By Anders Sundell
Scatterplots are good for showing relationships between continous variables, with values evenly spread across a scale. But if you for instance work with survey data, there is often a limited number of response options, which means that respondents are spread over a relatively small number of combinations. This makes scatterplots less information.
An alternative to visualize information is then to use bar charts. One of Stata's commands for them lets us quickly summarize variables and present the results in an easily understandable way. In this example we'll look closer at the relationship between education and income in the United States, with the aid of the american General Social Survey. It is downloadable here https://gss.norc.org/get-the-data/stata and we will use the 2018 edition.
cd "/Users/anderssundell/Dropbox/Jupyter/stathelp/data/" use "GSS2018.dta", clear
The variable for education is
degree and has five values. The variable for income is a scale that shows the respondent's income in dollar,
realrinc. The United States are known for having a high "education premium," which means that getting educated generally pays more. Wage differences are large.
Let us now do a bar chart with the
graph bar command. We will state that we want the bars to show the mean of the variable
realrinc, and that we want one bar for each value of the variable
degree only has five values, we get five bars.
graph bar (mean) realrinc, over(degree)
The diagram reveals big differences: Those who have a bachelor's degee earn about twice as much as those who only went to high school or junior college.
If we want horizontal bars (which might be a good idea if one has a lot of text in the category labels) it is easily done: We just switch
graph bar to
graph hbar in the code.
graph hbar (mean) realrinc, over(degree)
The bar chart gives a quick overview of how much the different groups make. But if we want to know exactly it might be a good idea to display the actual values as well. We do that by adding labels that show the height of the bars.
To do that we need an option
blabel(bar). We add that after the option
over. We can also try to change the bars from showing the mean to the median, which often is advisable when you work with incomes, since the few with extreme incomes push up the mean a lot.
We will also make two cosmetic changes: We will change the text on the y axis to something more neat with the option
ytitle, and we will also change the color of the bars with
graph bar (median) realrinc, over(degree) blabel(bar) ytitle("Medianinkomst i dollar") bar(1, color(purple))
Now we can see the pattern even more clearly. Those with a graduate degree earn almost four times as much on average as those with the lowest level of education.
As with all graphs you can spend infinite time on polishing them to get them exactly the way you want. But one can get really far with just a few adjustments as well. A nice bar chart often does wonders to show the main tendencies in a dataset (and are generally more readable to a lay audience than a scatterplot).