By Anders Sundell

Scatterplots are good for showing relationships between continous variables, with values evenly spread across a scale. But if you for instance work with survey data, there is often a limited number of response options, which means that respondents are spread over a relatively small number of combinations. This makes scatterplots less information.

An alternative to visualize information is then to use bar charts. One of Stata's commands for them lets us quickly summarize variables and present the results in an easily understandable way. In this example we'll look closer at the relationship between education and income in the United States, with the aid of the american General Social Survey. It is downloadable here https://gss.norc.org/get-the-data/stata and we will use the 2018 edition.

In [35]:
cd "/Users/anderssundell/Dropbox/Jupyter/stathelp/data/"
use "GSS2018.dta", clear
/Users/anderssundell/Dropbox/Jupyter/stathelp/data

The variable for education is degree and has five values. The variable for income is a scale that shows the respondent's income in dollar, realrinc. The United States are known for having a high "education premium," which means that getting educated generally pays more. Wage differences are large.

Let us now do a bar chart with the graph bar command. We will state that we want the bars to show the mean of the variable realrinc, and that we want one bar for each value of the variable degree. Since degree only has five values, we get five bars.

In [17]:
graph bar (mean) realrinc, over(degree)
Stata Graph - Graph 0 10,000 20,000 30,000 40,000 50,000 mean of realrinc lt high school high school junior college bachelor graduate

The diagram reveals big differences: Those who have a bachelor's degee earn about twice as much as those who only went to high school or junior college.

Horizontal bars

If we want horizontal bars (which might be a good idea if one has a lot of text in the category labels) it is easily done: We just switch graph bar to graph hbar in the code.

In [18]:
graph hbar (mean) realrinc, over(degree)
Stata Graph - Graph 0 10,000 20,000 30,000 40,000 50,000 mean of realrinc graduate bachelor junior college high school lt high school

Add labels

The bar chart gives a quick overview of how much the different groups make. But if we want to know exactly it might be a good idea to display the actual values as well. We do that by adding labels that show the height of the bars.

To do that we need an option blabel(bar). We add that after the option over. We can also try to change the bars from showing the mean to the median, which often is advisable when you work with incomes, since the few with extreme incomes push up the mean a lot.

We will also make two cosmetic changes: We will change the text on the y axis to something more neat with the option ytitle, and we will also change the color of the bars with bar(1, color(purple)).

In [38]:
graph bar (median) realrinc, over(degree) blabel(bar) ytitle("Medianinkomst i dollar") bar(1, color(purple))
Stata Graph - Graph 9647.5 12485 17025 24970 37455 0 10,000 20,000 30,000 40,000 Medianinkomst i dollar lt high school high school junior college bachelor graduate

Now we can see the pattern even more clearly. Those with a graduate degree earn almost four times as much on average as those with the lowest level of education.

Conclusion

As with all graphs you can spend infinite time on polishing them to get them exactly the way you want. But one can get really far with just a few adjustments as well. A nice bar chart often does wonders to show the main tendencies in a dataset (and are generally more readable to a lay audience than a scatterplot).