Welcome to the Statistics help!

Here you can find simple guides to different techniques for statistical analysis with the software Stata. The focus is on running and interpreting the analyses, not the theory and assumptions behind that underpin the analyses. In the guides the code and output from the statistics software is shown, together with explanations in text. All code is supposed to be reproducible, so if you want to you can download the data that is linked in the guides, and follow along in the instructions.

The site is run by Anders Sundell.

The site is run by Anders Sundell.

Updates:

2020-03-11: Guide to using t-tests to calculate statistical significance.

2020-01-29: Guide to calculating mean values (averages) in different groups.

2020-01-24: Guide to simple descriptive statistics.

2020-01-13: Guide to visualizing regression coefficients with coefplot.

2020-01-08: Guide to line graphs.

2020-03-11: Guide to using t-tests to calculate statistical significance.

2020-01-29: Guide to calculating mean values (averages) in different groups.

2020-01-24: Guide to simple descriptive statistics.

2020-01-13: Guide to visualizing regression coefficients with coefplot.

2020-01-08: Guide to line graphs.

GENERAL

Glossary with short descriptions of common terms

Glossary of words that you may encounter when doing statistical analysis or work with data.

Glossary of words that you may encounter when doing statistical analysis or work with data.

STATA

A common tool for statistical analysis. Used to investigate relationships between two or more variables.

Introduction

Begin here. The basic principles, with two variables.

Control variables

Add control variables to account for alternative explanations.

Dummy variables

Use dummy variables to include categorical variables in the analysis.

Logarithmic variables

Run and interpret analyses with logarithmic variables, for instance to account for diminishing effects.

Interaction effects - two values

Effects that vary between over two groups in the sample.

Tables for presenting results from regression analyses

Create nice tables for presenting regression results with the command esttab.

Introduction

Begin here. The basic principles, with two variables.

Control variables

Add control variables to account for alternative explanations.

Dummy variables

Use dummy variables to include categorical variables in the analysis.

Logarithmic variables

Run and interpret analyses with logarithmic variables, for instance to account for diminishing effects.

Interaction effects - two values

Effects that vary between over two groups in the sample.

Tables for presenting results from regression analyses

Create nice tables for presenting regression results with the command esttab.

Different operations required to prepare the data for analysis.

Getting started with Stata

The different parts of the program, setting a project folder, loading data, do-files, etc.

Create datasets and import data

Import data or create a dataset from scratch.

Recode variables

Change or remove certain values from variables to prepare them for analysis, using the commands recode, generate and replace.

If qualifiers and conditions

Use conditions to run analyses and other commands on selected groups of observations.

Combining datasets

Add data from other sources with the command merge.

Logarithms

Use the logarithmic transformation on variables to account for skewness, for instance arising from exponential growth.

Getting started with Stata

The different parts of the program, setting a project folder, loading data, do-files, etc.

Create datasets and import data

Import data or create a dataset from scratch.

Recode variables

Change or remove certain values from variables to prepare them for analysis, using the commands recode, generate and replace.

If qualifiers and conditions

Use conditions to run analyses and other commands on selected groups of observations.

Combining datasets

Add data from other sources with the command merge.

Logarithms

Use the logarithmic transformation on variables to account for skewness, for instance arising from exponential growth.

Various techniques for visualizing data and relationships.

Histograms

Show the distribution of a variable with bars of different heights.

Bar charts

Averages for different groups shown with bars.

Scatterplots

Show relationships between two variables with points.

Line graphs

Show how a variable has changed over time with line graphs.

Maps of the world and regions with spmap

Maps that show countries' values on different variables with colors.

Visualize regression coefficient with coefplot

Present regression coefficients and confidence intervals graphically, using the command coefplot.

Histograms

Show the distribution of a variable with bars of different heights.

Bar charts

Averages for different groups shown with bars.

Scatterplots

Show relationships between two variables with points.

Line graphs

Show how a variable has changed over time with line graphs.

Maps of the world and regions with spmap

Maps that show countries' values on different variables with colors.

Visualize regression coefficient with coefplot

Present regression coefficients and confidence intervals graphically, using the command coefplot.

Get an overview of the data before proceeding to more advanced analysis.

Simple descriptive statistics

Use the commands codebook, summarize and tab to quickly find out the mean, median, min and max values (among other things) for a variable.

Mean values (averages) in different groups

Compare groups in a straightforward way by comparing mean values in different groups, using the commands sum and table.

t-test

Test differences between groups for statistical significance.

Correlation

Simple and very common measure to show the strength and direction of association between two variables.

Crosstabs

Relationships between two categorical variables shown with percentages.

Simple descriptive statistics

Use the commands codebook, summarize and tab to quickly find out the mean, median, min and max values (among other things) for a variable.

Mean values (averages) in different groups

Compare groups in a straightforward way by comparing mean values in different groups, using the commands sum and table.

t-test

Test differences between groups for statistical significance.

Correlation

Simple and very common measure to show the strength and direction of association between two variables.

Crosstabs

Relationships between two categorical variables shown with percentages.