Welcome to the Statistics help!

Follow me on Twitter or Youtube for examples of data visualizations.

2021-08-18: Panel regression with fixed effects

2021-04-26: Transform panel data between long and wide format using reshape

2021-04-26: Regression with interaction - continuous variables

2021-03-24: Logistic regression

2021-02-24: Create index variables

2020-11-18: Center, standardize and normalize variables

2020-10-19: Set up data for panel analysis

GENERAL

Glossary with short descriptions of common terms

Glossary of words that you may encounter when doing statistical analysis or work with data.

Glossary of words that you may encounter when doing statistical analysis or work with data.

STATA

Different operations required to prepare the data for analysis.

Getting started with Stata

The different parts of the program, setting a project folder, loading data, do-files, etc.

Create datasets and import data

Import data or create a dataset from scratch.

Recode variables

Change or remove certain values from variables to prepare them for analysis, using the commands "recode", "generate" and "replace".

Center, standardize and normalize variables

Three common transformations of variables: centering, standardizing and normalizing.

Create an index variable

Create index variables that combine values of several variables, and check the reliability using Cronbach's alpha.

If qualifiers and conditions

Use conditions to run analyses and other commands on selected groups of observations.

Combining datasets

Add data from other sources with the command "merge".

Logarithms

Use the logarithmic transformation on variables to account for skewness, for instance arising from exponential growth.

Aggregate datasets

Use the command "collapse" to aggregate datasets to show statistics such as means and standard deviations for groups in the data.

Getting started with Stata

The different parts of the program, setting a project folder, loading data, do-files, etc.

Create datasets and import data

Import data or create a dataset from scratch.

Recode variables

Change or remove certain values from variables to prepare them for analysis, using the commands "recode", "generate" and "replace".

Center, standardize and normalize variables

Three common transformations of variables: centering, standardizing and normalizing.

Create an index variable

Create index variables that combine values of several variables, and check the reliability using Cronbach's alpha.

If qualifiers and conditions

Use conditions to run analyses and other commands on selected groups of observations.

Combining datasets

Add data from other sources with the command "merge".

Logarithms

Use the logarithmic transformation on variables to account for skewness, for instance arising from exponential growth.

Aggregate datasets

Use the command "collapse" to aggregate datasets to show statistics such as means and standard deviations for groups in the data.

A common tool for statistical analysis. Used to investigate relationships between two or more variables.

Introduction

Begin here. The basic principles, with two variables.

Interpret the results

What the different parts of Stata's output from regression analysis means, with annotated output.

Control variables

Add control variables to account for alternative explanations.

Predict values

Use the regression equation to predict values - guesses - for observations in the data.

Dummy variables

Use dummy variables to include categorical variables in the analysis.

Logarithmic variables

Run and interpret analyses with logarithmic variables, for instance to account for diminishing effects.

Logistic regression

A special regression analysis suited for dependent variables that only have two values, 0 or 1. How to run and interpret the analysis, and how it differs from OLS.

Interaction effects - two values

Effects that vary over two groups in the sample.

Interaction effects - continuous variables

Effects that vary over continuous variables.

Tables for presenting results from regression analyses

Create nice tables for presenting regression results with the command esttab.

Introduction

Begin here. The basic principles, with two variables.

Interpret the results

What the different parts of Stata's output from regression analysis means, with annotated output.

Control variables

Add control variables to account for alternative explanations.

Predict values

Use the regression equation to predict values - guesses - for observations in the data.

Dummy variables

Use dummy variables to include categorical variables in the analysis.

Logarithmic variables

Run and interpret analyses with logarithmic variables, for instance to account for diminishing effects.

Logistic regression

A special regression analysis suited for dependent variables that only have two values, 0 or 1. How to run and interpret the analysis, and how it differs from OLS.

Interaction effects - two values

Effects that vary over two groups in the sample.

Interaction effects - continuous variables

Effects that vary over continuous variables.

Tables for presenting results from regression analyses

Create nice tables for presenting regression results with the command esttab.

Get an overview of the data before proceeding to more advanced analysis.

Simple descriptive statistics

Use the commands codebook, summarize and tab to quickly find out the mean, median, min and max values (among other things) for a variable.

Mean values (averages) in different groups

Compare groups in a straightforward way by comparing mean values in different groups, using the commands sum and table.

t-test

Test differences between groups for statistical significance.

Correlation

Simple and very common measure to show the strength and direction of association between two variables.

Crosstabs

Relationships between two categorical variables shown with percentages.

Simple descriptive statistics

Use the commands codebook, summarize and tab to quickly find out the mean, median, min and max values (among other things) for a variable.

Mean values (averages) in different groups

Compare groups in a straightforward way by comparing mean values in different groups, using the commands sum and table.

t-test

Test differences between groups for statistical significance.

Correlation

Simple and very common measure to show the strength and direction of association between two variables.

Crosstabs

Relationships between two categorical variables shown with percentages.

Various techniques for visualizing data and relationships.

Histograms

Show the distribution of a variable with bars of different heights.

Bar charts

Averages for different groups shown with bars.

Scatterplots

Show relationships between two variables with points.

Line graphs

Show how a variable has changed over time with line graphs.

Maps of the world and regions with spmap

Maps that show countries' values on different variables with colors.

Visualize regression coefficient with coefplot

Present regression coefficients and confidence intervals graphically, using the command coefplot.

Histograms

Show the distribution of a variable with bars of different heights.

Bar charts

Averages for different groups shown with bars.

Scatterplots

Show relationships between two variables with points.

Line graphs

Show how a variable has changed over time with line graphs.

Maps of the world and regions with spmap

Maps that show countries' values on different variables with colors.

Visualize regression coefficient with coefplot

Present regression coefficients and confidence intervals graphically, using the command coefplot.

Work with time in Stata, either for one unit (time series) or many (panel data).

Setting up data for time series

Set time variable, lags, leads, delta variable, plot data over time.

Setting up panel data

Set panel and time variable, the difference between wide and long data, common error messages.

Transform panel data between long and wide with reshape

How to transform data between the two data formats for panel data, wide format and long format, using the command reshape.

Panel regression with fixed effects

How to use and understand so called "fixed effects" in regression analysis of panel data.

Setting up data for time series

Set time variable, lags, leads, delta variable, plot data over time.

Setting up panel data

Set panel and time variable, the difference between wide and long data, common error messages.

Transform panel data between long and wide with reshape

How to transform data between the two data formats for panel data, wide format and long format, using the command reshape.

Panel regression with fixed effects

How to use and understand so called "fixed effects" in regression analysis of panel data.