Introduction to Applied Time Series Regression

Henry Thompson

Auburn University

 

Suppose there is an economic model that suggests xt should affect yt with control variable zt.  In general functional form yt = f(xt, zt).  Ideally yt would be endogenous and xt and zt exogenous in the underlying economic model.  Economics generally studies systems of equations but this introduction focuses on estimating a single equation. 

Each time series variable depends on its own history and estimating these underlying processes is prior to estimating the relationship of interest between xt and yt.  The best predictor of yt+1 is typically its own history that includes the influence of all related variables.  In economic theory the focus is on model specification and parameter estimation.  Empirical models isolate and quantify the effects of exogenous variables, and may suggest ways to improve theory. 

Variables in OLS regressions should be normally distributed with a constant mean with observations around the mean according to a white noise error term.  Variables with trends have low peaks and fat tails and are not normally distributed.  With a positive trend, early observations are below the mean and later ones above the mean.  Standard errors are calculated based on constant means and normal distributions.  If variables are nonstationary then standard errors are understated.

Consider the OLS regression

yt = α0 + α1xt + α2zt + εt .                                                      (1)

The goal of the project is to interpret theory in terms of the estimated coefficient α1 = δyt/δxt.  Begin with a theoretical model and then derive (1) to be able to relate estimated coefficients to the theoretical model.  Rely on theory to suggest exogenous variables.  Either xt or zt can represent more than a single variable.  The model yt = α0 + α1xt + εt without the control variable should be estimated and compared to (1).  The residual εt has to be white noise WN with a mean close to zero, low autocorrelation, and constant variance. 

The ultimate form of the regression may not be as simple as (1) since OLS assumes normally distributed variables but time series variables typically have trends, may have structural breaks, and may be heteroskedastic with a changing variance over time. 

A key concept in applied time series analysis is stationarity.  A stationary process has a long history and is converging to its dynamic steady state.  Stationarity is a weaker condition than a constant mean but regressions with stationary variables may produce reliable statistics.  The key test is the residual εt that has to be white noise WN.  

The typical issue in time series is that variables are not normally distributed and the OLS regression (1) has understated standard errors.  If theory suggests xt should affect yt and both have trends, they will be correlated and estimated coefficients will appear significant but explanatory power is overstated. 

An OLS regression with nonstationary variables leads to autocorrelation indicated by significant autocorrelation corr(εt, εt-1) in the residual series εt.  Autocorrelation implies information remains in the residual and something else must affect yt in a systematic way.  A pattern in the residual suggests there is something more that can be explained, requiring either a different model or transformed variables. 

Understated standard errors with autocorrelation imply overstated coefficient significance and explanatory power.  A spurious OLS regression has biased and underestimated variances, inflated t-statistics, and an inflated R2.  Coefficient estimates are unbiased, just as likely above as below the true value.  Estimated coefficients are consistent and converge to the true value as the number of observations increases and the variance approaches zero.  In fact, estimated coefficients are super consistent with accelerating convergence as the number of observations increases. 

If series are not stationary they may be difference stationary random walks, and OLS regressions in differences are then reliable.  A difference stationary random walk series may also be cointegrated, related through an error correction process that adjusts relative to the long run dynamic relationship between variables. 

Economic models with more than a single dependent variable can be solved in reduced form with each dependent variable a function of all exogenous variables.  Consider the market model that determines endogenous price P and quantity Q = D = S from the demand function D = D(P,y) and supply function S = S(P,w).  The exogenous variables in the model are the demand shifter y and the supply shifter w.  It would be appropriate to estimate Q and P as functions of y and w but inappropriate to estimate Q as a function of P.  This identification problem should be addressed in deriving (1).  Theory is flexible in that various assumptions about endogeneity lead to different regression models.

As another example, the ISLMBP model with national income Y a function of exogenous variables including government spending G, money supply Ms, the foreign interest rate r*, and foreign income Y*.  The domestic interest rate r is endogenous and should not be on the right hand side in a regression with the dependent variable national income.  A floating exchange rate e would be an endogenous variable with the balance of payments B exogenous since e adjusts if B ¹ 0 but a managed exchange rate would be exogenous.  

The time series processes of the variables involved ultimately determine their form in the regression and lagged effects may be important.  For instance, an increase in the price of coffee might raise the demand for tea next year.  The theoretical model should then check for lagged effects of exogenous variables.    

Regression options include transforming variables with logarithms, differences, inverses, and lags, and de-trending.  The error correction model ECM includes the residual εt of (1) in a second stage difference model that separates transitory adjustment from adjustment toward the dynamic equilibrium. 

Theory is the guide to variable selection and endogeneity.  Regression results might suggest ways to refine theory.  Empirical analysis should lead to subsequent theoretical and empirical analysis.  Beyond the term project, other time periods or variables can be examined. 

SECTIONS

White noise

Stationary variables

Stationary with a structural break

Difference stationary variables

Unit root with a structural break

Difference models

Error correction models

Lagged transformation models

Detrending

Event Breaks

Other Models: 2SLS, VAR, Causality, Conditional Mean and Variance

Conclusion

The primary goals of time series regression analysis are to interpret economic theory in terms of estimated coefficients and suggest ways to refine theory.  Successful applied time series analysis requires the combination of solid economic theory with reliable time series techniques.  In the results, discuss only significant coefficients and not the signs of insignificant coefficients.  Work through the algebra of any differences, residuals, or lags and relate results to theory.  Report the best possible regression results with residuals as close to WN as possible.  Advanced techniques deal with optimal lags, endogenous influences across processes, simultaneous equations, simultaneous estimation of time varying variance, endogenous structural breaks, instrumental variables, and so on.

This Introduction to Applied Time Series Regression provides the foundation for a term project.  Send an email for information on the full text.