Steury lab page - Auburn University

Data and Analysis Considerations Checklist

This checklist is intended to be a means for researchers to keep track of the various things they need to THINK about when conducting statistical analysis. A lot of students have asked for a step-by-step checklist of procedures for analyzing data. However, data analysis cannot (or at least should not) follow a 'cook-book'. Every data set is different. Every scientific question is different. Thus, a 'one-size-fits-all' recipe book for statistics simply isn't possible. Instead, you should THINK about your data. Successfully doing so should guide you in how to analyze your data.

Note that it's probably a good idea to think about these things BEFORE you collect the data.

Here's the checklist.

WHAT IS THE SCIENTIFIC (NOT STATISTICAL) QUESTION YOU ARE TRYING TO ANSWER?
What statistical modeling procedure will HELP you to answer the question above? (Note that statistics should not and generally cannot provide you with the answer, but is a tool to help you answer the question). The answer to this may require information from below.
What is the statistical goal of your analysis? Is it to:
Determine if an effect or difference is significant?
Estimate the size of an effect or difference?
Make prediction?
What are the independent variables of interest? Are there other independent variables that you aren't interested in, but that may swamp out the effects of variables you are interested in?
Are your independent variables continuous or categorical? Ask this question for each of your independent variables.
For each continuous independent variable, is it possible that its effect on the response is non-linear?
Are the other assumptions of statistical procedures possibly violated (e.g., homoscedasticity, no autocorrelation in data, etc.)?
Are interactions among your independent variables likely to exist?
Is there collinearity among your independent variables? Of course there is; this is ecology! HOW STRONG IS IT? How will you deal with the collinearity? Are there are independent variables that you haven't measured that might have confounding effects?
Should you treat your independent variables as random or fixed? Are there any blocking factors or other random effects you need to include in your model?
Are there any measurements that were repeated on a subject? If so, do you have to worry about autocorrelation among those variables or can you simply treat the subject as a blocking factor?
Are any variables nested in other variables? In other words, do you need to be concerned about pseudoreplication in your analysis?
What is the nature of your response variable? Is it:
Continuous, normally distributed?
Count data?
Binomial data (coin flip)?
Follow some other distribution?
Do you have multiple responses of interest and if so, should you use multivariate statistics?
How will you build your statistical model?
No Building - just run an analysis of the global model
Build the model using stepwise procedures
Use AIC to find a best model, possibly with multimodel inference of coefficient estimates

Steury Lab