This checklist is intended to be a
means for researchers to keep track of the various
things they need to THINK about when conducting
statistical analysis. A lot of students have asked
for a step-by-step checklist of procedures for
analyzing data. However, data analysis cannot (or at
least should not) follow a 'cook-book'. Every data
set is different. Every scientific question is
different. Thus, a 'one-size-fits-all' recipe book
for statistics simply isn't possible. Instead, you
should THINK about your data. Successfully doing so
should guide you in how to analyze your data.
Note that it's probably a good
idea to think about these things BEFORE you collect
the data.
Here's the checklist.
-
WHAT IS THE SCIENTIFIC (NOT
STATISTICAL) QUESTION YOU ARE TRYING TO ANSWER?
-
What statistical modeling
procedure will HELP you to answer the question
above? (Note that statistics should not and
generally cannot provide you with the answer,
but is a tool to help you answer the question).
The answer to this may require information from
below.
-
What is the statistical goal
of your analysis? Is it to:
-
Determine if an effect or
difference is significant?
-
Estimate the size of an effect
or difference?
-
Make prediction?
-
What are the independent
variables of interest? Are there other
independent variables that you aren't interested
in, but that may swamp out the effects of
variables you are interested in?
-
Are your independent variables
continuous or categorical? Ask this question for
each of your independent variables.
-
For each continuous
independent variable, is it possible that its
effect on the response is non-linear?
-
Are the other assumptions of
statistical procedures possibly violated (e.g.,
homoscedasticity, no autocorrelation in data,
etc.)?
-
Are interactions among your
independent variables likely to exist?
-
Is there collinearity among
your independent variables? Of course there is;
this is ecology! HOW STRONG IS IT? How will you
deal with the collinearity? Are there are
independent variables that you haven't measured
that might have confounding effects?
-
Should you treat your
independent variables as random or fixed? Are
there any blocking factors or other random
effects you need to include in your model?
-
Are there any measurements
that were repeated on a subject? If so, do you
have to worry about autocorrelation among those
variables or can you simply treat the subject as
a blocking factor?
-
Are any variables nested in
other variables? In other words, do you need to
be concerned about pseudoreplication in your
analysis?
-
What is the nature of your
response variable? Is it:
-
Continuous, normally
distributed?
-
Count data?
-
Binomial data (coin flip)?
-
Follow some other
distribution?
-
Do you have multiple responses
of interest and if so, should you use
multivariate statistics?
-
How will you build your
statistical model?
-
No Building - just run an
analysis of the global model
-
Build the model using stepwise
procedures
-
Use AIC to find a best model,
possibly with multimodel inference of
coefficient estimates