Assignment #4 Mutivariable modeling and Collinearity - 2019
Case 1
In this example you have two continuous factors (e.g.,
elevation [meters] and degrees north latitude [degrees]), and one categorical
factor with three groups (country). The dependent variable is lotus plant size (grams). In this system,
there is no collinearity among your variables.
1.
Write a linear (statistical) model that describes
a single multivariable system containing all the x variables; be sure you specify what each ‘x’ represents.
2.
What do the various β’s mean in this model?
3.
Analyze the data in R using a single multi-variable
model (lm). Write example sentences that you might include
in a manuscript for publication that describes the observed results.
In this example you have two continuous factors (e.g.,
plant food abundance and understory cover) that might influence the density of
bunnies. In this case, the two continuous variables are collinear and are
potentially confounding (more bunnies if more food because they have to eat, but
more bunnies if more understory cover because bunnies have to hide from
predators). The data was created
such that bunny density is truly affected by both understory cover and plant
density. Understory cover ranges from 0 to 1 (percent of a density board seen
from 15 meters away). Food is kg of browse per square meter and is collinear
with understory density (bunnies tend to eat their cover). The response
(bunnies) is bunnies / hectare. Examine the equations to see how the data was
made and get the values of truth.
Import the data (saved as a .csv file)
Plot the relationship between understory cover
(x) and food (y)
Calculate the r^2 between food and understory. Report this value in your word document.
Plot the relationship between bunny density (y)
and understory cover (x)
Run a simple regression between bunny density and cover. Report your results in your word document using the standard sentence.
Run a multiple regression between bunny density
(y) and both cover (x) and food (x)
In your word document describe what happened in 6, 7, and 8 above. Be sure to discuss:
The coefficient estimates of the explanatory (x)
variable(s) relative to truth and other models run
What your final model would be in this analysis
(be sure to explain why) and how you would deal with the collinearity among variables.
What you’ve learned from the exercise.
In this example, you have two continuous factors (sediment
load and amount of organic material) that might influence water clarity. In this
case, the two continuous variables are collinear, but are potentially redundant
(both basically indicators of run-off, which is really driving water clarity).
The data was created such that water clarity is a function of run-off. This
run-off data is in the excel file, but it’s not in the csv file and you should
essentially pretend it's data you didn’t collect and
thus don’t have. Sediment and
organic material are both closely correlated with run-off, but sediment is more
closely related to run-off and thus is a better ‘index’ of run-off. Sediment and
organic matter might be grams per cubic meter; run-off might be cubic-feet /
minute, and clarity might be depth a secchi disk can be seen from (in
centimeters), although I’m not a limnologist so don’t expect the relationships
or numbers to be realistic. Examine the equations to see how the data was made
and get the values of truth.
Import the data (saved as a .csv file)
Plot the relationship between sediment (x) and
organic matter (y)
Calculate the r^2 between
sediment and
Run a simple regression between clarity
(y) and
sediment
The coefficient estimates of the explanatory (x) variable(s) and how they changed throughout the analysis (and why)
What your final model would be in this analysis
(be sure to explain why) and how you would deal with the collinearity among variables.
What you’ve learned from the exercise.