Assignment #4

Assignment #4 Mutivariable modeling and Collinearity - 2019

Case 1 - Data; Truth NO Collinearity

In this example you have two continuous factors (e.g., elevation [meters] and degrees north latitude [degrees]), and one categorical factor with three groups (country). The dependent variable is lotus plant size (grams). In this system, there is no collinearity among your variables.

1. Write a linear (statistical) model that describes a single multivariable system containing all the x variables; be sure you specify what each ‘x’ represents.

2. What do the various β’s mean in this model?

3. Analyze the data in R using a single multi-variable model (lm). Write example sentences that you might include in a manuscript for publication that describes the observed results. You will need a total of 5 sentences for this part; be sure to include a sentence comparing Nepal to Tibet!

Optional: For your own edification, try running a post-hoc test of your multivariable model to determine the difference between the three countries. Note that you will have to convert your model to an AOV before you do so, and will have to specify the variable you want to run the post-hoc test for. While it may run, are there any red flags that indicate it didn't do what you quite wanted it to?

Case 2 – Dataset1; Truth

In this example you have two continuous factors (e.g., plant food abundance and understory cover) that might influence the density of bunnies. In this case, the two continuous variables are collinear and are potentially confounding (more bunnies if more food because they have to eat, but more bunnies if more understory cover because bunnies have to hide from predators). The data was created such that bunny density is truly affected by both understory cover and plant density. Understory cover ranges from 0 to 1 (percent of a density board seen from 15 meters away). Food is kg of browse per square meter and is collinear with understory density (bunnies tend to eat their cover). The response (bunnies) is bunnies / hectare. Examine the equations to see how the data was made and get the values of truth.

Import the data (saved as a .csv file).
Plot the relationship between understory cover (x) and food (y). Paste the graph in a word document
Calculate the r^2 between food and understory. Report this value in your word document.
Plot the relationship between bunny density (y) and understory cover (x). Paste the graph in your word document
Plot the relationship between bunny density (y) and food (x). Paste the graph in your word document
Run a simple regression between bunny density and food. Report your results in your word document using the standard sentence.
Run a simple regression between bunny density and cover. Report your results in your word document using the standard sentence.
Run a multiple regression between bunny density (y) and both cover (x) and food (x). Report your results in your word document using the standard sentence (or in this case, two sentences; one for each x).
Calculate the vif for food and understory. Report the vif in your word document.
In your word document describe what happened in 6, 7, and 8 above. Be sure to discuss:

The coefficient estimates of the explanatory (x) variable(s) relative to truth and other models run
What your final model would be in this analysis (be sure to explain why) and how you would deal with the collinearity among variables.
What you’ve learned from the exercise.

Case 3 – Dataset2; Truth

In this example, you have two continuous factors (sediment load and amount of organic material) that might influence water clarity. In this case, the two continuous variables are collinear, but are potentially redundant (both basically indicators of run-off, which is really driving water clarity). The data was created such that water clarity is a function of run-off. This run-off data is in the excel file, but it’s not in the csv file and you should essentially pretend it's data you didn’t collect and thus don’t have. Sediment and organic material are both closely correlated with run-off, but sediment is more closely related to run-off and thus is a better ‘index’ of run-off. Sediment and organic matter might be grams per cubic meter; run-off might be cubic-feet / minute, and clarity might be depth a secchi disk can be seen from (in centimeters), although I’m not a limnologist so don’t expect the relationships or numbers to be realistic. Examine the equations to see how the data was made and get the values of truth.

Import the data (saved as a .csv file)
Plot the relationship between sediment (x) and organic matter (y). Paste the graph in a word document
Calculate the r^2 between sediment and organic matter. Report this value in your word document
Plot the relationship between clarity (y) and sediment (x). Paste the graph in a word document
Plot the relationship between clarity (y) and organic matter (x). Paste the graph in your word document.
Run a simple regression between clarity (y) and sediment (x). Report your results in your word document using the standard sentence.
Runs a simple regression between clarity (y) and organic matter (x). Report your results in your word document using the standard sentence.
Run a multiple regression between clarity (y) and both sediment (x) and organic matter (x). Report your results in your word document using the standard sentence (again, actually 2 sentences).
Calculate the vif for sediment and organic matter. Report this value in your word document
In your word document describe what happened in 6, 7, and 8 above. Be sure to discuss:

The coefficient estimates of the explanatory (x) variable(s) and how they changed throughout the analysis (and why)
What your final model would be in this analysis (be sure to explain why) and how you would deal with the collinearity among variables.
What you’ve learned from the exercise.