Fall 2019

 

Assignment #3 - Analysis of Categorical Data

 

DataSet#1 - For this dataset, you are conducting an experiment to see if there is a difference in quail density (Y; quail/ha) between predator exclusion plots and control plots (X - groups). Technically, this is a t-test. However, for consistency, I want you to use the lm function, NOT the t.test function. Report your results using the standard sentence.

Dataset #2 - For this dataset, you are conducting an experiment to see if bobwhite quail density (Y; quail/hectare) varies between habitat types (X). The habitat types examined are:  open pine, closed pine, prairie/pasture, and ag. Answer the following questions:

1. Run an ANOVA in R. Write a sentence describing the results (i.e., p-value). Typically, this sentence is something like "ANOVA indicated that at least two of the [x-variable] were significantly different from each other (p = [p-value])."

2. Plot the relationship between habitat types and quail density in R and paste the graph in your word document

3. Run a Tukey's HSD Post-hoc test in R. Using the standard language we have been using throughout the class, describe the estimated differences between Closed Pine and Ag and between Open Pine and Ag (be sure to include confidence intervals and p-values).

Dataset #3 - For this dataset, you are examining how deer kill rates (Y; deer killed / hunter) is related to hunting method (X; the variable is called 'Hunt'). Types of hunting methods considered in this study include rifle, bow, and Uzi. Answer the following questions:

1. Run an ANOVA in R. Write a sentence (like we discussed in class) describing the results (i.e., p-value).

2. Plot the relationship between hunting method and deer kill rates in R and paste the graph in your word document

3. Run a Tukey's HSD Post-hoc test in R. Using the standard language we have been using throughout the class, describe the estimated differences among all hunting methods (be sure to include confidence intervals and p-values).

4. For this last dataset, in Excel, I'd like you to dummy code the data. Make 3 new columns - one for each hunting method - and assign 0s and 1s as appropriate. You do not need to use the dummy coded data in your analysis; I just want to make sure you can do it.

Dataset #4 - For this dataset, you are examining how fertilizer (X; grams) influences biomass production (Y; kg/ha) in restored prairie plots. Fertilizer has been applied at 0 (control), 2, 4, 6, 8, 10 grams in each of 5 plots. Run the following analyses:

1. Run an analysis to determine the relationship between biomass and fertilizer where fertilizer is treated as a categorical variable. Because we won't be conducting a post-hoc test, use the lm function for this analysis. Report the difference in biomass between 2 grams and 0 grams of fertilizer using the standard sentence.

2. Continued from question 1, Report the difference in biomass between 2 grams and 4 grams of fertilizer using the standard sentence. Don't get this answer by using a Tukey's post-hoc test. Use the lm function.

3. Run an analysis to determine the relationship between Biomass and Fertilizer, but in this case, treat Fertilizer as a continuous variable. Report the results using the standard sentence.

4. Run an F-drop test to compare the model where Fertilizer was treated as categorical (from questions 1 and 2) to the model where fertilizer was treated as continuous (from question 3). What is the F-statistic and p-value from this f-drop test?

5. Based on the results from 4, which model is preferred: the one where Fertilizer is continuous or the one where Fertilizer is categorical?

6. Continued from 5. What does that mean about the nature of the relationship between Biomass and Fertilizer?