Assignment #9 - 2019

Part 1

In part 1 of this assignment, we will be conducting an AIC analysis. In this study. we looked at the presence or absence of red-cockaded woodpeckers in study plots as a function of various habitat attributes of those plots including stand age, whether the plot is burned regularly, tree density, and road density (What kind of model is this?). You are to run an AIC analysis on all possible model combinations of the four variables (an "all-subsets analysis"; that's 16 models). 

Data to be analyzed

Truth (you don't need this for your analysis - it's included here just for your information)

Using the MuMIn package in R, conduct an all-subsets analysis of the data set above, complete with multi-model inference. Write a paragraph that provides the following information (typical of published research). What was the best model? What was the Akaike weight of the best model? What is the Akaike weight of the parameters considered (careful with wording here; use wording that provides the Akaike weight, but also interprets the meaning of those weights)? Write standard sentences explaining the estimated effects of the variables as determined by the model-averaged coefficient estimates. Since this is a glm, don't forget to report exp(beta) 'times as likely'. Also, since we can't use the confint function to get the confidence limits, use LCL = exp(beta-2*SE) and UCL = exp(beta+2*SE) to calculate the limits where SE is the unconditional standard errors that MuMIn provides.

Part 2

In part 2 of the assignment, you will be evaluating the predictive performance of a model using cross-validation.

In this study, you are trying to predict the age (in months) study species (your choice) as a function of body length (cm), sex, mass (mg, g, or kg), and color saturation (% of maximum). Note that in the real world, your continuous variables would likely be collinear, but to keep things simple the data has no collinearity in it.

Data to be analyzed

Truth

First, divide the dataset into a training dataset (75% of data) and a testing dataset (25% of data). Run an analysis on the training dataset (all variables are significant; no need for model building). Then use those results to evaluate the predictive performance of the model using the testing dataset.

Report the r^2 between the model predictions and the y-values from the testing dataset. Do you think this is good predictive ability?

Also plot the relationship between model predictions and y-values from the testing dataset

Finally, use the 'train' function from the 'caret' package to run a k-fold (10 folds, 1 repeat) cross validation of your model.

Report the average RMSE (+/- SD) and r^2 (+/- SD) for those 10 folds. Write standard sentences describing the effect of each x-variable on Age; use the optimal betas provided by the k-fold cross validation training procedure (just leave off C.I. and p-values, since r doesn't provide measures of uncertainty for those optimal values).