Assignment #10 - Multivariate Analyses - 2019

Part 1

In the first part of this assignment, you are interested in how cheetah morphometrics might influence their success at predation. The data for this part of the assignment is here. You caught and radio collared 150 cheetahs. At the time of capture, you took measurements of body length (snout-vent), body mass, hind-foot length (HFL), and tail length (bigger tails = better balance when cornering), all in cm. You followed the cheetahs for 1 month and recorded the average number of prey killed / week for each animal. Each row in your dataset represents a single animal. You want to run a multiple regression analyzing the relationship between predation rates and morphometrics. Specifically, you hypothesize that larger animals should have higher predation rates, but you are interested if there are specific morphometrics that influence predation (e.g., longer tails help predation because of balance, longer legs help predation because of greater speed, etc.) Unfortunately, you have very strong collinearity among your x-variables. So strong that none of the parameters are significant when you run the model (although the model itself is significant). You'll need PCA to remove the collinearity.

1. Run an lm between Predation rate and the morphometric measures. Write appropriate sentences describing your results (should be four sentences here).

2. Run the PCA on the four mophometric measures - To make your results easier to interpret, do not scale your data before performing the PCA! You can either create a new data frame that removes Predation, or you can use a formula statement in princomp (i.e., ~BodyLength+Mass+HFL+Tail), in which case, you leave out the y-variable.

3. What proportion of the variance in morphometrics data do each of the principal components explain?

4. Based upon the loadings, give your description of the meaning of each of the principal components. Don't just describe them, but try to come up with a caricature that captures the meaning of the component. Given that these are animals, you might try to describe them using stereotypical body types from dogs or other animals (e.g., weiner dog, bulldog, elephants for big ears, etc.). However, any caricature is fine.

5. Run an lm between Predation rate and the PCA scores. Note that you'll have to get creative with your formula statement because the Predation data is in a different location (datum dataframe) from the scores (part of the PCA results). You can either add the scores to the datum data frame, or use the following formula statement:

datum$Predation~PCA$scores[,1]+PCA$scores[,2]+PCA$scores[,3]+PCA$scores[,4]

where 'PCA' is whatever you called your PCA object (might be 'results').

6. In a single sentence describe the results of this lm (don't get too verbose or complicated, just tell me what's significant, and what's not; if you want to challenge yourself, you might describe the significant relationship - in this case, you should know that the units of PCA scores is 'standard deviation')

Part 2

Data used for this part of the assignment

Your job in part 2 of this assignment is to run a DFA to build a model that could be used to distinguish among three species of canids (grey wolves, red wolves, and coyotes) as a function of various morphometrics

1. Plot a scatterplot matrix of the four morphometric measurements.

2. Run a MANOVA to test for differences among groups. What is the F-statistics for that test? Which morphometrics are significantly different among the 3 groups?

2. Run the DFA. Looking at the group means from the DFA, how would you describe grey wolves, red wolves, and coyotes morphometrically?

4. Describe the linear discriminants - what information is being used to distinguish between grey wolves, coyotes, and red wolves?

5. What proportion of the 'trace' is determined by the first LDA?

5. How well does the DFA predict class membership? In other words, what proportion of classifications are mistakes when the DFA model attempts to classify the original data?

6. Be sure to include a plot of the DFA results - LDA1 and LDA2 as the axes, with groups labeled in the graph.