##### Code for Testing Assumptions of Regression in R ### Import data datum=read.csv(file.choose()) ### creates a data frame named datum from imported csv file ### check data was imported properly head(datum) ### column headers of data file plus first 6 rows of data summary(datum) ### summary statistics for datum ### Run a regression - Generic Code - Procedures below require you to run a regression first results=lm(Y~X,data=datum) ### runs a linear regression between 'Y' and 'X' columns in 'datum', calls results 'results' summary(results) #####Testing of assumptions ###continuous variable - no need to test, should be obvious ###ScatterPlot - Good for testing all assumptions plot(Y~X,data=datum) abline(results) ###plots line over data # if non-linear relationship, will have groups of points above and below line # if heteroscedastic data, variance in error will not be constant across x values # if non-normally distributed residuals, error will be skewed above or below the line # if autocorrelated data, points will seem to follow each other in a path ### Plot of Residuals - Good for testing all assumptions residuals(results) #### provides residual error of points plot(residuals(results)~datum$X) ###plot residuals in order of X values # line is now horizontal at Y = 0 # if non-linear relationship, will have groups of points above and below line # if heteroscedastic data, variance in error will not be constant across x values # if non-normally distributed residuals, error will be skewed above or below the line # if autocorrelated data, points will seem to follow each other in a path ### Histogram of residuals - Good for testing normality assumption # Note that this is testing whether residuals are global normal (across all x), not whether residuals are locally normal help(hist) # help file for histograms hist(residuals(results)) ### draws a histogram of residuals hist(residuals(results),breaks=10) ### the 'breaks' argument changes the number of bars # if non-normally distributed residuals, histogram will appear non-normal #### autocorrelation function help(acf) #help file for autocorrelation function acf(residuals(results)) ### runs autocorrelation function on residuals from regression # to test for autocorrelation, residuals must be in order that autocorrelation might exist (usually x-order) acf(residuals(results)[order(datum$X)]) ###[order(datum$Rainfall)] ### sorts residuals into x-order # if autocorrelated data, lines at x=1,2,3,.... will cross dotted horizontal line, indicated significant autocorrelation # Note that violating assumption of non-linearity can also appear as autocorrelation.