Notes on Presenting Data

 

Enumerative Tables – research data in the form of a list.

 

Look at our data  -- Variable Gender

 

Men        7

Women 3

 

May want to put things in percentages – Numbers often tell only part of the story.

 

Men        70%

Women  30%

 

 

What about Student Averages or Grades?

Example of condensing information down into categories.

 

A             4

B             1

C             3

D             0

F              2

 

Contingency tables give us a little more information – sometimes called crosstabs

Looks at categories of two variables to see how they are related:

Also need to include your n (number of cases) somewhere in the table

Each potential block is a cell – There are 6 in the following table.

Always need to sum percentages to 100% or numbers (cases) in each cell to total number of cases.

 

For example – Voting for the Presidential Incumbent by Race.

 

                                Voted for Incumbent           Voted for Challenger                           # of cases (n)

White                                     50%                                        50%                                                        200

Black                                      90%                                        10%                                                        100

Hispanic                                                80%                                        20%                                                        100

 

All voters                              70%                                        30%                                                        400

n (number of cases              270                                          130

 

(put either place

 

 

Other tables presenting data – Bar graphs, Line Graphs, Pie graphs – SPSS will do them all for you.

 


Describing Data – Summarizing distributions on one variable (Univariate statistics) – Describing your data

 

We use different statistics to describe data of different “levels” of measurement – nominal (categorical), ordinal, and interval.

 

Nominal – values reflect categories (male or female, religion, party id)

Ordinal – there is some order to the values, but no a definitive distance between each value. (1st , 2nd, 3rd place)

Interval (scale) – There is order and a definitive, equal distance between each value (dollars, percents, etc)

 

We can describe the data observed for each of our variables using measures of central tendency and dispersion.  The measures that we use are different for each level of measurement.

 

 

Level of Measurement

Central Tendency

Dispersion

Nominal

Mode

Variation ratio

Ordinal

Median

 

interval

Median, mean

 

 

Each measure of central tendency reported should be accompanied by the appropriate dispersion measure.

 

Central Tendency – The most “typical” value – The one value or score that best represents the entire set of cases on that variable

 

Dispersion – How large the variation of scores is around the “typical” score (central tendency)

                The smaller the dispersion, the greater our confidence that that typical score is really representative of the population.

                Greater dispersion says that there is a lot of variance in the scores – they are very different, and thus may not represent the “typical” case.

 

Example – Typical “average” yearly income of a University of North Carolina alumni since about 1980 – about $250,000 a year.  Why?  Most make about 40 to 50 K, but UNC has one super rich grad in the last 20 years – Micheal Jordan.  His millions of dollars pulled the average up.  So it is good to report additional central tendency measures and a measure of dispersion to show us the “range” of values for our cases.

 

All measure of Central tendency start with a frequency distribution – An ordered count of the number of cases that take on each values for any given variable.

 

For  nominal variables:

 

                Mode:  The most frequently occurring value – The value occurring in the largest number of cases

                                Ex.  Se your data I handed out yesterday.

                                What variables could we use the mode for describing? (gender, mode=0 or male)

                                                                                                                                (homestate, mode=1 or 4)

Many times the mode is a single point, sometimes, like for homestate, values are bimodal – have two modes.

Variation Ratio:  The percentage of all cases that do not fit into the mode.   You may have some circumstances where you may have most cases with different values.  In this case, you may have a mode, but a large percentage of cases do not fit this category. 

Ex:  The variation ratio is 30% for gender or 40% for homestate.  This tells us that the modal responses for this value are fairly “typical”.

Add number of cases that do NOT fit in mode, then divided this number by N (your number of cases.

 

For Ordinal variables:

                We have a little more information, and can get a little more specific on our descriptions.  Here we should find the median and the inter-quartile range.

 

Median:  The value of the middle case in the distribution.  Arrange the cases (values) in order from lowest to highest.  Count down to the middle case – if you have 20 cases, count down to the value that encompasses the 10th case.  That value is the median.

                Ex.  Find the median for parental interest in politics.  Arrange values (cases) in a frequency distribution from smallest value to highest.

                               

Values

# of cases

1

1

2

0

3

2

4

0

5

0

6

4

7

2

8

0

9

0

10

1

 

                In this example, the median is?  6

 

Interquartile range:  The middle 50% of the cases – The values between which 50% of the cases fall.  Thus, you remove the bottom 25% and the top 25% of the cases, and see what values 50% of the cases fall between.  The smaller the distances between the values, the more certain you are that that median is fairly typical of the population.

Thus, 25% case (case 2.5) in this example, occurs in value 3.

                The 75% case (case 7.5) in this example, occurs in value 7. 

                Thus the interquartile range is 3-7.  50% percent of the cases will have values between 3 and 7.

 

Interval level data.  Most complete information (values) we can have.  Here we find the mean and standard deviation.  Note:  We can always use statistics for “lower levels of measurement” to describe data – example it might be more meaningful to use the median to describe some interval data with large variation in values, or an “outlier  See the Micheal Jordan example above.  You can NOT use more advanced statistics (like median or mean) on the lower “levels of measure” (the nominal variables).  They would be meaningless!

 

Mean (the average):  A measure that locates the central point of a distribution in terms of both the numbers of cases on either side of the point and their distance from it.  Sum up the values for ALL cases, divide by the number of cases.

                Ex.  What is the mean student IQ?

                                (110+150+95+70+100+99+140+125+100+75) / 10                             Mean=106

               

                                If you have a large N (number of cases), then you may want to work with a formula like

                                (value) (# of cases with that value) + (value) (# of cases with that value)…   / N

 

                                Use this formula to find the mean workweek:

                                (0) (1) + (10) (4) + (20) (3) + (25) (1) + (30) (1) / 10                            Mean = 15.5

 

The measure of dispersion for interval data is the standard deviation.  We get to the standard deviation by way of the variance – Or a measure of the average “distances” each value has from the mean.  The greater the variance and the standard distribution, the greater the dispersion. 

 

To calculate the variance and standard deviation, follow these 6 steps.

1.  Calculate the mean for the variable.  We will use IQ for this example.  The mean IQ is 106.

 

2.  Calculate the distance between the mean and each value.  (Don’t worry about the sign of the integer – negative or positive because we square these next).

 

3.  Square that distances calculated in step 2.

 

4.  Add the squared distances calculated in step 3.  Answer is 5968

 

5.  To get the variance, divide the sum from step 4 by N (number of cases).  Variance =596.8

 

6.  To get the standard deviation, take the square root of the variance.  24.42  ie, Many students fall within 24 IQ points of the mean.

 

 

 

Value

Value -Mean

(Value-Mean)2

110

110-106 =4

16

150

150-106=44

1936

95

95-106=11

121

70

70-106=36

1296

100

100-106=6

36

99

99-106=7

49

140

140-106=34

1156

125

125-106=19

361

100

100-106=6

36

75

75-106=31

961

 

                                                                Total = 5968

 

 

This standard deviation gives us an idea of how dispersed the scores are around that mean.  It would let us compare other groups of students’ IQs with this group.  But it is pretty meaningless to us in this form.  We don’t really know what a “good” dispersion would be. 

 

We know that, for normal distributions with 1 mode (most averages you will use fall into this category, that 68.3% of all cases will fall within 1 standard deviation from the mean, 95.5% will fall within 2 standard deviations from the mean, and 99.7% within 3 standard deviations from the mean.  Thus for the data above, 68.3% of cases would fall between 130 and 82  IQ, and 95% between 154 and 58, etc.

 

We can also calculate a standard score for any particular value, that will tell us how far from the mean a case is – Can be used to compare between 2 or more cases.  We calculate a z score.

 

Z= (value – mean) / standard deviation.

 

Thus a z score for an IQ of 100  would be 100-106/24.4 == -.24.

 

Turning to Appendix A6, we see that the distance between the mean and the z for this score is .09.