Notes on Presenting Data

Enumerative Tables – research data in the form of a list.

Look at our data -- Variable Gender

Men 7

Women 3

May want to put things in percentages – Numbers often tell only part of the story.

Men 70%

Women 30%

What about Student Averages or Grades?

Example of condensing information down into categories.

A 4

B 1

C 3

D 0

F 2

Contingency tables give us a little more information – sometimes called crosstabs

Looks at categories of two variables to see how they are related:

Also need to include your n (number of cases) somewhere in the table

Each potential block is a cell – There are 6 in the following table.

Always need to sum percentages to 100% or numbers (cases) in each cell to total number of cases.

For example – Voting for the Presidential Incumbent by Race.

Voted for Incumbent Voted for Challenger # of cases (n)

White 50% 50% 200

Black 90% 10% 100

Hispanic 80% 20% 100

All voters 70% 30% 400

n (number of cases 270 130

(put either place

Other tables presenting data – Bar graphs, Line Graphs, Pie graphs – SPSS will do them all for you.

Describing Data – Summarizing distributions on one variable (Univariate statistics) – Describing your data

We use different statistics to describe data of different “levels” of measurement – nominal (categorical), ordinal, and interval.

Nominal – values reflect categories (male or female, religion, party id)

Ordinal – there is some order to the values, but no a definitive distance between each value. (1^st , 2^nd, 3^rd place)

Interval (scale) – There is order and a definitive, equal distance between each value (dollars, percents, etc)

We can describe the data observed for each of our variables using measures of central tendency and dispersion. The measures that we use are different for each level of measurement.

Level of Measurement	Central Tendency	Dispersion
Nominal	Mode	Variation ratio
Ordinal	Median
interval	Median, mean

Each measure of central tendency reported should be accompanied by the appropriate dispersion measure.

Central Tendency – The most “typical” value – The one value or score that best represents the entire set of cases on that variable

Dispersion – How large the variation of scores is around the “typical” score (central tendency)

The smaller the dispersion, the greater our confidence that that typical score is really representative of the population.

Greater dispersion says that there is a lot of variance in the scores – they are very different, and thus may not represent the “typical” case.

Example – Typical “average” yearly income of a University of North Carolina alumni since about 1980 – about $250,000 a year. Why? Most make about 40 to 50 K, but UNC has one super rich grad in the last 20 years – Micheal Jordan. His millions of dollars pulled the average up. So it is good to report additional central tendency measures and a measure of dispersion to show us the “range” of values for our cases.

All measure of Central tendency start with a frequency distribution – An ordered count of the number of cases that take on each values for any given variable.

For nominal variables:

Mode: The most frequently occurring value – The value occurring in the largest number of cases

Ex. Se your data I handed out yesterday.

What variables could we use the mode for describing? (gender, mode=0 or male)

(homestate, mode=1 or 4)

Many times the mode is a single point, sometimes, like for homestate, values are bimodal – have two modes.

Variation Ratio: The percentage of all cases that do not fit into the mode. – You may have some circumstances where you may have most cases with different values. In this case, you may have a mode, but a large percentage of cases do not fit this category.

Ex: The variation ratio is 30% for gender or 40% for homestate. This tells us that the modal responses for this value are fairly “typical”.

Add number of cases that do NOT fit in mode, then divided this number by N (your number of cases.

For Ordinal variables:

We have a little more information, and can get a little more specific on our descriptions. Here we should find the median and the inter-quartile range.

Median: The value of the middle case in the distribution. Arrange the cases (values) in order from lowest to highest. Count down to the middle case – if you have 20 cases, count down to the value that encompasses the 10^th case. That value is the median.

Ex. Find the median for parental interest in politics. Arrange values (cases) in a frequency distribution from smallest value to highest.

Values	# of cases
1	1
2	0
3	2
4	0
5	0
6	4
7	2
8	0
9	0
10	1

In this example, the median is? 6

Interquartile range: The middle 50% of the cases – The values between which 50% of the cases fall. Thus, you remove the bottom 25% and the top 25% of the cases, and see what values 50% of the cases fall between. The smaller the distances between the values, the more certain you are that that median is fairly typical of the population.

Thus, 25% case (case 2.5) in this example, occurs in value 3.

The 75% case (case 7.5) in this example, occurs in value 7.

Thus the interquartile range is 3-7. 50% percent of the cases will have values between 3 and 7.

Interval level data. Most complete information (values) we can have. Here we find the mean and standard deviation. Note: We can always use statistics for “lower levels of measurement” to describe data – example it might be more meaningful to use the median to describe some interval data with large variation in values, or an “outlier” See the Micheal Jordan example above. You can NOT use more advanced statistics (like median or mean) on the lower “levels of measure” (the nominal variables). They would be meaningless!

Mean (the average): A measure that locates the central point of a distribution in terms of both the numbers of cases on either side of the point and their distance from it. Sum up the values for ALL cases, divide by the number of cases.

Ex. What is the mean student IQ?

(110+150+95+70+100+99+140+125+100+75) / 10 Mean=106

If you have a large N (number of cases), then you may want to work with a formula like

(value) (# of cases with that value) + (value) (# of cases with that value)… / N

Use this formula to find the mean workweek:

(0) (1) + (10) (4) + (20) (3) + (25) (1) + (30) (1) / 10 Mean = 15.5

The measure of dispersion for interval data is the standard deviation. We get to the standard deviation by way of the variance – Or a measure of the average “distances” each value has from the mean. The greater the variance and the standard distribution, the greater the dispersion.

To calculate the variance and standard deviation, follow these 6 steps.

1. Calculate the mean for the variable. We will use IQ for this example. The mean IQ is 106.

2. Calculate the distance between the mean and each value. (Don’t worry about the sign of the integer – negative or positive because we square these next).

3. Square that distances calculated in step 2.

4. Add the squared distances calculated in step 3. Answer is 5968

5. To get the variance, divide the sum from step 4 by N (number of cases). Variance =596.8

6. To get the standard deviation, take the square root of the variance. 24.42 ie, Many students fall within 24 IQ points of the mean.

Value	Value -Mean	(Value-Mean)²
110	110-106 =4	16
150	150-106=44	1936
95	95-106=11	121
70	70-106=36	1296
100	100-106=6	36
99	99-106=7	49
140	140-106=34	1156
125	125-106=19	361
100	100-106=6	36
75	75-106=31	961

Total = 5968

This standard deviation gives us an idea of how dispersed the scores are around that mean. It would let us compare other groups of students’ IQs with this group. But it is pretty meaningless to us in this form. We don’t really know what a “good” dispersion would be.

We know that, for normal distributions with 1 mode (most averages you will use fall into this category, that 68.3% of all cases will fall within 1 standard deviation from the mean, 95.5% will fall within 2 standard deviations from the mean, and 99.7% within 3 standard deviations from the mean. Thus for the data above, 68.3% of cases would fall between 130 and 82 IQ, and 95% between 154 and 58, etc.

We can also calculate a standard score for any particular value, that will tell us how far from the mean a case is – Can be used to compare between 2 or more cases. We calculate a z score.

Z= (value – mean) / standard deviation.

Thus a z score for an IQ of 100 would be 100-106/24.4 == -.24.

Turning to Appendix A6, we see that the distance between the mean and the z for this score is .09.