prev

next

out of 30

View

217Download

0

Embed Size (px)

7/25/2019 Lec 5 - Normality Testing

1/30

Testing for Normality

7/25/2019 Lec 5 - Normality Testing

2/30

For each mean and standard deviation combination a theoretical

normal distribution can be determined. This distribution is based

on the proportions shown below.

7/25/2019 Lec 5 - Normality Testing

3/30

This theoretical normal distribution can then be compared to the

actual distribution of the data.

Are the actual data statistically different than the computed

normal curve?

Theoretical normal

distribution calculated

from a mean of 66.51 and a

standard deviation of

18.265.

The actualdata

distribution that has a

mean of 66.51 and a

standard deviation of

18.265.

7/25/2019 Lec 5 - Normality Testing

4/30

There are several methods of assessing whether data are

normally distributed or not. They fall into two broad categories:

graphicaland statistical. The some common techniques are:

Graphical

Q-Q probability plots

Cumulative frequency (P-P) plots

Statistical

W/S test

Jarque-Bera test

Shapiro-Wilks test Kolmogorov-Smirnov test

DAgostino test

7/25/2019 Lec 5 - Normality Testing

5/30

Q-Q plots display the observed values against normally

distributed data (represented by the line).

Normally distributed data fall along the line.

7/25/2019 Lec 5 - Normality Testing

6/30

Graphical methods are typically not very useful when the sample size is

small. This is a histogram of the last example. These data do not look

normal, but they are not statistically different than normal.

7/25/2019 Lec 5 - Normality Testing

7/30

Tests of Norma lity

.110 1048 .000 .931 1048 .000Age

Statistic df Sig. Statistic df Sig.

Kolmogorov-Smirnova

Shapiro-Wilk

Lilliefors Significance Correctiona.

Tests of Normality

.283 149 .000 .463 149 .000TOTAL_VALU

Statistic df Sig. Statistic df Sig.

Kolmogorov-Smirnova

Shapiro-Wilk

Lilliefors Significance Correctiona.

Tests of Norma lity

.071 100 .200* .985 100 .333Z100

Statistic df Sig. Statistic df Sig.

Kolmogorov-Smirnova

Shapiro-Wilk

This is a lower bound of the true s ignificance.*.

Lilliefors Significance Correctiona.

7/25/2019 Lec 5 - Normality Testing

8/30

Statistical tests for normality are more precise since actual

probabilities are calculated.

Tests for normality calculate the probability that the sample was

drawn from a normal population.

The hypotheses used are:

Ho: The sample data are not significantly different than a

normal population.

Ha: The sample data are significantly different than a normalpopulation.

7/25/2019 Lec 5 - Normality Testing

9/30

Typically, we are interested in finding a difference between groups.

When we are, we look for small probabilities.

If the probability of finding an event is rare (less than 5%) andwe actually find it, that is of interest.

When testing normality, we are not looking for a difference.

In effect, we want our data set to be NO DIFFERENT than

normal. We want to accept the null hypothesis.

So when testing for normality:

Probabilities > 0.05 mean the data are normal.

Probabilities < 0.05 mean the data are NOT normal.

7/25/2019 Lec 5 - Normality Testing

10/30

Non-Normally Distributed Data

.142 72 .001 .841 72 .000Average PM10Statistic df Sig. Statistic df Sig.

Kolmogorov-Smirnova

Shapiro-Wilk

Lilliefors Significance Correctiona.

Remember that LARGE probabilities denote normally distributed

data. Below are examples taken from SPSS.

Normally Distributed Data

.069 72 .200* .988 72 .721Asthma Cases

Statistic df Sig. Statistic df Sig.

Kolmogorov-Smirnova

Shapiro-Wilk

This is a lower bound of the true significance.*.

Lil liefors Significance Correctiona.

7/25/2019 Lec 5 - Normality Testing

11/30

In SPSS output above the probabilities are greater than 0.05 (the

typical alpha level), so we accept Ho these data are not different

from normal.

Normally Distributed Data

.069 72 .200* .988 72 .721Asthma Cases

Statistic df Sig. Statistic df Sig.

Kolmogorov-Smirnova

Shapiro-Wilk

This is a lower bound of the true significance.*.

Lil liefors Significance Correctiona.

7/25/2019 Lec 5 - Normality Testing

12/30

Non-Normally Distributed Data

.142 72 .001 .841 72 .000Average PM10

Statistic df Sig. Statistic df Sig.

Kolmogorov-Smirnova

Shapiro-Wilk

Lilliefors Significance Correctiona.

In the SPSS output above the probabilities are less than 0.05 (the typical

alpha level), so we reject Ho these data are significantly different from

normal.

Important: As the sample size increases, normality parameters becomesMORErestrictive and it becomes harder to declare that the data are

normally distributed. So for very large data sets, normality testing

becomes less important.

7/25/2019 Lec 5 - Normality Testing

13/30

Three Simple Tests for Normality

7/25/2019 Lec 5 - Normality Testing

14/30

W/S Test for Normality

A fairly simple test that requires only the sample standard

deviation and the data range.

Based on the q statistic, which is the studentized (meaning t

distribution) range, or the range expressed in standard

deviation units. Tests kurtosis.

Should not be confused with the Shapiro-Wilks test.

where q is the test statistic, w is the range of the data and s is

the standard deviation.

s

w

q=

7/25/2019 Lec 5 - Normality Testing

15/30

Range constant,

SD changes

Range changes,SD constant

7/25/2019 Lec 5 - Normality Testing

16/30

Village

Population

Density

Aranza 4.13

Corupo 4.53

San Lorenzo 4.69Cheranatzicurin 4.76

Nahuatzen 4.77

Pomacuaran 4.96

Sevina 4.97

Arantepacua 5.00

Cocucho 5.04

Charapan 5.10

Comachuen 5.25

Pichataro 5.36

Quinceo 5.94

Nurio 6.06

Turicuaro 6.19

Urapicho 6.30

Capacuaro 7.73

Standard deviation (s) = 0.866

Range (w) = 3.6n = 17

31.406.3

16.4866.0

6.3

toq

q

s

wq

RangeCritical =

==

=

The W/S test uses a critical range. IF the calculated value falls WITHIN the range,

then accept Ho. IF the calculated value falls outside the range then reject Ho.

Since 3.06 < q=4.16 < 4.31, then we accept Ho.

7/25/2019 Lec 5 - Normality Testing

17/30

7/25/2019 Lec 5 - Normality Testing

18/30

Since we have a critical range, it is difficult to determine a probability

range for our results. Therefore we simply state our alpha level.

The sample data set is not significantly different than normal(W/S4.16, p > 0.05).

7/25/2019 Lec 5 - Normality Testing

19/30

3

1

3

3

)(

ns

xx

k

n

i

i=

= 3

)(

4

1

4

4

==

ns

xx

k

n

i

i

( ) ( )

+=

246

2

4

2

3 kknJB

Wherexis each observation, n is the sample size, s is the

standard deviation, k3 is skewness, and k4 is kurtosis.

JarqueBera Test

A goodness-of-fit test of whether sample data have the skewness

and kurtosis matching a normal distribution.

7/25/2019 Lec 5 - Normality Testing

20/30

VillagePopulation

DensityMean

DeviatesMean

Deviates3Mean

Deviates4

Aranza 4.13 -1.21 -1.771561 2.14358881

Corupo 4.53 -0.81 -0.531441 0.43046721

San Lorenzo 4.69 -0.65 -0.274625 0.17850625

Cheranatzicurin 4.76 -0.58 -0.195112 0.11316496

Nahuatzen 4.77 -0.57 -0.185193 0.10556001

Pomacuaran 4.96 -0.38 -0.054872 0.02085136Sevina 4.97 -0.37 -0.050653 0.01874161

Arantepacua 5.00 -0.34 -0.039304 0.01336336

Cocucho 5.04 -0.30 -0.027000 0.00810000

Charapan 5.10 -0.24 -0.013824 0.00331776

Comachuen 5.25 -0.09 -0.000729 0.00006561

Pichataro 5.36 0.02 0.000008 0.00000016

Quinceo 5.94 0.60 0.216000 0.12960000

Nurio 6.06 0.72 0.373248 0.26873856

Turicuaro 6.19 0.85 0.614125 0.52200625

Urapicho 6.30 0.96 0.884736 0.84934656

Capacuaro 7.73 2.39 13.651919 32.62808641

12.595722 37.433505

87.0

34.5

=

=

s

x

7/25/2019 Lec 5 - Normality Testing

21/30

13.1)87.0)(17(

6.12

33 ==

k 843.03)87.0)(17(

43.37

44 ==

k

( ) ( )

( )

12.4

0296.02128.017

24

711.0

6

2769.117

24

843.0

6

13.117

22

=

+=

+=

+=

JB

JB

JB

7/25/2019 Lec 5 - Normality Testing

22/30

The Jarque-Bera statistic can be compared to the 2 distribution (table) with 2

degrees of freedom (df or v) to determine the critical value at an alpha level of

0.05.

The critical 2 value is 5.991. Our calculated Jarque-Bera statistic is 4.12

which falls between 0.5 and 0.1, which is greater than the critical value.

Ther