# Non-Normal Distribution in Statistics – Skewness and Kurtosis (3-9)

Now that I have explained to you the

ubiquity of the normal distribution, Its regular appearance in human measurements,

you may begin to hope or even expect that all of the distributions that we

will encounter will be normal curves, but if that is your expectation, you will

have to get used to disappointment. (Princess bride reference there) Because

many curves, perhaps most curves, are not normal distributions, we need a way to

talk about the shape of distributions when they differ from normality. The

first difference that we may find is that the scores in the distribution are

more spread out than we would have expected, or we may find the scores are

more closely packed together than we expected. The name for the peaked or

flatness of a curve is called kurtosis. When the scores are very close together,

then the curve becomes peaked. We call this a "leptokurtic" curve think of the

scores leaping up – leptokurtic. When the scores are very spread out, the curve

becomes flat like a plate, we call this platykurtic. "Plat" rhymes with flat.

Platykurtic is a flattened curve in the shape of a plate. A normal curve is

mesokurtic. It's kurtosis is medium. So kurtosis can be measured as leptokurtic –

tall, platykurtic – flat, or mesokurtic – medium. Kurtosis is caused by the

variability in the distribution. Another thing that can happen to a curve is when

the scores are pulled out in only one direction. When the scores are dragged down (or

rather, out) in only one direction, this creates a skew in our curve. Therefore,

we need to talk about the skewness of our distribution. Negatively skewed

distributions have a higher than expected frequency of high or extreme

scores on the right, and the tail is pulled out to the left end of the number

line on the x-axis. For example, if we were interested in the running speeds of

football players, we might find a lot of very fast players – high scores, but only a

few slower runners – low scores. Skewness is always caused by outliers in the

direction of the tail. In a positively skewed distribution, the higher than

expected frequencies are on the low end of the curve. The tail is pulled back on

the right or positive end of the number line. If we were measuring reaction time,

we would expect to have a large number of very quick responses – low scores, and

only a few slower responses, taking more time, further up the positive end of that

scale. Skewed distributions are not normal. How can you remember which

direction is positive or negative when we talk about skewness? Stats cow tells

us that the skew is in the tail. Skewness is caused by outliers, extreme scores in

the tail of the distribution, the direction that the tail is pulled out,

(positive or negative) is the direction of the skew. Here are two curves. This first

one is positively skewed, and the second is negatively skewed the top curve is

positively skewed because the tail is pulled out on the right, or the positive

direction of the number line. The bottom curve, is negatively skewed, the tail is

pulled out on the negative, or left end of the

number line. Both of these curves show us what happens to the mean and the median

in the case of kurtosis. In both of these curves, you can see what happens to the

mean and the median in the case of skewness. Both of them are pulled in the

direction of the outlier but the mean is pulled further. That is because the mean

is more susceptible to the outlier that is causing the skewness. Mathematically

we can calculate a measure of skewness by comparing the mean and the median and this will give us a value that we can use to quantify the skewness of our

curve. But there are other things that can go wrong with our normal curve!

Instead of having one peak sometimes we have two peaks. This occurs when there is more than one most frequently occurring score we call this type of curve bimodal. A curve can be bimodal when there really are two most frequently occurring scores.

For instance when is the best time to go fishing? At what time of day will you

catch the most fish? Probably early in the morning, and then in the evening when the sun is going down. In the middle of the day, when the Sun is at its height,

you will catch fewer fish. So if we plot the number of fish caught, we will see a

peak in the morning at dawn, and another peak in the evening at dusk. This would

be a true bimodal distribution. On the other hand, we might have a bimodal

distribution when there are actually two distributions overlying each other. When

we had both males and females on the football field and we were comparing

heights, we saw that there was a distribution for males and another

distribution for females. The distributions overlapped – some females

were taller than some males – but the average heights were taller for males.

They really were two distinct distributions that should be separated

before being analyzed. A multimodal distribution has three or more most

frequently occurring scores. You may wonder why we don't call it a

trimodal distribution or a quadrimodal distribution – four peaks – the answer is

that when we start getting three, four, five, modes, there is something very wrong in our data set. Three or more modes is multimodal, and it's messed up. We need to figure out what is going on before we try to analyze those data. Rectangular

distributions have the same frequency for all scores. If you roll a single die

100 times, how many times do you expect to get a one? About one-sixth of the time,

in fact you would expect to get each of the scores, one through six, approximately one sixth of a time. That is a rectangular distribution. Once you add a

second die, however your distribution will begin to look more normal.

Rectangular distributions have exactly the same frequency for all scores, and do

not have tails. Before we conclude, there is – one more thing – that I want to tell

you about the normal curve, and that is that the normal curve can be overlaid

with a number line, and this is where things get really interesting and quite

useful. If we have a normal curve, we can add the value of the mean right in the

middle where it belongs, and in this example we're going to imagine that our

mean is 50, so then we could lay out a number line with four point delineations.

Half of our scores will always be above the mean, or above 50. The remaining half

of the scores will always be below 50. That is what a measure of central

tendency tells us. It is the point at which half of the scores fall above and

half of the scores fall below. The next thing that we could do is measure

the proportion of the scores that fall within a certain range, above or below

the mean. The next thing that we could do is measure the proportion of the scores

that fall within a certain range above or below the mean. The

proportion is the total area under the normal curve that corresponds to the

relative frequency of those scores. To better understand this, let's return to

our picture of the people standing on the football field. Remember that

everyone (100%) are standing below the rope that represents

our distribution we want to know the proportion of people who are between

five foot six and five foot nine inches tall. We ask everyone who is in those

rows, five foot six, seven, eight, and nine to stay where they are, everyone else

please leave the field. So how many people are in these four rows? Divide the

number of people in the four rows by the total number of people and you have a

proportion. This is the proportion of people who are in that range underneath

the distribution. It would also be the relative frequency of the number of

people in that range, and this is going to become a very useful technique when

we talk about z-scores. But for now, just remember what we've learned about the

frequency table, and specifically how the relative frequency relates to what we

know about the normal curve.

funny and informative

Great video, but better takeaway would be that the 50% Point is the Median Not the Mean!! Only in rare cases, if the data is 100% symmetric both are identical.

Thanks for the video it helped a lot

sir what is non Gaussian distribution and will it have skewness??pls explain me the answer

Thank you so much for these great courses Doctor … I have non true bimodal distributed data… i learned from you they must be analysed separatetly and so i did….but i need the source to quote it in my thesis please… i have some question too… do you think we must perform a Shapiro Wilk test after we get a graph showing a non normal distribution just to support the result or is it overtesting… Great Thanks

Actually, kurtosis has nothing to do with pointiness or flatness of the peak. You can have an infinitely pointy peak with negative excess kurtosis, and you can have a flat peak with infinite kurtosis.

Instead, kurtosis measures the tail heaviness (outlier potential) of the distribution. Data values near the peak contribute very little to kurtosis. It is an unfortunate historical accident (no doubt due largely to both Fisher and Pearson) that people keep repeating the incorrect "peakedness" interpretation.

Westfall, P.H. (2014). Kurtosis as Peakedness, 1905 – 2014. R.I.P. The American Statistician, 68, 191–195.

thx for posting! what is the probability you will be within 1 standard deviation of a non normal dist. i know the prob of 1sd in a normal dist is 68%. Do we know what the sd is for a non normal dist?

Types of non normal distribution…Plz sir reply me..