Skewness and Kurtosis

Skewness and Kurtosis-

Introduction-

Skewness and Kurtosis are very important  concepts in statistics and have several applications.  In addition, they characterize the nature of data distribution which make data analysis easier. Moreover, I will separately discuss skewness and kurtosis in further sections.

Skewness-

Skewness  refers the measurement of lack of symmetry in data distribution. Measures of skewness has applications in data analytics, machine learning and data science in per-processing of data.  Moreover, if mean, median and mode of a data distribution coincides  i.e mean = median = mode then. Then data set has skewness 0 i.e there is no asymmetry in data set.

Suppose a data set

0,10, 20,20, 30, 40,40,50,50, 50,50,40,30,20,10,0

Mean: 28.75
Median: 30.0
Mode: 50
Standard Deviation: 17.275343701356565
Skewness : -0.24321198774750508

The value of coef. of skewness is negative and this type of skewness in data distribution is called negative skewness.

Formula of skewness

Coef. of Skewness = 3(Mean-Median) /Standard Deviation

The statistics are calculated using the following python code

Python Code for  Calculating Coefficient of Skewness

from scipy.stats import skew
import numpy as np
import statistics
import matplotlib.pyplot as plt
x = [0,10, 20,20, 30, 40,40,50,50, 50,50,40,30,20,10,0]
print(x)
mean= np.mean(x)
median= np.median(x)
mode= statistics.mode(x)
std=np.std(x)
print("Mean:", mean)
print("Median:", median)
print("Mode:", mode)
print("Standard Deviation:", std)
lines = plt.plot(x)
plt.setp(lines, color='r', linewidth=2.0)
print( "Skewness :" , skew(x))
plt.savefig("skewness.jpg")

skewnessConsider another data set

0,10, 20, 30,40,50,50,50,60,60,70,80,90,100,110,120,70,60,60, 50,50,50,40,30,20,10,0

Mean: 51.111111111111114
Median: 50.0
Mode: 50
Standard Deviation: 30.83208205669246
Skewness : 0.32780083058284104

The value of coef. of skewness is positive and this type of skewness in data distribution is called positive skewness.

positive skewnessKurtosis-

Kurtosis measures “How much heavy tail a data distribution have”. Furthermore,  it is used for outlier detection in a data set

that means  how many values have different characteristics.

The formula for kurtosis is

Coef. Kurtosis =(X-μ)4/Variance

Python code for Kurtosis

import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as stats
import math
from scipy.stats import kurtosis
mu = 0
variance = 1
sigma = math.sqrt(variance)
x = np.linspace(mu – 5*sigma, mu + 5*sigma, 50)
y1=stats.norm.pdf(x, mu, sigma)
print(“x”,x)
print(“y1”,y1)
plt.plot(x,y1)
print(“Kurtosis\n”, kurtosis(y1))
plt.savefig(“kurtosis.jpg”)

The  normally distributed  data set generated from  the above python  code is

1.48671951e-06 4.03963981e-06 1.05285406e-05 2.63211976e-05
6.31182642e-05 1.45183206e-04 3.20324125e-04 6.77914385e-04
1.37616968e-03 2.67966838e-03 5.00497661e-03 8.96674844e-03
1.54091915e-02 2.54001718e-02 4.01610804e-02 6.09096432e-02
8.86091674e-02 1.23646888e-01 1.65500632e-01 2.12484892e-01
2.61678710e-01 3.09115411e-01 3.50255414e-01 3.80680815e-01
3.96870719e-01 3.96870719e-01 3.80680815e-01 3.50255414e-01
3.09115411e-01 2.61678710e-01 2.12484892e-01 1.65500632e-01
1.23646888e-01 8.86091674e-02 6.09096432e-02 4.01610804e-02
2.54001718e-02 1.54091915e-02 8.96674844e-03 5.00497661e-03
2.67966838e-03 1.37616968e-03 6.77914385e-04 3.20324125e-04
1.45183206e-04 6.31182642e-05 2.63211976e-05 1.05285406e-05
4.03963981e-06 1.48671951e-06

Pot of data is

And  kurtosis is -0.24249670483561347

Conclusion- 

I this post, I have explained about skewness and kurtosis which is very important to understand data distribution. These both data analytics method are very important in machine learning, data science and  big data analytics. Hope all these concepts  I have explained will help you.

 

 

Leave a Comment

Your email address will not be published. Required fields are marked *

©Postnetwork-All rights reserved.