**Skewness and Kurtosis-**

**Skewness and Kurtosis-**

## Introduction-

Skewness and Kurtosis are very important concepts in statistics and have several applications. In addition, they characterize the nature of data distribution which make data analysis easier. Moreover, I will separately discuss skewness and kurtosis in further sections.

*Skewnes*s-

Skewness refers the measurement of lack of symmetry in data distribution. Measures of skewness has applications in data analytics, machine learning and data science in per-processing of data. Moreover, if mean, median and mode of a data distribution coincides i.e mean = median = mode then. Then data set has skewness 0 i.e there is no asymmetry in data set.

Suppose a data set

0,10, 20,20, 30, 40,40,50,50, 50,50,40,30,20,10,0

Mean: 28.75

Median: 30.0

Mode: 50

Standard Deviation: 17.275343701356565

Skewness : -0.24321198774750508

The value of coef. of skewness is negative and this type of skewness in data distribution is called negative skewness.

Formula of skewness

# Coef. of Skewness = 3(Mean-Median) /Standard Deviation

The statistics are calculated using the following python code

## Python Code for Calculating Coefficient of Skewness

from scipy.stats import skew import numpy as np import statistics import matplotlib.pyplot as plt x = [0,10, 20,20, 30, 40,40,50,50, 50,50,40,30,20,10,0] print(x) mean= np.mean(x) median= np.median(x) mode= statistics.mode(x) std=np.std(x) print("Mean:", mean) print("Median:", median) print("Mode:", mode) print("Standard Deviation:", std) lines = plt.plot(x) plt.setp(lines, color='r', linewidth=2.0) print( "Skewness :" , skew(x)) plt.savefig("skewness.jpg")

0,10, 20, 30,40,50,50,50,60,60,70,80,90,100,110,120,70,60,60, 50,50,50,40,30,20,10,0

Mean: 51.111111111111114

Median: 50.0

Mode: 50

Standard Deviation: 30.83208205669246

Skewness : 0.32780083058284104

The value of coef. of skewness is positive and this type of skewness in data distribution is called positive skewness.

**Kurtosis-**

**Kurtosis-**

Kurtosis measures “How much heavy tail a data distribution have”. Furthermore, it is used for outlier detection in a data set

that means how many values have different characteristics.

The formula for kurtosis is

# Coef. Kurtosis =(X-μ)^{4}/Variance

## Python code for Kurtosis

**import matplotlib.pyplot as plt**

**import numpy as np**

**import scipy.stats as stats**

**import math**

**from scipy.stats import kurtosis**

**mu = 0**

**variance = 1**

**sigma = math.sqrt(variance)**

**x = np.linspace(mu – 5*sigma, mu + 5*sigma, 50)**

**y1=stats.norm.pdf(x, mu, sigma)**

**print(“x”,x)**

**print(“y1”,y1)**

**plt.plot(x,y1)**

**print(“Kurtosis\n”, kurtosis(y1))**

**plt.savefig(“kurtosis.jpg”)**

The normally distributed data set generated from the above python code is

1.48671951e-06 4.03963981e-06 1.05285406e-05 2.63211976e-05

6.31182642e-05 1.45183206e-04 3.20324125e-04 6.77914385e-04

1.37616968e-03 2.67966838e-03 5.00497661e-03 8.96674844e-03

1.54091915e-02 2.54001718e-02 4.01610804e-02 6.09096432e-02

8.86091674e-02 1.23646888e-01 1.65500632e-01 2.12484892e-01

2.61678710e-01 3.09115411e-01 3.50255414e-01 3.80680815e-01

3.96870719e-01 3.96870719e-01 3.80680815e-01 3.50255414e-01

3.09115411e-01 2.61678710e-01 2.12484892e-01 1.65500632e-01

1.23646888e-01 8.86091674e-02 6.09096432e-02 4.01610804e-02

2.54001718e-02 1.54091915e-02 8.96674844e-03 5.00497661e-03

2.67966838e-03 1.37616968e-03 6.77914385e-04 3.20324125e-04

1.45183206e-04 6.31182642e-05 2.63211976e-05 1.05285406e-05

4.03963981e-06 1.48671951e-06

Pot of data is

**And kurtosis is -0.24249670483561347**

**Conclusion- **

I this post, I have explained about skewness and kurtosis which is very important to understand data distribution. These both data analytics method are very important in machine learning, data science and big data analytics. Hope all these concepts I have explained will help you.