Statistical Functions in Python

In this tutorial, we would be covering some useful statistical functions which can be applied to pandas and series objects.

By Priya Sengar, Data Scientist with Old Dominion University on October 12, 2022 in Python

Photo by Andrea Piacquadio

Statistical functions are of great help in analyzing the data and making meaningful conclusions. In this tutorial, we would be covering some useful statistical functions which can be applied to pandas and series objects

The following statistical functions would be covered in the tutorial:

pct_change()
cov ()
corr ()
corrwith ()

pct_change()

The method pct_change () can be applied to a panda’s series and Data Frame to calculate the percent change over a specific number of periods

Calculating pct_change() without specifying the number of periods

Code:

import pandas as pd
import numpy as np

series = pd.Series(np.random.randn(10))

series.pct_change()

Output:

0         NaN

1   -0.881470

2   -5.025007

3    0.728078

4   -0.577371

5    1.173420

6   -1.578389

7   -3.520208

8   -1.927874

9   -1.600583

dtype: float64

Calculating pct_change() by specifying the number of periods

Code:

df = pd.DataFrame(np.random.randn(10,2))

df.pct_change(periods = 2)

Output:

	0	1
0	NaN	NaN
1	NaN	NaN
2	-0.095052	-1.399525
3	0.073909	-7.491512
4	-0.882174	-1.150202

Covariance: cov()

The method cov () is used to calculate the covariance in a series and Data Frame. While calculating the covariance in a Data Frame, pairwise covariance is calculated amongst the series in a Data Frame.

While calculating the covariance in series and Data Frame missing values are excluded if any

Calculating covariance between two series

Code:

series1 = pd.Series(np.random.randn(200))
series2 = pd.Series(np.random.randn(200))

series1.cov(series2)

Output:

-0.14817157321848334

Calculating covariance of a Data Frame

Code:

df = pd.DataFrame(np.random.randn(4,5),columns = ["a","b","c","d","e"])
df.cov()

Output:

	a	b	c	d	e
a	2.095402	0.191502	0.049185	0.090229	-1.052856
b	0.191502	0.628889	0.377184	-0.507893	0.404180
c	0.049185	0.377184	0.336220	-0.077814	0.571139
d	0.090229	-0.507893	-0.077814	0.950198	0.164894
e	-1.052856	0.404180	0.571139	0.164894	1.722546

Correlation: corr ()

Correlation is computed using the corr () method, the corr () method has a method parameter that has the following method name option's available:

Pearson(default) which is the Standard correlation coefficient
Kendall Tau correlation coefficient
Spearman rank correlation coefficient

Calculating the correlation between series in a Data Frame using the default Pearson

Code:

df = pd.DataFrame(np.random.randn(200,4), columns = ["a","b","c","d"])
df["a"]. corr(df["b"])

Output:

0.08425780768544051

Calculating the correlation between series in a Data Frame using the method spearman

Code:

df["a"]. corr(df["b"],method = "spearman")

Output:

0.053819845496137414

Calculating the pairwise correlation between Data Frame columns

Code:

df.corr()

Output:

	a	b	c	d
a	1.000000	0.084258	-0.074284	0.054453
b	0.084258	1.000000	0.022995	0.029727
c	-0.074284	0.022995	1.000000	-0.028279
d	0.054453	0.029727	-0.028279	1.000000

corrwith ()

Corrwith () method is applied to a Data Frame to calculate the correlation between the same - labeled Series in different Data Frame objects

Code:

index = ["a","b","c","d","e"]

columns = ["one","two","three","four"]

df1 = pd.DataFrame(np.random.randn(5,4), index = index, columns = columns )

df2 = pd.DataFrame(np.random.randn(4,4), index = index[:4], columns = columns)

df1.corrwith(df2)

Output:

one      0.277569

two     -0.052151

three   -0.754392

four     0.526614

dtype: float64

Code:

df2.corrwith(df1, axis=1)

Output:

a    0.346955

b   -0.707590

c    0.711081

d    0.753457

e         NaN

dtype: float64

Priya Sengar (Medium, Github) is a Data Scientist with Old Dominion University. Priya is passionate about solving problems in data and converting them into solutions.

Statistical Functions in Python

pct_change()

Calculating pct_change() without specifying the number of periods

Calculating pct_change() by specifying the number of periods

Covariance: cov()

Calculating covariance between two series

Calculating covariance of a Data Frame

Correlation: corr ()

Calculating the correlation between series in a Data Frame using the default Pearson

Calculating the correlation between series in a Data Frame using the method spearman

Calculating the pairwise correlation between Data Frame columns

corrwith ()

More On This Topic

Latest Posts

Top Posts