Visualization in Python

Akshar Rastogi
4 min readJun 11, 2021

Plotting is the visual way of understanding our data. It gives us a clear idea of what information means and it even makes us identify and see relationships between variables.

Visualization shrinks the spread of data from spreadsheets/CSV to a plot/map.

Matplotlib is the first word that is synonymous with python visualization. Matplotlib has an omnipresence in incarnations of various python libraries which makes the tedious process of coding brief and fast. Even plotting in pandas is based on matplotlib.

In this post, I will compare seaborn and matplotlib to give an idea of why to chose seaborn over matplotlib, one thing I also want to mention here those plots which don’t have a matplotlib code attatched means that either that feature don’t exist on matplotlib or it’s very hard to code there. (P.S it is very much advisable to get familiar with matplotlib concepts first)

What is Matplotlib?

Matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy.

import pandas as pdimport numpy as npimport matplotlib.pyplot as pltimport seaborn as sns

What is Seaborn?

Seaborn provides an API on top of Matplotlib that offers sane choices for plot style and color defaults, defines simple high-level functions for common statistical plot types, and integrates with the functionality provided by Pandas

#loading data with seaborndf_titanic = sns.load_dataset('titanic')df_iris = sns.load_dataset('iris')df_flights = sns.load_dataset('flights')

Datasets used are -Titanic, Iris, and Flights of the seaborn library.

Different Type of Plots

The number of variables used classifies the types of plot. The 4 type of plots classified are-

  • Univariate Plots
  • Bivariate Plots
  • Multivariate Plots
  • Numerical Variables against Categorical Variables

Univariate Plots

Plotting of a single variable creates univariate plots.

Distplots

Distplots show the univariate distribution of data i.e. data distribution of a variable against the density function.

plt.figure(figsize=(16,8))sns.distplot(df_titanic['fare'])plt.title('Distribution of Fare in titanic')plt.show()

Countplot

Countplots are used to get the quantity-wise categorical distribution.

#countplotplt.figure(figsize=(12,6))sns.countplot(df_titanic['sex'])plt.show()

Bivariate Plots

Bivariate plots shows relation between two variables.

Barplot

It shows the relationship between a numeric and a categoric variable. Each entity of the categoric variable is represented as a bar.

plt.bar(df_flights['month'],df_flights['passengers'] )

Scatterplot

The scatterplot shows the distribution of relation by points.

sns.scatterplot(x='sepal_length', y='sepal_width', data=df_iris)
plt.title('scaterplot in seaborn')

Here is the scatterplot of matplotlib

plt.scatter(df_iris['sepal_length'],df_iris['sepal_width'])
plt.title('Scatter plot in matplotlib')
plt.show()

Jointplot

Jointplot displays a relationship between 2 variables (bivariate) as well as 1D profiles (univariate) in the margins.

sns.jointplot(x=df_iris['sepal_length'], y=df_iris['sepal_width'], kind='hex')

Multivariate Plots

Multivariate plots show the relationship between more than two variables.

Scatterplot with Hue

The scatterplot shows the distribution of relation by points with categorical variables as hue.

sns.scatterplot(x='sepal_length', y='sepal_width', hue='species', data =df_iris)plt.title('Scatterplt with hue in Seaborn')

Barplot with Hue

It shows the relationship between a numeric and a categoric variable. Each entity of the categoric variable is represented as a bar and the third variable as hue.

sns.barplot(x='sex', y='fare', data= df_tiitanic, hue='class')

Numerical Variables against Categorical Variables

Boxplot

A boxplot shows the distribution by Quartiles. It shows the shape of the distribution, its central value, and its variability.

sns.boxplot(x='sex', y='age', data = df_titanic)plt.show()
sns.boxplot(x='species', y='sepal_length', data = df_iris)

Violinplot

A violinplot is the advanced version of Boxplot. It also shows the distribution of the data.

sns.violinplot(x='month', y='passengers', data=df_flights)

Swarmplot

A swarmplot not only shows the distribution but also every data point.

sns.swarmplot(y=df_iris['petal_length'], x=df_iris['species'])

Pair Plots

These plots shows relation between all variables within a single plot.(One of my favourite xD)

sns.pairplot(df_iris, hue='species', diag_kind='hist')
g = pd.plotting.scatter_matrix(df_iris, figsize=(10,10), marker = '*', hist_kwds = {'bins': 10}, s = 60, alpha = 0.8)
plt.show()

--

--