16 Seaborn

1 Introduction

Seaborn is a statistical data visualization library for Python built on top of Matplotlib.

It provides:

  • High-level, easy-to-use functions
  • Beautiful default themes
  • Strong integration with Pandas DataFrames
  • Many plot types commonly used in statistics
  • You work directly with DataFrame columns
  • Many defaults are automatically meaningful
  • Complex plots require very little code

We import Seaborn as:

import seaborn as sns
import matplotlib.pyplot as plt

2 Seaborn Themes and Style

Seaborn provides clean built-in themes that enhance Matplotlib plots.

sns.set_theme()   # Default theme

Other options:

sns.set_style("whitegrid")
sns.set_style("darkgrid")

Load the palmerpenguins data

import pandas as pd
penguins = pd.read_csv("data/penguins.csv")
penguins.head()
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
0 Adelie Torgersen 39.1 18.7 181.0 3750.0 male 2007
1 Adelie Torgersen 39.5 17.4 186.0 3800.0 female 2007
2 Adelie Torgersen 40.3 18.0 195.0 3250.0 female 2007
3 Adelie Torgersen NaN NaN NaN NaN NaN 2007
4 Adelie Torgersen 36.7 19.3 193.0 3450.0 female 2007

3 The 5 Seaborn Plot Families

Seaborn groups its visualization functions into:

  1. Relational plots
    • visualize relationships between variables
    • functions: scatterplot(), lineplot(), relplot()
  2. Distributional plots
    • visualize distribution of one or two variables
    • functions: histplot(), kdeplot(), displot()
  3. Categorical plots
    • compare categories across numerical values
    • functions: boxplot(), violinplot(), stripplot(), catplot()
  4. Regression plots
    • show relationships + fitted models
    • functions: regplot(), lmplot()
  5. Matrix plots
    • visualize entire grids / correlation structures
    • functions: heatmap(), clustermap()

4 Relational Plots - relplot

  • Purpose: Visualize the relationship between two variables.
  • Common high-level function:
    • relplot() → wrapper for relational plots (scatter or line)
  • Individual functions:
    • scatterplot()
    • lineplot()

4.1 scatterplot

sns.scatterplot(data=penguins, x="flipper_length_mm", y="bill_length_mm")

4.2 lineplot

(Not very meaningful for penguins, but for demonstration)

sns.lineplot(
    data=penguins.sort_values("flipper_length_mm"),
    x="flipper_length_mm",
    y="bill_length_mm"
)

4.3 High-level relplot

sns.relplot(
    data=penguins,
    x="flipper_length_mm",
    y="bill_length_mm",
    kind="scatter"
)

5 Customizing Plots with hue, style, size, alpha, palette

Seaborn allows rich customization of plots using visual encodings. These help communicate more variables through the plot.

5.1 hue (color by category)

sns.scatterplot(
    data=penguins,
    x="flipper_length_mm",
    y="bill_length_mm",
    hue="species"
)

5.2 hue + style (Shape)

sns.scatterplot(
    data=penguins,
    x="bill_length_mm",
    y="flipper_length_mm",
    hue="species",
    style="sex"
)

5.3 size (continuous variable)

Visualizing body mass differences:

sns.scatterplot(
    data=penguins,
    x="bill_depth_mm",
    y="bill_length_mm",
    hue="species",
    size="body_mass_g",
    sizes=(20, 200)
)
plt.title("Adding SIZE to Encode Body Mass")
plt.show()

5.4 alpha (Transparency)

Useful for overlapping points:

sns.scatterplot(
    data=penguins,
    x="flipper_length_mm",
    y="bill_length_mm",
    hue="species",
    alpha=0.6
)

5.5 Custom color palette

Seaborn has many beautiful palettes.

Built-in palettes

sns.color_palette()
sns.palettes.SEABORN_PALETTES.keys()
dict_keys(['deep', 'deep6', 'muted', 'muted6', 'pastel', 'pastel6', 'bright', 'bright6', 'dark', 'dark6', 'colorblind', 'colorblind6'])

hue + palette

sns.scatterplot(
    data=penguins,
    x="bill_length_mm",
    y="body_mass_g",
    hue="species",
    palette="viridis"
)

Other nice palettes:

  • "deep"
  • "muted"
  • "bright"
  • "dark"
  • "colorblind"
  • "pastel"
  • "mako"
  • "rocket"
  • "icefire"

5.6 hue + style + size + alpha + palette

This combines everything into one plot:

sns.scatterplot(
    data=penguins,
    x="flipper_length_mm",
    y="bill_length_mm",
    hue="species",
    style="sex",
    size="body_mass_g",
    sizes=(30, 200),
    alpha=0.7,
    palette="dark"
)

6 Distributional Plots - displot

  • Purpose: Understand the distribution of one or two variables.
  • Common high-level function:
    • displot() → wrapper for histograms, KDEs, ECDFs
  • Individual functions:
    • histplot()
    • kdeplot()
    • ecdfplot()

6.1 histplot

sns.histplot(
    data=penguins,
    x="bill_length_mm",
    kde=True
)

6.2 Kernel Density Estimate - kdeplot

sns.kdeplot(
    data=penguins,
    x="body_mass_g",
    fill=True
)

6.3 High-level displot

sns.displot(
    data=penguins,
    x="bill_length_mm",
    hue="species",
    kind="hist",
    kde=True
)

7 Categorical Plots - catplot

  • Purpose: Compare numeric values across categories.
  • Common high-level function:
    • catplot() → wrapper for 8 categorical plot types
  • Individual functions:
    • boxplot()
    • violinplot()
    • stripplot()
    • swarmplot()
    • barplot()
    • countplot()

7.1 boxplot

sns.boxplot(
    data=penguins,
    x="species",
    y="body_mass_g"
)

7.2 violinplot

sns.violinplot(
    data=penguins,
    x="species",
    y="flipper_length_mm"
)

7.3 barplot (Aggregate)

By default, Seaborn shows mean with CI intervals — very useful for statistics.

Average body mass by species:

sns.barplot(
    data=penguins,
    x="species",
    y="body_mass_g"
)

7.4 countplot

For categorical frequencies:

sns.countplot(data=penguins, x="island")
plt.title("Count of Penguins by Island")
Text(0.5, 1.0, 'Count of Penguins by Island')

7.5 High-level catplot

sns.catplot(
    data=penguins,
    x="species",
    y="body_mass_g",
    kind="box"
)

8 Regression Plots

  • lmplot: Plot data and regression model fits across a FacetGrid.
  • regplot: Plot data and a linear regression model fit.

8.1 lmplot

Equivalent to ggplot2 geom_smooth(method="lm").

sns.lmplot(
    data=penguins,
    x="flipper_length_mm",
    y="bill_length_mm"
)

sns.lmplot(
    data=penguins,
    x="flipper_length_mm",
    y="bill_length_mm",
    hue="species"
)

9 Matrix Plots

  • heatmap: Plot rectangular data as a color-encoded matrix.
  • clustermap: Plot a matrix dataset as a hierarchically-clustered heatmap.

9.1 heatmap

Correlation between numeric variables:

corr = penguins.select_dtypes("number").corr()

sns.heatmap(
    corr,
    annot=True,
    cmap="coolwarm"
)

10 Multi-plot grids

10.1 Faceting with col (Multiple Panels)

The col argument creates separate panels for each category, which helps compare patterns across groups.

sns.relplot(
    data=penguins,
    x="flipper_length_mm",
    y="bill_length_mm",
    hue="species",
    col="sex"
)

10.2 FacetGrid

Faceting by species and sex:

g = sns.FacetGrid(
    penguins,
    col="species",
    row="sex"
)
g.map_dataframe(
    sns.scatterplot,
    x="flipper_length_mm",
    y="bill_length_mm"
)

This is the equivalent of ggplot2’s facet_grid().

10.3 pairplot

For multivariate exploration

sns.pairplot(penguins)

You can also color by category:

sns.pairplot(penguins, hue="species")

10.4 jointplot

A combination of scatter + distribution plots.

sns.jointplot(
    data=penguins,
    x="bill_length_mm",
    y="bill_depth_mm",
    kind="reg"
)

Kinds include: scatter, kde, hist, hex, reg.

sns.jointplot(data=penguins,
    x="flipper_length_mm",
    y="bill_length_mm", hue="species", kind="hist")

11 Customizing with Matplotlib

Seaborn plots return Matplotlib Axes objects, so customization is simple.

ax = sns.scatterplot(
    data=penguins,
    x="body_mass_g",
    y="bill_length_mm",
    hue="species"
)

ax.set_title("Bill Length vs Body Mass")
ax.set_xlabel("Body Mass (g)")
ax.set_ylabel("Bill Length (mm)")
plt.show()

12 Saving Figures

fig = sns.histplot(
    data=penguins,
    x="body_mass_g"
)

plt.savefig("penguins_hist.png", dpi=300, bbox_inches='tight')

13 Summary

  • Seaborn is a high-level statistical visualization library
  • Built on top of Matplotlib
  • Easy DataFrame integration
  • Beautiful defaults and themes
  • Excellent for exploratory and statistical graphics
  • Faceting, regression plots, heatmaps, distribution plots—all extremely simple

Matplotlib is still needed for fine control, but Seaborn simplifies 80–90% of tasks.