Code
library(reticulate)
py_require(c('celluloid','seaborn','IPython'))

1 Introduction to Seaborn

Seaborn is a Python data visualization library based on Matplotlib. It provides a high-level interface for creating attractive and informative statistical graphics. This document demonstrates how to create various plots using Seaborn, customize their appearance, and save them.

Code
import seaborn as sns
print(sns.__version__)
0.13.2
Code
# Import necessary libraries
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib

# Load an example dataset
tips = sns.load_dataset("tips")
tips.head()
   total_bill   tip     sex smoker  day    time  size
0       16.99  1.01  Female     No  Sun  Dinner     2
1       10.34  1.66    Male     No  Sun  Dinner     3
2       21.01  3.50    Male     No  Sun  Dinner     3
3       23.68  3.31    Male     No  Sun  Dinner     2
4       24.59  3.61  Female     No  Sun  Dinner     4

2 Scatter Plot

A scatter plot is used to display the relationship between two continuous variables. Each point on the plot represents an observation in the dataset.

Code
sns.scatterplot(data=tips, x='tip', y='total_bill')

2.1 Color by Group

You can color the points in a scatter plot based on a categorical variable to visualize relationships within different groups.

Code
sns.scatterplot(data=tips, x='tip', y='total_bill', hue='sex')

2.2 Size by Group

Similarly, the size of the points can be varied based on a numerical or categorical variable.

Code
sns.scatterplot(data=tips, x='tip', y='total_bill', size='size')

3 Line Plot

A line plot is ideal for visualizing the trend of a continuous variable over a continuous interval or time.

Code
dowjones = sns.load_dataset("dowjones")
dowjones.head()
        Date  Price
0 1914-12-01  55.00
1 1915-01-01  56.55
2 1915-02-01  56.00
3 1915-03-01  58.30
4 1915-04-01  66.45
Code
sns.lineplot(data=dowjones, x='Date', y='Price')

3.1 Line Plot with Dots

You can also add markers to the line plot to highlight the data points. ::: {.cell}

Code
sns.lineplot(data=dowjones, x='Date', y='Price', marker='o')

:::

3.2 Color by Group

Different lines can be plotted for different categories to compare trends.

Code
import random

# Create datasets for comparison
dowjones2 = dowjones.copy()
dowjones2['type'] = 'old'

dowjones3 = dowjones.copy()
dowjones3['Price'] = dowjones3['Price'] + random.random() * 200
dowjones3['type'] = 'new'

dowjones4 = pd.concat([dowjones2, dowjones3], ignore_index=True)
dowjones4 = dowjones4.sort_values('Date').reset_index(drop=True)
Code
dowjones4.head()
        Date       Price type
0 1914-12-01   55.000000  old
1 1914-12-01  148.624455  new
2 1915-01-01  150.174455  new
3 1915-01-01   56.550000  old
4 1915-02-01   56.000000  old
Code
sns.lineplot(data=dowjones4, x='Date', y='Price', hue='type')

4 Histogram

A histogram is used to represent the distribution of a single numerical variable.

Code
sns.histplot(data=tips, x='tip')

4.1 Color by Group

Histograms can be grouped by a categorical variable to compare distributions.

Code
sns.histplot(data=tips, x='tip', hue='sex', multiple="dodge")

5 Bar Chart

A bar chart represents categorical data with rectangular bars. The lengths of the bars are proportional to the values they represent.

Code
sns.barplot(data=tips, x='sex', y='tip', errorbar=None)

5.1 Show Number on Bars

You can display the value of each bar directly on the plot.

Code
ax = sns.barplot(data=tips, x='sex', y='tip', errorbar=None)

for i in ax.containers:
    ax.bar_label(i,)

5.2 Horizontal Bar Plot

Bar charts can also be plotted horizontally.

Code
ax = sns.barplot(data=tips, y='sex', x='tip', errorbar=None, orient='h')
plt.show()

6 Box Plot

A box plot displays the five-number summary of a set of data: minimum, first quartile, median, third quartile, and maximum.

Code
sns.boxplot(data=tips, x='day', y='tip')

6.1 Color by Group

Box plots can be grouped by a categorical variable to compare the distributions.

Code
sns.boxplot(data=tips, x='day', y='tip', hue='sex')

7 Strip Plot

A strip plot is a scatter plot where one of the variables is categorical. It is useful for visualizing the distribution of data points.

Code
sns.stripplot(data=tips, x='day', y='tip')

7.1 Color by Group

Strip plots can also be grouped by a categorical variable.

Code
sns.stripplot(data=tips, x='day', y='tip', hue='sex', dodge=True)

8 Joint Plot

A joint plot shows the relationship between two variables along with their individual distributions.

Code
sns.jointplot(data=tips, x='total_bill', y='tip', kind='reg')
<seaborn.axisgrid.JointGrid object at 0x1118ed110>

9 Facet Plot

Facet plots allow you to create multiple plots based on the subsets of your data.

Code
g = sns.FacetGrid(data=tips, col="day", hue="sex")
g.map_dataframe(sns.scatterplot, x="total_bill", y="tip")

Code
g.add_legend()

9.1 Two Plots per Column

You can wrap the columns of the facet grid to control the layout.

Code
g = sns.FacetGrid(data=tips, col="day", col_wrap=2, hue="sex")
g.map_dataframe(sns.scatterplot, x="total_bill", y="tip")

Code
g.add_legend()

10 Subplots

You can create a figure with multiple subplots to display several plots at once.

Code
fig, axes = plt.subplots(1, 2)

sns.boxplot(data=tips, x='day', y='tip', hue='sex', ax=axes[0])
sns.boxplot(data=tips, x='day', y='tip', ax=axes[1])

11 Displaying Chinese Characters on macOS

To display Chinese characters correctly in plots on macOS, you need to set the font family to one that supports them.

Code
# Add the following line
plt.rcParams['font.family'] = ['Arial Unicode MS'] # To display Chinese labels correctly
plt.rcParams['axes.unicode_minus'] = False # To display the minus sign correctly
sns.set_style('whitegrid', {'font.sans-serif': ['Arial Unicode MS', 'Arial']})

12 Customizing Title, Size, and Axis Labels

12.1 Add Title

You can add a title to your plot to describe its content.

Code
df = sns.load_dataset("tips")
ax = sns.boxplot(x="day", y="total_bill", data=df)
ax.set_title("Tips Box Plot")

12.2 Adjust Size

The size of the plot can be adjusted to fit your needs.

Code
plt.clf()
plt.figure(figsize=(10, 6))
ax = sns.boxplot(x="day", y="total_bill", data=df)
ax.set_title("Tips Box Plot")
plt.show()

12.3 Change Axis Labels

You can customize the labels for the x and y axes.

Code
ax = sns.boxplot(x="day", y="total_bill", data=df)
ax.set_title("Tips Box Plot")
ax.set(xlabel='X-axis Label', ylabel='Y-axis Label')

13 Applying Themes

Seaborn comes with several built-in themes to style your plots.

14 Saving a Plot

You can save your plot to a file in various formats.

Code
import seaborn as sns
df = sns.load_dataset("tips")
plt.clf()
plt.style.use('default')
sns.boxplot(x="day", y="total_bill", data=df)
plt.savefig("output.png", dpi=100, bbox_inches="tight")

15 Animated Plot

You can create animated plots to show changes over time or another variable.

Code
from celluloid import Camera
Code
from celluloid import Camera
from matplotlib import pyplot as plt

fig = plt.figure()
camera = Camera(fig)

a = sns.lineplot(data=dowjones4, x='Date', y='Price', hue='type')
hands, labs = a.get_legend_handles_labels()

new_data = dowjones4.sample(50, random_state=42)
new_data = new_data.sort_values(by=['Date'], ascending=True)

for i in new_data["Date"]:
    data = dowjones4.query('Date <= @i')
    sns.lineplot(data=data, x='Date', y='Price', hue='type')
    plt.legend(handles=hands, labels=labs)
    camera.snap()

animation = camera.animate()
Code
from IPython.display import HTML
HTML(animation.to_html5_video())

16 References

Back to top