Code
library(reticulate)
py_require(c('celluloid','seaborn','IPython'))
Seaborn is a Python data visualization library based on Matplotlib. It provides a high-level interface for creating attractive and informative statistical graphics. This document demonstrates how to create various plots using Seaborn, customize their appearance, and save them.
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
A scatter plot is used to display the relationship between two continuous variables. Each point on the plot represents an observation in the dataset.
You can color the points in a scatter plot based on a categorical variable to visualize relationships within different groups.
Similarly, the size of the points can be varied based on a numerical or categorical variable.
A line plot is ideal for visualizing the trend of a continuous variable over a continuous interval or time.
Date Price
0 1914-12-01 55.00
1 1915-01-01 56.55
2 1915-02-01 56.00
3 1915-03-01 58.30
4 1915-04-01 66.45
You can also add markers to the line plot to highlight the data points. ::: {.cell}
:::
Different lines can be plotted for different categories to compare trends.
import random
# Create datasets for comparison
dowjones2 = dowjones.copy()
dowjones2['type'] = 'old'
dowjones3 = dowjones.copy()
dowjones3['Price'] = dowjones3['Price'] + random.random() * 200
dowjones3['type'] = 'new'
dowjones4 = pd.concat([dowjones2, dowjones3], ignore_index=True)
dowjones4 = dowjones4.sort_values('Date').reset_index(drop=True)
Date Price type
0 1914-12-01 55.000000 old
1 1914-12-01 148.624455 new
2 1915-01-01 150.174455 new
3 1915-01-01 56.550000 old
4 1915-02-01 56.000000 old
A histogram is used to represent the distribution of a single numerical variable.
Histograms can be grouped by a categorical variable to compare distributions.
A bar chart represents categorical data with rectangular bars. The lengths of the bars are proportional to the values they represent.
You can display the value of each bar directly on the plot.
Bar charts can also be plotted horizontally.
A box plot displays the five-number summary of a set of data: minimum, first quartile, median, third quartile, and maximum.
Box plots can be grouped by a categorical variable to compare the distributions.
A strip plot is a scatter plot where one of the variables is categorical. It is useful for visualizing the distribution of data points.
Strip plots can also be grouped by a categorical variable.
A joint plot shows the relationship between two variables along with their individual distributions.
Facet plots allow you to create multiple plots based on the subsets of your data.
You can wrap the columns of the facet grid to control the layout.
You can create a figure with multiple subplots to display several plots at once.
To display Chinese characters correctly in plots on macOS, you need to set the font family to one that supports them.
You can add a title to your plot to describe its content.
The size of the plot can be adjusted to fit your needs.
You can customize the labels for the x and y axes.
Seaborn comes with several built-in themes to style your plots.
The “darkgrid” theme is the default and features a gray background with white grid lines.
The “whitegrid” theme has a white background with gray grid lines.
The “dark” theme is similar to “darkgrid” but without the grid lines.
The “white” theme is similar to “whitegrid” but without the grid lines.
The “ticks” theme is like the “white” theme but adds ticks to the axes.
This theme mimics the style of the FiveThirtyEight website.
This theme emulates the popular ggplot2
library in R.
This theme uses a color palette that is friendly to colorblind viewers.
You can save your plot to a file in various formats.
You can create animated plots to show changes over time or another variable.
from celluloid import Camera
from matplotlib import pyplot as plt
fig = plt.figure()
camera = Camera(fig)
a = sns.lineplot(data=dowjones4, x='Date', y='Price', hue='type')
hands, labs = a.get_legend_handles_labels()
new_data = dowjones4.sample(50, random_state=42)
new_data = new_data.sort_values(by=['Date'], ascending=True)
for i in new_data["Date"]:
data = dowjones4.query('Date <= @i')
sns.lineplot(data=data, x='Date', y='Price', hue='type')
plt.legend(handles=hands, labels=labs)
camera.snap()
animation = camera.animate()
---
title: "Seaborn Chart"
execute:
warning: false
error: false
format:
html:
toc: true
toc-location: right
code-fold: show
code-tools: true
number-sections: true
code-block-bg: true
code-block-border-left: "#31BAE9"
---

```{r}
library(reticulate)
py_require(c('celluloid','seaborn','IPython'))
```
# Introduction to Seaborn
Seaborn is a Python data visualization library based on Matplotlib. It provides a high-level interface for creating attractive and informative statistical graphics. This document demonstrates how to create various plots using Seaborn, customize their appearance, and save them.
```{python}
import seaborn as sns
print(sns.__version__)
```
```{python}
# Import necessary libraries
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib
# Load an example dataset
tips = sns.load_dataset("tips")
tips.head()
```
# Scatter Plot
A scatter plot is used to display the relationship between two continuous variables. Each point on the plot represents an observation in the dataset.
```{python}
sns.scatterplot(data=tips, x='tip', y='total_bill')
```
## Color by Group
You can color the points in a scatter plot based on a categorical variable to visualize relationships within different groups.
```{python}
sns.scatterplot(data=tips, x='tip', y='total_bill', hue='sex')
```
## Size by Group
Similarly, the size of the points can be varied based on a numerical or categorical variable.
```{python}
sns.scatterplot(data=tips, x='tip', y='total_bill', size='size')
```
# Line Plot
A line plot is ideal for visualizing the trend of a continuous variable over a continuous interval or time.
```{python}
dowjones = sns.load_dataset("dowjones")
dowjones.head()
```
```{python}
sns.lineplot(data=dowjones, x='Date', y='Price')
```
## Line Plot with Dots
You can also add markers to the line plot to highlight the data points.
```{python}
sns.lineplot(data=dowjones, x='Date', y='Price', marker='o')
```
## Color by Group
Different lines can be plotted for different categories to compare trends.
```{python}
#| code-fold: true
import random
# Create datasets for comparison
dowjones2 = dowjones.copy()
dowjones2['type'] = 'old'
dowjones3 = dowjones.copy()
dowjones3['Price'] = dowjones3['Price'] + random.random() * 200
dowjones3['type'] = 'new'
dowjones4 = pd.concat([dowjones2, dowjones3], ignore_index=True)
dowjones4 = dowjones4.sort_values('Date').reset_index(drop=True)
```
```{python}
dowjones4.head()
```
```{python}
sns.lineplot(data=dowjones4, x='Date', y='Price', hue='type')
```
# Histogram
A histogram is used to represent the distribution of a single numerical variable.
```{python}
sns.histplot(data=tips, x='tip')
```
## Color by Group
Histograms can be grouped by a categorical variable to compare distributions.
```{python}
sns.histplot(data=tips, x='tip', hue='sex', multiple="dodge")
```
# Bar Chart
A bar chart represents categorical data with rectangular bars. The lengths of the bars are proportional to the values they represent.
```{python}
sns.barplot(data=tips, x='sex', y='tip', errorbar=None)
```
## Show Number on Bars
You can display the value of each bar directly on the plot.
```{python}
ax = sns.barplot(data=tips, x='sex', y='tip', errorbar=None)
for i in ax.containers:
ax.bar_label(i,)
```
## Horizontal Bar Plot
Bar charts can also be plotted horizontally.
```{python}
ax = sns.barplot(data=tips, y='sex', x='tip', errorbar=None, orient='h')
plt.show()
```
# Box Plot
A box plot displays the five-number summary of a set of data: minimum, first quartile, median, third quartile, and maximum.
```{python}
sns.boxplot(data=tips, x='day', y='tip')
```
## Color by Group
Box plots can be grouped by a categorical variable to compare the distributions.
```{python}
sns.boxplot(data=tips, x='day', y='tip', hue='sex')
```
# Strip Plot
A strip plot is a scatter plot where one of the variables is categorical. It is useful for visualizing the distribution of data points.
```{python}
sns.stripplot(data=tips, x='day', y='tip')
```
## Color by Group
Strip plots can also be grouped by a categorical variable.
```{python}
sns.stripplot(data=tips, x='day', y='tip', hue='sex', dodge=True)
```
# Joint Plot
A joint plot shows the relationship between two variables along with their individual distributions.
```{python}
sns.jointplot(data=tips, x='total_bill', y='tip', kind='reg')
```
# Facet Plot
Facet plots allow you to create multiple plots based on the subsets of your data.
```{python}
g = sns.FacetGrid(data=tips, col="day", hue="sex")
g.map_dataframe(sns.scatterplot, x="total_bill", y="tip")
g.add_legend()
```
## Two Plots per Column
You can wrap the columns of the facet grid to control the layout.
```{python}
g = sns.FacetGrid(data=tips, col="day", col_wrap=2, hue="sex")
g.map_dataframe(sns.scatterplot, x="total_bill", y="tip")
g.add_legend()
```
# Subplots
You can create a figure with multiple subplots to display several plots at once.
```{python}
fig, axes = plt.subplots(1, 2)
sns.boxplot(data=tips, x='day', y='tip', hue='sex', ax=axes[0])
sns.boxplot(data=tips, x='day', y='tip', ax=axes[1])
```
# Displaying Chinese Characters on macOS
To display Chinese characters correctly in plots on macOS, you need to set the font family to one that supports them.
```{python}
# Add the following line
plt.rcParams['font.family'] = ['Arial Unicode MS'] # To display Chinese labels correctly
plt.rcParams['axes.unicode_minus'] = False # To display the minus sign correctly
sns.set_style('whitegrid', {'font.sans-serif': ['Arial Unicode MS', 'Arial']})
```
# Customizing Title, Size, and Axis Labels
## Add Title
You can add a title to your plot to describe its content.
```{python}
df = sns.load_dataset("tips")
ax = sns.boxplot(x="day", y="total_bill", data=df)
ax.set_title("Tips Box Plot")
```
## Adjust Size
The size of the plot can be adjusted to fit your needs.
```{python}
plt.clf()
plt.figure(figsize=(10, 6))
ax = sns.boxplot(x="day", y="total_bill", data=df)
ax.set_title("Tips Box Plot")
plt.show()
```
## Change Axis Labels
You can customize the labels for the x and y axes.
```{python}
ax = sns.boxplot(x="day", y="total_bill", data=df)
ax.set_title("Tips Box Plot")
ax.set(xlabel='X-axis Label', ylabel='Y-axis Label')
```
# Applying Themes
Seaborn comes with several built-in themes to style your plots.
::: {.panel-tabset .nav-pills}
## Darkgrid Theme
The "darkgrid" theme is the default and features a gray background with white grid lines.
```{python}
import seaborn as sns
df = sns.load_dataset("tips")
sns.set_theme()
# Equivalent to:
# sns.set_style("darkgrid")
sns.boxplot(x="day", y="total_bill", data=df)
```
## Whitegrid Theme
The "whitegrid" theme has a white background with gray grid lines.
```{python}
import seaborn as sns
df = sns.load_dataset("tips")
sns.set_style("whitegrid")
sns.boxplot(x="day", y="total_bill", data=df)
```
## Dark Theme
The "dark" theme is similar to "darkgrid" but without the grid lines.
```{python}
import seaborn as sns
df = sns.load_dataset("tips")
sns.set_style("dark")
sns.boxplot(x="day", y="total_bill", data=df)
```
## White Theme
The "white" theme is similar to "whitegrid" but without the grid lines.
```{python}
import seaborn as sns
df = sns.load_dataset("tips")
sns.set_style("white")
sns.boxplot(x="day", y="total_bill", data=df)
```
## Ticks Theme
The "ticks" theme is like the "white" theme but adds ticks to the axes.
```{python}
import seaborn as sns
df = sns.load_dataset("tips")
sns.set_style("ticks")
sns.boxplot(x="day", y="total_bill", data=df)
```
## fivethirtyeight Theme
This theme mimics the style of the FiveThirtyEight website.
```{python}
plt.clf()
plt.style.use('fivethirtyeight')
sns.boxplot(x="day", y="total_bill", data=df)
plt.show()
```
## ggplot Theme
This theme emulates the popular `ggplot2` library in R.
```{python}
plt.clf()
plt.style.use('ggplot')
sns.boxplot(x="day", y="total_bill", data=df)
plt.show()
```
## tableau-colorblind10 Theme
This theme uses a color palette that is friendly to colorblind viewers.
```{python}
plt.clf()
plt.style.use('tableau-colorblind10')
sns.boxplot(x="day", y="total_bill", data=df)
plt.show()
```
## dark_background Theme
This theme uses a dark background for the plots.
```{python}
plt.clf()
plt.style.use('dark_background')
sns.boxplot(x="day", y="total_bill", data=df)
plt.show()
```
:::
# Saving a Plot
You can save your plot to a file in various formats.
```{python}
import seaborn as sns
df = sns.load_dataset("tips")
plt.clf()
plt.style.use('default')
sns.boxplot(x="day", y="total_bill", data=df)
plt.savefig("output.png", dpi=100, bbox_inches="tight")
```
# Animated Plot
You can create animated plots to show changes over time or another variable.
```{python}
from celluloid import Camera
```
```{python}
#| output: false
from celluloid import Camera
from matplotlib import pyplot as plt
fig = plt.figure()
camera = Camera(fig)
a = sns.lineplot(data=dowjones4, x='Date', y='Price', hue='type')
hands, labs = a.get_legend_handles_labels()
new_data = dowjones4.sample(50, random_state=42)
new_data = new_data.sort_values(by=['Date'], ascending=True)
for i in new_data["Date"]:
data = dowjones4.query('Date <= @i')
sns.lineplot(data=data, x='Date', y='Price', hue='type')
plt.legend(handles=hands, labels=labs)
camera.snap()
animation = camera.animate()
```
```{python}
from IPython.display import HTML
HTML(animation.to_html5_video())
```
# References
- [Seaborn Official Documentation](https://seaborn.pydata.org/index.html)
- [YouTube Tutorial on Seaborn](https://www.youtube.com/watch?v=ooqXQ37XHMM)