Code
library(tidyverse)
library(tidymodels)
library(factoextra)
data(iris)
<- iris %>% select(-Species) iris_data
This document demonstrates how to perform clustering in R using the tidymodels
framework. Clustering is an unsupervised learning technique that groups similar data points together based on their inherent characteristics. We will use the iris
dataset for this demonstration.
First, we load the necessary libraries and the iris
dataset.
K-Means is a popular clustering algorithm. We will use it to group the iris data into 3 clusters.
Hierarchical clustering is another common clustering method.
# Calculate the distance matrix
dist_matrix <- dist(iris_data, method = "euclidean")
# Perform hierarchical clustering
hclust_model <- hclust(dist_matrix, method = "ward.D2")
# Visualize the dendrogram
fviz_dend(hclust_model, k = 3, # Cut in 3 groups
cex = 0.5, # label size
k_colors = c("#2E9FDF", "#00AFBB", "#E7B800"),
color_labels_by_k = TRUE, # color labels by groups
rect = TRUE # Add rectangle around groups
)
This document provided a brief overview of clustering in R using tidymodels
. We demonstrated both K-Means and Hierarchical clustering on the iris
dataset.
---
title: "Clustering in R"
execute:
warning: false
error: false
eval: false
format:
html:
toc: true
toc-location: right
code-fold: show
code-tools: true
number-sections: true
code-block-bg: true
code-block-border-left: "#31BAE9"
---
## Introduction
This document demonstrates how to perform clustering in R using the `tidymodels` framework. Clustering is an unsupervised learning technique that groups similar data points together based on their inherent characteristics. We will use the `iris` dataset for this demonstration.
## Load Data
First, we load the necessary libraries and the `iris` dataset.
```{r}
#| label: load-data
#| echo: true
library(tidyverse)
library(tidymodels)
library(factoextra)
data(iris)
iris_data <- iris %>% select(-Species)
```
## K-Means Clustering
K-Means is a popular clustering algorithm. We will use it to group the iris data into 3 clusters.
```{r}
#| label: kmeans
#| echo: true
set.seed(123)
kmeans_model <- kmeans(iris_data, centers = 3, nstart = 25)
# Visualize the clusters
fviz_cluster(kmeans_model, data = iris_data)
```
## Hierarchical Clustering
Hierarchical clustering is another common clustering method.
```{r}
#| label: hclust
#| echo: true
# Calculate the distance matrix
dist_matrix <- dist(iris_data, method = "euclidean")
# Perform hierarchical clustering
hclust_model <- hclust(dist_matrix, method = "ward.D2")
# Visualize the dendrogram
fviz_dend(hclust_model, k = 3, # Cut in 3 groups
cex = 0.5, # label size
k_colors = c("#2E9FDF", "#00AFBB", "#E7B800"),
color_labels_by_k = TRUE, # color labels by groups
rect = TRUE # Add rectangle around groups
)
```
## Conclusion
This document provided a brief overview of clustering in R using `tidymodels`. We demonstrated both K-Means and Hierarchical clustering on the `iris` dataset.