Input & Output in R

Author

Tony Duan

This document provides a comprehensive guide to reading data into R (input) and writing data out of R (output). We will cover a wide variety of common file formats, from traditional text files to modern, high-performance formats.

Code

pak::pak(c('tidyverse','openxlsx','arrow','readxl'))

Code

# Load common packages for I/O operations
library(tidyverse)
library(openxlsx)
library(arrow)
library(readxl)

1. Input: Reading Data into R

1.1. Text Files (CSV, TXT)

Reading CSV Files

Comma-Separated Values (CSV) are one of the most common formats for storing tabular data. The readr package (part of the tidyverse) provides the read_csv() function, which is fast and efficient.

Code

# Create a sample data frame to write to a CSV
sample_df <- tibble(
  id = 1:5,
  name = c("Alice", "Bob", "Charlie", "David", "Eve"),
  score = c(95, 82, 78, 91, 88)
)

write_csv(sample_df, "data/sample_data.csv")

# Read the CSV file back into R
data_from_csv <- read_csv("data/sample_data.csv")
head(data_from_csv)

# A tibble: 5 × 3
     id name    score
  <dbl> <chr>   <dbl>
1     1 Alice      95
2     2 Bob        82
3     3 Charlie    78
4     4 David      91
5     5 Eve        88

Reading Delimited Files

For text files that use delimiters other than commas (like tabs or semicolons), you can use read_delim().

Code

# Write a tab-delimited file
write_delim(sample_df, "data/sample_data.txt", delim = "\t")

# Read the tab-delimited file
data_from_txt <- read_delim("data/sample_data.txt", delim = "\t")
head(data_from_txt)

# A tibble: 5 × 3
     id name    score
  <dbl> <chr>   <dbl>
1     1 Alice      95
2     2 Bob        82
3     3 Charlie    78
4     4 David      91
5     5 Eve        88

1.2. Excel Files (.xlsx)

The readxl package is excellent for reading .xls and .xlsx files. The openxlsx package can be used for both reading and writing.

Code

# Write the sample data to an Excel file
write.xlsx(sample_df, "data/sample_data.xlsx")

# Read the data from the Excel file
data_from_excel <- read_excel("data/sample_data.xlsx")
head(data_from_excel)

# A tibble: 5 × 3
     id name    score
  <dbl> <chr>   <dbl>
1     1 Alice      95
2     2 Bob        82
3     3 Charlie    78
4     4 David      91
5     5 Eve        88

1.3. High-Performance Formats (Parquet, Feather)

For large datasets, text-based formats are inefficient. Binary formats like Parquet and Feather are much faster to read and write and take up less disk space.

Reading Parquet Files

Parquet is a columnar storage format optimized for big data processing. The arrow package provides the interface to work with Parquet files.

Code

# Write data to a Parquet file
write_parquet(sample_df, "data/sample_data.parquet")

# Read data from the Parquet file
data_from_parquet <- read_parquet("data/sample_data.parquet")
head(data_from_parquet)

# A tibble: 5 × 3
     id name    score
  <int> <chr>   <dbl>
1     1 Alice      95
2     2 Bob        82
3     3 Charlie    78
4     4 David      91
5     5 Eve        88

Reading Feather Files

Feather is a fast, lightweight binary format designed for sharing data between R and Python.

Code

# Write data to a Feather file
write_feather(sample_df, "data/sample_data.feather")

# Read data from the Feather file
data_from_feather <- read_feather("data/sample_data.feather")
head(data_from_feather)

# A tibble: 5 × 3
     id name    score
  <int> <chr>   <dbl>
1     1 Alice      95
2     2 Bob        82
3     3 Charlie    78
4     4 David      91
5     5 Eve        88

1.4. R-Specific Formats (.Rds, .RData)

R has its own native formats for saving any R object.

.Rds: Saves a single R object.
.RData: Saves multiple R objects and can save the entire workspace.

Reading an .Rds File

Code

# Save a single object (our data frame)
saveRDS(sample_df, "data/sample_df.rds")

# Read the object back
data_from_rds <- readRDS("data/sample_df.rds")
head(data_from_rds)

# A tibble: 5 × 3
     id name    score
  <int> <chr>   <dbl>
1     1 Alice      95
2     2 Bob        82
3     3 Charlie    78
4     4 David      91
5     5 Eve        88

Loading an .RData File

Code

# Save multiple objects to one file
save(sample_df, mtcars, file = "data/multiple_objects.RData")

# Clear the objects from the environment to show they are loaded
rm(sample_df, mtcars)

# Load the objects back into the workspace
load("data/multiple_objects.RData")

# Check that they exist again
head(sample_df)

# A tibble: 5 × 3
     id name    score
  <int> <chr>   <dbl>
1     1 Alice      95
2     2 Bob        82
3     3 Charlie    78
4     4 David      91
5     5 Eve        88

1.5. Reading from a Database

Connecting to databases is covered in detail in a separate guide (6 SQL database.qmd), but here is a quick example using SQLite.

Code

library(DBI)
library(RSQLite)

# Create an in-memory SQLite database and write mtcars to it
con <- dbConnect(RSQLite::SQLite(), ":memory:")
dbWriteTable(con, "mtcars_table", mtcars)

# Read the full table back into an R data frame
mtcars_from_db <- dbReadTable(con, "mtcars_table")
dbDisconnect(con)

head(mtcars_from_db)

   mpg cyl disp  hp drat    wt  qsec vs am gear carb
1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

1.6 Read .docx (Word) file

Code

library(officer)
# docx = read_docx("data/example03.docx")
# docx_summary(docx)

1.7 Read PDF Tables

Code

library(tabulapdf)
# out <- extract_tables('data/60870-By-the-Numbers.pdf')
# out[[1]]

2. Output: Writing Data from R

Writing data is the reverse of reading. The functions are generally named write_*().

2.1. Write to CSV

Code

# The `write_csv` function from readr is a great choice
write_csv(mtcars, "data/mtcars_output.csv")

2.2. Write to Excel

Code

# The `write.xlsx` function from openxlsx is very convenient
write.xlsx(iris, "data/iris_output.xlsx")

2.3. Write to Parquet

Code

# Use the arrow package for high-performance writing
write_parquet(airquality, "data/airquality_output.parquet")

2.4. Write to Feather

Code

# Ideal for R-to-Python data exchange
write_feather(diamonds, "data/diamonds_output.feather")

2.5. Write to R-Specific Formats

Code

# Save a single model object
my_model <- lm(mpg ~ hp, data = mtcars)
saveRDS(my_model, "data/linear_model.rds")

# Save the entire current workspace
# save.image(file = "data/my_full_workspace.RData")

2.6. Write to a Database

Code

# Establish a connection
con <- dbConnect(RSQLite::SQLite(), "data/my_output_database.sqlite")

# Write the iris dataset to a new table named 'iris_table'
# `overwrite = TRUE` will replace the table if it already exists
dbWriteTable(con, "iris_table", iris, overwrite = TRUE)

# List tables to confirm it was created
print(dbListTables(con))

[1] "iris_table"

Code

# Clean up
dbDisconnect(con)

2.7 Write to Google Sheets

You can write a data frame to a new or existing Google Sheet. This requires authentication.

Code

library(googlesheets4)

# Create a data frame to write
my_data_to_write <- data.frame(
  Name = c("Ada Lovelace", "Grace Hopper"),
  Contribution = c("First algorithm", "COBOL")
)

# Write the data to a new Google Sheet file named 'R-Sheet-Output'
ss <- write_sheet(my_data_to_write, ss = "R-Sheet-Output", sheet = "Sheet1")

# The returned object `ss` contains the URL of the new sheet
print(ss)

--- title: "Input & Output in R" author: "Tony Duan" execute: warning: false error: false format: html: toc: true toc-location: right code-fold: show code-tools: true number-sections: false code-block-bg: true code-block-border-left: "#31BAE9" --- This document provides a comprehensive guide to reading data into R (input) and writing data out of R (output). We will cover a wide variety of common file formats, from traditional text files to modern, high-performance formats. ![](images/1_-OQi5hp-uoL_KdIydr4hew.gif){width="500"} ```{r} pak::pak(c('tidyverse','openxlsx','arrow','readxl')) ``` ```{r} # Load common packages for I/O operations library(tidyverse) library(openxlsx) library(arrow) library(readxl) ``` # 1. Input: Reading Data into R ## 1.1. Text Files (CSV, TXT) ### Reading CSV Files Comma-Separated Values (CSV) are one of the most common formats for storing tabular data. The `readr` package (part of the `tidyverse`) provides the `read_csv()` function, which is fast and efficient. ```{r} # Create a sample data frame to write to a CSV sample_df <- tibble( id = 1:5, name = c("Alice", "Bob", "Charlie", "David", "Eve"), score = c(95, 82, 78, 91, 88) ) write_csv(sample_df, "data/sample_data.csv") # Read the CSV file back into R data_from_csv <- read_csv("data/sample_data.csv") head(data_from_csv) ``` ### Reading Delimited Files For text files that use delimiters other than commas (like tabs or semicolons), you can use `read_delim()`. ```{r} # Write a tab-delimited file write_delim(sample_df, "data/sample_data.txt", delim = "\t") # Read the tab-delimited file data_from_txt <- read_delim("data/sample_data.txt", delim = "\t") head(data_from_txt) ``` ## 1.2. Excel Files (.xlsx) The `readxl` package is excellent for reading `.xls` and `.xlsx` files. The `openxlsx` package can be used for both reading and writing. ```{r} # Write the sample data to an Excel file write.xlsx(sample_df, "data/sample_data.xlsx") # Read the data from the Excel file data_from_excel <- read_excel("data/sample_data.xlsx") head(data_from_excel) ``` ## 1.3. High-Performance Formats (Parquet, Feather) For large datasets, text-based formats are inefficient. Binary formats like Parquet and Feather are much faster to read and write and take up less disk space. ### Reading Parquet Files Parquet is a columnar storage format optimized for big data processing. The `arrow` package provides the interface to work with Parquet files. ```{r} # Write data to a Parquet file write_parquet(sample_df, "data/sample_data.parquet") # Read data from the Parquet file data_from_parquet <- read_parquet("data/sample_data.parquet") head(data_from_parquet) ``` ### Reading Feather Files Feather is a fast, lightweight binary format designed for sharing data between R and Python. ```{r} # Write data to a Feather file write_feather(sample_df, "data/sample_data.feather") # Read data from the Feather file data_from_feather <- read_feather("data/sample_data.feather") head(data_from_feather) ``` ## 1.4. R-Specific Formats (.Rds, .RData) R has its own native formats for saving any R object. - `.Rds`: Saves a **single** R object. - `.RData`: Saves **multiple** R objects and can save the entire workspace. ### Reading an .Rds File ```{r} # Save a single object (our data frame) saveRDS(sample_df, "data/sample_df.rds") # Read the object back data_from_rds <- readRDS("data/sample_df.rds") head(data_from_rds) ``` ### Loading an .RData File ```{r} # Save multiple objects to one file save(sample_df, mtcars, file = "data/multiple_objects.RData") # Clear the objects from the environment to show they are loaded rm(sample_df, mtcars) # Load the objects back into the workspace load("data/multiple_objects.RData") # Check that they exist again head(sample_df) ``` ## 1.5. Reading from a Database Connecting to databases is covered in detail in a separate guide (`6 SQL database.qmd`), but here is a quick example using SQLite. ```{r} library(DBI) library(RSQLite) # Create an in-memory SQLite database and write mtcars to it con <- dbConnect(RSQLite::SQLite(), ":memory:") dbWriteTable(con, "mtcars_table", mtcars) # Read the full table back into an R data frame mtcars_from_db <- dbReadTable(con, "mtcars_table") dbDisconnect(con) head(mtcars_from_db) ``` ## 1.6 Read .docx (Word) file ```{r} library(officer) # docx = read_docx("data/example03.docx") # docx_summary(docx) ``` ## 1.7 Read PDF Tables ```{r} library(tabulapdf) # out <- extract_tables('data/60870-By-the-Numbers.pdf') # out[[1]] ``` # 2. Output: Writing Data from R Writing data is the reverse of reading. The functions are generally named `write_*()`. ## 2.1. Write to CSV ```{r} # The `write_csv` function from readr is a great choice write_csv(mtcars, "data/mtcars_output.csv") ``` ## 2.2. Write to Excel ```{r} # The `write.xlsx` function from openxlsx is very convenient write.xlsx(iris, "data/iris_output.xlsx") ``` ## 2.3. Write to Parquet ```{r} # Use the arrow package for high-performance writing write_parquet(airquality, "data/airquality_output.parquet") ``` ## 2.4. Write to Feather ```{r} # Ideal for R-to-Python data exchange write_feather(diamonds, "data/diamonds_output.feather") ``` ## 2.5. Write to R-Specific Formats ```{r} # Save a single model object my_model <- lm(mpg ~ hp, data = mtcars) saveRDS(my_model, "data/linear_model.rds") # Save the entire current workspace # save.image(file = "data/my_full_workspace.RData") ``` ## 2.6. Write to a Database ```{r} # Establish a connection con <- dbConnect(RSQLite::SQLite(), "data/my_output_database.sqlite") # Write the iris dataset to a new table named 'iris_table' # `overwrite = TRUE` will replace the table if it already exists dbWriteTable(con, "iris_table", iris, overwrite = TRUE) # List tables to confirm it was created print(dbListTables(con)) # Clean up dbDisconnect(con) ``` ## 2.7 Write to Google Sheets You can write a data frame to a new or existing Google Sheet. This requires authentication. ```{r} #| eval: false library(googlesheets4) # Create a data frame to write my_data_to_write <- data.frame( Name = c("Ada Lovelace", "Grace Hopper"), Contribution = c("First algorithm", "COBOL") ) # Write the data to a new Google Sheet file named 'R-Sheet-Output' ss <- write_sheet(my_data_to_write, ss = "R-Sheet-Output", sheet = "Sheet1") # The returned object `ss` contains the URL of the new sheet print(ss) ```