Author

Tony Duan

This document provides a comprehensive guide to reading data into R (input) and writing data out of R (output). We will cover a wide variety of common file formats, from traditional text files to modern, high-performance formats.

Code
pak::pak(c('tidyverse','openxlsx','arrow','readxl'))
Code
# Load common packages for I/O operations
library(tidyverse)
library(openxlsx)
library(arrow)
library(readxl)

1. Input: Reading Data into R

1.1. Text Files (CSV, TXT)

Reading CSV Files

Comma-Separated Values (CSV) are one of the most common formats for storing tabular data. The readr package (part of the tidyverse) provides the read_csv() function, which is fast and efficient.

Code
# Create a sample data frame to write to a CSV
sample_df <- tibble(
  id = 1:5,
  name = c("Alice", "Bob", "Charlie", "David", "Eve"),
  score = c(95, 82, 78, 91, 88)
)

write_csv(sample_df, "data/sample_data.csv")

# Read the CSV file back into R
data_from_csv <- read_csv("data/sample_data.csv")
head(data_from_csv)
# A tibble: 5 × 3
     id name    score
  <dbl> <chr>   <dbl>
1     1 Alice      95
2     2 Bob        82
3     3 Charlie    78
4     4 David      91
5     5 Eve        88

Reading Delimited Files

For text files that use delimiters other than commas (like tabs or semicolons), you can use read_delim().

Code
# Write a tab-delimited file
write_delim(sample_df, "data/sample_data.txt", delim = "\t")

# Read the tab-delimited file
data_from_txt <- read_delim("data/sample_data.txt", delim = "\t")
head(data_from_txt)
# A tibble: 5 × 3
     id name    score
  <dbl> <chr>   <dbl>
1     1 Alice      95
2     2 Bob        82
3     3 Charlie    78
4     4 David      91
5     5 Eve        88

1.2. Excel Files (.xlsx)

The readxl package is excellent for reading .xls and .xlsx files. The openxlsx package can be used for both reading and writing.

Code
# Write the sample data to an Excel file
write.xlsx(sample_df, "data/sample_data.xlsx")

# Read the data from the Excel file
data_from_excel <- read_excel("data/sample_data.xlsx")
head(data_from_excel)
# A tibble: 5 × 3
     id name    score
  <dbl> <chr>   <dbl>
1     1 Alice      95
2     2 Bob        82
3     3 Charlie    78
4     4 David      91
5     5 Eve        88

1.3. High-Performance Formats (Parquet, Feather)

For large datasets, text-based formats are inefficient. Binary formats like Parquet and Feather are much faster to read and write and take up less disk space.

Reading Parquet Files

Parquet is a columnar storage format optimized for big data processing. The arrow package provides the interface to work with Parquet files.

Code
# Write data to a Parquet file
write_parquet(sample_df, "data/sample_data.parquet")

# Read data from the Parquet file
data_from_parquet <- read_parquet("data/sample_data.parquet")
head(data_from_parquet)
# A tibble: 5 × 3
     id name    score
  <int> <chr>   <dbl>
1     1 Alice      95
2     2 Bob        82
3     3 Charlie    78
4     4 David      91
5     5 Eve        88

Reading Feather Files

Feather is a fast, lightweight binary format designed for sharing data between R and Python.

Code
# Write data to a Feather file
write_feather(sample_df, "data/sample_data.feather")

# Read data from the Feather file
data_from_feather <- read_feather("data/sample_data.feather")
head(data_from_feather)
# A tibble: 5 × 3
     id name    score
  <int> <chr>   <dbl>
1     1 Alice      95
2     2 Bob        82
3     3 Charlie    78
4     4 David      91
5     5 Eve        88

1.4. R-Specific Formats (.Rds, .RData)

R has its own native formats for saving any R object.

  • .Rds: Saves a single R object.
  • .RData: Saves multiple R objects and can save the entire workspace.

Reading an .Rds File

Code
# Save a single object (our data frame)
saveRDS(sample_df, "data/sample_df.rds")

# Read the object back
data_from_rds <- readRDS("data/sample_df.rds")
head(data_from_rds)
# A tibble: 5 × 3
     id name    score
  <int> <chr>   <dbl>
1     1 Alice      95
2     2 Bob        82
3     3 Charlie    78
4     4 David      91
5     5 Eve        88

Loading an .RData File

Code
# Save multiple objects to one file
save(sample_df, mtcars, file = "data/multiple_objects.RData")

# Clear the objects from the environment to show they are loaded
rm(sample_df, mtcars)

# Load the objects back into the workspace
load("data/multiple_objects.RData")

# Check that they exist again
head(sample_df)
# A tibble: 5 × 3
     id name    score
  <int> <chr>   <dbl>
1     1 Alice      95
2     2 Bob        82
3     3 Charlie    78
4     4 David      91
5     5 Eve        88

1.5. Reading from a Database

Connecting to databases is covered in detail in a separate guide (6 SQL database.qmd), but here is a quick example using SQLite.

Code
library(DBI)
library(RSQLite)

# Create an in-memory SQLite database and write mtcars to it
con <- dbConnect(RSQLite::SQLite(), ":memory:")
dbWriteTable(con, "mtcars_table", mtcars)

# Read the full table back into an R data frame
mtcars_from_db <- dbReadTable(con, "mtcars_table")
dbDisconnect(con)

head(mtcars_from_db)
   mpg cyl disp  hp drat    wt  qsec vs am gear carb
1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

1.6 Read .docx (Word) file

Code
library(officer)
# docx = read_docx("data/example03.docx")
# docx_summary(docx)

1.7 Read PDF Tables

Code
library(tabulapdf)
# out <- extract_tables('data/60870-By-the-Numbers.pdf')
# out[[1]]

2. Output: Writing Data from R

Writing data is the reverse of reading. The functions are generally named write_*().

2.1. Write to CSV

Code
# The `write_csv` function from readr is a great choice
write_csv(mtcars, "data/mtcars_output.csv")

2.2. Write to Excel

Code
# The `write.xlsx` function from openxlsx is very convenient
write.xlsx(iris, "data/iris_output.xlsx")

2.3. Write to Parquet

Code
# Use the arrow package for high-performance writing
write_parquet(airquality, "data/airquality_output.parquet")

2.4. Write to Feather

Code
# Ideal for R-to-Python data exchange
write_feather(diamonds, "data/diamonds_output.feather")

2.5. Write to R-Specific Formats

Code
# Save a single model object
my_model <- lm(mpg ~ hp, data = mtcars)
saveRDS(my_model, "data/linear_model.rds")

# Save the entire current workspace
# save.image(file = "data/my_full_workspace.RData")

2.6. Write to a Database

Code
# Establish a connection
con <- dbConnect(RSQLite::SQLite(), "data/my_output_database.sqlite")

# Write the iris dataset to a new table named 'iris_table'
# `overwrite = TRUE` will replace the table if it already exists
dbWriteTable(con, "iris_table", iris, overwrite = TRUE)

# List tables to confirm it was created
print(dbListTables(con))
[1] "iris_table"
Code
# Clean up
dbDisconnect(con)

2.7 Write to Google Sheets

You can write a data frame to a new or existing Google Sheet. This requires authentication.

Code
library(googlesheets4)

# Create a data frame to write
my_data_to_write <- data.frame(
  Name = c("Ada Lovelace", "Grace Hopper"),
  Contribution = c("First algorithm", "COBOL")
)

# Write the data to a new Google Sheet file named 'R-Sheet-Output'
ss <- write_sheet(my_data_to_write, ss = "R-Sheet-Output", sheet = "Sheet1")

# The returned object `ss` contains the URL of the new sheet
print(ss)
Back to top