Working with Data
R packages can include data — not just functions. This is helpful when:
- You want to ship example data with your package
- You’re documenting datasets for teaching or analysis
- You use specific reference data across projects
There are two types of package data:
- External data: available to users after loading the package
- Internal data: used inside the package only
Step 1: Add Raw Data Script
Run:
usethis::use_data_raw("my_dataset")This creates:
- A
data-raw/folder (if it doesn’t exist) - A starter script at
data-raw/my_dataset.R
Inside that script, read in and clean your data.
Step 2: Read and Save Data
Replace the contents of data-raw/my_dataset.R with something like:
# Read raw CSV
my_dataset <- read.csv("path/to/your.csv")
# Clean or rename variables here
# my_dataset <- dplyr::rename(my_dataset, new_col = old_col)
# Save it for package use
usethis::use_data(my_dataset, overwrite = TRUE)After this runs, you’ll see:
data/
└── my_dataset.rda
Step 3: Load and Use the Data
After loading your package:
library(mypackage)
my_datasetThe dataset is ready for use — just like iris or mtcars.
The object name (
my_dataset) is the same as the name in your R script and.rdafile.
Step 4: Document the Dataset
You document datasets with roxygen2, just like functions.
Create a new file, e.g. R/data-documentation.R:
#' Example dataset: my_dataset
#'
#' A dataset containing example data imported from a CSV.
#'
#' @format A data frame with 100 rows and 5 variables:
#' \describe{
#' \item{col1}{Description of column 1}
#' \item{col2}{Description of column 2}
#' \item{col3}{Description of column 3}
#' }
#' @source path/to/your.csv
"my_dataset"Then run:
devtools::document()Now you can view:
?my_dataset✅ You’ll see a proper help page for the data!
Internal vs External Data
If you want to use a dataset inside your package but not expose it to users:
usethis::use_data(my_secret_data, internal = TRUE)This stores it in R/sysdata.rda. You can still use it inside your package code, but users won’t see it in the global namespace.
Tips for Good Data Documentation
📋 When documenting data, always include:
- What the dataset represents
- The number of rows and columns
- Descriptions for each variable (
@format) - The source of the data
- Any transformation or filtering applied
Use @examples too if it makes sense:
#' @examples
#' summary(my_dataset)Coming Up Next
Now that you can include data and document it properly, let’s look at how to build, install, and share your package.