This is a kind of checklist of my most common steps for adding data to data packages. (If you are serious about building packages you will learn much more at http://r-pkgs.had.co.nz/.)

After building the fundamental infrastructure of a data package (see how) you need to actually add data. Do this from some data-raw/file.R or some data-raw/file.Rmd. These two approaches are standard:

  1. Import and clean all data in a single data-raw/data.R file. This is a good idea if all datasets are closely related to each other (e.g. similar source, type, cleaning).

  2. Import and clean each dataset in an individual data-raw/data-file.R. This is best if the datasets aren’t closely related.

Create folders and files to store, clean and document data

library(usethis)
library(fs)
use_data_raw()
use_r("data.R")
fs::file_copy("R/data.R", "data-raw/data.R")

Now, working from some file in data-raw/, keep adding data to data/ with usethis::usedata().

  • Place a raw dataset in data-raw/ (manually is OK).
  • In data-raw/some-file.R:
    • Document the source of the raw dataset.
    • Clean the dataset.
    • Export the dataset with usethis::use_data(OBJECT-NAME-GOES-HERE).

Adding private data

You may do this if you have private data that can’t be uploaded to GitHub.

use_directory("data-raw/private")
use_git_ignore("data-raw/private")

Manually add private data to data-raw/private/.


In the terminal (send commands from R script to terminal with Control + Alt + Enter)

git add .
git commit -a -m "Add infrastrucute to host private data only locally."
git pull
git push

Confirm the private data hasn’t been pushed to dev branch.

browse_github()

Browse to branch dev > data-raw.

Compare with local version.

Note that private/ exists in local but not remote version. Your private data lives nowhere online.

WARNING: It is not under version control – Git isn’t tracking it.