This is a kind of checklist of my most common steps for adding data to data packages. (If you are serious about building packages you will learn much more at http://r-pkgs.had.co.nz/.)
After building the fundamental infrastructure of a data package (see how) you need to actually add data. Do this from some data-raw/file.R or some data-raw/file.Rmd. These two approaches are standard:
Import and clean all data in a single data-raw/data.R file. This is a good idea if all datasets are closely related to each other (e.g. similar source, type, cleaning).
Import and clean each dataset in an individual data-raw/data-file.R. This is best if the datasets aren’t closely related.
Create folders and files to store, clean and document data
library(usethis)
library(fs)
use_data_raw()
use_r("data.R")
fs::file_copy("R/data.R", "data-raw/data.R")
Now, working from some file in data-raw/, keep adding data to data/ with usethis::usedata()
.
- Place a raw dataset in data-raw/ (manually is OK).
- In data-raw/some-file.R:
- Document the source of the raw dataset.
- Clean the dataset.
- Export the dataset with
usethis::use_data(OBJECT-NAME-GOES-HERE)
.
Adding private data
You may do this if you have private data that can’t be uploaded to GitHub.
use_directory("data-raw/private")
use_git_ignore("data-raw/private")
Manually add private data to data-raw/private/.
In the terminal (send commands from R script to terminal with Control + Alt + Enter)
git add .
git commit -a -m "Add infrastrucute to host private data only locally."
git pull
git push
Confirm the private data hasn’t been pushed to dev branch.
browse_github()
Browse to branch dev > data-raw.
Compare with local version.
Note that private/ exists in local but not remote version. Your private data lives nowhere online.
WARNING: It is not under version control – Git isn’t tracking it.