Writing Packages with R
Based on the official CRAN guidelines on Writing R Extensions and the Hadley Wickham book R Packages. This guide assumes a basic working knowledge of how to use libraries in R and versioning control like Git. I often use the syntax library::function()
to call external libraries.
Tools
devtools
for creating package structure and performing development tasks. The following core functions are most used:create()
document()
check()
build()
test()
roxygen2
will be used to add documentationtestthat
provides functions for testing and error handling
Package Setup
An R package contains the following components:
- Functions
- Data (optional)
- Documentation
- Vignettes (longer descriptions of package function usage)
- Tests
Naming
- All R Packages should contain letters, numbers, or a dot, with at least two characters and start with a letter and not end in a dot.
- CamelCase is ok but 70% of R packages are lowercase
- No Underscores
- Avoid names that already exist in cran
# get all package names in CRAN
options(repos = list(CRAN="http://cran.rstudio.com/"))
pkgs <- available.packages(filters = c("CRAN", "duplicates"))[,'Package']
# check if pkgs vector contains a name "myutils"
"myutils" %in% pkgs
# Alternatively use package avaialable
available::available("myutils")
Initialization
RStudio has built-in tools to select a working directory and create an R package template, so I'd reccommend just using that. But you can use usethis::create_package()
function to create a subdirectory with raw R. The resulting directory should have:
- DESCRIPTION - a file containing metadata about the package author, version, dependencies, etc. This is important as it will be displayed on the CRAN download webpage (if you submit the package to CRAN)
- There are many other fields that can be added other than the defaults, as well as functions to automate selection of these fields
- LICENSE - file describing package usage agreement
- NAMESPACE - file containing information about functions imported from other packages
- R - directory containing R scripts
- man - directory containing documentation files
Versioning & Licensing
After the devtools
library is loaded the use_version()
function can be run to update the version automatically. Generally, versions are at least 3 numbers seperated by '.'
The two main types of open-source licenses; Permissive and Copy left. Permissive allows the code to be copied and modified in any way and publicshed as long as the license is perserved, the most popular version is the MIT license. Copyleft allow code modification for personal use only, the most popular version is the GPL3 lisense. use_mit_license("Company Name")
or use_gpl_license(version = 3, include_future = TRUE)
can be run to adjust the license.
Adding R Functions to Package
Add an R script to the 'R' directory in the project, or similarly use the `use_r('filename') function to create the script.
Create a function in the R script like so:
# This function creates a summary for a character vector
char_summary <- function(x, na.rm=FALSE){
length = length(x)
Nmiss = sum(is.na(x))
Nunique = length(unique(x))
c(length = length,
Nmiss = Nmiss,
Nunique = Nunique )
}
You can also remove the example R script R/hello.r and it's manual page man/hello.Rmd which were added by default.
Now instead of using source()
as we usually would, use the load_all()
function from devtools
to load all the functions in this package.
If @export
is used in the function then it becomes visible to users, otherwise it is only used as an internal function.
Checking Packages
Use the devtools::check()
function to ensure the package works and correctly passes all requirements. It should return with no errors and no warnings. It may give a warning because there are undocumented functions in our package. We will fix this later.
Documentation
Each function in R must contain a help topic. R package roxygen2
allows for writing special comments at the start of each function to create documentation.
To insert documentation skeleton into RStudio click inside a function then go to the code menu and select Insert Roxygen Skeleton
. This should generate a comment at the top of the function.
Running devtools::document()
to generate an .Rd file and update the description file. If you get an error about NAMESPACE then delete the NAMESPACE file generated by default and run it again.
Now that our package documentation is available we can run load_all()
then ?char_summary
to read the generated documentation.
Documentation can also be formatted and supports Markdown. See the roxygen2 documentation. Never start project or function descriptions with "This package..." or "This function...".
A Vignette is a long-form guide to your package. To see the existing vignette for a specific package execute: browseVignettes("packageName")
. There could be multiple vignettes or just one for the whole package. To add a vignette use use_vignette('mypackage')
. This will modify the DESCRIPTION file to add knitr
as a suggestted dependency and create a vignettes/ directory. Once the documentation is finished build the vignettes with build_vignettes()
which creates the doc/ subdirectory that contain the html files.
A README.md file may also be added to provide high-level descriptions and goals which can be read from the github repo.
Publishing
To initalize a git repo execute use_git()
, it will prompt you to set up then restart RStudio. To publish the package on GitHub run use_github()
. Once the package is ready others can install it with devtools::install_github('myuser/mypackage')
Here is a sample .gitignore files of the usual files excluded from any R package:
# History files
.Rhistory
.Rapp.history
# Session Data files
.RData
.RDataTmp
# User-specific files
.Ruserdata
# Example code in package build process
*-Ex.R
# Output files from R CMD build
/*.tar.gz
# Output files from R CMD check
/*.Rcheck/
# RStudio files
.Rproj.user/
# produced vignettes
vignettes/*.html
vignettes/*.pdf
# OAuth2 token, see https://github.com/hadley/httr/releases/tag/v0.3
.httr-oauth
# knitr and R markdown default cache directories
*_cache/
/cache/
# Temporary files created by R markdown
*.utf8.md
*.knit.md
# R Environment Variables
.Renviron
# pkgdown site
docs/
# translation temp files
po/*~
# RStudio Connect folder
rsconnect/
Submission to CRAN
CRAN submission is very strict, make sure the package is checked and tested thoroughtly! Check Prepare for CRAN for details, and use the submission form.