“R will always be arcane to those who do not make a
serious effort to learn it. It is not meant to be intuitive and easy for
casual users to just plunge into. It is far too complex and powerful for
that. But the rewards are great for serious data analysts who put in the
effort.”
When posting a question on forums or mailing lists, keep a few things
in mind:
Read the posting guidelines before posting!
Be respectful of other people and their time. R is free
software. People are offering their free time to help. They are under no
obligation to help you. If you are disrespectful or act like they owe
you anything, you will rub people the wrong way and will be less likely
to get help.
Provide a minimal, reproducible example. Providing a minimal,
reproducible example can be crucial for getting a helpful response. By
going to the trouble of creating a minimal, reproducible example and
identifying the minimum conditions necessary to reproduce the issue, you
will often figure out how to resolve it. Here are guidelines on
providing a minimal, reproducible example: https://stackoverflow.com/help/minimal-reproducible-example
(archived at https://perma.cc/6NUB-UTYF). Here are a good example and
guidelines for providing a minimal, reproducible example in
R: https://stackoverflow.com/a/5963610 (archived at https://perma.cc/PC9L-DQZG). Provide a
reprex whenever possible: https://reprex.tidyverse.org.
2 Initial Set Up
Note: many of these initial setup steps described below are not
necessary for general use; many of these steps are necessary only for
using lab-related repositories (e.g., to gain API access to export data
from REDCap, to use absolute paths rather than relative
paths so repos can communicate with each other, etc.).
Make sure you are logged onto a computer that can access the lab
server (either a lab computer, or a computer you can VPN into the lab
server), and that you have admin access to install and uninstall
software
If R was already installed in a directory that contains
spaces (e.g., C:/Program Files/R/[R-VERSION]), uninstall
R before installing it in a directory that doesn’t contain
spaces
Install RStudio Desktop (https://www.rstudio.com/products/rstudio/download/) in
the main program files directory; may have to right click and “Run As
Administrator”. RStudio is the best available graphical
user interface for R.
Set the executables for R and RStudio to
always run with administrator permissions.
If on Windows, open File Explorer and find the main executable of
R (C:/R/[R-VERSION]/bin/R.exe) and
RStudio
(C:\Program Files\RStudio\bin\RStudio.exe). Right-click it
to open the contextual menu. Then, click or tap on “Properties”. In the
Properties window, go to the Compatibility tab. At the bottom of the
window, check the box next to the “Run this program as an administrator”
option, and then click or tap on Apply or OK.
Install tools to allow you to compile R packages so you
can install packages from source, if necessary (i.e., if package
binaries are not available):
If on Windows, install Rtools; may
have to right click and “Run As Administrator”
Set up git, GitLab, and the
GitHub Desktop App in the main program files directory; may
have to right click and “Run As Administrator”; For instructions setting
up and using GitLab, see here: https://devpsylab.github.io/DataAnalysis/git.html#toBegin
The Rprofile.site file in the etc folder
of the R installation directory is the code that is run for
every user at the beginning each time you load
R. We will update the default Rprofile.site file with the
lab’s Rprofile.site file so R installs
packages in the correct location, sets the default package repository,
updates packages, and gives you a fortune cookie. To do this, perform
the following steps:
Rename the Rprofile.site file in the R
installation directory
(C:/R/R-[InsertVersionNumber]/etc/Rprofile.site) to be
Rprofile_BACKUP.site
The .Rprofile file in the user’s Documents
folder is the code that is run for the particular
user at the beginning each time you load R. We will update
the default .Rprofile file (if there is one) with the lab’s
.Rprofile file so R knows which computer you
are using and which path to use (relative to where your R
projects are located). To do this, perform the following steps:
Open the lab’s .Rprofile file, and revise it with your
HawkID
Revise the lab’s .Rprofile file with the local path to
the Documents folder for each of the computers you will use
to access R (e.g., home computer, work computer, laptop).
Make sure to use forward slashes (/), not back slashes
(\) in the path.
You will save the file in your HOME directory. To find
the HOME directory, open R and type the
following command: Sys.getenv("HOME")—the output of the
command is the location of your HOME directory; If this is
a lab computer, it may be located here:
//home.iowa.uiowa.edu/[user]/Documents. If this is your
personal computer, it may be located here: PC:
C:/Users/[user]/Documents; Mac: /Users/[user].
Then close R.
If your HOME directory is in a OneDrive folder (or
another cloud-based sync folder), you will want to change the directory
of your HOME path so that it is not in a OneDrive folder.
To do that, open Environment Variables
(archived at https://perma.cc/A2E5-B5VA) in Windows. Then, add/edit
HOME as the “variable name” with the intended location as
the “variable value” (e.g., C:/Users/[user]/Documents,
where you replace “user” with your HawkID).
You may also solve this issue by placing the following command in
the Rprofile.site from the previous step
Sys.setenv("HOME" = "C:/Users/[specific user ID])/Documents")
Move the revised .Rprofile file to the
HOME directory and overwrite the original
.Rprofile file (if it exists). You may have to show hidden
files in order to see the file (PC: see Windows Explorer settings; Mac:
Command+Shift+Dot).
Make sure to show filename extensions in your file explorer window,
and make sure the file is named .Rprofile (not
.Rprofile.Rprofile). Make sure there is a period at the
beginning of the filename.
Run RStudio. If the Rprofile.site and
.Rprofile files are correctly set up, they should
pre-populate your path location when you open R. If the
contents of the Global Environment in RStudio
are empty, your Rprofile.site and/or .Rprofile
files are not set up correctly.
If you get this error
(Error: could not find function "install.packages"), run
the following line manually and then restart RStudio after
the package finishes installing:
install.packages("fortunes")
For reproducibility purposes,
prevent R/RStudio from saving your workspaces
automatically using the following steps:
With RStudio running, choose Tools → Global Options
from the menus.
In the Options dialog, change the value for
Save workspace to .RData on exit to
Never.
Click OK.
Install the petersenlabR package using
the following steps:
Install the remotes package using the following
command: install.packages("remotes")
Install the petersenlab package using the following
command:
remotes::install_github("DevPsyLab/petersenlab")
Request an API
token for the following REDCap project(s); note: please check with
Dr. P before requesting an API token. In general, RAs should not have an
API token.
Revise the API tokens to reflect yours, then run the script to save
your encrypted credentials on the lab server and your encryption key on
your local computer
Verify that the Encryption Key
(REDCap Encryption Key.RData) was saved where you intended
it to be saved on your local computer
Verify that a file named with your HawkID was saved here:
//lc-rs-store24.hpc.uiowa.edu/lss_itpetersen/Lab/Studies/School Readiness Study/Data Management/REDCap/Tokens/
Copy the Encryption Key (REDCap Encryption Key.RData)
to the comparable location of any other computers you own that you plan
to access the data from
The file has to be in the comparable location (relative to the
path variable you set in Rprofile.site) of
every computer in order for it to be found by the
Export Data.R script. The default location is:
file.path(path, "GitHub/R/Data/REDCap Encryption Key.RData"),
so if path is set as "C:/User/YourName", the
file would be saved in:
C:/User/YourName/GitHub/R/Data/REDCap Encryption Key.RData.
The recommended location for GitHub repos is to create a
folder titled GitHub in your Documents folder,
and to put repos in the GitHub folder; it is NOT
recommended to put git repos in a OneDrive folder because
git
files tend not to play nice with syncing services (archived at https://perma.cc/XZ6F-43G3; e.g., OneDrive,
Dropbox)
Add the SRS Data Processing repo from the lab drive to your
GitHub Desktop App
(//lc-rs-store24.hpc.uiowa.edu/lss_itpetersen/Lab/Studies/School Readiness Study/Data Processing)
Ensure your HawkID and location of your encryption key in the script
are correct, and then run the script to verify that you can export data
from REDCap
For antialiased plots in RStudio, change the Graphics
backend to Cairo:
Tools → Global Options → Graphics
To install and load R packages, see the instructions here.
5 Update Packages
To update packages, use the following code:
update.packages(checkBuilt = TRUE)
One indication that the packages might not be updating to the latest
version is seeing the same packages showing as needing an update after
having run the update.packages() function. If this does not
update the package(s) to the latest version, you may need to install the
latest version of the package(s) from source (see the section on “Initial Set Up” of R for the software
needed to install R packages from source):
update.packages(checkBuilt = TRUE, type = "source")
Install the new R version into a directory that
contains no spaces (see Step 2 in the Initial Set
Up section above)
[You only need to do this step if you installed packages in the
R-version-specific “Library” folder rather than the common/shared
“Packages” folder—that is, you don’t need to do this step if you used
the lab’s Rprofile.site file, as described above, which
installs packages to the common/shared “Packages” folder]:
Copy installed packages in the “Library” folder to the “Library”
folder in the new installation
R will run the file named Rprofile.site at
initial runtime.
Set the executables for R and RStudio to
always run with administrator permissions.
If on Windows, open File Explorer and find the main executable of
R (C:\R\R-VERSION\bin\R.exe) and
RStudio
(C:\Program Files\RStudio\bin\RStudio.exe). Right-click it
to open the contextual menu. Then, click or tap on “Properties”. In the
Properties window, go to the Compatibility tab. At the bottom of the
window, check the box next to the “Run this program as an administrator”
option, and then click or tap on Apply or OK.
Make sure you have the latest version of the tools necessary to
compile packages from source (i.e., Rtools for Windows or R
Compiler Tools for Rcpp on MacOS; see the instructions in the section on
initial set up)
Open the new R version and run
update.packages(checkBuilt = TRUE, ask = FALSE), and
install any necessary packages
Close R
Delete anything left of the old installation
7 Style Guide and Best
Practices
7.1 Create
Rstudio Project
For each data analysis project (i.e., each GitLab/GitHub repo), create an
RStudio Project. This helps keep your project files organized.
7.2 Use R
Notebooks for “Computational Notebooks”
Using R Notebooks for “Computational Notebooks” is
helpful for reproducible code that can be shared with others. To create
computational notebooks see the Markdown section on computational notebooks
in the Data Analysis guides.
7.3 Separate sections in
code
In R scripts, use sections.
To insert a section in RStudio, use
CTRL-Shift-R or “Code” - “Insert Section”
In R Notebooks/Markdown, use Headers and code chunks.
Headers: 1, 2, or 3 pound signs
Code Chunks: Ctrl+Alt+I; or click “Insert” button then
“R”
7.4 Naming variables
Use meaningful variable names; we want to know what a variable
represents without having to consult an external codebook for every
variable
Variable names should include the prefix for the measure followed by
an underscore
e.g., cbcl_ for the Child Behavior Checklist
variables
Use lower camel case for variable naming
e.g., prefix_thisIsTheVariableName
Do not include spaces in variable names
7.5 Comment code
frequently and clearly!
It is important to comment code frequently and clearly. You want you
(and others) to easily be able to understand your code if you come back
to it several years later!
7.6 Don’t save your
workspace image
For reproducibility purposes, it is important not
to save your workspace image (archived at https://perma.cc/9SCZ-L4DE). It is best practices to
begin R each session with a clean workspace. If there is a
.Rdata file in the same folder as the
Rstudio Project, Rstudio will automatically load the
objects into the workspace at the beginning of the session. This is
problematic because those objects can interact/interfere with the code
and can lead to problems with replicability for others who are running
the code without those objects in the workspace. When you exit
RStudio, RStudio asks if you want to “Save
workspace image to [filepath]/.Rdata?” Make sure to select
“Don’t Save”! However, do make sure to save your R scripts
before exiting Rstudio.
To troubleshoot R issues, be resourceful. Try googling
the error message or issue you are experiencing. See here for a list of places you can pose
R-related questions for help. In addition, particular
errors/warnings/issues are included below:
13.1 General
Troubleshooting Tips
Run code line-by-line to identify the source of the
error
If you are using RStudio, a red line will often appear on the
lefthand side of the screen to indicate the particular line(s) of code
that result in an error
It may be useful to clear your working environment in R before
trying to diagnose an error to be sure that you are not experiencing any
interference from existing objects in the environment
Running code line-by-line is particularly useful because it allows
you to inspect the data at each step of its processing
Data issues are commonly the culprit of code errors (see below)
Check for typos
Are all brackets/parentheses/etc. closed?
Does the object/variable that you are referring to exist? (If
the object does not exisit, you will likely get an error message
indicating [object name] not found)
Has it been defined somewhere prior in the code?
Does it depend on something else to be created?
Is its name spelled correctly?
Highlighting the object or variable in question and using
CTRL-F to search for it is a good way to check these
things
Did you spell the function you want to use correctly?
Do you have commas separating items within a
list/function/etc.?
Does your working environment have what it needs?
Are all necessary packages/libraries installed? Do any require an
update?
Are all expected objects (dataframes, lists, environment variables,
etc) present? Are they correct?
Investigate the structure of your data
Errors such as non-numeric argument to binary operator
indicate that the function you are trying to use is expecting to recieve
numeric data as an input but one or more of the inputs are not numeric
The function class(data$variable) (replace “data” and
“variable” with whatever you are trying to invesitgate) is useful for
determining how R is interpreting your data. Often, data that look like
numbers can end up being stored as a character (text) variable. This can
be due to import/export processes or due to an issue with data entry.
For example, when 00 is entered instead of
0 in a numeric field R is likely to interpret the variable
as a character instead of a number when importing the data
Check to make sure that all expected rows/columns are present and
that they look the way you expect them to
If your dataset has been created by joining/merging multiple
dataframes, any duplicate columns not accounted for when joining may
have a suffix appended to distinguish them (i.e.,
variable.x or variable.y)
To resolve this, either call the appended variable in subsequent
manipulations (variable.x instead of variable)
or deal with the duplicated columns before/during the joining process
(e.g., include duplicate columns in the by argument of
joining function or remove the redundant column from one of the datasets
to be joined)
If you have been creating/computing variables in your dataset,
ensure that they are being computed as expected
Check for NA (missing) or NaN (not a
number) values. Such values may not be an issue or error in all cases,
but if you are not expecting that as a result of your computations, this
might suggest an issue with the code and/or data structure
Check whether the computed values are reasonable
13.2 Warning:
PACKAGENAME package in FILEPATH library will
not be updated
This warning might indicate that it cannot update the package because
it is part of the base R installation. To fix this:
find and delete the package that is part of the base R
installation in the library directory of the R
installation (e.g., C:\R\R-4.3.1\library\PACKAGENAME\)
make sure the package is also not in any other package installation
locations (e.g., C:\R\Packages\PACKAGENAME\); if it is,
delete it from there as well
then, after deleting the package from these locations, restart
R and run install.packages("PACKAGENAME") to
reinstall the package
close R
if the package was part of the base R installation,
move the installed package folder
C:\R\Packages\PACKAGENAME\ to the library
directory of the R installation (e.g.,
C:\R\R-4.3.1\library\PACKAGENAME\)
Save the file as a .bat file in the desired
location
Once the .bat file has been created, search
Windows Task Scheduler in the search bar
In the Actions selection bar, select
Create basic Task...
Name the task and provide a description
Next, set the trigger for the new task (i.e., how often the task
should run)
Set the action for the task by selecting
Start a program
Under, Program/script browse to the .bat
file that was created in step 1 and select Next
Click Finish and the script is now configured to run
automatically
If you wish to schedule a task to run during a time at which your
computer is likely to be asleep, you’ll need to change the
Conditions of the task to ensure that the computer “wakes
up” to execute the script
In the Task Scheduler Library, select the task you just created
The task should appear on the lower half of the screen (double click
to open in a new window). Select Conditions from the tabs
across the top of the task window
Select Wake the computer to run this task
Note that elevated account permissions may be required to set this
condition
Note: When R is updated, the path to the
bin folder within R needs to be updated to
reflect an accurate absolute path to R.
Example: C:\R\4.1.3\bin\R CMD BATCH changed to
C:\R\4.3.0\bin\R CMD BATCH
14.1 Troubleshooting
14.1.1 Pandoc error
This error may appear if you are attempting to render a markdown
file
pandoc version 1.12.3 or higher is required and was not found.
When using R to perform such actions as rendering sites and
processing data, it can be useful to include code that commits and
pushes relevant changes into the appropriate git
repository. The steps below assume that you have Git and
the GitHub desktop app installed. The instructions to
download and configure git software can be found here
Stage files with a specific file extension:
add(repo, "*.txt")
Stage files with one of any specified file extension:
add(repo, c("*.txt", "*.csv", "*.html"))
Stage files within a specific folder:
add(repo, "/folderName/*)
Stage one specific file:
add(repo, "fileName.txt)
Stage multiple specific files:
add(repo, c("thisFile.txt", "thatFile.txt", "anotherFile.txt"))
Commit the changes
Commit with a customized commit message:
commit(repo, "My commit message")
Commit with a generated message using system or environment
variables:
# example: include system date in commit message
date <- format(Sys.Date(), format = %Y-%m-%d)
message <- paste(date, "daily site update", sep = " ")
commit(repo, message)
Push changes
push(repo)
Note: The code included above will prompt
Git to authenticate your credentials by manually providing
your username and password in a popup window. The push()
function accepts a
credentials = cred_user_pass("username", "password")
argument that will “bypass” the need for user input by including the
relevant credentials in the push() function call. One could
provide their username and password directly in the script running this
function, but other more-secure options include:
install.packages("excel.link")
library("excel.link")
passwordProtectedBook <- xl.read.file(file.path("full path to workbook"), #Full path to workbook
password = "pass", #password
write.res.password="pass") #writing the reset password
17 Sending slacks with
R
Occasionally, it can be helpful to send a Slack message using
R. For example, if a script does not run, a Slack message
can be sent to inform the appropriate team members. These
instructions (archived at https://perma.cc/9CWJ-J5ZT) can largely be followed to
set up R to send Slack messages. However, there are some
differences:
When setting up the configuration file, use the below template. The
slack API token should be placed in the token category.
Note the token will need to be updated every 30 days. You can
generate a new token by navigating to the Slack API and selecting
Oauth & Permissions
Once the configuration is complete, it is possible to send messages.
For now, we have found it helpful to embed the slacks in the
tryCatch function.
tryCatch(
CODE YOU WANT TO RUN,
error = function(e)
{
#message to send if the code doesn't run
my_message <- paste( "example message")
slackr_msg(my_message, channel = "#recruitment")
})
Many notes in projects that are exported from REDCap come with spaces
denotes as //n. Use the below code to make these fields
more readable in the future.
gsub('\\n', ', ', df$notesField)
19 Working with
R on a Network Drive
When working with R on a network drive, it may be
helpful to configure the project to store .Rproj.user on
the local C:/ drive rather than on the network drive, which
results in slow execution times.
After the package source (.tar.gz file) is built, open
a terminal window directly in RStudio by clicking on the “Terminal” tab
at the bottom of RStudio
Run R CMD check --as-cran
In the terminal window, navigate to the directory where your package
source (.tar.gz file) is located (commonly up one
directory):
cd ..
Then, run R CMD check --as-cran followed by the name of
your package tarball (making sure to update the package version in the
filename). For example:
R CMD check --as-cran petersenlab_1.0.0.tar.gz
20.2.1
Troubleshooting
20.2.1.1 If errors
compiling the PDF manual
In the terminal:
cd ./petersenlab
R CMD Rd2pdf . --output=man/figures/manual.pdf --force --no-preview --no-clean
Then, delete the created folder whose name begins with
“.Rd2pdf…”
In the R Console, build the source package in R (or
using the instructions described above):
7.5 Comment code frequently and clearly!
It is important to comment code frequently and clearly. You want you (and others) to easily be able to understand your code if you come back to it several years later!