Analysis scripts for High Performance Computing (HPC) at the University of Iowa. Currently uses Argon.
Workflow: https://workflow.uiowa.edu/entry/new/3282/11927336
Cluster Systems Documentation: https://uiowa.atlassian.net/wiki/spaces/hpcdocs/pages/76513411/Cluster+Systems+Documentation (archived at https://perma.cc/EKB8-ZKR7)
Argon Cluster Documentation: https://uiowa.atlassian.net/wiki/spaces/hpcdocs/pages/76513466/Argon+Cluster (archived at https://perma.cc/6HST-VV6Y)
Significant amounts of data storage are provided on Argon, but data are not backed up in any way unless special arrangements are made. It is the responsibility of the user to back up important information.
User agrees to refrain from storing Restricted data on HPC resources. Data is classified as Restricted when the unauthorized disclosure, alteration or destruction of that data could cause a significant level of risk to the University or its affiliates. Examples of Restricted data include data protected by state or federal privacy regulations and data pertaining to identified human subjects that has not been deidentified.
At present there are no fees for the use of the Argon cluster for low-priority usage. For large users, or those who want access to dedicated resources, the option of purchasing or renting supplemental system hardware may be available. If you are interested in dedicated hardware, contact research-computing@uiowa.edu.
/Shared/lss_itpetersen
This PC and select
Map Network DriveY and type
\\data.hpc.uiowa.edu\argon_home\Documentsargon.hpc.uiowa.edu
Windows Explorer:
\\data.hpc.uiowa.edu\argon_home (username: itpetersen@uiowa.edu or hawkid@uiowa.edu)
Using SecureCRT for an SSH connection: 1. Download SecureCRT from
UIowa Informational Technology Services: https://its.uiowa.edu/securecrt 2. In Hostname type
argon.hpc.uiowa.edu and in username type HawkID 3. Click
connect
Mac OS Terminal:
ssh itpetersen@argon.hpc.uiowa.edu (on campus)
ssh -p 40 itpetersen@argon.hpc.uiowa.edu (off campus
without VPN)
- `pwd` print current working directory
- `cd path/to/directory` sets working directory
- `module load moduleName` loads modules required for analyses; necessary to complete prior to submitting jobs involving R
- `module list` lists downloaded modules
- `qsub` submit a job script to Argon processing
- `R CMD BATCH` submit a single R script to Argon processing
Argon requires Linux-compatible file endings, which is problematic for files created outside Argon (using DOS or CRLF file endings). Use the following commands to check if your files are Argon compatible (i.e., Unix LF) and resolve incompatible file endings:
- `file myfile.job | grep CRLF` checks if a file ending uses the incompatible CRLF format. Remember to check the ending of your file; this script assumes it is .job
- `dos2unix myfile.job` converts CRLF file endings to Argon-compatible LF endings
- `find ~/DirectoryName -name '*job' -exec dos2unix "{}" \;` converts all files in a directory that end with .job to LF format; change '*job' to change other file types
cd /Users/itpetersen/Documents/Projects/Bayesian_IRT/
cd /Users/itpetersen/Documents/Projects/EXT_pilot/
cd /Users/itpetersen/Documents/Projects/Multiple_Imputation/
cd /Users/itpetersen/Documents/Projects/SelfRegulation_IRT/
cd /Users/itpetersen/Documents/Projects/Test/
R Packagesmodule load stack/2022.2
module load r/4.2.2_gcc-9.5.0
module load geos/3.9.1_gcc-9.5.0
module load gdal/2.4.4_gcc-9.5.0
module load proj/5.2.0_gcc-9.5.0
module load gsl/2.7.1_gcc-9.5.0
module load nlopt/2.7.0_gcc-9.5.0
module load jags/4.3.0_gcc-9.5.0-dev
module load zlib/1.2.13_gcc-9.5.0-dev
module load cmake/3.25.0_gcc-9.5.0
module load glpk/4.65_gcc-9.5.0-dev
module load libxml2/2.10.3_gcc-9.5.0
module load r-nloptr
module load r-zlibbioc/1.44.0_gcc-9.5.0
module load r-data-table/1.14.4_gcc-9.5.0
module load r-stringi/1.7.8_gcc-9.5.0
module load r-selectr/0.4-2_gcc-9.5.0
module load r-generics/0.1.3_gcc-9.5.0
module load r-fansi/1.0.3_gcc-9.5.0
module load r-utf8/1.2.2_gcc-9.5.0
module load r-pkgconfig/2.0.3_gcc-9.5.0
module load r-gtable/0.3.1_gcc-9.5.0
module load r-scales/1.2.1_gcc-9.5.0
module load r-tzdb/0.3.0_gcc-9.5.0
module load r-timechange/0.1.1_gcc-9.5.0
module load r-dbi/1.1.3_gcc-9.5.0
module load stack
module load stack/2022.2
module load r/4.2.2_gcc-9.5.0
cd path
https://uiowa.atlassian.net/wiki/spaces/hpcdocs/pages/76513440/Argon+Software+List (archived at https://perma.cc/WJ4Q-GDUS)
module load stack/2022.2
#!/bin/sh
# Set working directory
cd /Users/itpetersen/Documents/Projects/Bayesian_IRT/
# Specify qsub options
#$ -pe smp 4
#$ -M isaac-t-petersen@uiowa.edu
#$ -m eas
#$ -l mf=512G
#$ -l h_vmem=512G
#$ -cwd
#$ -q UI-HM
#$ -e /Users/itpetersen/Documents/Projects/Bayesian_IRT/Output/
#$ -o /Users/itpetersen/Documents/Projects/Bayesian_IRT/Output/
# Load the environment modules
module load stack/2022.2
module load r/4.2.2_gcc-9.5.0
# Run the R script
Rscript ./Analyses/factorScores.R
-pe smp 4: specify a parellel environment and number of
cores to be used (smp = shared memory parallel environment)
- The OMP_NUM_THREADS variable is set to ‘1’ by default. If
your code can take advantage of the threading then specify
OMP_NUM_THREADS to be equal to the number of job cores per
node requested.
-M isaac-t-petersen@uiowa.edu: Set the email address to
receive email about jobs. This must be your University of Iowa email
address.
-m eas: Specify when to send an email message (; ; ; ;
)
b = beginning of jobe = end of joba = when job is aborteds = when job is suspendedn = no mail is sent-l mf=512G: request a particular quantity of memory you
expect to use (to be available for your computation to start; the
request is only applicable at scheduling time. It is not a limit.)
-l h_vmem=512G: request a particular quantity of virtual
memory you expect to use (to be available for your computation to start;
the request is only applicable at scheduling time. It is not a
limit.)
-cwd: Determines whether the job will be executed from
the current working directory. If not specified, the job will be run
from your home directory.
-q UI-HM: specify queue
-e /Users/itpetersen/Documents/Projects/Bayesian_IRT/Output/:
Name of a file or directory for standard error.
-o /Users/itpetersen/Documents/Projects/Bayesian_IRT/Output/:
Name of a file or directory for standard output.
https://uiowa.atlassian.net/wiki/spaces/hpcdocs/pages/76513450/Basic+Job+Submission (archived at https://perma.cc/2SS2-LEJR)
https://uiowa.atlassian.net/wiki/spaces/hpcdocs/pages/76513452/Advanced+Job+Submission (archived at https://perma.cc/8H6G-2M2F)
cd path/to/dataSet123
[cd /Users/itpetersen/Documents/Projects/Test/Jobs]
qsub myscript.job
Job dependency (run Job B when Job A is
finished):
qsub -hold_jid JOB_ID test_B.job
cd /Users/itpetersen/Documents/Projects/Bayesian_IRT/Jobs
qsub bayesianIRT.job
qsub -hold_jid JOB_ID factorScores.job
cd /Users/itpetersen/Documents/Projects/Multiple_Imputation/Jobs
qsub srs_selfRegulation.R
https://uiowa.atlassian.net/wiki/spaces/hpcdocs/pages/76513468/Queues+and+Policies (archived at https://perma.cc/UUR7-XLBZ)
UI
UI-HM
UI-GPU-HM
UI-DEVELOP
all.q
qstat -g c -q QUEUE_NAME
qstat -g c -q UI
qstat -u itpetersen
qstat -j JOB_ID
qacct -j JOB_IDqstat -j JOB_ID | grep usageqdel JOB_ID
R
ScriptSee here
https://uiowa.atlassian.net/wiki/spaces/hpcdocs/pages/76514707/R+Programs+in+Batch+mode+for+HPC (archived at https://perma.cc/99JQ-43ZG)
R are installedmodule spider R
But, there may be more recent version of R installed in
the “Additional Software Stacks” (https://uiowa.atlassian.net/wiki/spaces/hpcdocs/pages/76513440/Argon+Software+List;
archived at https://perma.cc/WJ4Q-GDUS)
RIf you want to compile a more recent version of R than
is available in the software stacks, see here
(archived at https://perma.cc/C6EX-EZL4).
R packageshttps://uiowa.atlassian.net/wiki/spaces/hpcdocs/pages/76514803/Adding+R+programs+to+a+personal+library (archived at https://perma.cc/3SRR-2JE7)
R environment module
module load r/4.2.2_gcc-9.5.0Rpackage_name):
install.packages("package_name", repos = "http://cran.r-project.org", dependencies = TRUE, type = "source", Ncpus = 40)packrat, it is preferable to install packages
by source, if possible, but you can remove type = source if
you want to install binariesWarning in install.packages("package_name", repos = "http://cran.r-project.org") :
lib = "/opt/R/3.0.2/lib64/R/library"' is not writable
Would you like to use a personal library instead? (y/n)yy again when prompted to create the directory;
your package should download and install into your personal library
directoryExample:
install.packages(c(
"renv","psych","tidyverse","data.table","nlme","lme4","mirt","TeachingDemos","Amelia","mice","miceadds","abind","future","lavaan","blavaan","Rcpp","igraph","shinystan","StanHeaders","brms","rstan","rjags"),
repos = "http://cran.r-project.org",
dependencies = TRUE,
Ncpus = 40)
R package locally from sourceinstall.packages(path_to_file, repos = NULL, type = "source", Ncpus = 40)
Example:
install.packages(c(
"renv","psych","tidyverse","data.table","nlme","lme4","mirt","TeachingDemos","Amelia","mice","miceadds","abind","future","lavaan","blavaan","Rcpp","igraph","shinystan","StanHeaders","brms","rstan","rjags"),
repos = "http://cran.r-project.org",
type = "source",
dependencies = TRUE,
Ncpus = 40)
renvcd to the above directoryrenv package:
library("renv")R prompt, initialize the renv
project on the local repository of R packages with:
renv::init(project = "/Users/itpetersen/Documents/Projects/SelfRegulation_IRT/")renv::init() if you are in the intended working
directoryR session in the given project
directory after running init in order for the changes to take
effect!renv project.
Installed packages will go into a library within this project. After
initializing the renv project on the local repository of
R packages, packages from the local repository can be
installed with renv::install():
renv::install("package_name")renv project, simply start R
from the directory created in step (1). The project will initialize
automatically.To update version of renv:
renv::upgrade()
To install packages: renv::install("package_name")
To update packages: renv::update()
To save the current state of your library:
renv::snapshot()
To restore the state of your library from the lock file:
renv::restore()
To disable renv on a project:
renv::deactivate()
renv using a
DESCRIPTION fileIf you want to control which packages are installed in a
renv project, you can use a DESCRIPTION file
to specify the packages that should be installed:
Create a DESCRIPTION file in the project directory
with the following format:
Type: project
Description: My project.
Depends:
packageName1,
packageName2,
packageName3Run renv::settings$snapshot.type("explicit") to
suppress dependency discover and to enable “explicit” mode: https://rstudio.github.io/renv/reference/dependencies.html#explicit-dependencies
Run renv::init(bare=TRUE) to initialize the project
without attempting to discover and install R package
dependencies
renv has been initializedR environment module (see
Install R packages section above)R projectRpackage_name):
install.packages("package_name", repos = "http://cran.r-project.org", type = "source", dependencies = TRUE, Ncpus = 40)install.packages(c("renv","psych","tidyverse","data.table","nlme","lme4","mirt","TeachingDemos","Amelia","mice","miceadds","abind","future","lavaan","blavaan","Rcpp","igraph","shinystan","StanHeaders","brms","rstan","rjags","renv"), repos = "http://cran.r-project.org", type = "source", dependencies = TRUE, Ncpus = 40)renv::snapshot()update.packages(ask = FALSE)
R
Scriptmodule load r/4.0.5_gcc-9.3.0
cd path/to/dataSet123
[cd /Users/itpetersen/Documents/Projects/Test/Analyses]
Rscript path/to/program.R
[Rscript test.R]
q()
https://github.com/stan-dev/rstan/wiki/Installing-RStan-on-Linux (archived at https://perma.cc/89H9-L8S6)
Makevars:
\\data.hpc.uiowa.edu\argon_home\.R\Makevars
Sometimes Argon will fail to install packages in an R workspace. We advise the following steps:
Sometimes R provides an error stating something like “fatal error: modulename: No such file or directory”. In this instance, you may wish to exist R with q() and load all the modules that begin with the listed module name, as listed above in “Installing Linux Packages to Install R Packages”. Then reopen R and try to install the package again.
You can also try installing the package from binary using the RStudio Package Manager (RSPM):
remotes::install_github("cran4linux/rspm")
rspm::enable()
install.packages("PACKAGE_NAME")If that does not work, you can try downloading the .tar file directly from the CRAN repository. Then copy the file into your Argon folder and type ‘install.packages(“/Users/path/to/directory/package_name”, repos = NULL, type = “source”)’. Note that you may need to download an older version of the package (e.g., from the CRAN Archive), such that it is compatible with the version of R you are running on Argon (which it typically not the most recent R version):
install.packages("http://cran.r-project.org/src/contrib/Archive/MASS/MASS_7.3-60.0.1.tar.gz", repos = NULL, type = "source")Check if you are installing the necessary modules within your job script:
module load stack/2022.2
module load r/4.2.2_gcc-9.5.0Add a line to your R script to make sure it is accessing the same repository where you installed the packages manually:
.libPaths(c("/old_Users/HAWKID/Rlibs",
"/cvmfs/argon.hpc.uiowa.edu/2022.2/apps/linux-centos7-broadwell/gcc-9.5.0/r-4.2.2-lv7yirk/rlib/R/library"))If the prior steps fail, email research-computing@uiowa.edu with the error, asking for help to figure out how to install the packages.
You may have to set environment variables in each module file to help
the compiler find headers and libraries. Note that if you run into any
C++ code, you will need to set the CPLUS_INCLUDE_PATH
variable.
nloptrmodule load stack/2022.1
module load r/4.1.3_gcc-9.4.0
module load nlopt
LIBRARY_PATH so linker can find library while
launching R session (single line below):LIBRARY_PATH=$ROOT_NLOPT/lib64:$LIBRARY_PATH R
R console, install nloptr (two lines
below)install.packages(verbose = 1, "nloptr")
tkrplotmodule load stack/2022.1
module load r/4.1.3_gcc-9.4.0
R session (single line below):C_INCLUDE_PATH=$ROOT_XPROTO/include LIBRARY_PATH=$ROOT_LIBXEXT/lib:$ROOT_LIBXSCRNSAVER/lib R
R console, install tkrplot (two lines
below)install.packages(verbose = 1, "tkrplot")
Error in unserialize(node$con) : error reading from connection
Calls: parlmice ... FUN -> recvData -> recvData.SOCK0node -> unserialize
There likely wasn’t sufficient memory for a given core. Try increasing the max memory available and decreasing the number of cores and/or slots, so there is more memory available per core:
https://stackoverflow.com/questions/46186375/r-parallel-error-in-unserializenodecon-in-hpc (archived at https://perma.cc/MF6V-NAVS)
https://stackoverflow.com/questions/17015598/error-calling-serialize-r-function (archived at https://perma.cc/3Q75-DA2D)
https://gforge.se/2015/02/how-to-go-parallel-in-r-basics-tips/#Memory_load (archived at https://perma.cc/2JRF-8Y5F)
module load stack/2022.1
module load r/4.1.3_gcc-9.4.0
cd path
module load stack/2021.1
module load r/4.0.5_gcc-9.3.0
cd path
module load stack/2020.2
module load r/4.0.2_gcc-8.4.0
cd path
module load stack/2020.1
module load r/3.6.2_gcc-9.2.0
cd path
packratPlease note: we now use renv rather
than packrat for package management
https://uiowa.atlassian.net/wiki/spaces/hpcdocs/pages/76514803/Adding+R+programs+to+a+personal+library (archived at https://perma.cc/3SRR-2JE7)
cd to the above directorypackrat package:
library("packrat")R prompt, initialize the packrat
project on the local repository of R packages with:
packrat::init(project = "/Users/itpetersen/Documents/Projects/INSERT_PROJECT_NAME/", options = list(local.repos = "/Users/itpetersen/R/x86_64-pc-linux-gnu-library/4.0"))R session in the given project
directory after running init in order for the changes to take
effect!packrat
project. Installed packages will go into a library within this project.
After initializing the packrat project on the local
repository of R packages, packages from the local
repository can be installed with packrat::install_local():
packrat::install_local("package_name")R from the
directory created in step (1). The project will initialize
automatically.To save the current state of your library:
packrat::snapshot(); if that command fails due to an error
when fetching sources, try
packrat::snapshot(snapshot.sources = FALSE)
To disable packrat on a project:
disable(restart = FALSE)
packrat has been
initializedR environment module (see
Install R packages section above)R projectRpackage_name):
install.packages("package_name", repos = "http://cran.r-project.org", type = "source", dependencies = TRUE, Ncpus = 40)install.packages(c("packrat","psych","tidyverse","data.table","nlme","lme4","mirt","TeachingDemos","Amelia","mice","miceadds","abind","future","lavaan","blavaan","Rcpp","igraph","shinystan","StanHeaders","brms","rstan","rjags","renv"), repos = "http://cran.r-project.org", type = "source", dependencies = TRUE, Ncpus = 40)packrat::snapshot(); if that command fails due to an
error when fetching sources, try
packrat::snapshot(snapshot.sources = FALSE)packrat environment on a compute node (only if necessary if
package load fails due to issues building packages)qlogin – for more info, see here: https://uiowa.atlassian.net/wiki/spaces/hpcdocs/pages/76513454/Qlogin+for+Interactive+Sessions
(archived at https://perma.cc/Y3A9-WQ3W)module load stack/2020.2-base_arch – this will ensure
that the modules point to the lowest common multiarchitecture on Argon
and will run on all nodesmodule load rR projectRpackrat project (see above)packratexit – to exit the qlogin sessionNote: Your packrat environment will then be linked to
the proper glpk library and will run on any Argon node. You
do not need to use the 2020.2-base_arch module at run time,
only build time.