Analysis scripts for High Performance Computing (HPC) at the University of Iowa. Currently uses Argon.
Workflow: https://workflow.uiowa.edu/entry/new/3282/11927336
Cluster Systems Documentation: https://uiowa.atlassian.net/wiki/spaces/hpcdocs/pages/76513411/Cluster+Systems+Documentation (archived at https://perma.cc/EKB8-ZKR7)
Argon Cluster Documentation: https://uiowa.atlassian.net/wiki/spaces/hpcdocs/pages/76513466/Argon+Cluster (archived at https://perma.cc/6HST-VV6Y)
Significant amounts of data storage are provided on Argon, but data are not backed up in any way unless special arrangements are made. It is the responsibility of the user to back up important information.
User agrees to refrain from storing Restricted data on HPC resources. Data is classified as Restricted when the unauthorized disclosure, alteration or destruction of that data could cause a significant level of risk to the University or its affiliates. Examples of Restricted data include data protected by state or federal privacy regulations and data pertaining to identified human subjects that has not been deidentified.
At present there are no fees for the use of the Argon cluster for low-priority usage. For large users, or those who want access to dedicated resources, the option of purchasing or renting supplemental system hardware may be available. If you are interested in dedicated hardware, contact research-computing@uiowa.edu.
/Shared/lss_itpetersen
This PC
and select
Map Network Drive
Y
and type
\\data.hpc.uiowa.edu\argon_home\Documents
argon.hpc.uiowa.edu
Windows Explorer:
\\data.hpc.uiowa.edu\argon_home
(username: itpetersen@uiowa.edu or hawkid@uiowa.edu)
Using SecureCRT for an SSH connection: 1. Download SecureCRT from
UIowa Informational Technology Services: https://its.uiowa.edu/securecrt 2. In Hostname type
argon.hpc.uiowa.edu
and in username type HawkID 3. Click
connect
Mac OS Terminal:
ssh itpetersen@argon.hpc.uiowa.edu
(on campus)
ssh -p 40 itpetersen@argon.hpc.uiowa.edu
(off campus
without VPN)
- `pwd` print current working directory
- `cd path/to/directory` sets working directory
- `module load moduleName` loads modules required for analyses; necessary to complete prior to submitting jobs involving R
- `module list` lists downloaded modules
- `qsub` submit a job script to Argon processing
- `R CMD BATCH` submit a single R script to Argon processing
Argon requires Linux-compatible file endings, which is problematic for files created outside Argon (using DOS or CRLF file endings). Use the following commands to check if your files are Argon compatible (i.e., Unix LF) and resolve incompatible file endings:
- `file myfile.job | grep CRLF` checks if a file ending uses the incompatible CRLF format. Remember to check the ending of your file; this script assumes it is .job
- `dos2unix myfile.job` converts CRLF file endings to Argon-compatible LF endings
- `find ~/DirectoryName -name '*job' -exec dos2unix "{}" \;` converts all files in a directory that end with .job to LF format; change '*job' to change other file types
cd /Users/itpetersen/Documents/Projects/Bayesian_IRT/
cd /Users/itpetersen/Documents/Projects/EXT_pilot/
cd /Users/itpetersen/Documents/Projects/Multiple_Imputation/
cd /Users/itpetersen/Documents/Projects/SelfRegulation_IRT/
cd /Users/itpetersen/Documents/Projects/Test/
R
Packagesmodule load stack/2022.2
module load r/4.2.2_gcc-9.5.0
module load geos/3.9.1_gcc-9.5.0
module load gdal/2.4.4_gcc-9.5.0
module load proj/5.2.0_gcc-9.5.0
module load gsl/2.7.1_gcc-9.5.0
module load nlopt/2.7.0_gcc-9.5.0
module load jags/4.3.0_gcc-9.5.0-dev
module load zlib/1.2.13_gcc-9.5.0-dev
module load cmake/3.25.0_gcc-9.5.0
module load glpk/4.65_gcc-9.5.0-dev
module load libxml2/2.10.3_gcc-9.5.0
module load r-nloptr
module load r-zlibbioc/1.44.0_gcc-9.5.0
module load r-data-table/1.14.4_gcc-9.5.0
module load r-stringi/1.7.8_gcc-9.5.0
module load r-selectr/0.4-2_gcc-9.5.0
module load r-generics/0.1.3_gcc-9.5.0
module load r-fansi/1.0.3_gcc-9.5.0
module load r-utf8/1.2.2_gcc-9.5.0
module load r-pkgconfig/2.0.3_gcc-9.5.0
module load r-gtable/0.3.1_gcc-9.5.0
module load r-scales/1.2.1_gcc-9.5.0
module load r-tzdb/0.3.0_gcc-9.5.0
module load r-timechange/0.1.1_gcc-9.5.0
module load r-dbi/1.1.3_gcc-9.5.0
module load stack
module load stack/2022.2
module load r/4.2.2_gcc-9.5.0
cd path
https://uiowa.atlassian.net/wiki/spaces/hpcdocs/pages/76513440/Argon+Software+List (archived at https://perma.cc/WJ4Q-GDUS)
module load stack/2022.2
#!/bin/sh
# Set working directory
cd /Users/itpetersen/Documents/Projects/Bayesian_IRT/
# Specify qsub options
#$ -pe smp 4
#$ -M isaac-t-petersen@uiowa.edu
#$ -m eas
#$ -l mf=512G
#$ -l h_vmem=512G
#$ -cwd
#$ -q UI-HM
#$ -e /Users/itpetersen/Documents/Projects/Bayesian_IRT/Output/
#$ -o /Users/itpetersen/Documents/Projects/Bayesian_IRT/Output/
# Load the environment modules
module load stack/2022.2
module load r/4.2.2_gcc-9.5.0
# Run the R script
Rscript ./Analyses/factorScores.R
-pe smp 4
: specify a parellel environment and number of
cores to be used (smp
= shared memory parallel environment)
- The OMP_NUM_THREADS
variable is set to ‘1’ by default. If
your code can take advantage of the threading then specify
OMP_NUM_THREADS
to be equal to the number of job cores per
node requested.
-M isaac-t-petersen@uiowa.edu
: Set the email address to
receive email about jobs. This must be your University of Iowa email
address.
-m eas
: Specify when to send an email message (; ; ; ;
)
b
= beginning of jobe
= end of joba
= when job is aborteds
= when job is suspendedn
= no mail is sent-l mf=512G
: request a particular quantity of memory you
expect to use (to be available for your computation to start; the
request is only applicable at scheduling time. It is not a limit.)
-l h_vmem=512G
: request a particular quantity of virtual
memory you expect to use (to be available for your computation to start;
the request is only applicable at scheduling time. It is not a
limit.)
-cwd
: Determines whether the job will be executed from
the current working directory. If not specified, the job will be run
from your home directory.
-q UI-HM
: specify queue
-e /Users/itpetersen/Documents/Projects/Bayesian_IRT/Output/
:
Name of a file or directory for standard error.
-o /Users/itpetersen/Documents/Projects/Bayesian_IRT/Output/
:
Name of a file or directory for standard output.
https://uiowa.atlassian.net/wiki/spaces/hpcdocs/pages/76513450/Basic+Job+Submission (archived at https://perma.cc/2SS2-LEJR)
https://uiowa.atlassian.net/wiki/spaces/hpcdocs/pages/76513452/Advanced+Job+Submission (archived at https://perma.cc/8H6G-2M2F)
cd path/to/dataSet123
[cd /Users/itpetersen/Documents/Projects/Test/Jobs]
qsub myscript.job
Job dependency (run Job B
when Job A
is
finished):
qsub -hold_jid JOB_ID test_B.job
cd /Users/itpetersen/Documents/Projects/Bayesian_IRT/Jobs
qsub bayesianIRT.job
qsub -hold_jid JOB_ID factorScores.job
cd /Users/itpetersen/Documents/Projects/Multiple_Imputation/Jobs
qsub srs_selfRegulation.R
https://uiowa.atlassian.net/wiki/spaces/hpcdocs/pages/76513468/Queues+and+Policies (archived at https://perma.cc/UUR7-XLBZ)
UI
UI-HM
UI-GPU-HM
UI-DEVELOP
all.q
qstat -g c -q QUEUE_NAME
qstat -g c -q UI
qstat -u itpetersen
qstat -j JOB_ID
qacct -j JOB_ID
qstat -j JOB_ID | grep usage
qdel JOB_ID
R
ScriptSee here
https://uiowa.atlassian.net/wiki/spaces/hpcdocs/pages/76514707/R+Programs+in+Batch+mode+for+HPC (archived at https://perma.cc/99JQ-43ZG)
R
are installedmodule spider R
But, there may be more recent version of R
installed in
the “Additional Software Stacks” (https://uiowa.atlassian.net/wiki/spaces/hpcdocs/pages/76513440/Argon+Software+List;
archived at https://perma.cc/WJ4Q-GDUS)
R
If you want to compile a more recent version of R
than
is available in the software stacks, see here
(archived at https://perma.cc/C6EX-EZL4).
R
packageshttps://uiowa.atlassian.net/wiki/spaces/hpcdocs/pages/76514803/Adding+R+programs+to+a+personal+library (archived at https://perma.cc/3SRR-2JE7)
R
environment module
module load r/4.2.2_gcc-9.5.0
R
package_name
):
install.packages("package_name", repos = "http://cran.r-project.org", dependencies = TRUE, type = "source", Ncpus = 40)
packrat
, it is preferable to install packages
by source, if possible, but you can remove type = source
if
you want to install binariesWarning in install.packages("package_name", repos = "http://cran.r-project.org") :
lib = "/opt/R/3.0.2/lib64/R/library"' is not writable
Would you like to use a personal library instead? (y/n)
y
y
again when prompted to create the directory;
your package should download and install into your personal library
directoryExample:
install.packages(c(
"renv","psych","tidyverse","data.table","nlme","lme4","mirt","TeachingDemos","Amelia","mice","miceadds","abind","future","lavaan","blavaan","Rcpp","igraph","shinystan","StanHeaders","brms","rstan","rjags"),
repos = "http://cran.r-project.org",
dependencies = TRUE,
Ncpus = 40)
R
package locally from sourceinstall.packages(path_to_file, repos = NULL, type = "source", Ncpus = 40)
Example:
install.packages(c(
"renv","psych","tidyverse","data.table","nlme","lme4","mirt","TeachingDemos","Amelia","mice","miceadds","abind","future","lavaan","blavaan","Rcpp","igraph","shinystan","StanHeaders","brms","rstan","rjags"),
repos = "http://cran.r-project.org",
type = "source",
dependencies = TRUE,
Ncpus = 40)
renv
cd
to the above directoryrenv
package:
library("renv")
R
prompt, initialize the renv
project on the local repository of R
packages with:
renv::init(project = "/Users/itpetersen/Documents/Projects/SelfRegulation_IRT/")
renv::init()
if you are in the intended working
directoryR
session in the given project
directory after running init in order for the changes to take
effect!renv
project.
Installed packages will go into a library within this project. After
initializing the renv
project on the local repository of
R
packages, packages from the local repository can be
installed with renv::install()
:
renv::install("package_name")
renv
project, simply start R
from the directory created in step (1). The project will initialize
automatically.To update version of renv
:
renv::upgrade()
To install packages: renv::install("package_name")
To update packages: renv::update()
To save the current state of your library:
renv::snapshot()
To restore the state of your library from the lock file:
renv::restore()
To disable renv
on a project:
renv::deactivate()
renv
using a
DESCRIPTION
fileIf you want to control which packages are installed in a
renv
project, you can use a DESCRIPTION
file
to specify the packages that should be installed:
Create a DESCRIPTION
file in the project directory
with the following format:
Type: project
Description: My project.
Depends:
packageName1,
packageName2,
packageName3
Run renv::settings$snapshot.type("explicit")
to
suppress dependency discover and to enable “explicit” mode: https://rstudio.github.io/renv/reference/dependencies.html#explicit-dependencies
Run renv::init(bare=TRUE)
to initialize the project
without attempting to discover and install R
package
dependencies
renv
has been initializedR
environment module (see
Install R packages
section above)R
projectR
package_name
):
install.packages("package_name", repos = "http://cran.r-project.org", type = "source", dependencies = TRUE, Ncpus = 40)
install.packages(c("renv","psych","tidyverse","data.table","nlme","lme4","mirt","TeachingDemos","Amelia","mice","miceadds","abind","future","lavaan","blavaan","Rcpp","igraph","shinystan","StanHeaders","brms","rstan","rjags","renv"), repos = "http://cran.r-project.org", type = "source", dependencies = TRUE, Ncpus = 40)
renv::snapshot()
update.packages(ask = FALSE)
R
Scriptmodule load r/4.0.5_gcc-9.3.0
cd path/to/dataSet123
[cd /Users/itpetersen/Documents/Projects/Test/Analyses]
Rscript path/to/program.R
[Rscript test.R]
q()
https://github.com/stan-dev/rstan/wiki/Installing-RStan-on-Linux (archived at https://perma.cc/89H9-L8S6)
Makevars:
\\data.hpc.uiowa.edu\argon_home\.R\Makevars
Sometimes Argon will fail to install packages in an R workspace. We advise the following steps:
Sometimes R provides an error stating something like “fatal error: modulename: No such file or directory”. In this instance, you may wish to exist R with q() and load all the modules that begin with the listed module name, as listed above in “Installing Linux Packages to Install R Packages”. Then reopen R and try to install the package again.
You can also try installing the package from binary using the RStudio Package Manager (RSPM):
remotes::install_github("cran4linux/rspm")
rspm::enable()
install.packages("PACKAGE_NAME")
If that does not work, you can try downloading the .tar file directly from the CRAN repository. Then copy the file into your Argon folder and type ‘install.packages(“/Users/path/to/directory/package_name”, repos = NULL, type = “source”)’. Note that you may need to download an older version of the package (e.g., from the CRAN Archive), such that it is compatible with the version of R you are running on Argon (which it typically not the most recent R version):
install.packages("http://cran.r-project.org/src/contrib/Archive/MASS/MASS_7.3-60.0.1.tar.gz", repos = NULL, type = "source")
If steps 1 and 2 fail, email research-computing@uiowa.edu with the error, asking for help to figure out how to install the packages.
You may have to set environment variables in each module file to help
the compiler find headers and libraries. Note that if you run into any
C++ code, you will need to set the CPLUS_INCLUDE_PATH
variable.
nloptr
module load stack/2022.1
module load r/4.1.3_gcc-9.4.0
module load nlopt
LIBRARY_PATH
so linker can find library while
launching R
session (single line below):LIBRARY_PATH=$ROOT_NLOPT/lib64:$LIBRARY_PATH R
R
console, install nloptr
(two lines
below)install.packages(verbose = 1, "nloptr")
tkrplot
module load stack/2022.1
module load r/4.1.3_gcc-9.4.0
R
session (single line below):C_INCLUDE_PATH=$ROOT_XPROTO/include LIBRARY_PATH=$ROOT_LIBXEXT/lib:$ROOT_LIBXSCRNSAVER/lib R
R
console, install tkrplot
(two lines
below)install.packages(verbose = 1, "tkrplot")
Error in unserialize(node$con) : error reading from connection
Calls: parlmice ... FUN -> recvData -> recvData.SOCK0node -> unserialize
There likely wasn’t sufficient memory for a given core. Try increasing the max memory available and decreasing the number of cores and/or slots, so there is more memory available per core:
https://stackoverflow.com/questions/46186375/r-parallel-error-in-unserializenodecon-in-hpc (archived at https://perma.cc/MF6V-NAVS)
https://stackoverflow.com/questions/17015598/error-calling-serialize-r-function (archived at https://perma.cc/3Q75-DA2D)
https://gforge.se/2015/02/how-to-go-parallel-in-r-basics-tips/#Memory_load (archived at https://perma.cc/2JRF-8Y5F)
module load stack/2022.1
module load r/4.1.3_gcc-9.4.0
cd path
module load stack/2021.1
module load r/4.0.5_gcc-9.3.0
cd path
module load stack/2020.2
module load r/4.0.2_gcc-8.4.0
cd path
module load stack/2020.1
module load r/3.6.2_gcc-9.2.0
cd path
packrat
Please note: we now use renv
rather
than packrat
for package management
https://uiowa.atlassian.net/wiki/spaces/hpcdocs/pages/76514803/Adding+R+programs+to+a+personal+library (archived at https://perma.cc/3SRR-2JE7)
cd
to the above directorypackrat
package:
library("packrat")
R
prompt, initialize the packrat
project on the local repository of R
packages with:
packrat::init(project = "/Users/itpetersen/Documents/Projects/INSERT_PROJECT_NAME/", options = list(local.repos = "/Users/itpetersen/R/x86_64-pc-linux-gnu-library/4.0"))
R
session in the given project
directory after running init in order for the changes to take
effect!packrat
project. Installed packages will go into a library within this project.
After initializing the packrat
project on the local
repository of R
packages, packages from the local
repository can be installed with packrat::install_local()
:
packrat::install_local("package_name")
R
from the
directory created in step (1). The project will initialize
automatically.To save the current state of your library:
packrat::snapshot()
; if that command fails due to an error
when fetching sources, try
packrat::snapshot(snapshot.sources = FALSE)
To disable packrat
on a project:
disable(restart = FALSE)
packrat
has been
initializedR
environment module (see
Install R packages
section above)R
projectR
package_name
):
install.packages("package_name", repos = "http://cran.r-project.org", type = "source", dependencies = TRUE, Ncpus = 40)
install.packages(c("packrat","psych","tidyverse","data.table","nlme","lme4","mirt","TeachingDemos","Amelia","mice","miceadds","abind","future","lavaan","blavaan","Rcpp","igraph","shinystan","StanHeaders","brms","rstan","rjags","renv"), repos = "http://cran.r-project.org", type = "source", dependencies = TRUE, Ncpus = 40)
packrat::snapshot()
; if that command fails due to an
error when fetching sources, try
packrat::snapshot(snapshot.sources = FALSE)
packrat
environment on a compute node (only if necessary if
package load fails due to issues building packages)qlogin
– for more info, see here: https://uiowa.atlassian.net/wiki/spaces/hpcdocs/pages/76513454/Qlogin+for+Interactive+Sessions
(archived at https://perma.cc/Y3A9-WQ3W)module load stack/2020.2-base_arch
– this will ensure
that the modules point to the lowest common multiarchitecture on Argon
and will run on all nodesmodule load r
R
projectR
packrat
project (see above)packrat
exit
– to exit the qlogin sessionNote: Your packrat
environment will then be linked to
the proper glpk
library and will run on any Argon node. You
do not need to use the 2020.2-base_arch
module at run time,
only build time.