Useful links and examples of batch
jobs for the commonly used software on various computers.
See migration and note cedar and
plato have the same setup except allocation. Note changed from computecanada
name to alliancecan and therefore please use this to e-mail support:
Links
to Compute Canada login: https://ccdb.computecanada.ca/security/login
Multifactor
authentication (MFA):: https://docs.alliancecan.ca/wiki/Multifactor_authentication
Main
storage for acl-jerzy group is on cedar and you can access it issuing at home
dir:
cd /project/rrg-jerzy-ab/jerzygroup
(recommended to create link: ln –s /project/rrg-jerzy-ab/jerzygroup
jerzygroup or:
ln -s /project/6004094/jerzygroup/ jerzygroup , if
above does not work.)
Please see there info on permissions in file: /jerzygroup/backup_records/permisionsetup
and under acl on: https://docs.alliancecan.ca/wiki/Sharing_data
use command quota to check used space and number of files.
Do not store wfc files produced by QE. Either delete them (rm *.wfc*) or use for backup:
rsync -rv --exclude '*.wfc*' * szpunarb@cedar.computecanada.ca:/home/szpunarb/dirnane
Not much used directories can be archived but please add REDME file to describe content:
In order to "zip" a directory, the correct command would be
tar -zcvf directory_name.tar.gz
directory_name/
This will tell tar to
RAC allocation is limited and default priority may
became higher than RAC, therefore please check your usage level for all
accessible allocations if your jobs do not run on currently used allocation and
compare using the commands (here compared default and RAC):
sshare -l -A def-szpunarb_cpu -a --format=Account,User,EffectvUsage,LevelFS
Account User EffectvUsage LevelFS
def-szpunarb_cpu
0.000239 4.174797
sshare -l -A rrg-szpunarb-ad_cpu
-a --format=Account,User,EffectvUsage,LevelFS
Account
User EffectvUsage
LevelFS
rrg-szpunarb-ad_cpu 0.000063 1.508949
As you can see, the LevelFS for the default allocation is "4.174797" while it is only "1.508949" for rrg-* accounting group because of heavier usage of rg*
See more on (Ali):
https://docs.alliancecan.ca/wiki/Frequently_Asked_Questions#Why_are_my_jobs_taking_so_long_to_start.3F
Resources:
ComputeCanada: https://docs.alliancecan.ca/wiki/Technical_documentation
Good training: https://computecanada.github.io/2019-01-21-sfu/
WestGrid web including training
materials: https://westgrid.ca
Materials Getting
started videos http://bit.ly/2sxGO33
Compute Canada YouTube
channel http://bit.ly/2ws0JDC
JupyterHub: https://docs.alliancecan.ca/wiki/JupyterHub
Py4fVasp : To export the data as
a csv file, you need to change .plot() to .to_csv(). To create a downloadable
image, change .plot() to .to_image().
Atomistic structures
displays and more: VESTA OVITO
For the recent
version modify script as shown on alliance cluster via command: module spider
vasp (lammps…)
Software |
Cedar |
Graham |
narval (64p/n) |
Beluga (40p/n) |
Migration |
|
|
|
https://docs.alliancecan.ca/wiki/Rorqual/en rorqual.alliancecan.ca (64 cores/node) |
QE |
|
|||
QE run array with%1 only |
|
sbatch --array=1-108%1 joname |
|
|
QE (pw, ph) Verif. 2020 |
jobcomplete_cedar_intel_QE6.2..2 jobcnotworkingomplete_cedar_gcc_QE6.2.2 for primitive unite cells: |
|
||
EPW |
|
|||
ShengBTE |
|
Beluga_BTE
complete_current .job Tested with QE6.2.2 |
||
almaBTE |
|
|
||
|
|
|||
GULP |
|
|
|
|
WIEN2K |
|
|
|
|
Phono3py Phonopy/phonons Phonopy-QHA |
Phono3pyinst_StdEnv/2020_example |
|
||
VASP (licensed) # VASP (Jaya) VASP new for group VASP6.5 (Dhanish |
|
|
|
|
VASP elastic |
|
|
|
|
VASP LAMMPS ML (Ata) |
|
|
|
|
Boltztrap (v2b.py
vasp2boltz.py) BoltzTraP2 |
|
|
|
|
|
|
|
||
Priority/share |
[jaya@cedar5 ~]$
sshare -U |
|
|
|
Notes: |
|
|
|
*Note that the web page is
managed on PC, with windows operating system, therefore when downloading the
above batch_job examples into unix system if it is in DOS format one should
reformat it to UNIX format by issuing for example command: dos2unix batch_job .
Link to VASP tutorials: https://www.vasp.at/tutorials/latest/, geometry, DOS, BANDS py4vasp,
py4vasp, geometry,
DOS, BANDS, vaspkit
Example of batch job for
geometry optimization: vasp_loop_job.sh, and spin polarized calculations: Ni ferromagnetic case NiO,
Dudarev Hubbard U
Summary how to run VASP on
cluster using Jupyter interface::Aliancejupyterlab_VASP.pdf
# Installation of VASP by Ata: To install the most recent version from source use: eb VASP-6.5.0-iimpi-2023a.eb --sourcepath=/home/szpunarb/nearline/def-szpunarb/VASPsource
* The name of the executable may differ from one version to another.
$ To access potentials for exemplary versions do e.g.:
module load StdEnv/2023 intel/2023.2.1 openmpi/4.1.5 vasp/6.4.2
cd
$EBROOTVASP/pseudopotentials/
module load StdEnv/2023
intel/2023.2.1 intelmpi/2021.9.0
vasp/6.4.3 (Barbara’s access to coped the most recent pseudopotentials)
cd
$EBROOTVASP/pseudopotentials/
source
/project/6001430/vasp-6.4.3/bin/vasp_std (Group access but no potentials)
For group to get the most
recent potentials ask code holder (access from portal).
^BoltzTraP2 (original
guide) you can install yourself using the
attached instruction from Ata: BoltzTraP2instalation.txt
$ Phono3py https://iopscience.iop.org/article/10.1088/1361-648X/acd831
@Elastic constants for VASP can be calculated by
setting IBRION=6 and ISIF=3 so that you can get the
stress tensor for whatever crystal you are using as recommended here.
Ata installed 'Elastic‘
by Paweł T. Jochym (cover non-cubic cells) in wheel on clusters and the
instructions are listed above in the copied here link Running_elastic
% Ata upgraded ieb instillation of castep
to the latest version:
module
--force purge
module
load StdEnv/2023 intel/2023.2.1
openmpi/4.1.5
eb
CASTEP-24.1-iofb-2023a.eb
--sourcepath=/home/szpunarb/nearline/def-szpunarb/CASTEP/Download
ML VASP-LAMMPS: From Asmabi –VASP Forum: Since version VASP 6.5 it is possible to use the VASP ML
potentials in LAMMPS. You can find a comprehensive guide on how to
interface the two code and this one on how
to create the potentials in VASP.
+ LAMMPS Zr (GAPML potential) https://gitlab.com/yluo13/eam-tabgap/-/tree/main/tabGAPs?ref_type=heads,
(Jesper Byggmästar et al.) ref. paper: https://doi.org/10.1016/j.commatsci.2023.112730
&If you take the whole node, you can ask for the whole memory by using: --mem=0 (see examples prepared by Jaya for Graham)
#To check current QE versions do:
module -r spider 'quantumespresso*'
Versions, expanded info on current:
quantumespresso/6.8
quantumespresso/7.0
quantumespresso/7.1
quantumespresso/7.2
quantumespresso/7.3.1
module spider quantumespresso/7.3.1
To set environment you will need to load all module(s) on any one of the lines below before the "quantumespresso/7.3.1" module is available to load.
StdEnv/2023 gcc/12.3 openmpi/4.1.5
StdEnv/2023 intel/2023.2.1 openmpi/4.1.5
To use the GCC build:module for which EPW works OK do:
module load gcc/5.4.0, but more recent intel on beluga works too and see
notes above.
module load quantumespresso/6.3, 6.4.1 lists resistivity.
To use the more common Intel build:module load quantumespresso/6.3. 6.4
version works OK on beluga with new intel on one node.
For detailed information about a specific "quantumespresso" module (including how to load the modules) use the module's full name.
For example:
$ module spider quantumespresso/7.3.1
So use
$ module load quantumespresso/7.3.1
to set up runtime environment as shown above.
Find the path example:
which thirdorder_espresso.py
/cvmfs/soft.computecanada.ca/easybuild/software/2017/avx/MPI/intel2016.4/openmpi2.1/shengbte/1.1.1/bin/thirdorder_espresso.py
Fixture for error in
diagonalization (Olivier):
1)
Davidson (default) uses serial diagonalization as checked on beluga for
this change in batch job:
srun pw.x
-ndiag 1 <DISP.un_sc.in. >
DISP.un_sc.in.out
2)
Switch to conjugate gradient method and play with mixing (input file,
electrons):
diagonalization = 'cg'
mixing_ndim=8 (default could be lowered e.g. to 4)
The most recent LAMMPS 2019, August is installed on cedar, graham and beluga.
Unfortunately, the package LATTE did not work. I will
have to find a way to install it separately.
To load the module, use:
module load nixpkgs/16.09 intel/2018.3 openmpi/3.1.2
lammps-omp/20190807
As for the scripts, the same ones from the
documentation should work:
https://docs.computecanada.ca/wiki/LAMMPS
Note that the name of the executable is: lmp_icc_openmpi
The following packages are included in this version:
$ cat ${EBROOTLAMMPS}/list-packages.txt | grep -a "Installed YES:"
Installed YES: package ASPHERE
Installed YES: package BODY
…..
The following packages are not supported in this
version:
$ cat ${EBROOTLAMMPS}/list-packages.txt | grep -a "Installed
NO:"
Installed NO: package GPU
Installed NO: package KOKKOS
……
To figure out what is the name of the executable for LAMMPS, do the following (example for lammps-omp/20170331):
module load lammps-omp//20170811
ls ${EBROOTLAMMPS}/bin/
lmp lmp_icc_openmpi
From this output, the executable is: lmp_icc_openmpi and lmp is a symbolic link to the executable. We have done the same for each version installed: there is a symbolic link to each executable and it is called lmp. It means that no matter which module you pick, lmp will work as executable for that module.
For detailed information about a specific "lammps" module (including how to load the modules) use the module's full name. For example: $ module spider lammps/20170331
Check available modules:
module available
To find other possible module matches execute:
$ module -r spider '.*lammps.*'
See also: slurm-directives on plato and cedar.
The details on performance of running job can be found via portal and login to see e.g. on narval:
https://portail.narval.calculquebec.ca/
Info on running
job:
scontrol show jobid 13448796
scontrol update
jobid=446153 timelimit=10-00:00:00
To cancel job:
scancel 13448796
The details about efficiency of completed job can be found via seff:
seff 40220208
Job ID: 40220208
Cluster: cedar
User/Group: szpunarb/szpunarb
State: TIMEOUT (exit code 0)
Nodes: 6
Cores per node: 48
CPU Utilized: 186-22:22:53
CPU Efficiency: 16.23% of 1152-01:12:00 core-walltime
Job Wall-clock time: 4-00:00:15
Memory Utilized: 5.54 GB
Memory Efficiency: 0.49% of 1.10 TB
Not very useful command on running job info:
sacct -j 39677871
Account User JobID
Start
End AllocCPUS Elapsed
AllocTRES CPUTime AveRSS MaxRSS
MaxRSSTask MaxRSSNode NodeList ExitCode
State
---------- --------- ------------ ------------------- -------------------
---------- ---------- ------------------------------ ---------- ----------
---------- ---------- ---------- --------------- -------- --------------------
rrg-szpun+ szpunarb 39677871 2022-07-18T04:15:16
2022-07-20T04:15:24 48 2-00:00:08 billing=48,cpu=48,mem=187.50G+
96-00:06:24
cdr2183 0:0
TIMEOUT
https://www.rc.fas.harvard.edu/resources/documentation/convenient-slurm-commands/
https://researchcomputing.princeton.edu/support/knowledge-base/memory
https://westgrid.github.io/manitobaSummerSchool2018/4-materials.html
https://www.westgrid.ca/support/training
Info on new
servers:
https://docs.computecanada.ca/wiki/Available_software
https://docs.computecanada.ca/wiki/Project_layout
https://docs.computecanada.ca/wiki/Sharing_data
https://docs.computecanada.ca/wiki/Utiliser_des_modules/en
Note Ali presentation and note about LAMMPS:
In case you are interested in how the program scales,
I gave this week a presentation (online webinar) where I have presented few
slides about how the time spend in computing interactions between the particles
(in LAMMPS) scales with number of processors and number of particles. For a
given system, more processors you add, more communications between the
processors increase which kills the performance of the program. I used one type
of the potential but the idea is the same since for most all potentials, the
program spent more than 80% of the time in computing the pair interactions.
Even for a system with 1 000 000 particles, the efficiency drops for more than
16 or 32 cores. The results may differ a little bit if the shape of the
simulation box is different.
You can see the slides here:
https://www.westgrid.ca/events/introduction_classical_molecular_dynamics_simulations
Resources on local servers:
Globus
and other: https://www.usask.ca/ict/services/research-technologies/advanced-computing/plato/running-jobs.php
WestGrid
link: https://www.westgrid.ca/support/quickstart
Calcul Québec: https://wiki.calculquebec.ca/w/Accueil?setlang=en
https://wiki.calculquebec.ca/w/Ex%C3%A9cuter_une_t%C3%A2che/en
Plato:
https://wiki.usask.ca/display/ARC/Plato+technical+specifications
https://wiki.usask.ca/display/ARC/Quantum+ESPRESSO+on+Plato
https://wiki.usask.ca/display/ARC/LAMMPS+on+Plato
to search:
https://wiki.usask.ca/display/ARC/Advanced+Research+Computing
Note:
avx1 nodes have 16 processors with memory per node 310000M for use
avx2 nodes have 40 processors with memory per node 190000M for use
See: https://wiki.usask.ca/pages/viewpage.action?pageId=1620607096
Memory limit on plato: physical memory existent on
each node: 16 x 1.94 = 31.04 G
Increase memory by using more nodes.
More info on Plato according to Juan:
Time limits
(and priorities) are as follows:
long: 504 hours (lowest priority)
short: 84 hours (normal priority)
rush: 14 hours (highest priority)
jobs are simply sorted according to walltime. If you request less than 14 hours then the job goes to rush and gets a boost in priority.
Now, the ‘R’ or ’S’ stand for Researchers or Students. Students’ jobs have also higher priority, i.e. time limits are the same, but Srush has higher priority than Rrush.
Plato at
USASK documentation (vpn.usask.ca login required outside university):
https://wiki.usask.ca/display/ARC/University+Pages
https://wiki.usask.ca/display/ARC/System+status
https://wiki.usask.ca/display/ARC/Advanced+Research+Computing
https://wiki.usask.ca/display/ARC/Plato+HPC+Cluster
Note
installations of QE and VASP from Alliance/Compute Canada only fully compatible
on avx2
Batch jobs*: Note users who use
VASP on plato have a separate access account to all software and are required
to add in the second line(see VASP batch job) #SBATCH--account=hpc_p_szpunar
Software |
Plato |
guillimin |
Grex |
Bugaboo |
Jasper |
Third on avx2 |
|
|
|
|
|
QE |
|||||
QE (pw,ph) |
|
|
|
|
|
EPW |
|||||
BTE |
|
||||
almaBTE |
|
|
|
|
|
LAMMPS |
|||||
GULP |
|
|
|
|
|
VASP |
|
|
|
|
|
VASP/LAMMPS ML (DhanishS) |
|
|
|
|
|
WIEN2K |
|
|
|
|
|
STATUS |
qstat –u UN |
showq –u UN |
showq –u UN |
showq –u UN |
showq –u UN |
*Note
that the web page is managed on PC, with windows operating system, therefore
when downloading the above batch_job examples into unix system if it is in DOS
format one should reformat it to UNIX format by issuing for example command: dos2unix batch_job .
# versions of QE on plato:
Versions:
quantumespresso/6.0
quantumespresso/6.1
quantumespresso/6.2.2
quantumespresso/6.3
quantumespresso/6.4.1
[...]
$ module spider
quantumespresso/6.4.1
[...]
You will need to load all
module(s) on any one of the lines below before the
"quantumespresso/6.4.1"
module is available to load.
nixpkgs/16.09
gcc/7.3.0 openmpi/3.1.2
nixpkgs/16.09
intel/2019.3 openmpi/4.0.1
[...]
$ module load gcc/7.3.0
$ module load openmpi/3.1.2
$ module load
quantumespresso/6.4.1
Workstation in Prof. Szpunar LAB: @me-mz018ce7.usask.ca
To access e.g. for user bas627 use vip and login e.g via putty using
this address:
bas627@me-mz018ce7.usask.ca
Dhanish's NSID has sudo permissions to install from enabled repos etc.
Accessible directories and shared
area:
bas627@me-mz018ce7:~/snap/snapd-desktop-integration$
pwd
/home/bas627/engr-me/snap/snapd-desktop-integration
bas627@me-mz018ce7:~/snap/snapd-desktop-integration$
ls
253
common current
bas627@me-mz018ce7:~/snap/snapd-desktop-integration$
cd /datastore/JerzyGroup/
bas627@me-mz018ce7:/datastore/JerzyGroup$
pwd
/datastore/JerzyGroup
bas627@me-mz018ce7:/datastore/JerzyGroup$
ls
bas627 djo764
eej452 jad391 jir520 lim520 mdr456
bas627@me-mz018ce7:/datastore$
cd /data
bas627@me-mz018ce7:/data$
pwd
/data
bas627@me-mz018ce7:/data$
ls
lost+found
INFO from Hu Song:
1. storage
home dir is 100GB each on our NFSFiles server. Data is backed up daily.
/data (5TB local disk) is r/w for all your group members. This is not backed up
anywhere.
back up important data to /datastore/JerzyGroup
2. nvidia and cuda
soh516@me-mz018ce7:~$ nvidia-smis
Wed Apr 9 15:42:12 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI
550.120
Driver Version: 550.120 CUDA Version:
12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU
Name
Persistence-M | Bus-Id
Disp.A | Volatile Uncorr. ECC |
| Fan Temp
Perf Pwr:Usage/Cap
| Memory-Usage |
GPU-Util Compute M. |
|
|
|
MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA RTX
A1000
Off | 00000000:C2:00.0 Off
|
N/A |
| 30% 43C
P8 N/A
/ 50W | 77MiB / 8188MiB
| 0% Default |
|
|
|
N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
|
Processes:
|
| GPU GI
CI PID Type
Process
name
GPU Memory |
| ID
ID
Usage |
|=========================================================================================|
| 0 N/A N/A
2659 G
/usr/lib/xorg/Xorg
40MiB |
| 0 N/A N/A
2763 G /usr/bin/gnome-shell
6MiB |
+-----------------------------------------------------------------------------------------+
soh516@me-mz018ce7:~$ /usr/local/cuda-12.8/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Fri_Feb_21_20:23:50_PST_2025
Cuda compilation tools, release 12.8, V12.8.93
Build cuda_12.8.r12.8/compiler.35583870_
3. Applications
installed VASP and LAMMPS system wide
soh516@me-mz018ce7:~$ which vasp_
vasp_gam vasp_ncl vasp_std
soh516@me-mz018ce7:~$ which lmp
/usr/bin/lmp
Boltztrap2, Phonopy, Phono3py are just some python packages. You can install by
yourself via python venv or anaconda
# if you would like to use python venv
python3 -m venv mypyenv
source mypyenv/bin/activate
pip install BoltzTraP2
# if you would like to use conda
/opt/anaconda3/bin/conda init
source .bashrc
conda create --name conda_test
conda activate conda_test
conda install package
vaspkit is something you can install in your home dir. I think you will use the
same procedure as what you do on plato.
4. permissions
Dhanish's NSID has sudo permissions to install from enabled repos etc.
Running applications: (Dhanish):
Code |
Command |
Comments |
VASP |
nohup mpirun -np 30 /opt/vasp/vasp.6.5.0/bin/vasp_std > vasp.out
2>&1 & |
instead of 30 you can use the number of cores you want to use |
CASTEP |
mpirun -np 4 castep.mpi inputfile |
module
use /data/castep/modules/all module
load CASTEP/24.1-foss-2023b |
LAMMPS |
nohup
mpirun -np 4 /usr/bin/lmp -in in.lammps > lammps.out 2>&1 & |
instead
of 4 you can use the number of cores you want to use |
Phonopy |
|
|
Phono3py |
|
|
BoltztTrap2 |
./boltztrap.sh |
chmod +x
boltztrap.sh |
VESTA |
export
PATH=/data/vaspkit/vaspkit.1.5.1/bin:$PATH |
start
ssh with X : ssh -X your_username@me-mz018ce7 |
MATLAB |
nohup
/usr/local/MATLAB/R2024a/bin/matlab -nodisplay -r
"run('matlabprogram.m'); exit" > /dev/null 2>&1 & |
include
number of cores to be used in the program |
VASPkit |
vaspkit |
Each
group member has to add this line to their ~/.bashrc: |
MIGRATION |
|||
UNIVERSITY |
CONSORTIA |
CLUSTER NAME |
DEFUNDING |
U. Guelph |
Compute Ontario |
Mako |
7/31/2017 |
U. Waterloo |
Compute Ontario |
Saw |
7/31/2017 |
U. Toronto |
Compute Ontario |
TCS |
9/27/2017 |
U. Alberta |
WestGrid |
Jasper |
9/30/2017 |
U. Alberta |
WestGrid |
Hungabee |
9/30/2017 |
U. Toronto |
Compute Ontario |
GPC |
12/31/2017 |
Dalhousie U. |
ACENET |
Glooscap |
4/18/2018 |
Memorial U. |
ACENET |
Placentia |
4/18/2018 |
St. Mary's U. |
ACENET |
Mahone |
4/18/2018 |
U. New Brunswick |
ACENET |
Fundy |
4/18/2018 |
McMaster U. |
Compute Ontario |
Requin |
4/18/2018 |
Queen's U. |
Compute Ontario |
CAC |
4/18/2018 |
U. Guelph |
Compute Ontario |
Global_b |
4/18/2018 |
U. Guelph |
Compute Ontario |
Redfin |
4/18/2018 |
U. Waterloo |
Compute Ontario |
Orca |
4/18/2018 |
Western U. |
Compute Ontario |
Monk |
4/18/2018 |
Western U. |
Compute Ontario |
Global_a |
4/18/2018 |
Western U. |
Compute Ontario |
Global_c |
4/18/2018 |
Western U. |
Compute Ontario |
Kraken |
4/18/2018 |
Simon Fraser U. |
WestGrid |
Bugaboo |
4/18/2018 |
U. British Columbia |
WestGrid |
Orcinus |
4/18/2018 |
U. Calgary |
WestGrid |
Parallel |
4/18/2018 |
U. Manitoba |
WestGrid |
Grex |
4/18/2018 |
Concordia U. |
Calcul Quebec |
Psi |
12/31/2018 |
McGill U. |
Calcul Quebec |
Guillimin |
12/31/2018 |
U. Laval |
Calcul Quebec |
Colosse |
12/31/2018 |
U. Montreal |
Calcul Quebec |
Cottos |
12/31/2018 |
U. Montreal |
Calcul Quebec |
Briarée |
12/31/2018 |
U. Montreal |
Calcul Quebec |
Hadès |
12/31/2018 |
U. Sherbrooke |
Calcul Quebec |
MS2 |
12/31/2018 |
U. Sherbrooke |
Calcul Quebec |
MP2 |
12/31/2018 |
|
|
|
|