Use Launcher to bundle multiple serial jobs on Stampede2

As I was exploring Stampede2 at TACC for more RNA-seq data analysis, I needed to map the RNA seq reads for 68 samples to the Daphnia genome. Because on Stampede2, even a single serial job with 1 cpu would take the entire resources of the entire node of 16 cpus. I decided to use the tool launcher that built in Stampede2 to bundle multiple serial jobs and let launcher run the 68 mapping jobs in parallel.

The RNA-seq aligner I used was STAR. It can be loaded as a module on Stampede2. I first created a mapping script for each of the paired-end RNA-seq sample. Below is an example. For specific meaning of each option, please refer to STAR manual.

[wmd-divider style=”leafs” spacing=”40″ color=”#002426″ size=”2″ ls-id=”59d992a4a7ca8″/]

cd /your/working/directory

STAR –genomeDir /directory/to/your/STARindex \
–outFileNamePrefix SRR2062534 \
–outSAMstrandField intronMotif \
–quantMode GeneCounts \
–twopassMode Basic \
–runThreadN 2 \
–readFilesIn SRR2062534_1.fastq SRR2062534_2.fastq \
–outFilterMultimapNmax 1 \
–outReadsUnmapped Fastx \
–outFilterMatchNminOverLread 0.1 \
–outFilterScoreMinOverLread 0.1 \
–outSAMtype BAM SortedByCoordinate

[wmd-divider style=”leafs” spacing=”40″ color=”#002426″ size=”2″ ls-id=”59d992b825c73″/]

These scripts are named as 1.sh, 2.sh, 3.sh, …68.sh and all placed in a directory named as mapping_scripts

To load the launcher, use the command: module load launcher

copy the file $LAUNCHER_DIR//extras/batch-scripts/launcher.slurm into the mapping_scripts directory. The file content is listed below. The places to change are noted in bold letters.

[wmd-divider style=”leafs” spacing=”40″ color=”#002426″ size=”2″ ls-id=”59d992c3db57f”/]

#! /bin/bash

# Simple SLURM script for submitting multiple serial
# jobs (e.g. parametric studies) using a script wrapper
# to launch the jobs.
#
# To use, build the launcher executable and your
# serial application(s) and place them in your WORKDIR
# directory. Then, edit the CONTROL_FILE to specify
# each executable per process.
#——————————————————-
#——————————————————-
#
# <—— Setup Parameters ——>
#
#SBATCH -J STAR #name of the job
#SBATCH -N 1 #how many nodes you need
#SBATCH -n 7 #how many jobs
#SBATCH -p normal #the queue on stampede 2 to use
#SBATCH -o STAR.o%j #change according to you job name
#SBATCH -e STAR.e%j #change according to you job name
#SBATCH -t 48:00:00 #number of hours for the job to run. 48hr is the maximum for normal queue.
#SBATCH –mail-user=youremail@gmail.com #email address to send notification
#SBATCH –mail-type=all # Send email at begin and end of job
# <—— Account String —–>
# <— (Use this ONLY if you have MULTIPLE accounts) —>
##SBATCH -A
#——————————————————

export LAUNCHER_PLUGIN_DIR=$LAUNCHER_DIR/plugins
export LAUNCHER_RMI=SLURM
export LAUNCHER_JOB_FILE=/directory/to/your/jobfile #change the path here to point to you job file

$LAUNCHER_DIR/paramrun

[wmd-divider style=”leafs” spacing=”40″ color=”#002426″ size=”2″ ls-id=”59d992cecbff2″/]

Then, I created a jobfile that ask the system to execute 7 mapping scripts. An example file is below. The number 7 is because of the memory requirement for a Daphnia genome (200Mb), 1 node only has enough for about 7 mapping jobs. I experimented a bit with different number of jobs to make sure all the jobs can run without a problem. If a few jobs don’t have enough allocated memory, it will quite and the others will keep running until finishing. We can use the development queue to test for the optimal number.

[wmd-divider style=”leafs” spacing=”40″ color=”#002426″ size=”2″ ls-id=”59d992e384caa”/]

bash /full/path/to/your/1.sh

bash /full/path/to/your/2.sh

bash /full/path/to/your/7.sh

[wmd-divider style=”leafs” spacing=”40″ color=”#002426″ size=”2″ ls-id=”59d992ec68eba”/]

Once you have a jobfile like this, a few more files need to be created for including all other jobs. Similarly, we need to create a few files that are identical as the launcher.slurm file (above) except the export LAUNCHER_JOB_FILE=/directory/to/your/jobfile and job names need to be changed to point to the corresponding jobfiles. Let’s call these files launcher-1.slurm, launcher-2.slurm, …. launcher-1.slurm includes commands for the first 7 mapping tasks, launcher-2.slurm includes commands for the second batch of 7 mapping tasks, etc.

After this is all done, we can submit each of the launcher-1.slurm files using this command: sbatch launcher-1.slurm

After submission, each launcher script will spread the jobs onto a node and run them seperately. The status of the jobs can be monitored using the command: showq -u yourUserName