これは、mpiジョブの再送信に使用するジョブスクリプトです。このスクリプトは元々tcshシェル用に書かれています。bashシェル用に書き直そうとしましたが、エラーが発生します。スクリプトの修正を手伝ってください。
##============================================================================
#!/bin/bash
#PBS -l mem=10GB
#PBS -l walltime=12:00:00
#PBS -l nodes=2:ppn=6
#PBS -v NJOBS,NJOB
if [ X$NJOBS == X ]; then
$ECHO "NJOBS (total number of jobs in sequence) is not set - defaulting to 1"
export NJOBS=1
fi
if [ X$NJOB == X ]; then
$ECHO "NJOB (current job number in sequence) is not set - defaulting to 1"
export NJOB=1
fi
#
# Quick termination of job sequence - look for a specific file
#
if [ -f STOP_SEQUENCE ] ; then
$ECHO "Terminating sequence at job number $NJOB"
exit 0
fi
#
# Pre-job file manipulation goes here ...
# =============================================================================
# INSERT CODE
# =============================================================================
module load openmpi/1.4.3
startnum= 0
x=1
i= $(($NJOB + $startnum - $x))
j= $(($i + $x))
$ECHO "This is job $i"
#$ECHO floobuks.$i.blah
#$ECHO flogwhilp.$j.txt
#===========================================================================
# actual execution code
#===========================================================================
# this is just a sample
echo "job $i is followed by $j"
#===========================================================================
RUN COMPLETE
#===========================================================================
#
# Check the exit status
#
errstat=$?
if [ $errstat -ne 0 ]; then
# A brief nap so PBS kills us in normal termination
# If execution line above exceeded some limit we want PBS
# to kill us hard
sleep 5
$ECHO "Job number $NJOB returned an error status $errstat - stopping job sequence."
exit $errstat
fi
#
# Are we in an incomplete job sequence - more jobs to run ?
#
if [ $NJOB -lt $NJOBS ]; then
#
# Now increment counter and submit the next job
#
NJOB=$(($NJOB+1))
$ECHO "Submitting job number $NJOB in sequence of $NJOBS jobs"
qsub recur2.bash
else
$ECHO "Finished last job in sequence of $NJOBS jobs"
fi
#==============================================================================
実行すると次のエラーが発生します
qsub -v NJOBS=4 recur2.bash
ModuleCmd_Load.c(200):ERROR:105: Unable to locate a modulefile for 'openmpi/1.4.3'
/var/spool/PBS/mom_priv/jobs/1833549.epic.SC: line 115: 0: command not found
/var/spool/PBS/mom_priv/jobs/1833549.epic.SC: line 117: 0: command not found
/var/spool/PBS/mom_priv/jobs/1833549.epic.SC: line 118: 1: command not found
/home/nsubramanian/bin/gromacs_3.3.3/bin/grompp_mpi: error while loading shared libraries: libmpi.so.0: cannot open shared object file: No such\
file or directory
/var/spool/PBS/mom_priv/jobs/1833549.epic.SC: line 128: mpirun: command not found
私はopenmpiのエラーを理解することができましたが、残りはできませんでした。私はそれを機能させる方法がわかりません。
注:plsは、元のファイルとは異なる行番号を無視します。