0

これは、mpiジョブの再送信に使用するジョブスクリプトです。このスクリプトは元々tcshシェル用に書かれています。bashシェル用に書き直そうとしましたが、エラーが発生します。スクリプトの修正を手伝ってください。

##============================================================================

#!/bin/bash                                                                                                                                                                                            
#PBS -l mem=10GB                                                                                                                                
#PBS -l walltime=12:00:00                                                                                                                       
#PBS -l nodes=2:ppn=6                                                                                                                                                                                                                                             
#PBS -v NJOBS,NJOB

if [ X$NJOBS == X ]; then
    $ECHO "NJOBS (total number of jobs in sequence) is not set - defaulting to 1"
    export NJOBS=1
fi

if [ X$NJOB == X ]; then
    $ECHO "NJOB (current job number in sequence) is not set - defaulting to 1"
    export NJOB=1
fi

#                                                                                                                                               
# Quick termination of job sequence - look for a specific file                                                                                  
#                                                                                                                                               
if [ -f STOP_SEQUENCE ] ; then
    $ECHO  "Terminating sequence at job number $NJOB"
    exit 0
fi

#                                                                                                                                               
# Pre-job file manipulation goes here ...                                                                                                       
# =============================================================================                                                                                                                                              
# INSERT CODE             
# =============================================================================

module load openmpi/1.4.3

startnum= 0
x=1
i= $(($NJOB + $startnum - $x))
j= $(($i + $x))

$ECHO "This is job $i"
#$ECHO floobuks.$i.blah                                                                                                                         
#$ECHO flogwhilp.$j.txt                                                                                                                         


#===========================================================================
# actual execution code                                                                                                                    
#===========================================================================                 

# this is just a sample 
echo "job $i is followed by $j"

#=========================================================================== 
RUN COMPLETE
#===========================================================================

#
# Check the exit status
#
errstat=$?
if [ $errstat -ne 0 ]; then
    # A brief nap so PBS kills us in normal termination
    # If execution line above exceeded some limit we want PBS
    # to kill us hard
    sleep 5
    $ECHO "Job number $NJOB returned an error status $errstat - stopping job sequence."
    exit $errstat
fi

#
# Are we in an incomplete job sequence - more jobs to run ?
#
if [ $NJOB -lt $NJOBS ]; then


#
# Now increment counter and submit the next job
#
    NJOB=$(($NJOB+1))
    $ECHO "Submitting job number $NJOB in sequence of $NJOBS jobs"
    qsub recur2.bash
else
    $ECHO "Finished last job in sequence of $NJOBS jobs"
fi

#==============================================================================

実行すると次のエラーが発生します

qsub -v NJOBS=4  recur2.bash



ModuleCmd_Load.c(200):ERROR:105: Unable to locate a modulefile for 'openmpi/1.4.3'
/var/spool/PBS/mom_priv/jobs/1833549.epic.SC: line 115: 0: command not found
/var/spool/PBS/mom_priv/jobs/1833549.epic.SC: line 117: 0: command not found
/var/spool/PBS/mom_priv/jobs/1833549.epic.SC: line 118: 1: command not found
/home/nsubramanian/bin/gromacs_3.3.3/bin/grompp_mpi: error while loading shared libraries: libmpi.so.0: cannot open shared object file: No such\
 file or directory
/var/spool/PBS/mom_priv/jobs/1833549.epic.SC: line 128: mpirun: command not found

私はopenmpiのエラーを理解することができましたが、残りはできませんでした。私はそれを機能させる方法がわかりません。

注:plsは、元のファイルとは異なる行番号を無視します。

4

1 に答える 1

1

あなたのシステムには openmpi/1.4.3 のようなモジュールはありません。そしてこれらの行で

startnum= 0
i= $(($NJOB + $startnum - $x))
j= $(($i + $x))

等号の後にスペースを入れてはいけません。

これを見つけるために必要だったのは、bash シェルでスクリプトを 1 行ずつ実行することだけでした。

于 2012-07-21T20:38:44.377 に答える