Indiana University
University Information Technology Services
  
What are archived documents?

ARCHIVED: How do I combine several serial jobs into one MPI parallel job?

Note: Although this document assumes you are using the Research SP at Indiana University, it applies to other platforms as well.

An MPI job consists of multiple processes executing the same piece of code in parallel. Each process is uniquely identified by a "rank" variable. Depending on this rank, different processes execute different parts of the code at the same time. The following Fortran 90 program mytest.F illustrates this idea: program mytest include "mpif.h" integer myrank integer n integer ierr call mpi_init(ierr) call mpi_comm_rank(MPI_COMM_WORLD, myrank, ierr) if (myrank==0) then call system("ls") write(*,*) else call system("echo ""This is a test""") end if call mpi_finalize(ierr) end The above MPI program uses two processes to execute two different shell commands in parallel. One lists the content of the current directory, and the other one prints the text "This is a test" to the standard output. Since mpi_init() initializes an MPI execution environment and creates a group (MPI_COMM_WORLD) that contains all the processes this MPI job has, it must be called before any other MPI call in the program. The mpi_comm_rank() call retrieves the rank (myrank) of the calling process from this group. Different processes get different rank values (starting at 0 and ascending) when they perform this call. Therefore, if two processes are used, one process gets 0 for myrank, and the other gets 1. Next, the process with rank 0 will execute the ls command, while the process with rank 1 executes the echo command. The commands are issued via the system subroutine, which passes its string argument to the sh command as input. The sh command will interpret that input as a command and run it. To terminate the MPI execution environment, mpi_finalize() is used. You would compile the program using:

mpxlf -o mytest mytest.F

The LoadLeveler job submission script for this program would be:

#@ class = pb #@ job_type = parallel #@ network.MPI = css0,shared,IP #@ node = 1,1 #@ tasks_per_node = 2 #@ wall_clock_limit = 0:10:00 #@ initialdir = . #@ executable = /bin/poe #@ arguments = mytest -stdoutmode ordered -labelio yes #@ environment = COPY_ALL; MP_EUILIB=ip; MP_INFO_LEVEL=2; #@ output = mytest.out #@ error = mytest.error #@ queue

The output of the program would be:

0:hostlist 0:mytest 0:script 0:test.F 0:test.error 0:test.out 0: 1:This is a test

In the job submission script, the -stdoutmode ordered argument to POE is to instruct it to order the output from different processes; otherwise, output from different processes executing in parallel may get interleaved. The -labelio yes argument instructs the POE to label the lines of output according to the rank of the processes.

As another example, suppose there is a program called myprog that takes input from an input file, performs certain operations, and then outputs the result into an output file. All these operations are done in one directory. Assume that you want to apply myprog to several input files that are distributed into several directories (/N/u/someuser/dir1, /N/u/someuser/dir2, /N/u/someuser/dir3), and in each directory there is a copy of myprog. Then the following Fortran 90 program, myprogmpi.F, bundles these serial jobs into one parallel job:

program myprogmpi include "mpif.h" integer myrank integer n integer ierr call mpi_init(ierr) call mpi_comm_rank(MPI_COMM_WORLD, myrank, ierr) if (myrank==0) then call system("cd /N/u/someuser/dir1; myprog input") elseif (myrank==1) then call system("cd /N/u/someuser/dir2; myprog input") elseif (myrank==2) then call system("cd /N/u/someuser/dir3; myprog input") end if call mpi_finalize(ierr) end

It's best to request three processors in the job submission script; you can instruct each process to run a different executable, too.

Using this idea, you could combine several serial jobs into one parallel job. (For example, instead of submitting several jobs to the class a or class b queue on the Research SP, you can submit one MPI job to the pb queue.) Suppose you have n serial jobs; you can create n MPI processes. Each process is in charge of executing one of the serial jobs (e.g., using the system() call). As long as these serial jobs do not depend on or interfere with each other, they will execute in parallel and yield correct results.

Dependency occurs when the correct execution of one serial program depends on the result of another one. In this case, you should consider using job steps (keywords step_name and dependency) in the LoadLeveler job submission script. For example, you can write a script that indicates step1 will run with the executable called myprogram1, and step2 will run only if LoadLeveler removes step1 from the system. If step2 does run, the executable called myprogram2 is run. Following is an example: # @ class = a # @ job_type = serial # @ node_usage=shared # @ environment = COPY_ALL; MP_EUILIB=IP; MP_SHARED_MEMORY=yes; # @ output = /N/u/someuser/$(step_name).in # @ output = /N/u/someuser/$(step_name).out # @ error = /N/u/someuser/$(step_name).err # Beginning of step1 # @ step_name = step1 # @ executable = myprogram1 # @ arguments=arg1 # @ queue # Beginning of step2 # @ step_name = step2 # @ dependency = (step1 == CC_REMOVED) # @ executable = myprogram2 # @ arguments=arg2 # @ queue

CC_REMOVED is a job return code indicating the job step is removed from the system. For more information on dependency and job return codes and their applications in job steps, visit:

https://sp-www.iu.edu/LoadL/lllv2mst95.html

You will need to enter your Indiana University Network ID username and passphrase to access this resource.

An example of interference among serial programs is two serial programs all writing to the same output file (e.g., standard output). For example, if myprog in the above example writes to the standard output, then several instances of myprog run concurrently will garble the standard output file. For this specific example, changing the system call to call system("cd /N/u/someuser/dir1; myprog input > stdout1") will solve this problem.

If you do not resolve the problem, you should realize that the output file will contain mixed output from both programs, and thus will need proper interpretation.

For a tutorial on MPI programming, refer to the Parallel Programming Tutorial at:

http://www.iu.edu/~rac/hpc/mpi_tutorial/index.html
This is document akqy in domain all.
Last modified on July 17, 2006.
Please tell us, did you find the answer to your question?