ARCHIVED: On AVIDD at IU, how do I run a parallel job interactively using LAM?
Note: UITS began retiring the AVIDD system on May 1 as originally scheduled, but portions of the system will remain online until mid-August. UITS retired AVIDD-T, AVIDD-N, and AVIDD-I on May 1, but is delaying the retirement of AVIDD-B and AVIDD-O pending the installation of a replacement system based on Intel's X86-64 technology. If you use AVIDD-B or AVIDD-O, UITS encourages you to migrate to Big Red and/or the TeraGrid; for help, email High Performance Computing. If you cannot migrate, AVIDD-B and AVIDD-O will remain available to you until the new system is ready for use in mid-August.
For instructions on how to compile your program, see ARCHIVED: On AVIDD at IU, how do I compile my program so I can run it as a parallel job using LAM? To run your parallel job as a batch job, see ARCHIVED: On AVIDD at IU, how do I run my MPI program as a parallel batch job using LAM?
To run your parallel job interactively using LAM on the AVIDD system at Indiana University, follow these steps:
- Assign yourself as many nodes as you need from the Portable
Batch System (PBS) for an interactive job. For example,
the following command on the AVIDD head node will give you two nodes
for ten minutes:
[hpc@bh2 hpc]$ qsub -I -l nodes=2:ppn=2 -l walltime=10:00
Note: If you do not have
[hpc@bc01 hpc]$ soft add +lam-gm-intel+lam-gm-intelbefore the@aviddline in your~/.softfile, you'll need to run the following on the compute node: - Use the
lambootcommand to boot up LAM as follows: [hpc@bc01 hpc]$ lamboot $PBS_NODEFILEPoints to note:
- The host where you issue the
lambootcommand must be included in the LAMmachine_file. - By default,
$PBS_NODEFILEhas an-eth1suffix indicating that the Ethernet interface will be used. If you wish to use Myrinet instead of Ethernet, you could create your own machine file containing the node names (which you can get from$PBS_NODEFILE) but with a-myri0suffix. You can then pass the new machine file instead of$PBS_NODEFILEtolamboot: [hpc@bc01 hpc]$ cat $PBS_NODEFILE | sed 's/-eth1//' | sed 's/$/-myri0 cpu=2/' > ~/my_machine_file [hpc@bc01 hpc]$ lamboot ~/my_machine_file
When you boot LAM using
lamboot, a LAM daemon will be started on each node using SSH. For more information, including details about the format ofmachine_file, refer to themanpage forlamboot. - The host where you issue the
- After successfully booting LAM, run your parallel job using
mpirun: [hpc@bc01 hpc]$ mpirun C ~/bin/helloworldsFor more information about
mpirun, refer to itsmanpage. You might also find themanpage forappschemauseful (for MPMD programs). - When you are done, terminate the LAM daemons on each node by
running
lamhalt: [hpc@bc01 hpc]$ lamhaltIf your MPI program happens to crash unexpectedly, use
lamwipeto wipe out any stray orphan processes.
For more information on LAM, refer to the LAM FAQ at:
http://www.lam-mpi.org/faq/This document was developed with support from the National Science Foundation (NSF) under Grant No. 0503697 to the University of Chicago and subcontracted to Indiana University. Additional support was provided by IU through its participation in the TeraGrid, which is supported by the NSF under Grants No. 0833618, SCI451237, SCI535258, and SCI504075. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the NSF.
Last modified on December 18, 2007.






