Sys Admin Pocket Survival Guide - OS
Home	Solaris	HP-UX	AIX

Parallel Environment

MPI vs OpenMP

Both are designed to allow programmer to use more CPU by leveraging parallel processing.

Perhaps the biggest difference is CPU vs Memory locality.

OpenMP is newer than MPI, it leverages the newer multi-core CPU and multi-CPU machines. Memory is local to the machine. At the basic level, OpenMP run within a single machine, so it is shared-memory system and so accessible to all "threads" of the program.

MPI is an older API and does NOT use shared-memory. Host kinda expected to have single CPU/core. So, each MPI process ("thread") is independent and don't have access of memory of other threads.

with the above mindset, it is easier to see what MPI and OpenMP audiences are.

MPI

MPI is considered to be a lower level API than OpenMP. To coordinate distributed processing, MPI program need to ship the data to remote node/process (cuz not sharing memory).

The main programming paradigm is scatter-gather. Ship data to remote node, have them run specific process, then collect results from them. So, this often lend to SIMD processing.

To carry our the scatter-gather processing, a diverse number of functions are provided, via the Message Passing approach (which means they are low-level programming constructs):

send / receive operations (think of multiple threads doing client/server send/receive of data, point-to-point rendezavous-type ops).
process sync (barrier)
fn to find out network size, topology, neigbour info (perhaps weather similation need to know what neigbour data is but not those of far off nodes, this topology info is not simple map-reduce), but these fn are also what makes MPI hard (and the fact that it is lower level than a Map-Reduce API).
gather and reduce ( the MapReduce paradigm of Hardoop?)

The constructs diversity means that MPI program isn't really restricted to scather/gather. It can be MIMD, it is up to the programmer on how to utilize the communication API to process data.

Because of the programming model, MPI tends to require its own "mindset" and pretty much program has to be written from the ground up with this MPI mindset.

OpenMP

OpenMP was build with focus on symmetric multiprocessing. Leverage multi-core CPU, multiple CPU on same host, with memory readily available to process. So parallelization is simpler.

use a multi-threading model. different threads can do different things. not limited to SIMD.

The multi-threading model allows programmer to adopt a core piece of the program to use OpenMP gradually, instead of write whole program in MPI style code.

code is #pragma directive that guide compiler to use openMP. Non-OpenMP compiler produces serial code.

gcc 4.2 supports openMP.

at least when running in "single-host" shared memory SMP machine, probably no need to do anything at the sys admin level once program is compiled with proper compiler (and appropriate LD_LIBRARY stuff included).

Cluster OpenMP and other works on distributed memory system.

https://computing.llnl.gov/tutorials/openMP/exercise.html has simple hello_world exercise for OMP.

seems like it will expands automatically as many threads as there are CPU cores within the SMP machine.
example shows with icc, pgcc, and gcc. from 2005!

SGE -sp smp and OpenMP is largely the same. no special daemon needed... so, if on shared-memory system, not sure what diff OpenMP has vs pthreads. perhaps only ability to scale to other nodes when appropriate compiler/library is used?

Further notes on OpenMP:
	* by itself not for shared memory design
	* run w/in a single (SMP) computer
	* not spanning machines by default, so was not told about it in grad school (also v1.0 for C/C++ released in Oct 1998)
	* so really think of it as alternative to pthreads... 
	* in hpc/cluster environment, SGE use of PE called OpenMP has limited things it need to do, since generated code just run.  Mostly, it would sets the correct OMP_NUM_THREADS for the node, and know where to launch the job.
	* a couple of web example of openmp in sge suggest just specify a PE that allows defining how many core to take, eg -pe threads 2-8, then in the qsub script to define OMP_NUM_THREADS=$NSLOTS   (NSLOTS defined in SGE inside the qsub)

sky/code/openMP > cat hello_slac.c

/* http://www.slac.stanford.edu/comp/unix/farm/openmp.html */
#include 
#include 

int main(int argc, char *argv[]) {
  int iam = 0, np = 1;

  #pragma omp parallel default(shared) private(iam, np)
  {
    #if defined (_OPENMP)
      np = omp_get_num_threads();
      iam = omp_get_thread_num();
    #endif
    printf("Hello from thread %d out of %d\n", iam, np);
  }
}



# http://www.dartmouth.edu/~rc/classes/intro_openmp/compile_run.html
# gcc 4.4 and above
cc  -fopenmp hello_llbl.c
gcc -fopenmp hello_llbl.c
icc -openmp  hello_slac.c


export OMP_NUM_THREADS=4
# if not specified, run as many threads as available cpu cores (or hyperthreads if enabled)
# if ask for more threads than avail cores, the OS will sequentialize them.
./a.out

ScaleMP

marketed as vSMP Foundation, scale across multiple physical server rather than depends on shared-memory or NUMA host.

http://www.scalemp.com/products/product-comparison/ there is a vSMP Foundation Free for up to 8 nodes / 1TB shared memory. but essentially a commercial product.

can use IB as interconnect to speed data/memory xfer, even bonding IB in adv version.

but then presumably need to run some sort of daemon process on each nodes. the cheaper licensing model lic is node-locked. so create quite a complex env to use in an batch scheduler env.
This and the cost issue maybe why folks just use MPI ? don't seems to see much ScaleMP or OpenMP, at least not in life science space.

http://www.scalemp.com/industries/lifescience/computational-chemistry/ list Schrodinger Jaguar, DOCK. Glide. Amber. Gaussian. OpenEye Fred, Omega. HMMER. mpiBLAST (but not the GPU blast?). touted for dept w/o dedicated IT.

MPI API Implementation Details

MPICH versions

MPICH 1.x - Original implementation by Argonne Nat Lab and MSU. Schrodinger is compiled with MPICH 1.2
MPICH 2 - Extension of MPICH1.x. Ohio State, mostly on C.
MVAPICH - MPI-1 implementation by Ohio State Univ. Doesn't work with Schrodinger Jaguar ?
OpenMPI - one of the latest implementation, based on MPI-2. It absorbed from the original LAM/MPI. It is quite popular now and widely supported. Infiniband OFED stack is compiled for it (and MVAPICH). SGE toutes OpenMPI as the best thing to use as it is integrated and SGE has full control of it, start/stop it as needed. Ditto, OpenMPI automatically communicate with Torque/PBS Pro and retrieve host and np info. Jobs were to be submitted using mpirun rather than qsub... http://www.open-mpi.org/faq/?category=tm
But many of the older program compiled for MPICH won't work out of the box with OpenMPI.

There are many other implementations, including commercial ones, MATLAB, Java, etc. See: wikipedia MPI Implemenatation

MPICH v1

(See config-backup/sw/mpi/mpich1.test.txt for more info).

Starting MPICH

Environment VARs:
MPI_HOME
MPI_USEP4SSPORT=yes
MPI_P4SSPORT=4644

/etc/hosts.equiv or .rhosts need to be setup, even if using ssh !!  some sys call in MPICH need this for auth.

$MPI_HOME/share/machines.LINUX		# host (+cpu) definition file
					# node1:2 would be a 4 cpu machine, but then indicate shared memory
					# which parallel Jaguar don't support.  Instead, repeat lines per node 
					# for number of CPU, eg :
					# node1
					# node1
					# node2
					# node2

To start a shared daemon as root:
ssh node1 "serv_p4 -o -p 1235 -l /nfs/mpilogs/node1.log"
ssh node2 "serv_p4 -o -p 1235 -l /nfs/mpilogs/node2.log"
# rc script on each node to start up would be good, 
# but centralized script in above form to start/kill would also be useful.
# Alternatively, Schrodinger mpich utility can start serv_p4 correctly
# (without the problem of chp4_sers which results in non-sharable deamons).


For a per-user process, can start/monitor MPICH as:

tools (scripts) in $MPI_HOME/sbin/
chp4_servs -port=4644           # script to start serv_p4 on all nodes, DOESN'T obey  MPI_P4SSPORT (def to 1234)
                                # at some point in the past also used port 1235
chp4_servs -hosts=filename      # use filename to get list of hosts to start serv_p4 (def to machines.LINUX)
chp4_servs -hunt                # list all serv_p4 process on all mpi nodes (on all ports)

chkserv -port 4644              # see which node don't have mpi daemon running
                                # DOESN'T obey  MPI_P4SSPORT (def to 1234)
                                # no output = all good.
                                # (parallel jaguar will trigger it to start anyway)

NOTE: schrodinger has mpich utility to monitor MPICH status also.

Testing MPICH


$MPI_HOME/sbin/tstmachines -v
	# see if daemons are fine.

cat $HOME/.server_apps
	exact path to each binary, should be populated automatically.


cd $MPI_HOME/examples
mpirun -np 16 cpi
	# run pi calculation test on 16 procs.
        # doesn't really start a serv_p4 process, so can't use to test sharing daemon b/w users.

Per-User Environment

(There is no need for this unless the shared root daemon process don't work)
MPICH allows a per-user instance of MPICH daemon rings instead of depending on a shared daemon run by root. This has been tested to work with Parallel Jaguar. To use this, add an environment defining the port you want to use with your set of MPI daemon ring. Your 4 digit phone extension would be a good number to use. It maybe best to add it to your $HOME/.cshrc, like this:

  setenv MPI_P4SSPORT     4644                    #change number to a unique port for yourself

After this, parallel jaguar (or mpirun) jobs should work. If there are problem, check that you have sourced /protos/package/skels/local.cshrc.linux.apps and these variables are defined:

  setenv MPI_HOME /protos/package/linux/mpich	
  setenv PATH "${PATH}:${MPI_HOME}/bin"	
  setenv MPI_USEP4SSPORT yes
  setenv P4_RSHCOMMAND ssh	
  setenv SCHRODINGER_MPI_START yes                  # Parallel Jaguar to start its own MPICH serv_p4 on user defined port

After these setup, can run parallel jaguar job like:
$SCHRODINGER/jaguar run -HOST "vic1 vic2 vic3 vic4" -PROCS 4 piperidine

OpenMPI

mpirun hostname
mpirun -H n0301,n0302 hostname
mpirun --hostfile myHostfile hostname

Example mpi host file

Example mpi host file with cpu info

n0000 slots=16
n0001 slots=16
n0002 slots=16
n0003 slots=16

If FQDN is needed, eg have multiple "domain"/cluster, use this setting:

export OMPI_MCA_orte_keep_fqdn_hostnames=t

ref:

https://docs.oracle.com/cd/E19708-01/821-1319-10/mca-params.html

https://www.open-mpi.org/faq/?category=running

PVM

Ref: http://www.csm.ornl.gov/pvm/

source pvm.env # get PVM_ROOT, etc
pvm # starts monitor, starting pvmd* daemon if needed.

$PVM_ROOT/lib/pvmd pvmhost.conf
# starts PVM daemon on lists specified in the conf file, whereby hosts is listed one per line.
# may want to put it in background. ^C will end everything.
# it uses RSH (ssh if defined correctly) to login to remote host to start process
# Need to ensure ssh login will source the env correctly for pvm/pvmd to run.
# Can be started by any user (what about more than one user??)

kill -SIGTERM can be used to kill the daemon
if use kill -9 (or other non-catchable signal), be sure to clean up /tmp/pvmd.

pvm> commands
ps
conf
halt
exit

To run OpenEye omega/rocs job, the
$PVM_ROOT/bin/$PVM_ARCH dir must have access to the desired binary (eg sym link to omega).
PATH from .login will not be sourced.

run command as:
omega -pvmconf omega.pvmconf -in carboxylic_acids_1--100.smi -out carboxylic_acids_1--100.oeb.gz -log omega_pvm.log

Each user that start pvm will have her own independent instance of pvmd3.
pvm rsh/ssh to remote host to start itself, so ports numbers are likely not going to be static.
It uses UDP for communication.

from lsof -i4 -n

process name / pid uid ...
pvmd3 27808 tinh 7u IPv4 17619158 UDP 10.220.3.20:33430
pvmd3 27808 tinh 8u IPv4 17619159 UDP 10.220.3.20:33431

tin 27808 1 0 14:25 pts/29 00:00:00 /app/pvm/pvm345/lib/LINUX/pvmd3


## omega.pvmconf
## host = req keyword
## hostname, sometime may need to be FQDN, depending on what command "hostname" returns
## n = number of instance of PVM to run
host  phpc-cn01 1
host  phpc-cn02 2
host  phpc-cn03 2

##/home/common/Environments/pvm.env

# csh environment setup for PVM 3.4.5
# currently only available for LINUX64 (LSF cluster)

setenv PVM_ROOT /app/pvm/pvm345

source ${PVM_ROOT}/lib/cshrc.stub

# http://mail.hudat.com/~ken/help/unix/.cshrc
#alias ins2path  'if ("$path:q" !~ *"\!$"* ) set path=( \!$ $path )'
#alias add2path  'if ("$path:q" !~ *"\!$"* ) set path=( $path \!$ )'
##add2path ${PVM_ROOT}/bin

## : has special meaning in cshrc, so need to escape it for it to be taken verbatim
## there is no auto shell conversion between $manpath and $MANPATH as it does for PATH
## csh is convoluted.
setenv MANPATH $MANPATH\:${PVM_ROOT}/man