This is old DOUG wiki, please go to http://www.dougdevel.org/trac/wiki

Profiling

From DOUGWiki

Jump to: navigation, search

This page contains description of DOUG profiling methods and issues

Here is list of some tools http://www.llnl.gov/computing/tutorials/performance_tools/

Contents

Free

gprof

Works, if specify GMON_OUT_PREFIX environment variable.

Try to supply -p or -pg flags to compiler

  • gprof for MPI hint

Here is what should be added/changed in PBS script

  • Add GMON_OUT_PREFIX environment variable
  • pass it to mpirun with -x (in case of LAM-MPI)
...
export GMON_OUT_PREFIX=dougmon
...
mpirun -x GMON_OUT_PREFIX -np $NP /home/olegus/doug/doug_trunk/src/main/doug_main
...

Run gprof to see results

gprof /home/olegus/doug/doug_trunk/src/main/doug_main $GMON_OUT_PREFIX.*

OCM (OMIS Complient Monitoring system)

Not tried on DOUG, because OCM source was not found and OCM-G adds complication (Globus dependency)

OCM project goal is to write PVM and MPI(MPICH) monitoring system with monitoring, profiling and checkpoint capabilities. It seems like project is sacrifised in favor of Grid extension, namely OCM-G. At least I could not find source or binaries to download.

OCM-G is extension to OCM for Grid (Globus).

mpiP

mpiP: Lightweight, Scalable MPI Profiling

Homepage: http://mpip.sourceforge.net/

Download it to some folder.

Configure mpiP on kuu:

  • Add /usr/local/mpi/ifort/lam/bin to your path
  • Run configure
  PATH=/usr/local/mpi/ifort/lam/bin:$PATH
  
  ./configure --with-include="-I/usr/include -I/usr/local/mpi/ifort64/lam/include" \
    --with-ldflags="-L/usr/lib64 -L/usr/local/mpi/lib64 -L/usr/lib \
    -L/usr/local/mpi/ifort64/lam/lib" --with-f77=/opt/intel/fce/9.1.037/bin/ifort \
    FFLAGS="-g -O2"

Now configure DOUG (link it against the profiling libraries).

  export LD_LIBRARY_PATH=/opt/intel/fce/9.1.037/lib
   
  ./configure --with-mpi=/usr/local/mpi/ifort64/lam CPPFLAGS=-I/usr/local/mpi/include \
      LDFLAGS="-L/usr/local/mpi/lib64 -L/home/username/mpiP-3.1.0/lib \
      -lmpiP -lbfd -liberty -lm -lrt" --disable-shared FCFLAGS=-g 

Note: -lrt is added because otherwise clock_gettime() is not found.

Compile DOUG:

  make

Now run DOUG as usual:

  mpirun -np 1 /home/username/doug/src/main/doug_main 

At the beginning and at the end of the output mpiP adds some information. Profiling information is printed into *.mpiP file in current directory.

If you are unable to run DOUG due to some missing libraries (libimf.so, libirc.so), then it's because they're not found on other nodes. On kuu they're in /opt/intel/fce/9.1.037/lib as the LD_LIBRARY_PATH was set. You can copy them to your home folder (/home/username/lib) and set the LD_LIBRARY_PATH in your .bash_profile file:

  export LD_LIBRARY_PATH=/home/username/lib

If you are using more than 1 process with mpirun while runnig DOUG that has been linked against these profiling libraries, then you may get a segmentation fault similar to the following.

  forrtl: severe (174): SIGSEGV, segmentation fault occurred
  Image              PC                Routine            Line        Source
  doug_main          0000000000432148  Unknown               Unknown  Unknown
  doug_main          00000000004C4BB3  Unknown               Unknown  Unknown
  doug_main          000000000040775D  Unknown               Unknown  Unknown
  doug_main          00000000004080EF  Unknown               Unknown  Unknown
  doug_main          0000000000406BEA  Unknown               Unknown  Unknown
  libc.so.6          000000312AD1C3FB  Unknown               Unknown  Unknown
  doug_main          0000000000406B2A  Unknown               Unknown  Unknown 

Here is an example of mpiP log file of doug_main with only 1 process.

MPE (MPI Parallel Environment) and Jumpshot-4

http://www-unix.mcs.anl.gov/perfvis/

MPE is a profiling and tracing library that is a part of MPICH implementation. It also works with other MPI implementations and you can download MPE separately too. Jumpshot-4 is a visualiser for the log files. But easiest way to use MPE is to use MPICH (or MPICH2 with MPE2).

understanding linking

This section describes how MPE profiling code is inserted and may help understanding why profiling does not work.

When linking libraries to a final executable it is important the order they are written on the command line. The former libraries make use of the latter libraries on the command line. In the following example lib1 routines use lib2 routines, not the opposite.

gcc -o myexe my.o my2.o -llib1 -llib2

To profile we need the following order (LAM-MPI example)

gcc -o myexe my.o mylib.a -llamf77mpi -llmpe -lmpe -lmpi ...

The important part is that lmpe and mpe must be after mylib.a and lamf77mpi, but before mpi. This is because FORTRAN MPI wrapper routines are used by the program, which in turn use MPI routines from profiling libraries and which use actual MPI (PMPI_*) routines from MPI implementation (Program -> FORTRAN wrappers -> profiling wrappers -> actual MPI routines). If you place profiling libraries after mpi library or before FORTRAN wrapper library then FORTRAN wrapper routines will use actual MPI routines, so the profiling routines will not be in use.

Here what happens if you use mpif77 (LAM-MPI on kuu)

[olegus@kuu assembled]$ mpif77 -show -o myexe my.o mylib.a -llmpe -lmpe
/opt/intel/fce/9.1.037/bin/ifort -I/usr/local/mpi/ifort64/lam/include -pthread 
 -o myexe my.o mylib.a -llmpe -lmpe -L/usr/local/mpi/ifort64/lam/lib 
 -llammpio -llamf77mpi -lmpi -llam -laio -laio -lutil -ldl

The profiling libraries are inserted before lamf77mpi (before all LAM-MPI libraries) and profiling will not be working. One possible solution is to specify lamf77mpi on the command line explicitly before profiling libraries.

[olegus@kuu assembled]$ mpif77 -show -o myexe my.o mylib.a -llamf77mpi -llmpe -lmpe
/opt/intel/fce/9.1.037/bin/ifort -I/usr/local/mpi/ifort64/lam/include -pthread 
 -o myexe my.o mylib.a -llamf77mpi -llmpe -lmpe -L/usr/local/mpi/ifort64/lam/lib 
 -llammpio -llamf77mpi -lmpi -llam -laio -laio -lutil -ldl

kuu + lam-ifort

It suffices to add LDFLAGS=-llamf77mpi -llmpe -lmpe to the configure command line. When running doug_main and doug_aggr CLOG2 file appears in local directory.

It may be much simpler to (i) delete doug_aggr, (ii) run make in src/main, (iii) copy link command to the shell input and edit it as described in the previous section (add -llmpe -lmpe to the correct place) and (iv) run it (link doug_aggr from command line)

kuu + mpich

For example in kuu cluster: install and build MPICH2 with MPE routines:

  ./configure --enable-mpe --prefix=/home/<username>/<mpich2_installation_folder>

Add MPICH2 to your PATH and library path in .bash_profile:

  export PATH=/home/<username>/<mpich2_installation_folder>/bin:$PATH
  export LD_LIBRARY_PATH=/home/<username>/<mpich2_installation_folder>/lib:$LD_LIBRARY_PATH

Create file mpd.hosts into home folder with following contents:

  kuu:2
  node2:2
  node3:2
  node4:2
  node5:2
  node6:2
  node7:2
  node8:2

Link DOUG against MPE profiling/tracing libraries:

  ./configure --with-mpi=/home/<usrname>/<mpich2_installation_folder> \
  CPPFLAGS=-I/usr/local/mpi/include \
  LDFLAGS="-L/usr/local/mpi/lib64 \
  -L/home/<username>/<mpich2_installation_folder>/lib -lfmpich -ltmpe" \
  --disable-shared FCFLAGS=-g

Modify job submission script:

  #!/bin/sh
  ### Job name
  #PBS -N DOUG_MPE
  #PBS -l nodes=8:ppn=2
  NP=16
  ### Run the application
  mpdboot -n 8
  cd $PBS_O_WORKDIR
  mpirun -np $NP /home/<username>/DOUG/src/main/doug_main
  mpdallexit
  exit 0

When the job is finished, there will be a log file with default name "Unknown.clog2" in the directory where the job was launched. In order to analyze it with Jumpshot-4, it has to be converted to SLOG-2 format:

  clog2TOslog2 Unknown.clog2 

Now launch Jumpshot-4 and open the corresponding slog2-file.

Proprietary

ITA (fka Vampir)

not tried, no licence

Personal tools
development (restricted)