Roy and Niels

Roy and Niels

Sunday, January 22, 2012

The Quick and Dirty Guide for Parallelizing FLUKA

(Single PC version)

Imagine you got a desktop or laptop PC with 4 or perhaps even 8 CPU cores available, and you want to run the Monte Carlo particle transport program  FLUKA on it using all CPU cores.
The FLUKA execution script rfluka however was designed to run in "serial" mode. That is, if you request to repeat your simulation a lot of times (say, 100) issuing the command rfluka -N0 -M100 example, each process is launched serially, instead of utilizing all available cores on your PC.

A solution can be to use a job queuing system and a scheduler. Here, I'll present one way to do it on a Debian based Linux system. Ubuntu might work just as well, since Ubuntu is very similar to Debian. A feature of the method presented here, is that it can easily be extended to cover several PCs on your network, so you can use the computing power of your colleagues when they do not use their PCs (e.g. at night). However, this post will try to make it very simple, namely set it just on your own PC. In less than 10 minutes you'll have it up and running...

The idea is to use TORQUE in a very minimal configuration. There will be no fuzz with Maui or similar schedulers, we will only use packages we can get from the Debian/Ubuntu software repositories.
In order to be friendly to all the Ubuntu users out there, all commands issued as root are here prefixed with the "sudo" command. As a Debian user you can become root using the "su" command first.

First install these packages:

$ sudo apt-get install torque-server torque-scheduler 
$ sudo apt-get install torque-common torque-mom libtorque2
and either
$ sudo apt-get install torque-client
or
$ sudo apt-get install torque-client-x11

after installation we need to setup torque properly. I here assume that your PC hostname cannot be resolved by DNS, which is quite common on small local networks. You can test whether your hostname can be resolved by the "host" command. Assuming your PC has the name "kepler", you may get an answer like:

$ host $HOSTNAME
Host kepler not found: 3(NXDOMAIN)

this means you may need to edit the /etc/hosts file, so your PC can associate an IP number with your hostname. Debian like distros may have a propensity to assign the hostname to 127.0.1.1 which will not work with torque. Instead I looked up my IP number (which in my case is pretty static) using /sbin/ifconfig, and edited the /etc/hosts accordingly, using your favourite text editor (emacs, gedit, vi...)
My /etc/hosts file ended up looking like this:

127.0.0.1 localhost
#127.0.1.1 kepler.lan kepler
192.168.1.108   kepler

If your hostname of your PC can be resolved, you can ommit the last line, but under all circumstances you must comment out the line starting with 127.0.1.1.


Once this is done, execute the following commands to configure torque:
$ sudo echo $HOSTNAME > /etc/torque/server_name
$ sudo echo $HOSTNAME > /var/spool/torque/server_name
$ sudo pbs_server -t create
$ sudo echo $HOSTNAME np=`grep proc /proc/cpuinfo | wc -l` > /var/spool/torque/server_priv/nodes 
$ sudo qterm
$ sudo pbs_server
$ sudo pbs_mom

(Update: If qterm fails, you probably have a problem with your /etc/hosts file. You can still kill the server with $killall -r "pbs_*".)

Now let's  see if things are running as expected:
$ pbsnodes -a
kepler
     state = free
     np = 4
     ntype = cluster
     status = rectime=1326926041,varattr=,jobs=,state=free,netload=3304768553,gres=,loadave=0.09,ncpus=4,physmem=3988892kb,availmem=6643852kb,totmem=7876584kb,idletime=2518,nusers=2,nsessions=8,sessions=1183 1760 2170 2271 2513 15794 16067 16607,uname=Linux kepler 3.1.0-1-amd64 #1 SMP Tue Jan 10 05:01:58 UTC 2012 x86_64,opsys=linux

and also
$sudo momctl -d 0 -h $HOSTNAME

Host: kepler/kepler   Version: 2.4.16   PID: 16835
Server[0]: kepler (192.168.1.108:15001)
  Last Msg From Server:   279 seconds (CLUSTER_ADDRS)
  Last Msg To Server:     9 seconds
HomeDirectory:          /var/spool/torque/mom_priv
MOM active:             280 seconds
LogLevel:               0 (use SIGUSR1/SIGUSR2 to adjust)
NOTE:  no local jobs detected

Now setup a queue, which here is called "batch".
$ sudo qmgr -c 'create queue batch'
$ sudo qmgr -c 'set queue batch queue_type = Execution'
$ sudo qmgr -c 'set queue batch resources_default.nodes = 1'
$ sudo qmgr -c 'set queue batch resources_default.walltime = 01:00:00'
$ sudo qmgr -c 'set queue batch enabled = True'
$ sudo qmgr -c 'set queue batch started = True'
$ sudo qmgr -c 'set server default_queue = batch'
$ sudo qmgr -c 'set server scheduling = True'

[update: you may want to increase walltime to 10:00:00 so jobs dont stop after 1 hour]

and start the scheduler:
$ sudo pbs_sched

The rest of the commands can be issued as a normal user (i.e. non-root).

Let's see if all servers are running:
$ ps -e | grep pbs
 1286 ?        00:00:00 pbs_mom
 1293 ?        00:00:00 pbs_server
 2174 ?        00:00:00 pbs_sched

Anything in the queue?
$ qstat
$ 
Nope, it's empty.

Lets try to submit a simple job
echo "sleep 20" | qsub

and within the next 20 seconds you can test, if its in the queue:
$ qstat
Job id                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
0.kepler                 STDIN            bassler                0 R batch


Great, now were ready to rock 'n roll! This is really a minimalistic setup, which just works. For more bells and whistles, check the torque manual.

All we need, is a simple FLUKA job submission script: rtfluka.sh
#!/bin/bash
#
# how to use this
# change to directory with the files you want to run
# and enter:
# $ qsub -V -t 0-9 -d . rtfluka.sh
#
#PBS -N FLUKA_JOB
#
start="$PBS_ARRAYID"
let stop="$start+1"
stop_pad=`printf "%03i\n" $stop`
#
# Init new random number sequence for each calculation. 
# This may be a poor solution.
cp $FLUPRO/random.dat ranexample$stop_pad
sed -i '/RANDOMIZE        1.0/c\RANDOMIZE        1.0 '"${RANDOM}"'.0 \' example.inp
$FLUPRO/flutil/rfluka -N$start -M$stop example -e flukadpm3

Update: Note that your RANDOMIZE card in your own .inp file must match the sed regular expression above, else you may repeat the exact same simulation over and over again...


Let's submit 10 jobs:
$ qsub -V -t 0-9 -d . rtfluka.sh

And watch the blinkenlichts.
$ qstat
Job id                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
15-0.kepler               FLUKA_JOB-0      bassler                0 R batch          
15-1.kepler               FLUKA_JOB-1      bassler                0 R batch          
15-2.kepler               FLUKA_JOB-2      bassler                0 R batch          
15-3.kepler               FLUKA_JOB-3      bassler                0 R batch          
15-4.kepler               FLUKA_JOB-4      bassler                0 Q batch          
15-5.kepler               FLUKA_JOB-5      bassler                0 Q batch          
15-6.kepler               FLUKA_JOB-6      bassler                0 Q batch          
15-7.kepler               FLUKA_JOB-7      bassler                0 Q batch          
15-8.kepler               FLUKA_JOB-8      bassler                0 Q batch          
15-9.kepler               FLUKA_JOB-9      bassler                0 Q batch 

Surely, this can be improved a lot, suggestions are most welcome in the comments below. One problem is for instance, that the random number seed is limited to a 16 bit integer, which only covers a very small fraction of the possible seeds for the RANDOMIZE card.
Update: There is also a very small risk that the same seed occasionally is used twice (or more often). Alternatively one could just add a random number to a starting seed after each run. (Any MC random number experts out there?)

Output data can be processed in regular ways, using flair
Alternatively you may use some of the scripts in the auflukatools package, which for instance can do the merging of USRBIN output with a single command. Auflukatools also includes rtfluka.sh as well as a CONDOR job submission script rcfluka.py, which is better suited for heterogenous clusters.

Finally, here is a job script for SHIELD_HITxxA, (which is even shorter):

#!/bin/bash
#
# how to use
# change to directory you want to run
# $ qsub -V -t 0-9 -d . rtshield.sh
#
#PBS -N SHIELD_JOB
shield_exe  -N$PBS_ARRAYID

Enjoy!

Totally unrelated: englishrussia.com just posted some nice pics from the Budker institute for Nuclear Physics in Novosibirsk, Russia. Certainly worth visiting, have a look at:
http://englishrussia.com/2012/01/21/the-budker-institute-of-nuclear-physics/
 :-) Heaps of pioneering accelerator technology was developed there, such as electron cooling, the first collider, lithium lenses (e.g. for capturing antiprotons), and they supplied the conventional magnets for the beam transfer lines to the LHC at CERN. I visited the center many years ago but my pics are not as good. :-/ The German wiki about Budker himself, is also worth reading.


12 comments:

  1. This is really fun! Now, I just have to find some more CPUs ...

    ReplyDelete
    Replies
    1. That will be covered in a future post... :)

      Delete
  2. Have you ever tried to run your FLUKA processes as map tasks in a Hadoop cluster? without knowing anything about it, it would seem that the Hadoop scheduling will take care of all of that for you and the merging/reduce step as well.

    ReplyDelete
    Replies
    1. Thanks for your comment. No, I have never messed with Hadoop, but I think Roy might have (you reading this, Roy?).

      So far I liked the pure torque idea, since all packages are available in the Debian repository. They are pretty good preconfigured, so you don't have to mess with generic configuration scripts and worry how to have a seamless system integration. Updates flow in automatically from Debian, again less trouble for me, since I do not have to get and install new tarballs myself.
      Finally, data merging isn't really an issue, due to flair and auflukatools.

      However, if Hadoop installation and configuration is more simple than what I mentioned in this post, or simplify other parts, let me know...

      Delete
    2. We messed around some with Fluka + Hadoop. The general mapreduce work flow doesn't seem ideal for this, but is certainly possible. Our collaborators used a tree-based overlay network (TBON), which is like a generalized mapreduce to run Fluka jobs for us on our local cluster. Of course this was more of a proof of principle test.

      We messed around with Hadoop some, but we ended up using our in house framework to run our cloud calculations. Hadoop is Java-based, which I'm not a fan of. I did briefly look at some other mapreduce frameworks, such as dumbo (python-based).

      Overall doing Monte Carlo with mapreduce sort of feels like forcing a square peg into a round hole. Monte Carlo already consists of n independent runs (n being an arbitrary number), making the whole mapping / partitioning part of mapreduce frameworks an awkward fit.

      Delete
  3. You can download the qfluka.sh form the flair website (download section). It works for torque you simply have to put the correct queue name. Add it to the flair queue system. "Tools->Preferences->Submit->right-click->Add"
    and then simply "Spawn" your input in the "Run" Frame to as many CPU's you have select the submit script and click run.
    Flair will automatically do everything for you as you describe in your page. Put a unique random number and spawn many processes, as well collect and process your results.

    ReplyDelete
    Replies
    1. Very good, thanks for pointing this out, Vasilis!

      Delete
  4. Upon doing "echo 'sleep 20' | qsub", I kept getting error like:
    $ qsub: Bad UID for job execution MSG=ruserok failed validating xxxx/xxxx from xxxx-XXXX

    Google helped me to solve the question by adding the following qmgr:
    $ sudo qmgr -c 'set server allow_node_submit = True'

    ReplyDelete
    Replies
    1. Thanks Xijun, I had the same error, but that fixed it

      Thank you Roy for the handy guide. I wanted a local version of Torque to practice writing qsubs on, but local scheduling will be useful

      Delete