Documentation download

PDF Version (Gogolist.pdf).

Gogolist Install

You can easily install Gogolist using this command in your terminal :

pip install -i "http://cerpypi.ifremer.fr/simple" gogolist

Previous topic

Requirements

Next topic

Gogolist usage

Quick Start (on Cersat Infrastructure)

Concept

_images/gogolist_concept9.png

Step-by-step Guide

  1. Connect on the submit host

    ssh cerhouse1
  2. Prepare your SSH environment

    • Create your ssh-keys (do not set any passphrase, just type ENTER key):

      ssh-keygen -t rsa -f ~/.ssh/id_rsa_cercloud
    • Add the generated key to you authorized_keys2, to allow connection without any password:

      cat ~/.ssh/id_rsa_cercloud.pub >> ~/.ssh/authorized_keys2
      chmod 600 ~/.ssh/authorized_keys2
    • Add these lines in your ~/.ssh/config (create the file if needed):

      host cerhouse* cercloud* 134.246.156.* br156-* 10.0.0.*
          IdentityFile ~/.ssh/id_rsa_cercloud
          StrictHostKeyChecking no
          UserKnownHostsFile=/dev/null
    • Put the correct file permission if needed:

      chmod 600 ~/.ssh/config
    • To check that all is OK, try to ssh cerhouse1 (from cerhouse1) : it should not ask you any password:

      yourusername@cerhouse1:~> ssh cerhouse1
      Warning: Permanently added 'cerhouse1,134.246.158.137' (RSA) to the list of known hosts.
      ...
      ...
      yourusername@cerhouse1:~>

    That’s OK !

  3. Run a distributed processing and monitor your job

    Let’s say that you want to run a script which just takes a number for argument.

    In the following example, we run the “sleep” command with a number of seconds for argument (it does nothing but wait n seconds before exiting), on the cluster cloudphys, reserving 500mb ram memory for each execution of “sleep NUMBER”:

    yourusername@cerhouse1:~> seq 30 90 | /home5/begmeil/tools/gogolist/bin/gogolist.py \
    --stdin --workspace ./workspace \
    --execute 'sleep' --qsub-options='-l nodes=1:cloudphys,mem=500mb' \
    --split-max-lines=1 \
    --reporting

    That’s all ! If all is running correctly, you should have a reporting every 1 minute showing you the current status:

    Job workspace : ./workspace/20120802/000002
    Job successfully registered in monitor. Go to : http://cercloudweb/jobsmonitor/job/29111/
    Batch Manager : torque    Job id : 54908[].cerhouse1.ifremer.fr
    job name:sleep id:54908[].cerhouse1.ifremer.fr (Q:0 / R:0 / C:0 / E:0 / H:0 / W:0 / X:0) )
    No running jobs. Remaining Jobs to process : 61
    
    (... 60 seconds later...)
    
    job name:sleep id:54908[].cerhouse1.ifremer.fr (Q:0 / R:35 / C:26 / E:0 / H:0 / W:0 / X:0) )
    Remaining Jobs to process (including currently running) : 35 [2012-08-02T17:35:55Z]
      Jobs launched: 61/61 (running: 35  terminated: 26)
      Exit OK = 26 | Exit ERROR = 0 | Lines submitted = 26/61  (42.62%)
        exec time : mean=0:00:42.576923, sum=0:18:27
    
    (... 60 seconds later...)
    
    job name:sleep id:54908[].cerhouse1.ifremer.fr (Q:0 / R:0 / C:61 / E:0 / H:0 / W:0 / X:0) )
    job completed (id=54908[].cerhouse1.ifremer.fr)
    workspace: ./workspace/20120802/000002
    Remaining Jobs to process (including currently running) : 0 [2012-08-02T17:36:55Z]
      Jobs launched: 61/61 (running: 0  terminated: 61)
      Exit OK = 61 | Exit ERROR = 0 | Lines submitted = 61/61  (100.00%)
        exec time : mean=0:01:00.098360, sum=1:01:06
    Job workspace: ./workspace/20120802/000002
    Job is TERMINATED
  4. Notes

    • While the job is running, you can also monitor it using the Jobsmonitor web interface given in the output. Here : http://cercloudweb/jobsmonitor/job/29111/

      _images/jobsmonitor19.png
    • Once launched, you can interrupt the reporting (CTRL-C) : the job will continue to run. To restart the reporting, just run:

      /home5/begmeil/tools/gogolist/bin/gogolist.py report \
      --loop-time=10 \
      ./workspace/20120802/000002