Documentation download

PDF Version (Gogolist.pdf).

Gogolist Install

You can easily install Gogolist using this command in your terminal :

pip install -i "http://cerpypi.ifremer.fr/simple" gogolist

Previous topic

Quick Start (on Cersat Infrastructure)

Gogolist usage

To get help, use the gogolist “–help” option:

user@hostname:~$ gogolist.py  --help
Usage: gogolist.py [options]

Options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit
  -w FOLDER, --workspace=FOLDER
                        path to workspace directory [REQUIRED]
  --stdin               use stdin to get listing to process (instead of using
                        args). See also --split-max-lines
  -e EXECUTE_CMD, --execute=EXECUTE_CMD
                        command to execute [REQUIRED]
  --streaming           execute command giving listing with a pipe on stdin
  --qsub-options=QSUB_OPTIONS
                        options we will give to qsub
  --qsub-max-running-jobs=QSUB_MAX_RUNNING_JOBS
                        nbr jobs running at same time for qsub jobarrays
                        (modulo option in jobarray)
  --batch-manager=BATCH_MANAGER
                        batch manager : torque | pbspro | local
                        [default=torque]
  --split-max-lines=SPLIT_MAX_LINES
                        max lines for internal listing [required when using
                        --stdin].  (note : '0' will generate a temporary
                        complete listing)
  --split-max-jobs=SPLIT_MAX_JOBS
                        max jobs to run (used to get the number of lines for
                        internal listing)
  -v, --verbose         activate verbose output
  -d, --debug           run in debug mode
  --dry-run             prepare workspace and listings, but do not execute the
                        command (=> no log, no report)
  --reporting           prepare workspace and listings, but do not execute the
                        command (=> no log, no report)
  --no-register         do not register job in monitor [url=http://cercloudweb
                        /jobsmonitor/api/v1/gogolistjob/]
  -c CONFIG_FILE, --config=CONFIG_FILE
                        config file (json syntax)

Basically, to launch a task with gogolist, gogolist requires the following informations:

  • an executable
  • a listing (each line is an argument line for the executable)

Gogolist workspace

For each execution gogolist creates a workspace folder in which will be set:

  • the job configuration
  • the input listings
  • the error listings
  • the reporting and monitoring informations
  • the executable output logs

Using “-w” (or “–workspace”) option, the user specifies the root of the workspaces (called [UserRootWorkspaceDir] in this doc). Gogolist then creates a hierarchical structure based on the current date and a execution id. For example, for the third execution on 2012/08/02, the following structure will be created

[UserRootWorkspaceDir]/20120802/000003/{report,input,monitor,logs,output}

Note

  • To be functionnal, the Jobsmonitor tool must be able to reach the workspace (using a nfs shared folder is a common solution).
  • For some jobs, the workspace can be heavily requested in I/O. In these cases, a efficient disk space is recommended (relying on Raid for example).

User listing management

1. How to submit a listing to Gogolist ?

The user listing can be send to Gogolist in two different ways :

  1. the listing file as Gogolist argument (default mode)

    gogolist.py [options] /path/to/userlisting.txt
    
  2. using Pipe  (“–stdin” option)

    cat /path/to/userlisting.txt | gogolist.py --stdin [options]
    

    Tip

    This is a really pratical way to provide a listing to Gogolist, without using intermediate files. Here are some examples:

    • If you want to launch the process on the first five lines of your listing:
    cat /path/to/userlisting.txt | head -n5 | gogolist --stdin ...
    
    • You can create your listing dynamically and send it to gogolist :
    ~user/listing_generator.sh | gogolist --stdin ...
    find /path/to/data -type f | gogolist --stdin ...
    
    • You can easily relaunch the job listing-lines which failed :
    cat [UserRootWorkspaceDir]/YYYYMMDD/XXXXXX/output/listing.err* | gogolist --stdin
    

2. User-listing split

Gogolist has to split the user-listing to allow the job parallelization. The following options define how the user listing will be split. All those sub-listings will then be processed in parallel.

Note

The number of sub-listings is the maximum number of tasks which can run in parallel.

  • “–split-max-lines” : the maximum number of lines for each sub-listing gogolist will create.

    Example: For an input listing of 100 lines, –split-max-lines=5 will create 20 sub-listings

  • “-split-max-jobs” : the maximum number of sub-listings gogolist will create.

3. Job submission to the batch scheduler

Gogolist can connect to several batch schedulers. To choose the one you have, use the “–batch-manager” option.

  • torque/maui (uses qsub command)
  • pbspro (uses qsub command)
  • oar (uses oarsub command)
  • local : uses a gogolist internal scheduler, based on the multiprocessing python module (limited options)

You may have to give some specific options to the scheduler, using the “–qsub-options”. Usual needs are to set the cluster name and the resource reservations. Example:

--qsub-options="-l nodes=1:cercloudcluster,mem=2gb"

4. Sub-listings processing

In “default” mode, sequential processing is used for each sub-listing.

myexecutable sublisting_lineX

In “streaming” mode, the sub-listing is send to the executable using the unix pipeline “|”.

cat /path/to/sublistingY.txt | myexecutable

In “streaming-in-exec”, The executable is launched and the sub-listing path is provided to the executable as an argument.

myexecutable /path/to/sublistingY.txt

Tip

The streaming modes can be really effective if launching n executables for a single line is much longer than launching a single executable for n lines. Reasons can be :

  • Initialization time of the executable
  • Post treatments
  • Network connections sessions/initialization
  • ...

Monitoring tools

Gogolist provides some monitoring tools which allows you to see the processing progress and have some reporting on it.

In order to activate the progress report, use the option “–reporting”.

By default Gogolist records each new Job in a web based monitoring tool, to disable this use the option “–no-register”.

Configuration file

Thr configuration file is optionnal. Its aims is to overwrite some default parameters:

Gogolist looks for the configuration file in this order:

  1. envvar
  2. homedir

to be continued...