Multiple serial jobs

From HP-SEE Wiki

Revision as of 11:12, 17 April 2012 by Gurov (Talk | contribs)
Jump to: navigation, search

High Throughput Computing on BlueGene/P - BG

Section contributed by IICT-BAS


The whole IBM Blue Gene/P machine is split into partitions, so that each partition has I/O node. Each partition can be run in two modes. The more natural one is the HPC mode, where parallel jobs take the whole partition. The other mode is called High Throughput Computing (HTC), where one partition of the machine can be allocated for many independent single-node jobs. This type of operation is more typical for Grid computing. On the Blue Gene/P in Sofia one partition is running in HTC mode in order to allow configuration scripts to be executed and return output immediately. HTC mode can be used for long-running jobs that require many tasks. This can mean the difference between having to start from scratch and just being able to proceed ahead on the remaining nodes. Code that runs on the compute nodes is much cleaner, since it only contains the work to be performed, and leaves the coordination to a script or scheduler. In our case HTC provides 128 computing nodes/CPUs, each CPU with 4 cores. In principle it is possible to make BlueGene/P work like a cluster using HTC. However, this is not the most typical use case for the Blue Gene and is not the most optimal way of using machine's capabilities.

High Throughput Computing on BlueGene/P - UVT

Section contributed by IICT-BAS

IBM BlueGene/P Supercomputer supports, beside the traditional MPI processing with inter-process communication requirements, a special processing paradigm that allows multiple instances of the same application to execute on different set of data without the need of having communication. This paradigm is called HTC, high throughput computing. The design of HTC is simply as it focuses on running a large number of short jobs, normally each job runs the same code but with a different set of input data. In HTC mode a BG/P compute node can handle one, two or four different jobs depending on the execution mode (SMP, dual or virtual node) the partition was booted in.

There is no specific modification to be done on the source code to support HTC, the code is compiled the same as on HPC mode, without MPI support and applying the optimization support for the specific application in use.

The HTC job is submitted and handled by the submit tool, similar with mpirun tool, the only difference is that in case of HTC jobs the compute nodes partition must be prior booted up and after the job is finished the partition must be manually destroyed. Submit tool also acts like a proxy for stdout, stdin and stderr and each job can be configured to have its own standard input/output/error redirection.

HTC jobs doesn’t require any special code modification but it offers some code subroutines for checking the type of job execution environment using BG/P personality:

 #include <unistd.h>
 #include <common/bgp_personality.h>
 #include <common/bgp_personality_inlines.h>
 #include <spi/kernel_interface.h>
 int
 main()
 {
    _BGP_Personality_t pers;
    if (Kernel_GetPersonality(&pers, sizeof(pers)) == -1) {
      fprintf(stderr, "could not get personality\n");
     exit(EXIT_FAILURE);
    }
    if (pers.Kernel_Config.NodeConfig & _BGP_PERS_ENABLE_HighThroughput) {
 // HTC Job code
    } else {
             // non-HTC Job code
    }
 }

Parametric studies on HPC clusters

Section to be contributed by IMBB