System software, middleware and programming environments

From HP-SEE Wiki

(Difference between revisions)
Jump to: navigation, search
(Local Resource Management System: Puppet details added)
(JSV script)
Line 441: Line 441:
#  
#  
# Author: NIIF Institute, http://www.niif.hu/en
# Author: NIIF Institute, http://www.niif.hu/en
-
# Date: 2012-10-14
+
# Date: 2012-12-14
#
#
# Filename: gpu_jsv.sh
# Filename: gpu_jsv.sh
Line 460: Line 460:
         jsv_add_env GPU_NUMBER $gpu
         jsv_add_env GPU_NUMBER $gpu
     else
     else
 +
        jsv_sub_add_param l_hard gpu 1
         jsv_add_env GPU_NUMBER 1
         jsv_add_env GPU_NUMBER 1
     fi
     fi

Revision as of 07:15, 14 December 2012

Contents

System software

Local Resource Management System

Open Grid Scheduler

Home page: http://gridscheduler.sourceforge.net

This is an open source fork of Sun Grid Engine.

cpuset integration on SGI UV 1000 machine

SGI UV 1000: http://www.sgi.com/products/servers/uv/specs.html

We are recommending this OGS version for cpuset integration: OGS/GE 2011.11p1

Browser URL: http://gridscheduler.svn.sourceforge.net/viewvc/gridscheduler/tags/GE2011.11p1/

Details about SVN access: http://sourceforge.net/scm/?type=svn&group_id=343697

Wrapper script for sge_shepherd

Please check these variables: START_CPU, CPU_PER_SLOT, SGE_CPUSET, SGE_ADMIN.

#!/bin/bash
# 
# Author: NIIF Institute, http://www.niif.hu/en
# Date: 2012-10-14
#
# Filename: sge_shepherd_cpuset_wrapper.sh
#
# 0-5 CPU cores (boot cpuset)
# 6-11 CPU cores (reserve cpuset)
# 12- CPU cores (SGE cpuset)
#

START_CPU=12
CPU_PER_SLOT=6
SGE_CPUSET="/dev/cpuset/sge"
SGE_ADMIN="ogs-amin@example.com"

while IFS== read key val
do
    case "${key}" in
        JOB_ID) JOB_ID="${val}";;
        SGE_TASK_ID) SGE_TASK_ID="${val}";;
     esac
done <environment

if [ "${SGE_TASK_ID}" != "undefined" ]
then
   sge_id=${JOB_ID}-${SGE_TASK_ID}
else
   sge_id=${JOB_ID}
fi

if [ "`cat ${PWD}/pe_hostfile | cut -d " " -f 4`" == "UNDEFINED" ]
then
  mail -s "ERROR!!! There is no cpuset allocation by this job: ${sge_id}" $SGE_ADMIN
  exec /usr/bin/sge_shepherd $@
  exit 0
fi

SGE_BINDING=`cat ${PWD}/pe_hostfile | cut -d " " -f 4 | sed 's/:/\n/g;s/,/ /g' | awk -v cpu_per_slot=${CPU_PER_SLOT} -v start_cpu=${START_CPU} '{print start_cpu + ($1 * cpu_per_slot) + $2 }' | awk '{printf "%d ", $1}'`

function get_nodes() {
  for cpu_id in $1
  do
    nodes="${nodes} `expr ${cpu_id} / ${CPU_PER_SLOT}`"
  done 
  echo `echo ${nodes} | sed 's/ /\n/g' | sort | uniq | sed 's/\n/ /g'`
}

if [ ! -d ${SGE_CPUSET}/${sge_id} ]
then
  mkdir ${SGE_CPUSET}/${sge_id}
fi

NODE_BINDING=`get_nodes "${SGE_BINDING}"`

cpus=`echo ${SGE_BINDING} | sed "s/ /,/g"`
echo ${cpus} > ${SGE_CPUSET}/${sge_id}/cpus
nodes=`echo ${NODE_BINDING} | sed "s/ /,/g"`
echo ${nodes} > ${SGE_CPUSET}/${sge_id}/mems

echo 1 > ${SGE_CPUSET}/${sge_id}/notify_on_release 
echo $$ > ${SGE_CPUSET}/${sge_id}/tasks 

export SGE_BINDING NODE_BINDING

exec /usr/bin/sge_shepherd $@

JSV script

Please check these variables: cpu_per_slot, h_vmem

#!/bin/bash
# 
# Author: NIIF Institute, http://www.niif.hu/en
# Date: 2012-10-14
#
# Filename: cpuset_jsv.sh
#

jsv_on_start()
{
   return
}

jsv_on_verify()
{
  cpu_per_slot="6"
  slots=$cpu_per_slot
  serial_job="false"

  if [ "`jsv_get_param pe_name`" != "" ]; then
     pe_min=`jsv_get_param pe_min`
     pe_min_remainder=`expr $pe_min % $cpu_per_slot`
     pe_min_int=`expr $pe_min / $cpu_per_slot`
     pe_max=`jsv_get_param pe_max`
     pe_max_remainder=`expr $pe_max % $cpu_per_slot`
     pe_max_int=`expr $pe_max / $cpu_per_slot`
     if [ "$pe_max" == "9999999" ]; then
         # pe_max will be allways equal with pe_min
         if [ "$pe_min_remainder" != "0" ]; then
            pe_min=`expr $pe_min_int \* $cpu_per_slot + $cpu_per_slot`
            jsv_set_param pe_min $pe_min
         fi
         jsv_set_param pe_max $pe_min
         slots=$pe_min
     else
         if [ "$pe_max_remainder" != "0" ]; then
            pe_max=`expr $pe_max_int \* $cpu_per_slot + $cpu_per_slot`
            jsv_set_param pe_max $pe_max
         fi
         jsv_set_param pe_min $pe_max
         slots=$pe_max
     fi
     jsv_set_param binding_amount $slots
  else
       jsv_set_param binding_amount $cpu_per_slot
       jsv_set_param pe_name "serial"
       jsv_set_param pe_min $cpu_per_slot
       jsv_set_param pe_max $cpu_per_slot
       serial_job="true"
  fi

  if [ `jsv_is_param t_max` != false ]; then
     if [ "`jsv_get_param pe_name`" == "serial" ]; then
       jsv_set_param pe_name "array"
     fi
  fi

  jsv_set_param binding_strategy "linear_automatic"
  jsv_set_param binding_type "pe"

  l_hard=`jsv_get_param l_hard`
  if [ "$l_hard" != "" ]; then
    has_h_vmem=`jsv_sub_is_param l_hard h_vmem`
    if [ "$has_h_vmem" = "true" ]; then
       jsv_set_param l_hard "h_vmem=5.2G"
    fi
  else
    jsv_sub_add_param l_hard "h_vmem=5.2G"
  fi 

  jsv_correct "Job was modified before it was accepted: CPU/NODE binding added"

  return
}

. ${SGE_ROOT}/util/resources/jsv/jsv_include.sh

jsv_main

JSV will be run on the submit host, therefore this line should be added to here: $SGE_ROOT/$SGE_CELL/common/sge_request

-jsv  /usr/share/gridengine/scripts/cpuset_jsv.sh

Additional information:

The JSV script rounded up the parallel jobs slot number so as to be divisible by six. This makes it possible to allocate CPU socket instead of CPU cores. NUMA CPUs will be faster if they are using the "local" memory.

Dummy parallel environments for non parallel (serial) jobs

Commands:

qconf -ap serial
qconf -ap array
pe_name            serial
slots              3000
user_lists         NONE
xuser_lists        NONE
start_proc_args    NONE
stop_proc_args     NONE
allocation_rule    $pe_slots
control_slaves     TRUE
job_is_first_task  FALSE
urgency_slots      max
accounting_summary FALSE
pe_name            array
slots              3000
user_lists         NONE
xuser_lists        NONE
start_proc_args    NONE
stop_proc_args     NONE
allocation_rule    $pe_slots
control_slaves     TRUE
job_is_first_task  FALSE
urgency_slots      max
accounting_summary FALSE

Adding consumable resources to the UV machine

Commands:

qconf -mc
h_vmem              h_vmem     MEMORY      <=    YES         YES        5.2G     0
qconf -me uv
complex_values        slots=1140,h_vmem=5929G

Additional information: 5.2G is for one CPU core (this value will be multiplied by parallel job). The SGE cpuset contain 190 memory nodes therefore the complex_values will be this: 190*6*5.2=5928G (I added +1G to this value, because OGS need it. I do not know the reason.)

One memory node contain 6 CPU cores and 32 GByte memories.

Example:

numactl --hardware | grep "node 2 "

Output:

node 2 cpus: 12 13 14 15 16 17
node 2 size: 32768 MB
node 2 free: 31759 MB

Checkpointing: BLCR integration

Home page: https://ftg.lbl.gov/projects/CheckpointRestart/

BLCR scripts for SGE integration

Creation of BLCR checkpointing environment

Command:

qconf -ackpt BLCR
ckpt_name BLCR
interface APPLICATION-LEVEL
ckpt_command /usr/share/gridengine/scripts/blcr_checkpoint.sh $job_id $job_pid
migr_command /usr/share/gridengine/scripts/blcr_migrate.sh $job_id $job_pid
restart_command none
clean_command /usr/share/gridengine/scripts/blcr_clean.sh $job_id $job_pid
ckpt_dir none
signal none
when xsmr

Adding to queue

Command:

qconf -mq test.q
ckpt_list             BLCR
starter_method        /usr/share/gridengine/scripts/blcr_submit.sh

Job submission

qsub -ckpt BLCR -r yes job.sh

GPU integration

Queue creaton

Command:

qconf -cq gpu.q
qname                 gpu.q
hostlist              gpu1 gpu2
seq_no                0
load_thresholds       np_load_avg=1.1,mem_free=2G
suspend_thresholds    NONE
nsuspend              1
suspend_interval      00:05:00
priority              0
min_cpu_interval      00:05:00
processors            UNDEFINED
qtype                 BATCH INTERACTIVE
ckpt_list             NONE
pe_list               NONE
rerun                 FALSE
slots                 24
tmpdir                /tmp
shell                 /bin/bash
prolog                NONE
epilog                NONE
shell_start_mode      unix_behavior
starter_method        NONE
suspend_method        NONE
resume_method         NONE
terminate_method      NONE
notify                00:00:60
owner_list            NONE
user_lists            NONE
xuser_lists           NONE
subordinate_list      NONE
complex_values        NONE
projects              NONE
xprojects             NONE
calendar              NONE
initial_state         default
s_rt                  INFINITY
h_rt                  INFINITY
s_cpu                 INFINITY
h_cpu                 INFINITY
s_fsize               INFINITY
h_fsize               INFINITY
s_data                INFINITY
h_data                INFINITY
s_stack               INFINITY
h_stack               INFINITY
s_core                INFINITY
h_core                INFINITY
s_rss                 INFINITY
h_rss                 INFINITY
s_vmem                INFINITY
h_vmem                INFINITY

Wrapper script for sge_shepherd

#!/bin/bash
# 
# Author: NIIF Institute, http://www.niif.hu/en
# Date: 2012-10-14
#
# Filename: sge_shepherd_gpu_wrapper.sh
#

while IFS== read key val
do
    case "$key" in
        GPU_NUMBER) GPU_NUMBER="$val";;
    esac
done <environment

CUDA_VISIBLE_DEVICES=""

for gpu_id in `/usr/bin/nvidia-smi -L | cut -d ':' -f 1| awk '{printf "%s ", $2}'`
do
  /usr/bin/nvidia-smi -i $gpu_id | grep "No running compute processes found"  > /dev/null
  if [ $? == 0 ]
  then
     if [ ! -z "$CUDA_VISIBLE_DEVICES" ]
     then
       CUDA_VISIBLE_DEVICES="$CUDA_VISIBLE_DEVICES,$gpu_id"
       GPU_NUMBER=`expr $GPU_NUMBER - 1`
     else
       CUDA_VISIBLE_DEVICES="$gpu_id"
       GPU_NUMBER=`expr $GPU_NUMBER - 1`
     fi
  fi

  if [ "$GPU_NUMBER" == "0" ]
  then
    break
  fi

done

export CUDA_VISIBLE_DEVICES

exec /usr/bin/sge_shepherd $@

Wrapper script configuration for gpu1, gpu2 machines:

qconf -mconf gpu1
qconf -mconf gpu2
shepherd_cmd      /usr/share/gridengine/scripts/sge_shepherd_gpu_wrapper.sh

Adding consumable resource to the GPU machines

Commands:

qconf -mc
gpu                   gpu                   INT         <=      YES         YES        0        0
qconf -me gpu1
qconf -me gpu2
complex_values        slots=24,gpu=6

Reason: gpu1 and gpu2 contain 2*24 CPUs and 2*6 GPUs.

JSV script

#!/bin/bash
# 
# Author: NIIF Institute, http://www.niif.hu/en
# Date: 2012-12-14
#
# Filename: gpu_jsv.sh
#

jsv_on_start()
{
   return
}

jsv_on_verify()
{

  if [ "`jsv_get_param q_hard`" == "gpu.q" ]; then
     has_h_gpu=`jsv_sub_is_param l_hard gpu`
     if [ "$has_h_gpu" = "true" ]; then
        gpu=`jsv_sub_get_param l_hard gpu`
        jsv_add_env GPU_NUMBER $gpu
     else
        jsv_sub_add_param l_hard gpu 1
        jsv_add_env GPU_NUMBER 1
     fi
  fi

  jsv_correct "GPU configuration"
  jsv_accept "Job has been accepted"
   
  return
}

. ${SGE_ROOT}/util/resources/jsv/jsv_include.sh

jsv_main

JSV will be run on the submit host, therefore this line should be added to here: $SGE_ROOT/$SGE_CELL/common/sge_request

-jsv  /usr/share/gridengine/scripts/gpu_jsv.sh

Job submission

Submit script (job.sh):

#!/bin/bash
#$ -N GPU_test_job
#$ -q gpu.q
#$ -l gpu=3

./MonteCarloMultiGPU -noprompt
qsub job.sh

It will use only 3 GPUs.

Puppet

Homepage:

Puppet is IT automation software that helps system administrators manage infrastructure throughout its lifecycle, from provisioning and configuration to patch management and compliance. Using Puppet, you can easily automate repetitive tasks, quickly deploy critical applications, and proactively manage change, scaling from 10s of servers to 1000s, on-premise or in the cloud.

Puppet uses a declarative, model-based approach to IT automation:

  • Define the desired state of the infrastructure’s configuration using Puppet’s declarative configuration language.
  • Simulate configuration changes before enforcing them.
  • Enforce the deployed desired state automatically, correcting any configuration drift.
  • Report on the differences between actual and desired states and any changes made enforcing the desired state.

Puppet is IT automation software that helps system administrators manage infrastructure throughout its lifecycle, from provisioning and configuration to patch management and compliance. Using Puppet, you can easily automate repetitive tasks, quickly deploy critical applications, and proactively manage change, scaling from 10s of servers to 1000s, on-premise or in the cloud.

Middleware

rCUDA

Remote CUDA (rCUDA) is a middleware that enables Computer Unified Device Architecture (CUDA) to be used in the commodity network. This middleware allows an application to use a CUDA-compatible graphics processing unit (GPU) installed in a remote computer as if it were installed in the computer where the application is being executed. This approach is based on the observation that GPUs in a cluster are not usually fully utilized, and it is intended to reduce the number of GPUs in the cluster, thus lowering the costs related with acquisition and maintenance while keeping performance close to that of the fully equipped configuration.

rCUDA is designed following the client-server distributed architecture. Clients employ a library of wrappers to the high-level CUDA Runtime API, while GPU network service is listening for requests on a TCP port. When an application demands a GPU service, its request is derived to the client side of the architecture. The client forwards the request to one of the servers, which accesses the GPU installed in that computer and executes the request in it. Time-multiplexing (sharing) the GPU is accomplished by spawning a different server process for each remote execution over a new GPU context.

rCUDA includes highly optimized TCP and low-level InfiniBand communications. It can be useful in three different environments:

  • Clusters. To reduce the number of GPUs installed in High Performance Clusters. This leads to increase GPUs use and to energy savings, as well as other related savings like acquisition costs, maintenance, space, cooling, etc.
  • Academia. In commodity networks, to offer access to a few high performance GPUs concurrently to many students.
  • Virtual Machines. To enable the access to the CUDA facilities on the physical machine.

The current version of rCUDA is 4.0. This version implements most of the functions in the CUDA Runtime API version 4.2, excluding only those related with graphics interoperability. rCUDA 4.0 targets the Linux and Windows OSs (for 32- and 64-bit architectures) on both client and server sides. Currently, rCUDA-ready applications have to be programmed using the plain C API. In addition, host and device code need to be compiled separately. A conversion utility CU2rCU has been developed to assist code transformation.

Links:

EMI

Home page: http://www.eu-emi.eu/middleware

gLite

Home page: http://glite.cern.ch

UNICORE

Home page: http://www.unicore.eu

ARC

Home page: http://www.nordugrid.org/arc

Open Grid Scheduler integration

You can dowload from here that ARC middleware configuration which are using Open Grid Scheduler. Some parts of the server.xml file need to be modified, for example the share name (queue): test.q

ARC Linux repository: http://download.nordugrid.org/repos-11.05.html

Recommended ARC version: 1.1.1

You need to install these packages:

  • nordugrid-arc-plugins-needed
  • nordugrid-arc-aris
  • nordugrid-arc-debuginfo
  • nordugrid-arc-hed
  • nordugrid-arc-client
  • nordugrid-arc-plugins-globus
  • nordugrid-arc-arex
  • nordugrid-arc

ARC/A-REX daemon start command:

arched -c server.xml

Programming environments

CUDA Parallel Nsight tool

NVIDIA Nsight is the development platform for heterogeneous computing that allows efficient development, debugging and profiling of the GPU code. Nsight helps users gain a better understanding of their code - identify and analyze bottlenecks and observe the behavior of all system activities. NVIDIA Nsight is available for Windows, Linux and Mac OS users.

Nsight Development Platform, Visual Studio Edition (formerly NVIDIA Parallel Nsight) brings GPU Computing into Microsoft Visual Studio. It enables users to build, debug, profile and trace heterogeneous compute and graphics applications using CUDA C/C++, OpenCL, DirectCompute, Direct3D, and OpenGL. Current version is 2.2. Nsight Visual Studio Edition 2.2 has been updated with numerous bug fixes over the previous Nsight 2.2 release, build 12160. It is recommended that all grahics developers update to this latest version as the majority of improvements and bug fixes are graphics related.

NVIDIA Nsight Eclipse Edition is a full-featured IDE powered by the Eclipse platform that provides an all-in-one integrated environment to edit, build, debug and profile CUDA-C applications. Nsight Eclipse Edition supports a rich set of commercial and free plugins. It comprises of Nsight Source Code Editor, Nsight Debugger and Nsight Profiler. The latest version of NVIDIA Nsight Eclipse Edition with support for CUDA C/C++ and support for the Kepler Architecture is available with the CUDA 5.0 and is supported on MAC and Linux platforms. It is part of the CUDA Toolkit.

Link: http://www.nvidia.com/object/nsight.html

Personal tools