System software, middleware and programming environments

From HP-SEE Wiki

(Difference between revisions)
Jump to: navigation, search
(BLCR checkpointing)
(GPU integration)
Line 14: Line 14:
=== GPU integration ===
=== GPU integration ===
 +
 +
==== Queue creaton ====
 +
 +
Command:
 +
<pre>
 +
qconf -cq gpu.q
 +
</pre>
 +
 +
<pre>
 +
qname                gpu.q
 +
hostlist              gpu1 gpu2
 +
seq_no                0
 +
load_thresholds      np_load_avg=1.1,mem_free=2G
 +
suspend_thresholds    NONE
 +
nsuspend              1
 +
suspend_interval      00:05:00
 +
priority              0
 +
min_cpu_interval      00:05:00
 +
processors            UNDEFINED
 +
qtype                BATCH INTERACTIVE
 +
ckpt_list            NONE
 +
pe_list              NONE
 +
rerun                FALSE
 +
slots                24
 +
tmpdir                /tmp
 +
shell                /bin/bash
 +
prolog                NONE
 +
epilog                NONE
 +
shell_start_mode      unix_behavior
 +
starter_method        NONE
 +
suspend_method        NONE
 +
resume_method        NONE
 +
terminate_method      NONE
 +
notify                00:00:60
 +
owner_list            NONE
 +
user_lists            NONE
 +
xuser_lists          NONE
 +
subordinate_list      NONE
 +
complex_values        NONE
 +
projects              NONE
 +
xprojects            NONE
 +
calendar              NONE
 +
initial_state        default
 +
s_rt                  INFINITY
 +
h_rt                  INFINITY
 +
s_cpu                INFINITY
 +
h_cpu                INFINITY
 +
s_fsize              INFINITY
 +
h_fsize              INFINITY
 +
s_data                INFINITY
 +
h_data                INFINITY
 +
s_stack              INFINITY
 +
h_stack              INFINITY
 +
s_core                INFINITY
 +
h_core                INFINITY
 +
s_rss                INFINITY
 +
h_rss                INFINITY
 +
s_vmem                INFINITY
 +
h_vmem                INFINITY
 +
</pre>
 +
 +
==== Wrapper script for sge_shepherd ====
 +
 +
<pre>
 +
#!/bin/bash
 +
#
 +
# Author: NIIF Institute, http://www.niif.hu/en
 +
# Date: 2012-10-14
 +
#
 +
# File: sge_shepherd_gpu_wrapper.sh
 +
#
 +
 +
while IFS== read key val
 +
do
 +
    case "$key" in
 +
        GPU_NUMBER) GPU_NUMBER="$val";;
 +
    esac
 +
done <environment
 +
 +
CUDA_VISIBLE_DEVICES=""
 +
 +
for gpu_id in `/usr/bin/nvidia-smi -L | cut -d ':' -f 1| awk '{printf "%s ", $2}'`
 +
do
 +
  /usr/bin/nvidia-smi -i $gpu_id | grep "No running compute processes found"  > /dev/null
 +
  if [ $? == 0 ]
 +
  then
 +
    if [ ! -z "$CUDA_VISIBLE_DEVICES" ]
 +
    then
 +
      CUDA_VISIBLE_DEVICES="$CUDA_VISIBLE_DEVICES,$gpu_id"
 +
      GPU_NUMBER=`expr $GPU_NUMBER - 1`
 +
    else
 +
      CUDA_VISIBLE_DEVICES="$gpu_id"
 +
      GPU_NUMBER=`expr $GPU_NUMBER - 1`
 +
    fi
 +
  fi
 +
 +
  if [ "$GPU_NUMBER" == "0" ]
 +
  then
 +
    break
 +
  fi
 +
 +
done
 +
 +
export CUDA_VISIBLE_DEVICES
 +
 +
exec /usr/bin/sge_shepherd $@
 +
</pre>
 +
 +
Wrapper script configuration for gpu1, gpu2 machines:
 +
 +
<pre>
 +
qconf -mconf gpu1
 +
qconf -mconf gpu2
 +
</pre>
 +
 +
<pre>
 +
shepherd_cmd      /usr/share/gridengine/scripts/sge_shepherd_gpu_wrapper.sh
 +
</pre>
= Middleware =
= Middleware =

Revision as of 11:29, 14 October 2012

Contents

System software

Local Resource Management System

Open Grid scheduler

Home page: http://gridscheduler.sourceforge.net

This is an open source fork of Sun Grid Engine

cpuset configuration on SGI UV 1000

BLCR checkpointing

Home page: https://ftg.lbl.gov/projects/CheckpointRestart/

GPU integration

Queue creaton

Command:

qconf -cq gpu.q
qname                 gpu.q
hostlist              gpu1 gpu2
seq_no                0
load_thresholds       np_load_avg=1.1,mem_free=2G
suspend_thresholds    NONE
nsuspend              1
suspend_interval      00:05:00
priority              0
min_cpu_interval      00:05:00
processors            UNDEFINED
qtype                 BATCH INTERACTIVE
ckpt_list             NONE
pe_list               NONE
rerun                 FALSE
slots                 24
tmpdir                /tmp
shell                 /bin/bash
prolog                NONE
epilog                NONE
shell_start_mode      unix_behavior
starter_method        NONE
suspend_method        NONE
resume_method         NONE
terminate_method      NONE
notify                00:00:60
owner_list            NONE
user_lists            NONE
xuser_lists           NONE
subordinate_list      NONE
complex_values        NONE
projects              NONE
xprojects             NONE
calendar              NONE
initial_state         default
s_rt                  INFINITY
h_rt                  INFINITY
s_cpu                 INFINITY
h_cpu                 INFINITY
s_fsize               INFINITY
h_fsize               INFINITY
s_data                INFINITY
h_data                INFINITY
s_stack               INFINITY
h_stack               INFINITY
s_core                INFINITY
h_core                INFINITY
s_rss                 INFINITY
h_rss                 INFINITY
s_vmem                INFINITY
h_vmem                INFINITY

Wrapper script for sge_shepherd

#!/bin/bash
# 
# Author: NIIF Institute, http://www.niif.hu/en
# Date: 2012-10-14
#
# File: sge_shepherd_gpu_wrapper.sh
#

while IFS== read key val
do
    case "$key" in
        GPU_NUMBER) GPU_NUMBER="$val";;
    esac
done <environment

CUDA_VISIBLE_DEVICES=""

for gpu_id in `/usr/bin/nvidia-smi -L | cut -d ':' -f 1| awk '{printf "%s ", $2}'`
do
  /usr/bin/nvidia-smi -i $gpu_id | grep "No running compute processes found"  > /dev/null
  if [ $? == 0 ]
  then
     if [ ! -z "$CUDA_VISIBLE_DEVICES" ]
     then
       CUDA_VISIBLE_DEVICES="$CUDA_VISIBLE_DEVICES,$gpu_id"
       GPU_NUMBER=`expr $GPU_NUMBER - 1`
     else
       CUDA_VISIBLE_DEVICES="$gpu_id"
       GPU_NUMBER=`expr $GPU_NUMBER - 1`
     fi
  fi

  if [ "$GPU_NUMBER" == "0" ]
  then
    break
  fi

done

export CUDA_VISIBLE_DEVICES

exec /usr/bin/sge_shepherd $@

Wrapper script configuration for gpu1, gpu2 machines:

qconf -mconf gpu1
qconf -mconf gpu2
shepherd_cmd      /usr/share/gridengine/scripts/sge_shepherd_gpu_wrapper.sh

Middleware

EMI

Home page: http://www.eu-emi.eu/middleware

gLite

Home page: http://glite.cern.ch

UNICORE

Home page: http://www.unicore.eu

ARC

Home page: http://www.nordugrid.org/arc

Open Grid Scheduler integration

Programming environments

CUDA Parallel Nsight tool

NVIDIA Nsight is the development platform for heterogeneous computing that allows efficient development, debugging and profiling of the GPU code. Nsight helps users gain a better understanding of their code - identify and analyze bottlenecks and observe the behavior of all system activities. NVIDIA Nsight is available for Windows, Linux and Mac OS users.

Nsight Development Platform, Visual Studio Edition (formerly NVIDIA Parallel Nsight) brings GPU Computing into Microsoft Visual Studio. It enables users to build, debug, profile and trace heterogeneous compute and graphics applications using CUDA C/C++, OpenCL, DirectCompute, Direct3D, and OpenGL. Current version is 2.2. Nsight Visual Studio Edition 2.2 has been updated with numerous bug fixes over the previous Nsight 2.2 release, build 12160. It is recommended that all grahics developers update to this latest version as the majority of improvements and bug fixes are graphics related.

NVIDIA Nsight Eclipse Edition is a full-featured IDE powered by the Eclipse platform that provides an all-in-one integrated environment to edit, build, debug and profile CUDA-C applications. Nsight Eclipse Edition supports a rich set of commercial and free plugins. It comprises of Nsight Source Code Editor, Nsight Debugger and Nsight Profiler. The latest version of NVIDIA Nsight Eclipse Edition with support for CUDA C/C++ and support for the Kepler Architecture is available with the CUDA 5.0 and is supported on MAC and Linux platforms. It is part of the CUDA Toolkit.

Link: http://www.nvidia.com/object/nsight.html

Personal tools