System software, middleware and programming environments
From HP-SEE Wiki
(→Open Grid scheduler) |
(→JSV script) |
||
Line 173: | Line 173: | ||
jsv_main | jsv_main | ||
+ | </pre> | ||
+ | |||
+ | JSV will be run on the submit host, therefore this line should be added to here: $SGE_ROOT/$SGE_CELL/common/sge_request | ||
+ | <pre> | ||
+ | -jsv /usr/share/gridengine/scripts/cpuset_jsv.sh | ||
</pre> | </pre> | ||
Revision as of 12:24, 14 October 2012
Contents |
System software
Local Resource Management System
Open Grid scheduler
Home page: http://gridscheduler.sourceforge.net
This is an open source fork of Sun Grid Engine.
cpuset configuration on SGI UV 1000
Queue creaton
Wrapper script for sge_shepherd
#!/bin/bash # # Author: NIIF Institute, http://www.niif.hu/en # Date: 2012-10-14 # # Filename: sge_shepherd_cpuset_wrapper.sh # # 0-5 boot cpuset # 6-11 reserve cpuset # 12- sge cpuset # START_CPU=12 CPU_PER_SLOT=6 SGE_CPUSET="/dev/cpuset/sge" while IFS== read key val do case "${key}" in JOB_ID) JOB_ID="${val}";; SGE_TASK_ID) SGE_TASK_ID="${val}";; PE) PE="${val}";; NSLOTS) NSLOTS="${val}";; esac done <environment if [ ! -z "${PE}" ] then if [ "${PE}" = "openmp" ] then export OMP_NUM_THREADS=${NSLOTS} fi fi if [ "${SGE_TASK_ID}" != "undefined" ] then sge_id=${JOB_ID}-${SGE_TASK_ID} else sge_id=${JOB_ID} fi if [ "`cat ${PWD}/pe_hostfile | cut -d " " -f 4`" == "UNDEFINED" ] then mail -s "ERROR!!! There is no cpuset allocation by this this job: ${sge_id}" ogs-amin@example.com exec /usr/bin/sge_shepherd $@ exit 0 fi SGE_BINDING=`cat ${PWD}/pe_hostfile | cut -d " " -f 4 | sed 's/:/\n/g;s/,/ /g' | awk -v cpu_per_slot=${CPU_PER_SLOT} -v start_cpu=${START_CPU} '{print start_cpu + ($1 * cpu_per_slot) + $2 }' | awk '{printf "%d ", $1}'` function get_nodes() { for cpu_id in $1 do nodes="${nodes} `expr ${cpu_id} / ${CPU_PER_SLOT}`" done echo `echo ${nodes} | sed 's/ /\n/g' | sort | uniq | sed 's/\n/ /g'` } if [ ! -d ${SGE_CPUSET}/${sge_id} ] then mkdir ${SGE_CPUSET}/${sge_id} fi NODE_BINDING=`get_nodes "${SGE_BINDING}"` cpus=`echo ${SGE_BINDING} | sed "s/ /,/g"` echo ${cpus} > ${SGE_CPUSET}/${sge_id}/cpus nodes=`echo ${NODE_BINDING} | sed "s/ /,/g"` echo ${nodes} > ${SGE_CPUSET}/${sge_id}/mems echo 1 > ${SGE_CPUSET}/${sge_id}/notify_on_release echo $$ > ${SGE_CPUSET}/${sge_id}/tasks export SGE_BINDING NODE_BINDING exec /usr/bin/sge_shepherd $@
JSV script
#!/bin/bash # # Author: NIIF Institute, http://www.niif.hu/en # Date: 2012-10-14 # # Filename: cpuset_jsv.sh # jsv_on_start() { return } jsv_on_verify() { cpu_per_slot="6" slots="6" serial_job="false" if [ "`jsv_get_param pe_name`" != "" ]; then pe_min=`jsv_get_param pe_min` pe_min_remainder=`expr $pe_min % $cpu_per_slot` pe_min_int=`expr $pe_min / $cpu_per_slot` pe_max=`jsv_get_param pe_max` pe_max_remainder=`expr $pe_max % $cpu_per_slot` pe_max_int=`expr $pe_max / $cpu_per_slot` if [ "$pe_max" == "9999999" ]; then # pe_max will be allways equal with pe_min if [ "$pe_min_remainder" != "0" ]; then pe_min=`expr $pe_min_int \* $cpu_per_slot + $cpu_per_slot` jsv_set_param pe_min $pe_min fi jsv_set_param pe_max $pe_min slots=$pe_min else if [ "$pe_max_remainder" != "0" ]; then pe_max=`expr $pe_max_int \* $cpu_per_slot + $cpu_per_slot` jsv_set_param pe_max $pe_max fi jsv_set_param pe_min $pe_max slots=$pe_max fi jsv_set_param binding_amount $slots else jsv_set_param binding_amount $cpu_per_slot jsv_set_param pe_name "serial" jsv_set_param pe_min $cpu_per_slot jsv_set_param pe_max $cpu_per_slot serial_job="true" fi if [ `jsv_is_param t_max` != false ]; then if [ "`jsv_get_param pe_name`" == "serial" ]; then jsv_set_param pe_name "array" fi fi jsv_set_param binding_strategy "linear_automatic" jsv_set_param binding_type "pe" l_hard=`jsv_get_param l_hard` if [ "$l_hard" != "" ]; then has_h_vmem=`jsv_sub_is_param l_hard h_vmem` if [ "$has_h_vmem" = "true" ]; then jsv_set_param l_hard "h_vmem=5.2G" fi else jsv_sub_add_param l_hard "h_vmem=5.2G" fi jsv_correct "Job was modified before it was accepted: CPU/NODE binding added" return } . ${SGE_ROOT}/util/resources/jsv/jsv_include.sh jsv_main
JSV will be run on the submit host, therefore this line should be added to here: $SGE_ROOT/$SGE_CELL/common/sge_request
-jsv /usr/share/gridengine/scripts/cpuset_jsv.sh
BLCR checkpointing
Home page: https://ftg.lbl.gov/projects/CheckpointRestart/
GPU integration
Queue creaton
Command:
qconf -cq gpu.q
qname gpu.q hostlist gpu1 gpu2 seq_no 0 load_thresholds np_load_avg=1.1,mem_free=2G suspend_thresholds NONE nsuspend 1 suspend_interval 00:05:00 priority 0 min_cpu_interval 00:05:00 processors UNDEFINED qtype BATCH INTERACTIVE ckpt_list NONE pe_list NONE rerun FALSE slots 24 tmpdir /tmp shell /bin/bash prolog NONE epilog NONE shell_start_mode unix_behavior starter_method NONE suspend_method NONE resume_method NONE terminate_method NONE notify 00:00:60 owner_list NONE user_lists NONE xuser_lists NONE subordinate_list NONE complex_values NONE projects NONE xprojects NONE calendar NONE initial_state default s_rt INFINITY h_rt INFINITY s_cpu INFINITY h_cpu INFINITY s_fsize INFINITY h_fsize INFINITY s_data INFINITY h_data INFINITY s_stack INFINITY h_stack INFINITY s_core INFINITY h_core INFINITY s_rss INFINITY h_rss INFINITY s_vmem INFINITY h_vmem INFINITY
Wrapper script for sge_shepherd
#!/bin/bash # # Author: NIIF Institute, http://www.niif.hu/en # Date: 2012-10-14 # # Filename: sge_shepherd_gpu_wrapper.sh # while IFS== read key val do case "$key" in GPU_NUMBER) GPU_NUMBER="$val";; esac done <environment CUDA_VISIBLE_DEVICES="" for gpu_id in `/usr/bin/nvidia-smi -L | cut -d ':' -f 1| awk '{printf "%s ", $2}'` do /usr/bin/nvidia-smi -i $gpu_id | grep "No running compute processes found" > /dev/null if [ $? == 0 ] then if [ ! -z "$CUDA_VISIBLE_DEVICES" ] then CUDA_VISIBLE_DEVICES="$CUDA_VISIBLE_DEVICES,$gpu_id" GPU_NUMBER=`expr $GPU_NUMBER - 1` else CUDA_VISIBLE_DEVICES="$gpu_id" GPU_NUMBER=`expr $GPU_NUMBER - 1` fi fi if [ "$GPU_NUMBER" == "0" ] then break fi done export CUDA_VISIBLE_DEVICES exec /usr/bin/sge_shepherd $@
Wrapper script configuration for gpu1, gpu2 machines:
qconf -mconf gpu1 qconf -mconf gpu2
shepherd_cmd /usr/share/gridengine/scripts/sge_shepherd_gpu_wrapper.sh
Adding consumable resource to the GPU machines
Commands:
qconf -mc
gpu gpu INT <= YES YES 0 0
qconf -me gpu1 qconf -me gpu2
complex_values slots=24,gpu=6
Reason: gpu1 and gpu2 contain 2*24 CPUs and 2*6 GPUs.
JSV script
#!/bin/bash # # Author: NIIF Institute, http://www.niif.hu/en # Date: 2012-10-14 # # Filename: gpu_jsv.sh # jsv_on_start() { return } jsv_on_verify() { if [ "`jsv_get_param q_hard`" == "gpu.q" ]; then has_h_gpu=`jsv_sub_is_param l_hard gpu` if [ "$has_h_gpu" = "true" ]; then gpu=`jsv_sub_get_param l_hard gpu` jsv_add_env GPU_NUMBER $gpu else jsv_add_env GPU_NUMBER 1 fi fi jsv_correct "GPU configuration" jsv_accept "Job has been accepted" return } . ${SGE_ROOT}/util/resources/jsv/jsv_include.sh jsv_main
JSV will be run on the submit host, therefore this line should be added to here: $SGE_ROOT/$SGE_CELL/common/sge_request
-jsv /usr/share/gridengine/scripts/gpu_jsv.sh
Job submission
Submit script (job.sh):
#!/bin/bash #$ -N GPU_test_job #$ -q gpu.q #$ -l gpu=3 ./MonteCarloMultiGPU -noprompt
qsub job.sh
It will use only 3 GPUs.
Middleware
EMI
Home page: http://www.eu-emi.eu/middleware
gLite
Home page: http://glite.cern.ch
UNICORE
Home page: http://www.unicore.eu
ARC
Home page: http://www.nordugrid.org/arc
Open Grid Scheduler integration
Programming environments
CUDA Parallel Nsight tool
NVIDIA Nsight is the development platform for heterogeneous computing that allows efficient development, debugging and profiling of the GPU code. Nsight helps users gain a better understanding of their code - identify and analyze bottlenecks and observe the behavior of all system activities. NVIDIA Nsight is available for Windows, Linux and Mac OS users.
Nsight Development Platform, Visual Studio Edition (formerly NVIDIA Parallel Nsight) brings GPU Computing into Microsoft Visual Studio. It enables users to build, debug, profile and trace heterogeneous compute and graphics applications using CUDA C/C++, OpenCL, DirectCompute, Direct3D, and OpenGL. Current version is 2.2. Nsight Visual Studio Edition 2.2 has been updated with numerous bug fixes over the previous Nsight 2.2 release, build 12160. It is recommended that all grahics developers update to this latest version as the majority of improvements and bug fixes are graphics related.
NVIDIA Nsight Eclipse Edition is a full-featured IDE powered by the Eclipse platform that provides an all-in-one integrated environment to edit, build, debug and profile CUDA-C applications. Nsight Eclipse Edition supports a rich set of commercial and free plugins. It comprises of Nsight Source Code Editor, Nsight Debugger and Nsight Profiler. The latest version of NVIDIA Nsight Eclipse Edition with support for CUDA C/C++ and support for the Kepler Architecture is available with the CUDA 5.0 and is supported on MAC and Linux platforms. It is part of the CUDA Toolkit.