System software, middleware and programming environments
From HP-SEE Wiki
(→Wrapper script for sge_shepherd) |
(→Wrapper script for sge_shepherd) |
||
(11 intermediate revisions not shown) | |||
Line 1: | Line 1: | ||
= System software = | = System software = | ||
+ | |||
+ | == Lustre file system == | ||
+ | |||
+ | Lustre is a parallel distributed file system, generally used for large scale cluster computing. Lustre file systems is free and open source and provide a high performance file system for computer clusters ranging in size from small workgroup clusters to large-scale, multi-site clusters. Lustre is widely deployed in HPC systems over the world since it is scalable, dependent, and capable to support big data centers (thousands of clients, petabytes of storage space). | ||
+ | |||
+ | A Lustre file system has three major functional units: | ||
+ | |||
+ | * A single metadata server (MDS) that has a single metadata target (MDT) per Lustre filesystem that stores namespace metadata, such as filenames, directories, access permissions, and file layout | ||
+ | * One or more object storage servers (OSSes) that store file data on one or more object storage targets (OSTs) | ||
+ | * Client(s) that access and use the data. | ||
+ | |||
+ | Lustre presents all clients with a unified namespace for all of the files and data in the filesystem, using standard POSIX semantics, and allows concurrent and coherent read and write access to the files in the filesystem. | ||
+ | |||
+ | Links: | ||
+ | * http://lustre.org/ | ||
+ | |||
+ | == Ceph file system == | ||
+ | |||
+ | Ceph is a free software unified storage platform designed to present object, block, and file storage from a single distributed cluster. Ceph’s main goals are to be completely distributed without a single point of failure, scalable to the exabyte level, and freely-available. The data is seamlessly replicated, making it fault tolerant. Ceph is a software-based solution and runs on commodity hardware. The system is designed to be both self-healing and self-managing and strives to reduce both administrator and budget overhead. | ||
+ | |||
+ | Links: | ||
+ | * http://ceph.com/ | ||
= Local Resource Management System = | = Local Resource Management System = | ||
+ | |||
+ | == PBS Professional == | ||
+ | |||
+ | New version 12.0 of Altair's PBS Professional software is available. PBS Professional 12.0 is a distributed workload management and job scheduling system for high-performance computing, designed to handle the management and monitoring of the computational workload on a set of one or more computers. Features include rapid-start capabilities and shrink-to-fit, which allows users to run jobs or portions of jobs in the period immediately before a planned maintenance outage. Plug-ins allow users to control, modify, extend and change job lifecycle events in the execution stage. A plug-in also allows for health checks prior to starting the job, filtering and changing computer behavior when the job starts and ensuring cleanup is correct. | ||
== Open Grid Scheduler == | == Open Grid Scheduler == | ||
Line 361: | Line 387: | ||
# | # | ||
# Author: NIIF Institute, http://www.niif.hu/en | # Author: NIIF Institute, http://www.niif.hu/en | ||
- | # Date: | + | # Date: 2014-04-28 |
# | # | ||
# Filename: sge_shepherd_gpu_wrapper.sh | # Filename: sge_shepherd_gpu_wrapper.sh | ||
Line 377: | Line 403: | ||
for gpu_id in `/usr/bin/nvidia-smi -L | cut -d ':' -f 1| awk '{printf "%s ", $2}'` | for gpu_id in `/usr/bin/nvidia-smi -L | cut -d ':' -f 1| awk '{printf "%s ", $2}'` | ||
do | do | ||
+ | sleep $[ ( $RANDOM % 10 ) + 1 ]s | ||
/usr/bin/nvidia-smi -i $gpu_id | grep "No running compute processes found" > /dev/null | /usr/bin/nvidia-smi -i $gpu_id | grep "No running compute processes found" > /dev/null | ||
if [ $? == 0 ] | if [ $? == 0 ] | ||
Line 441: | Line 468: | ||
# | # | ||
# Author: NIIF Institute, http://www.niif.hu/en | # Author: NIIF Institute, http://www.niif.hu/en | ||
- | # Date: 2012- | + | # Date: 2012-12-14 |
# | # | ||
# Filename: gpu_jsv.sh | # Filename: gpu_jsv.sh | ||
Line 460: | Line 487: | ||
jsv_add_env GPU_NUMBER $gpu | jsv_add_env GPU_NUMBER $gpu | ||
else | else | ||
+ | jsv_sub_add_param l_hard gpu 1 | ||
jsv_add_env GPU_NUMBER 1 | jsv_add_env GPU_NUMBER 1 | ||
fi | fi | ||
Line 498: | Line 526: | ||
It will use only 3 GPUs. | It will use only 3 GPUs. | ||
+ | |||
+ | == Puppet == | ||
+ | |||
+ | Homepage: | ||
+ | * http://puppetlabs.com/ | ||
+ | |||
+ | Puppet is IT automation software that helps system administrators manage infrastructure throughout its lifecycle, from provisioning and configuration to patch management and compliance. Using Puppet, you can easily automate repetitive tasks, quickly deploy critical applications, and proactively manage change, scaling from 10s of servers to 1000s, on-premise or in the cloud. | ||
+ | |||
+ | Puppet uses a declarative, model-based approach to IT automation: | ||
+ | |||
+ | * Define the desired state of the infrastructure’s configuration using Puppet’s declarative configuration language. | ||
+ | * Simulate configuration changes before enforcing them. | ||
+ | * Enforce the deployed desired state automatically, correcting any configuration drift. | ||
+ | * Report on the differences between actual and desired states and any changes made enforcing the desired state. | ||
+ | |||
+ | Puppet is IT automation software that helps system administrators manage infrastructure throughout its lifecycle, from provisioning and configuration to patch management and compliance. Using Puppet, you can easily automate repetitive tasks, quickly deploy critical applications, and proactively manage change, scaling from 10s of servers to 1000s, on-premise or in the cloud. | ||
= Middleware = | = Middleware = | ||
Line 520: | Line 564: | ||
=== EMI === | === EMI === | ||
+ | The EMI (European Middleware Initiative) project brings a unified set of middleware applications, once provided by ARC, dCache, gLite and UNICORE. Those applications are currently being developed, built and tested in collaboration with the aim to improve interoperability and integration between grids and other computing infrastructures. EMI is developed as a part of european EGI project. | ||
+ | |||
+ | Currently, EMI has three releases: | ||
+ | * EMI 1 (Kebnekaise) | ||
+ | * EMI 2 (Matterhorn) | ||
+ | * EMI 2 (Monte Bianco) | ||
+ | |||
+ | EMI introduces a number of changes and new functionality in response to existing user requirements. The main improvements are defined as follows: | ||
+ | * Security - Replacement of GSI with SSL in the security components, most notably VOMS; REST-based interface for obtaining X.509 attribute certificates in VOMS Admin; initial integration of ARGUS with middleware services (e.g. CREAM) | ||
+ | * Compute - Full support for the CLUSTER service in CREAM; initial support for GLUE 2 in all CEs; integration of ARGUS in CREAM; initial implementation of common MPI methods across the different compute services | ||
+ | * Data - Adoption of pNFS4.1 and WebDAV standards (dCache); preview version of a messaging-based Storage Element-File Catalog synchronization service (SEMsg) | ||
+ | * Information Systems - ARC CE, UNICORE WS, CREAM, dCache, DPM expose a richer set of information via the experimental adoption of the GLUE2 information model standard. | ||
+ | |||
Home page: http://www.eu-emi.eu/middleware | Home page: http://www.eu-emi.eu/middleware | ||
=== gLite === | === gLite === | ||
+ | |||
+ | The gLite middleware distribution is an integrated set of components designed to enable resource sharing and building a grid. The gLite middleware was produced by the EGEE project and it is currently being developed by the EMI project. Current release is gLite 3.2 which in the process of being retired. New features will be included in the products developed under the EMI project. | ||
+ | |||
Home page: http://glite.cern.ch | Home page: http://glite.cern.ch | ||
=== UNICORE === | === UNICORE === | ||
+ | |||
+ | UNICORE (UNiform Interface to COmputing REsources) is a Grid computing technology that provides seamless, secure, and intuitive access to distributed Grid resources such as supercomputers or cluster systems and information stored in databases. In various European-funded projects UNICORE has evolved to a full-grown and well-tested Grid middleware system over the years. The UNICORE technology is open source under BSD licence and available at SourceForge, where new releases are published on a regular basis. | ||
+ | |||
+ | The architecture of UNICORE consists of three layers, namely user, server, and target system tier. The user tier is represented by various clients. The primary clients are the UNICORE Rich Client, a Graphical User Interface (GUI) based on the Eclipse framework, and the UNICORE commandline client (UCC). The clients use SOAP Web services to communicate with the server tier. XML documents are used to transmit platform and site independent descriptions of computational and data related tasks, resource information, and workflow specifications between client and server. The servers are accessible only via the Secure Socket Layer (SSL) protocol. | ||
+ | |||
+ | The security within UNICORE relies on the usage of permanent X.509 certificates issued by a trusted Certification Authority. | ||
+ | |||
Home page: http://www.unicore.eu | Home page: http://www.unicore.eu | ||
=== ARC === | === ARC === | ||
+ | |||
+ | The Advanced Resource Connector (ARC) middleware integrates computing resources (usually, computing clusters managed by a batch system or standalone workstations) and storage facilities, making them available via a secure common Grid layer. The ARC middleware uses Grid technologies to enable sharing and federation of computing and storage resources distributed across different administrative and application domains. ARC is used to create Grid infrastructures of various scope and complexity, from campus to national Grids. It is distributed under the Apache v2.0 license. | ||
+ | |||
+ | In a nutshell, ARC is: | ||
+ | * A general purpose, Open Source, lightweight, portable middleware solution | ||
+ | * A reliable production-quality and scalable implementation of fundamental Grid services | ||
+ | * Facilitator of cross-organisational distributed computing infrastructure solutions | ||
+ | * Strongly commited to open standards and interoperability | ||
+ | |||
Home page: http://www.nordugrid.org/arc | Home page: http://www.nordugrid.org/arc | ||
Line 555: | Line 631: | ||
arched -c server.xml | arched -c server.xml | ||
</pre> | </pre> | ||
+ | |||
+ | === Univa GriDEngine === | ||
+ | |||
+ | Univa Grid Engine is a batch-queuing system and distributed resource management software platform, once known as Sun Grid Engine (SGE). Univa Grid Engine supports the new breed of enterprise and big data applications with features and functionality designed to improve the speed of dispatching and throughput. | ||
+ | |||
+ | Current version is 8.1.4. The major improvements in the new version include: | ||
+ | * Improved Load collection tool for Intel Xeon Phi coprocessors ensures jobs run on the least loaded cores. | ||
+ | * Extended memory usage metrics for Multi-Threaded applications. | ||
+ | * Scheduler performance enhancements ensuring maximum number of jobs running in the cluster while improving system responsiveness. | ||
+ | * Interactive Univa Grid Engine jobs can now set their memory affinity. | ||
= Programming environments = | = Programming environments = | ||
Line 567: | Line 653: | ||
Link: http://www.nvidia.com/object/nsight.html | Link: http://www.nvidia.com/object/nsight.html | ||
+ | |||
+ | === NVIDIA Parallel Nsight 3.0 === | ||
+ | |||
+ | New version of Parallel Nsight tool, Visual Studio editio has been released. It has enhanced support for graphics debugging and profiling, compute debugging and profiling. | ||
+ | |||
+ | * Improved support for OpenGL 4.2 | ||
+ | * Improved GPU shader debugging and frame capturing | ||
+ | * Support for CUDA 5.0 and Kepler GK110 architecture (dynamic parallelism, concurrent kernel launches, etc.) | ||
+ | * CUDA Dynamic Parallelism analysis (building, debugging, running) | ||
+ | * Improved memory checker | ||
+ | * Profiler support for CPU/GPU workload analysis, active warp traces |
Latest revision as of 05:43, 28 April 2014
Contents |
System software
Lustre file system
Lustre is a parallel distributed file system, generally used for large scale cluster computing. Lustre file systems is free and open source and provide a high performance file system for computer clusters ranging in size from small workgroup clusters to large-scale, multi-site clusters. Lustre is widely deployed in HPC systems over the world since it is scalable, dependent, and capable to support big data centers (thousands of clients, petabytes of storage space).
A Lustre file system has three major functional units:
- A single metadata server (MDS) that has a single metadata target (MDT) per Lustre filesystem that stores namespace metadata, such as filenames, directories, access permissions, and file layout
- One or more object storage servers (OSSes) that store file data on one or more object storage targets (OSTs)
- Client(s) that access and use the data.
Lustre presents all clients with a unified namespace for all of the files and data in the filesystem, using standard POSIX semantics, and allows concurrent and coherent read and write access to the files in the filesystem.
Links:
Ceph file system
Ceph is a free software unified storage platform designed to present object, block, and file storage from a single distributed cluster. Ceph’s main goals are to be completely distributed without a single point of failure, scalable to the exabyte level, and freely-available. The data is seamlessly replicated, making it fault tolerant. Ceph is a software-based solution and runs on commodity hardware. The system is designed to be both self-healing and self-managing and strives to reduce both administrator and budget overhead.
Links:
Local Resource Management System
PBS Professional
New version 12.0 of Altair's PBS Professional software is available. PBS Professional 12.0 is a distributed workload management and job scheduling system for high-performance computing, designed to handle the management and monitoring of the computational workload on a set of one or more computers. Features include rapid-start capabilities and shrink-to-fit, which allows users to run jobs or portions of jobs in the period immediately before a planned maintenance outage. Plug-ins allow users to control, modify, extend and change job lifecycle events in the execution stage. A plug-in also allows for health checks prior to starting the job, filtering and changing computer behavior when the job starts and ensuring cleanup is correct.
Open Grid Scheduler
Home page: http://gridscheduler.sourceforge.net
This is an open source fork of Sun Grid Engine.
cpuset integration on SGI UV 1000 machine
SGI UV 1000: http://www.sgi.com/products/servers/uv/specs.html
We are recommending this OGS version for cpuset integration: OGS/GE 2011.11p1
Browser URL: http://gridscheduler.svn.sourceforge.net/viewvc/gridscheduler/tags/GE2011.11p1/
Details about SVN access: http://sourceforge.net/scm/?type=svn&group_id=343697
Wrapper script for sge_shepherd
Please check these variables: START_CPU, CPU_PER_SLOT, SGE_CPUSET, SGE_ADMIN.
#!/bin/bash # # Author: NIIF Institute, http://www.niif.hu/en # Date: 2012-10-14 # # Filename: sge_shepherd_cpuset_wrapper.sh # # 0-5 CPU cores (boot cpuset) # 6-11 CPU cores (reserve cpuset) # 12- CPU cores (SGE cpuset) # START_CPU=12 CPU_PER_SLOT=6 SGE_CPUSET="/dev/cpuset/sge" SGE_ADMIN="ogs-amin@example.com" while IFS== read key val do case "${key}" in JOB_ID) JOB_ID="${val}";; SGE_TASK_ID) SGE_TASK_ID="${val}";; esac done <environment if [ "${SGE_TASK_ID}" != "undefined" ] then sge_id=${JOB_ID}-${SGE_TASK_ID} else sge_id=${JOB_ID} fi if [ "`cat ${PWD}/pe_hostfile | cut -d " " -f 4`" == "UNDEFINED" ] then mail -s "ERROR!!! There is no cpuset allocation by this job: ${sge_id}" $SGE_ADMIN exec /usr/bin/sge_shepherd $@ exit 0 fi SGE_BINDING=`cat ${PWD}/pe_hostfile | cut -d " " -f 4 | sed 's/:/\n/g;s/,/ /g' | awk -v cpu_per_slot=${CPU_PER_SLOT} -v start_cpu=${START_CPU} '{print start_cpu + ($1 * cpu_per_slot) + $2 }' | awk '{printf "%d ", $1}'` function get_nodes() { for cpu_id in $1 do nodes="${nodes} `expr ${cpu_id} / ${CPU_PER_SLOT}`" done echo `echo ${nodes} | sed 's/ /\n/g' | sort | uniq | sed 's/\n/ /g'` } if [ ! -d ${SGE_CPUSET}/${sge_id} ] then mkdir ${SGE_CPUSET}/${sge_id} fi NODE_BINDING=`get_nodes "${SGE_BINDING}"` cpus=`echo ${SGE_BINDING} | sed "s/ /,/g"` echo ${cpus} > ${SGE_CPUSET}/${sge_id}/cpus nodes=`echo ${NODE_BINDING} | sed "s/ /,/g"` echo ${nodes} > ${SGE_CPUSET}/${sge_id}/mems echo 1 > ${SGE_CPUSET}/${sge_id}/notify_on_release echo $$ > ${SGE_CPUSET}/${sge_id}/tasks export SGE_BINDING NODE_BINDING exec /usr/bin/sge_shepherd $@
JSV script
Please check these variables: cpu_per_slot, h_vmem
#!/bin/bash # # Author: NIIF Institute, http://www.niif.hu/en # Date: 2012-10-14 # # Filename: cpuset_jsv.sh # jsv_on_start() { return } jsv_on_verify() { cpu_per_slot="6" slots=$cpu_per_slot serial_job="false" if [ "`jsv_get_param pe_name`" != "" ]; then pe_min=`jsv_get_param pe_min` pe_min_remainder=`expr $pe_min % $cpu_per_slot` pe_min_int=`expr $pe_min / $cpu_per_slot` pe_max=`jsv_get_param pe_max` pe_max_remainder=`expr $pe_max % $cpu_per_slot` pe_max_int=`expr $pe_max / $cpu_per_slot` if [ "$pe_max" == "9999999" ]; then # pe_max will be allways equal with pe_min if [ "$pe_min_remainder" != "0" ]; then pe_min=`expr $pe_min_int \* $cpu_per_slot + $cpu_per_slot` jsv_set_param pe_min $pe_min fi jsv_set_param pe_max $pe_min slots=$pe_min else if [ "$pe_max_remainder" != "0" ]; then pe_max=`expr $pe_max_int \* $cpu_per_slot + $cpu_per_slot` jsv_set_param pe_max $pe_max fi jsv_set_param pe_min $pe_max slots=$pe_max fi jsv_set_param binding_amount $slots else jsv_set_param binding_amount $cpu_per_slot jsv_set_param pe_name "serial" jsv_set_param pe_min $cpu_per_slot jsv_set_param pe_max $cpu_per_slot serial_job="true" fi if [ `jsv_is_param t_max` != false ]; then if [ "`jsv_get_param pe_name`" == "serial" ]; then jsv_set_param pe_name "array" fi fi jsv_set_param binding_strategy "linear_automatic" jsv_set_param binding_type "pe" l_hard=`jsv_get_param l_hard` if [ "$l_hard" != "" ]; then has_h_vmem=`jsv_sub_is_param l_hard h_vmem` if [ "$has_h_vmem" = "true" ]; then jsv_set_param l_hard "h_vmem=5.2G" fi else jsv_sub_add_param l_hard "h_vmem=5.2G" fi jsv_correct "Job was modified before it was accepted: CPU/NODE binding added" return } . ${SGE_ROOT}/util/resources/jsv/jsv_include.sh jsv_main
JSV will be run on the submit host, therefore this line should be added to here: $SGE_ROOT/$SGE_CELL/common/sge_request
-jsv /usr/share/gridengine/scripts/cpuset_jsv.sh
Additional information:
The JSV script rounded up the parallel jobs slot number so as to be divisible by six. This makes it possible to allocate CPU socket instead of CPU cores. NUMA CPUs will be faster if they are using the "local" memory.
Dummy parallel environments for non parallel (serial) jobs
Commands:
qconf -ap serial qconf -ap array
pe_name serial slots 3000 user_lists NONE xuser_lists NONE start_proc_args NONE stop_proc_args NONE allocation_rule $pe_slots control_slaves TRUE job_is_first_task FALSE urgency_slots max accounting_summary FALSE
pe_name array slots 3000 user_lists NONE xuser_lists NONE start_proc_args NONE stop_proc_args NONE allocation_rule $pe_slots control_slaves TRUE job_is_first_task FALSE urgency_slots max accounting_summary FALSE
Adding consumable resources to the UV machine
Commands:
qconf -mc
h_vmem h_vmem MEMORY <= YES YES 5.2G 0
qconf -me uv
complex_values slots=1140,h_vmem=5929G
Additional information: 5.2G is for one CPU core (this value will be multiplied by parallel job). The SGE cpuset contain 190 memory nodes therefore the complex_values will be this: 190*6*5.2=5928G (I added +1G to this value, because OGS need it. I do not know the reason.)
One memory node contain 6 CPU cores and 32 GByte memories.
Example:
numactl --hardware | grep "node 2 "
Output:
node 2 cpus: 12 13 14 15 16 17 node 2 size: 32768 MB node 2 free: 31759 MB
Checkpointing: BLCR integration
Home page: https://ftg.lbl.gov/projects/CheckpointRestart/
BLCR scripts for SGE integration
Creation of BLCR checkpointing environment
Command:
qconf -ackpt BLCR
ckpt_name BLCR interface APPLICATION-LEVEL ckpt_command /usr/share/gridengine/scripts/blcr_checkpoint.sh $job_id $job_pid migr_command /usr/share/gridengine/scripts/blcr_migrate.sh $job_id $job_pid restart_command none clean_command /usr/share/gridengine/scripts/blcr_clean.sh $job_id $job_pid ckpt_dir none signal none when xsmr
Adding to queue
Command:
qconf -mq test.q
ckpt_list BLCR starter_method /usr/share/gridengine/scripts/blcr_submit.sh
Job submission
qsub -ckpt BLCR -r yes job.sh
GPU integration
Queue creaton
Command:
qconf -cq gpu.q
qname gpu.q hostlist gpu1 gpu2 seq_no 0 load_thresholds np_load_avg=1.1,mem_free=2G suspend_thresholds NONE nsuspend 1 suspend_interval 00:05:00 priority 0 min_cpu_interval 00:05:00 processors UNDEFINED qtype BATCH INTERACTIVE ckpt_list NONE pe_list NONE rerun FALSE slots 24 tmpdir /tmp shell /bin/bash prolog NONE epilog NONE shell_start_mode unix_behavior starter_method NONE suspend_method NONE resume_method NONE terminate_method NONE notify 00:00:60 owner_list NONE user_lists NONE xuser_lists NONE subordinate_list NONE complex_values NONE projects NONE xprojects NONE calendar NONE initial_state default s_rt INFINITY h_rt INFINITY s_cpu INFINITY h_cpu INFINITY s_fsize INFINITY h_fsize INFINITY s_data INFINITY h_data INFINITY s_stack INFINITY h_stack INFINITY s_core INFINITY h_core INFINITY s_rss INFINITY h_rss INFINITY s_vmem INFINITY h_vmem INFINITY
Wrapper script for sge_shepherd
#!/bin/bash # # Author: NIIF Institute, http://www.niif.hu/en # Date: 2014-04-28 # # Filename: sge_shepherd_gpu_wrapper.sh # while IFS== read key val do case "$key" in GPU_NUMBER) GPU_NUMBER="$val";; esac done <environment CUDA_VISIBLE_DEVICES="" for gpu_id in `/usr/bin/nvidia-smi -L | cut -d ':' -f 1| awk '{printf "%s ", $2}'` do sleep $[ ( $RANDOM % 10 ) + 1 ]s /usr/bin/nvidia-smi -i $gpu_id | grep "No running compute processes found" > /dev/null if [ $? == 0 ] then if [ ! -z "$CUDA_VISIBLE_DEVICES" ] then CUDA_VISIBLE_DEVICES="$CUDA_VISIBLE_DEVICES,$gpu_id" GPU_NUMBER=`expr $GPU_NUMBER - 1` else CUDA_VISIBLE_DEVICES="$gpu_id" GPU_NUMBER=`expr $GPU_NUMBER - 1` fi fi if [ "$GPU_NUMBER" == "0" ] then break fi done export CUDA_VISIBLE_DEVICES exec /usr/bin/sge_shepherd $@
Wrapper script configuration for gpu1, gpu2 machines:
qconf -mconf gpu1 qconf -mconf gpu2
shepherd_cmd /usr/share/gridengine/scripts/sge_shepherd_gpu_wrapper.sh
Adding consumable resource to the GPU machines
Commands:
qconf -mc
gpu gpu INT <= YES YES 0 0
qconf -me gpu1 qconf -me gpu2
complex_values slots=24,gpu=6
Reason: gpu1 and gpu2 contain 2*24 CPUs and 2*6 GPUs.
JSV script
#!/bin/bash # # Author: NIIF Institute, http://www.niif.hu/en # Date: 2012-12-14 # # Filename: gpu_jsv.sh # jsv_on_start() { return } jsv_on_verify() { if [ "`jsv_get_param q_hard`" == "gpu.q" ]; then has_h_gpu=`jsv_sub_is_param l_hard gpu` if [ "$has_h_gpu" = "true" ]; then gpu=`jsv_sub_get_param l_hard gpu` jsv_add_env GPU_NUMBER $gpu else jsv_sub_add_param l_hard gpu 1 jsv_add_env GPU_NUMBER 1 fi fi jsv_correct "GPU configuration" jsv_accept "Job has been accepted" return } . ${SGE_ROOT}/util/resources/jsv/jsv_include.sh jsv_main
JSV will be run on the submit host, therefore this line should be added to here: $SGE_ROOT/$SGE_CELL/common/sge_request
-jsv /usr/share/gridengine/scripts/gpu_jsv.sh
Job submission
Submit script (job.sh):
#!/bin/bash #$ -N GPU_test_job #$ -q gpu.q #$ -l gpu=3 ./MonteCarloMultiGPU -noprompt
qsub job.sh
It will use only 3 GPUs.
Puppet
Homepage:
Puppet is IT automation software that helps system administrators manage infrastructure throughout its lifecycle, from provisioning and configuration to patch management and compliance. Using Puppet, you can easily automate repetitive tasks, quickly deploy critical applications, and proactively manage change, scaling from 10s of servers to 1000s, on-premise or in the cloud.
Puppet uses a declarative, model-based approach to IT automation:
- Define the desired state of the infrastructure’s configuration using Puppet’s declarative configuration language.
- Simulate configuration changes before enforcing them.
- Enforce the deployed desired state automatically, correcting any configuration drift.
- Report on the differences between actual and desired states and any changes made enforcing the desired state.
Puppet is IT automation software that helps system administrators manage infrastructure throughout its lifecycle, from provisioning and configuration to patch management and compliance. Using Puppet, you can easily automate repetitive tasks, quickly deploy critical applications, and proactively manage change, scaling from 10s of servers to 1000s, on-premise or in the cloud.
Middleware
rCUDA
Remote CUDA (rCUDA) is a middleware that enables Computer Unified Device Architecture (CUDA) to be used in the commodity network. This middleware allows an application to use a CUDA-compatible graphics processing unit (GPU) installed in a remote computer as if it were installed in the computer where the application is being executed. This approach is based on the observation that GPUs in a cluster are not usually fully utilized, and it is intended to reduce the number of GPUs in the cluster, thus lowering the costs related with acquisition and maintenance while keeping performance close to that of the fully equipped configuration.
rCUDA is designed following the client-server distributed architecture. Clients employ a library of wrappers to the high-level CUDA Runtime API, while GPU network service is listening for requests on a TCP port. When an application demands a GPU service, its request is derived to the client side of the architecture. The client forwards the request to one of the servers, which accesses the GPU installed in that computer and executes the request in it. Time-multiplexing (sharing) the GPU is accomplished by spawning a different server process for each remote execution over a new GPU context.
rCUDA includes highly optimized TCP and low-level InfiniBand communications. It can be useful in three different environments:
- Clusters. To reduce the number of GPUs installed in High Performance Clusters. This leads to increase GPUs use and to energy savings, as well as other related savings like acquisition costs, maintenance, space, cooling, etc.
- Academia. In commodity networks, to offer access to a few high performance GPUs concurrently to many students.
- Virtual Machines. To enable the access to the CUDA facilities on the physical machine.
The current version of rCUDA is 4.0. This version implements most of the functions in the CUDA Runtime API version 4.2, excluding only those related with graphics interoperability. rCUDA 4.0 targets the Linux and Windows OSs (for 32- and 64-bit architectures) on both client and server sides. Currently, rCUDA-ready applications have to be programmed using the plain C API. In addition, host and device code need to be compiled separately. A conversion utility CU2rCU has been developed to assist code transformation.
Links:
EMI
The EMI (European Middleware Initiative) project brings a unified set of middleware applications, once provided by ARC, dCache, gLite and UNICORE. Those applications are currently being developed, built and tested in collaboration with the aim to improve interoperability and integration between grids and other computing infrastructures. EMI is developed as a part of european EGI project.
Currently, EMI has three releases:
- EMI 1 (Kebnekaise)
- EMI 2 (Matterhorn)
- EMI 2 (Monte Bianco)
EMI introduces a number of changes and new functionality in response to existing user requirements. The main improvements are defined as follows:
- Security - Replacement of GSI with SSL in the security components, most notably VOMS; REST-based interface for obtaining X.509 attribute certificates in VOMS Admin; initial integration of ARGUS with middleware services (e.g. CREAM)
- Compute - Full support for the CLUSTER service in CREAM; initial support for GLUE 2 in all CEs; integration of ARGUS in CREAM; initial implementation of common MPI methods across the different compute services
- Data - Adoption of pNFS4.1 and WebDAV standards (dCache); preview version of a messaging-based Storage Element-File Catalog synchronization service (SEMsg)
- Information Systems - ARC CE, UNICORE WS, CREAM, dCache, DPM expose a richer set of information via the experimental adoption of the GLUE2 information model standard.
Home page: http://www.eu-emi.eu/middleware
gLite
The gLite middleware distribution is an integrated set of components designed to enable resource sharing and building a grid. The gLite middleware was produced by the EGEE project and it is currently being developed by the EMI project. Current release is gLite 3.2 which in the process of being retired. New features will be included in the products developed under the EMI project.
Home page: http://glite.cern.ch
UNICORE
UNICORE (UNiform Interface to COmputing REsources) is a Grid computing technology that provides seamless, secure, and intuitive access to distributed Grid resources such as supercomputers or cluster systems and information stored in databases. In various European-funded projects UNICORE has evolved to a full-grown and well-tested Grid middleware system over the years. The UNICORE technology is open source under BSD licence and available at SourceForge, where new releases are published on a regular basis.
The architecture of UNICORE consists of three layers, namely user, server, and target system tier. The user tier is represented by various clients. The primary clients are the UNICORE Rich Client, a Graphical User Interface (GUI) based on the Eclipse framework, and the UNICORE commandline client (UCC). The clients use SOAP Web services to communicate with the server tier. XML documents are used to transmit platform and site independent descriptions of computational and data related tasks, resource information, and workflow specifications between client and server. The servers are accessible only via the Secure Socket Layer (SSL) protocol.
The security within UNICORE relies on the usage of permanent X.509 certificates issued by a trusted Certification Authority.
Home page: http://www.unicore.eu
ARC
The Advanced Resource Connector (ARC) middleware integrates computing resources (usually, computing clusters managed by a batch system or standalone workstations) and storage facilities, making them available via a secure common Grid layer. The ARC middleware uses Grid technologies to enable sharing and federation of computing and storage resources distributed across different administrative and application domains. ARC is used to create Grid infrastructures of various scope and complexity, from campus to national Grids. It is distributed under the Apache v2.0 license.
In a nutshell, ARC is:
- A general purpose, Open Source, lightweight, portable middleware solution
- A reliable production-quality and scalable implementation of fundamental Grid services
- Facilitator of cross-organisational distributed computing infrastructure solutions
- Strongly commited to open standards and interoperability
Home page: http://www.nordugrid.org/arc
Open Grid Scheduler integration
You can dowload from here that ARC middleware configuration which are using Open Grid Scheduler. Some parts of the server.xml file need to be modified, for example the share name (queue): test.q
ARC Linux repository: http://download.nordugrid.org/repos-11.05.html
Recommended ARC version: 1.1.1
You need to install these packages:
- nordugrid-arc-plugins-needed
- nordugrid-arc-aris
- nordugrid-arc-debuginfo
- nordugrid-arc-hed
- nordugrid-arc-client
- nordugrid-arc-plugins-globus
- nordugrid-arc-arex
- nordugrid-arc
ARC/A-REX daemon start command:
arched -c server.xml
Univa GriDEngine
Univa Grid Engine is a batch-queuing system and distributed resource management software platform, once known as Sun Grid Engine (SGE). Univa Grid Engine supports the new breed of enterprise and big data applications with features and functionality designed to improve the speed of dispatching and throughput.
Current version is 8.1.4. The major improvements in the new version include:
- Improved Load collection tool for Intel Xeon Phi coprocessors ensures jobs run on the least loaded cores.
- Extended memory usage metrics for Multi-Threaded applications.
- Scheduler performance enhancements ensuring maximum number of jobs running in the cluster while improving system responsiveness.
- Interactive Univa Grid Engine jobs can now set their memory affinity.
Programming environments
CUDA Parallel Nsight tool
NVIDIA Nsight is the development platform for heterogeneous computing that allows efficient development, debugging and profiling of the GPU code. Nsight helps users gain a better understanding of their code - identify and analyze bottlenecks and observe the behavior of all system activities. NVIDIA Nsight is available for Windows, Linux and Mac OS users.
Nsight Development Platform, Visual Studio Edition (formerly NVIDIA Parallel Nsight) brings GPU Computing into Microsoft Visual Studio. It enables users to build, debug, profile and trace heterogeneous compute and graphics applications using CUDA C/C++, OpenCL, DirectCompute, Direct3D, and OpenGL. Current version is 2.2. Nsight Visual Studio Edition 2.2 has been updated with numerous bug fixes over the previous Nsight 2.2 release, build 12160. It is recommended that all grahics developers update to this latest version as the majority of improvements and bug fixes are graphics related.
NVIDIA Nsight Eclipse Edition is a full-featured IDE powered by the Eclipse platform that provides an all-in-one integrated environment to edit, build, debug and profile CUDA-C applications. Nsight Eclipse Edition supports a rich set of commercial and free plugins. It comprises of Nsight Source Code Editor, Nsight Debugger and Nsight Profiler. The latest version of NVIDIA Nsight Eclipse Edition with support for CUDA C/C++ and support for the Kepler Architecture is available with the CUDA 5.0 and is supported on MAC and Linux platforms. It is part of the CUDA Toolkit.
Link: http://www.nvidia.com/object/nsight.html
NVIDIA Parallel Nsight 3.0
New version of Parallel Nsight tool, Visual Studio editio has been released. It has enhanced support for graphics debugging and profiling, compute debugging and profiling.
- Improved support for OpenGL 4.2
- Improved GPU shader debugging and frame capturing
- Support for CUDA 5.0 and Kepler GK110 architecture (dynamic parallelism, concurrent kernel launches, etc.)
- CUDA Dynamic Parallelism analysis (building, debugging, running)
- Improved memory checker
- Profiler support for CPU/GPU workload analysis, active warp traces