Workflow based development environments

From HP-SEE Wiki

Revision as of 23:11, 24 April 2012 by Kozlovszky (Talk | contribs)
Jump to: navigation, search

Contents

WS-PGRADE/gUSE

Section contributed by SZTAKI (built up from previous publications)

The grid User Support Environment (gUSE) provides a collaborative, community-oriented application development environment, where developers and end-users can share sophisticated (layered and parameter sweep enabled) workflows, workflow graphs, workflow templates and ready-to-run workflow applications. The workflows are able to use a large set of virtualized, high-level Distributed Computing Infrastructures (DCIs) services. This environment is capable to provide interoperation among classical service and desktop grids, clouds and clusters, unique web services in a scalable way. Internally gUSE is implemented as a set of Web services that bind together in flexible ways the different components. The internal architecture contains a workflow enactor (called Zen), which provides seamless enactment of DAG based workflows and supports also extra features such as embedded workflows, timed workflows, web service type inputs and database access.

WS-PGRADE [*1] is an easily usable, highly flexible, co-operative, graphical user interface of gUSE. WS-PGRADE uses the client APIs of gUSE services to turn user requests into sequences of gUSE specific Web service calls. WS-PGRADE hides the communication protocols and sequences behind JSR168 compliant portlets. WS-PGRADE is integrated into Liferay [*2] and end users can access WS-PGRADE via HTTP and HTTPS with web browsers. A graph editor component is a JAVA Webstart based application and can be downloaded from WS-PGRADE via the browser to the user machine. The editor can be used to define the static skeleton of workflows, while the HTML pages of WS-PGRADE provide interfaces to add content to graphs, to generate complete Grid/Web service applications. Nowadays gUSE/WS-PGRADE Portals [*3] are operating and serving numerous user communities and international projects, providing access to multi-institutional grids and grid based virtual organizations as generic DCS Portals or eScience Gateways within Europe. gUSE and WS-PGRADE become open source in 2011 and the code is available for research communities under GPL license at Sourceforge.

Concept of WS-PGRADE and gUSE

WS-PGRADE portal based on the gUSE (grid User Support Environment) service set is the second generation P-GRADE portal that introduces many advanced features both at the workflow and architecture level compared to the first generation P-GRADE portal.

Guse arch.png

Fig 1: Architecture of WS-PGRADE/gUSE portal framework

The A type of user is the workflow developer who develops workflows for the end-user scientists. This user understands the usage of the underlying DCI and is able to develop complex workflows. This activity requires to edit, configure and run workflows in the underlying DCI as well as to monitor and test their execution in the DCIs. In order to support the work of these users WS-PGRADE provides a Workflow Developer UI through which all the required activities of developing workflows are supported. When a workflow is developed for the scientists, it should be uploaded to a repository from where the scientists can download and execute it. In order to support this interaction between workflow developers and end-users the gUSE service set provides an Application (Workflow) Repository and the WS-PGRADE Workflow Developer UI enables workflow developers to upload and publish their workflows for end-users via this repository.

The B type of user is an end-user scientist (e.g. a biologist, chemist, etc.) who is not aware of the features of the underlying DCI nor of the structure of the workflow that realizes the type of application she has to run in the DCI(s). For these users WS-PGRADE provides a simplified End-User UI where the reachable functionalities are very limited. Typically, end-user scientists can download workflows from the Application Repository, parameterize the workflows and execute them on the DCI(s). They can also monitor the progress of the running workflows. A user can login to the portal either as workflow developer or end-user and according to this login he/she can see either the developer view or the end-user view of WS-PGRADE portal.

In many cases even this simplified view is too complex for the scientists or sometimes they need some special visualization tool or other application specific portlets to make the usage of the portal more customized for their work. Therefore, there is a need to customize the portal according to these application specific requirements. In order to support the development of such application specific UI gUSE provides the ASM API by which such customization can easily and quickly be done. Once, this has been happened the C type scientists who require such customization can run their workflow applications on various DCIs via the Application Specific UI. Notice, that in this case the WS-PGRADE UI is exchanged with the customized UI and this new UI can directly access the gUSE services via the ASM API. It can also happen that a certain user community has got already a favorite UI and they insist on using this UI although they would like to access as many DCIs as possible through their UI. In this case they can access all the DCIs that are supported by the WS-PGRADE/gUSE system via the DCI Bridge service that was developed for the WS-PGRADE/gUSE system but can also be used as an independent service via the standard OGSA BES job submission interface. Therefore the D type of users who have their existing application specific interface can take benefit of using the services that belong to the WS-PGRADE/gUSE system.

Finally, there are users who prefer to access the gUSE services via a direct API without any user interface. For this type of users (denoted with E in Figure 1) we provide the gUSE Remote API. Further lessons learnt from the usage of P-GRADE showed that the simple DAG-based workflow concept of P-GRADE is not enough for many applications. Therefore, the WS-PGRADE/gUSE system extends the DAG-based workflow concept with advanced parameter study features through special workflow entities, (generator and collector jobs, parametric files), condition-dependent workflow execution and workflow embedding support. The new portal extends the concrete workflow concept of PGRADE with new concepts and objects like graph, abstract workflow, workflow instance, template, application and project. WS-PGRADE/gUSE is able to handle simultaneously very large number of jobs even in the range of millions without compromising the response time at the user interface.

The structure of WS-PGRADE workflows are represented by directed, acyclic graphs. An example of a workflow graph is shown in Figure 2. Big boxes represent job nodes of the workflow, whereas smaller boxes attached to the bigger boxes represent input and output file connectors (ports) of the given node. Directed edges of the graph represent data dependency (and corresponding file transfer) among the workflow nodes.

Workflow example.png Fig 2: Example workflow graph

The execution of a workflow instance is data driven forced by the graph structure: A node will be activated (the associated job submitted) when the required input data elements (usually file, or set of files) become available at each input port of the node. This node execution is represented as the instance of the created job. One node can be activated with several input sets (for example, in the case of a parameter sweep node) and each activation results in a new job instance. The job instances contain also status information and in case of successful termination the results of the calculation are represented in form of data entities associated to the output ports of the corresponding node.

Workflow features and elements

  • Parameter sweep applications are supported through the following special entities: generator nodes, collector nodes, parametric inputs, Condition-dependant node execution.
    • Generators - A node has Generator property if the executed code may produce more than one output data elements associated to given distinguished output port(s) during the lifetime of a single job instance. This kind of distinguished output port is called generator output port, and must be marked as such during the configuration of the node.
    • Collectors - A node has Collector property if it has at least one distinguished Collector port. Collector nodes are typically used to collect several files and then process them as a single input. Therefore Collector ports force delayed job execution until the last file of the input file set to be collected has arrived on the Collector port. The workflow engine computes the expected number of input files on a Collector port at run-time. When all the expected inputs arrived to the Collector port, the node becomes executable and a single job instance will be created and started to process all the incoming inputs files as a single input set.
    • Parametric input ports - A node may have a free parametric input port. Free input port means a port that is not associated to any output port of the workflow graph. Such ports must be associated to existing input data, i.e., data that is not produced by the workflow but was available before the workflow execution has been started. If the input data is not a single element but a vector of elements, then the associated port must be distinguished as Parametric input port. Parametric input ports force a sequence of job submissions (job instance creations) of the associated node in the same way as if the node was connected to the Generator output port of a Generator node.
    • Condition-dependant node execution - The user has the ability to make execution of a workflow node dependent on the content of any of the input files. The possible operations for testing are equality, inequality and containing.
  • Embedded workflow support
    • Another powerful feature of WS-PGRADE workflows is the possibility to embed workflows into workflow nodes. Thus, instead of running for example an executable within a workflow node, another WS-PGRADE workflow may run inside the parent workflow node.

Access methods

Application Specific Module (ASM)

The main advantage of ASM is to hide the complexity of the inner abstraction levels, and inner callings of different core services of gUSE. Without this component, one or more difficult web-service callings should be constructed each time when a customized portlet should get or pass information from/to the portal. In order to avoid this complexity ASM covers all of these internal information accesses by a simple call of a well-parameterized function and moreover, it requires only the really necessary parameters such as the name of the user or the id of the workflow. In usual cases these information can be retrieved easily by the portal container or by ASM.

Guse access.png

Fig. 3: User scenario of ASM-based portlet

Guse asm.png

Fig 4.:Functions of ASM

As it is shown in Figure 4, functionalities provided by ASM can be separated into three different subsets: methods covering application management issues, methods that can be used for Input/Output manipulation and methods to handle user activities during execution such as aborting or rescuing applications. Within the set of methods for application management there are several methods available for getting information from workflows stored in the local repository: For instance getting list of application developers, getting list of applications according to a specified developer id, importing an application to local user space and getting a list of applications that have already been imported. The set of Input/ Output Manipulation covers various methods to handle different input cases such as uploading a file to a specified port, setting a file that is currently exists on the portal server, or setting command-line parameters for a job. Some methods of this set contain possibilities to fetch the outputs of the calculations. The set of Execution methods contains methods not only for workflow submission, but for many other activities like methods for getting workflow execution status in simple or in detailed format, for aborting or for rescuing a workflow.

As portlets in general are deployed in portlet containers that supervise the most common user activities and manage user sessions, ASM does not have to provide any security feature with the exception of issues relating to the underlying infrastructures and complex systems. To provide these, ASM uses inner-level solutions of gUSE requiring proxy certificates. These certificates will be created and used with the help of dedicated Certificate portlets by the end-users who - similar to the members of other user groups - have account to the portal.

The Remote Access API of gUSE

Remote API - gUSE provides a remote access API and the web-service-based implementation, which enables to build up a connector service to the gUSE, and can handle job submission, job monitoring and result handling remotely. DCIs requiring user level authentication, so the Remote Access API needs to support transparent certificate handling (e.g.: X509 type certificates). The gUSE Remote Access API support https/http as communication channels. Every communication is initiated by the client (the client posts data to the server). A servlet at the server side processes the incoming requests. Due to the used communication mechanism, command line solutions (curl based access wrapped in shell scripts), or a wide range of programming and scripting languages (JAVA, C, perl) can be used to realize a generic connector API at the client side. Main functions of the gUSE Remote Access API is shown in Table 1.


Functions Parameters Return value Descriptions
submit
  • wfdesc: standard gUSE workflow description xml file (workflow.xml)
  • inputzip: zip file which contains the input files
  • portmapping: text file, which contains key – value pairs separated with new line, in the following format:
 if the file is an input file:
 inputfilename=WFname/JOBname/PORTnumber
 if the file is an executable:
 exename= Wfname/JOBname
  • certs.zip: zip file which contains files to authenticate to a specified grid.
  • pass: password for simpley authentication
String:ID, which identifies the workflow. Submits a workflow from the client into gUSE
info
  • ID – the workflow runtime ID from the return value of the submit method.
  • pass: password for simple authentication

Workflow status:

  • submitted
  • running
  • finished
  • error
  • suspended
  • invalid
Access status information about the workflow
detailsinfo
  • ID – the workflow runtime ID from the return value of the submit method.
  • pass: password for simple authentication
Workflow + JOB status Access status information about the workflow’s jobs.
stop
  • ID – the workflow runtime ID from the return value of the submit method.
  • pass: password for simple authentication
  • TRUE: if the abort was successful
  • FALSE: if an error occurred
Abort the workflow, and delete it.
download
  • ID – the workflow runtime ID from the return value of the submit method.
  • pass: password for simple authentication
a zip file: containing the output and log files Download the produced results of the workflow

Table 1.: Main functions of the gUSE Remote Access API


References

[*1] Kacsuk, P. Karoczkai, K. Hermann, G. Sipos, G. Kovacs, J.; WS-PGRADE: Supporting parameter sweep applications in workflows. In: Proc. of 3rd Workshop on Workflows in Support of Large-Scale Science, In conjunction with SC 2008, Austin, TX, USA, 17 Nov. 2008, pp.1 – 10, ISBN: 978-1-4244-2827-4 [*2] Liferay - http://www.liferay.com [acc. 09.08.2011]. [*3] gUSE LPDS - http://www.guse.hu [acc. 09.08.2011].