DeepAligner

From HP-SEE Wiki

(Difference between revisions)
Jump to: navigation, search
(Technical Features and HP-SEE Implementation)
(Publications)
Line 44: Line 44:
== Publications ==
== Publications ==
-
* ...
+
* M. Kozlovszky, G. Windisch, Á. Balaskó;Short fragment sequence alignment on the HP-SEE infrastructure;MIPRO 2012, accepted
-
* ...
+
* M. Kozlovszky, G. Windisch; Supported bioinformatics applications of the HP-SEE project’s infrastructure; Networkshop 2012, accepted

Revision as of 16:11, 20 February 2012

Contents

General Information

  • Application's name: DeepAligner
  • Virtual Research Community: Life Sciences
  • Scientific contact: Kozlovszky Miklos, Windisch Gergely; m.kozlovszky at sztaki.hu
  • Technical contact: Kozlovszky Miklos, Windisch Gergely; m.kozlovszky at sztaki.hu
  • Developers: Obuda University – John von Neumann Faculty of Informatics
  • Web site: https://biotech.nik.bmf.hu/web/

Short Description

Mapping short fragment reads to open-access eukaryotic genomes is solvable by BLAST and BWA and other sequence alignment tools - BLAST is one of the most frequently used tool in bioinformatics and BWA is a relative new fast light-weighted tool that aligns short sequences. Local installations of these algorithms are typically not able to handle such problem size therefore the procedure runs slowly, while web based implementations cannot accept high number of queries. SEE-HPC infrastructure allows accessing massively parallel architectures and the sequence alignment code is distributed free for academia. Due to the response time and service reliability requirements grid can not be an option for the DeepAligner application.

Problems Solved

The recently used deep sequencing techniques present a new data processing challenge: mapping short fragment reads to open-access eukaryotic (animal: focusing on mouse and rat) genomes at the scale of several hundred thousands.

Scientific and Social Impact

The aim of the task is threefold, the first task is to port the BLAST/BWA algorithms to the massively parallel HP-SEE infrastructure create a BLAST/BWA service, which is capable to serve the short fragment sequence alignment demand of the regional bioinformatics communities, to do sequence analysis with high throughput short fragment sequence alignments against the eukaryotic genomes to search for regulatory mechanisms controlled by short fragments.

Collaborations and Beneficiaries

Serve the short fragment sequence alignment demand of the regional bioinformatics communities. People who are interested in using short fragment alignments will greatly benefit from the availability of this service. The service will be freely available to the LS community. We estimate that a number of 5-15 scientific groups world wide will use our service. Ongoing collaborations so far: Hungarian Bioinformatics Association, Semmelweis University Planned collaboration with the MoSGrid consortium (D-GRID based project, Germany)

Technical Features and HP-SEE Implementation

  • Primary programming language: C, perl
  • Parallel programming paradigm: Master-slave, MPI, + Multiple serial jobs (data-splitting, parametric studies)
  • Main parallel code: WS-PGRADE/gUSE + C/C++
  • Pre/post processing code: Perl/BioPerl (in-house development)
  • Application tools and libraries: Perl/BioPerl (in-house development)
  • Number of cores required: 128-256
  • Minimum RAM/core required: 4-8 Gb'
  • Storage space during a single run: 2-5 GB
  • Long-term data storage: 1-2 TB

Usage Example

Tobefilledin, text and (maybe) images.

Publications

  • M. Kozlovszky, G. Windisch, Á. Balaskó;Short fragment sequence alignment on the HP-SEE infrastructure;MIPRO 2012, accepted
  • M. Kozlovszky, G. Windisch; Supported bioinformatics applications of the HP-SEE project’s infrastructure; Networkshop 2012, accepted
Personal tools