C-DAC,Pune : High-Perf. Comp. Frontier Technologies Exploration Group and CMSD, University of Hyderabad, Technology Workshop hyPACK (October 15-18), 2013

hyPACK-2013 HPC GPU Cluster - Application Kernels

Bio-Informatics - String Search algorithms

Computational Biology : Computational Biology applications such as bioinformatics, and molecular modelling high performance is critical. Biological sequence comparison is a very important operation in Bioinformatics. There are several methods to compare biological sequences, and some of these have quadratic time and space complexity. To-day, a huge amount of new DNA sequences will need to be compared, in order to infer functional/structural characteristics. Time spent in each comparison, as well as the accuracy of the result obtained, will be challenging in genome projects. Sequence comparison is, a very basic and important operation in Bioinformatics. As a result of this step, one or more sequence alignments can be produced.

A sequence alignment has a similarity score associated to it that is obtained by placing one sequence above the other, making clear the correspondence between the characters and possibly introducing gaps into them. The most common types of sequence alignment are global and local. To solve a global alignment problem is to find the best match between the entire sequences. On the other hand, local alignment algorithms must find the best match between parts of the sequences. Smith-Waterman (SW) is an exact algorithm based on the longest common subsequence (LCS) concept that uses dynamic programming to find local alignments between two sequences of size m and n in O(mn) space and time. In order to accelerate these methods, many GPU algorithms were proposed in the litreature.

Smith Watermann (SW) algorithm : Parallelisation on HPC GPU Cluster

The algorithm Smith-Waterman (SW) is an exact method based on dynamic programming to obtain the best local alignment between two sequences in quadratic time and space. SW algorithm was modified by [3] in order to calculate affine gap penalties. It is divided in two phases: calculate the dynamic programming matrices and obtain the best local alignment. In the SW algorithm most of the time is spent calculating the similarity matrix and this is the part which is usually parallelized. The GPU implementation of SW is focussed on matrix calculation.

String Search algorithms

Pattern Matching or String Matching is very important subject which has application on broad domain of text processing. The core part of the algorithm are also used in many implementation of practical software existing under many application domain like "Speech Processing", "DNA analysis" etc. Moreover, they emphasize programming methods that serve as paradigms in other fields of computer science (system or software design). Theoretically pattern matching pose as a challenging problem under theoretical computer science. Sequential pattern matching algorithms refers to the class of algorithm where a single stream of execution do pattern matching matching activity at a time. Brute force Pattern Matching algorithm is considered to be most simple to understand and implement. Some time this algorithm is called as naive approach of pattern matching. characters of P and T left to right until either two unequal characters are found or until P is exhausted, in which case an the occurrence of P is reported in side T on this particular location. On the other hand if match is not found then P is shifted one place right, and the same loop of comparison is continued. This process continue up to end of sequence T. Boyer Moore algorithm is considered to be the most efficient pattern matching algorithm. The algorithm scan the characters of the pattern from right to left beginning with the right-most one. In case of mismatch it used two pre-compiled functions to shift the window to the right.