hyPACK-2013 HPC GPU Cluster - Application Kernels
Bio-Informatics - String Search algorithms
Computational Biology : Computational Biology applications such as
bioinformatics, and molecular modelling high performance is critical.
Biological sequence comparison is a very important operation
in Bioinformatics.
There are several methods to compare biological sequences, and some
of these have quadratic time and space complexity.
To-day, a huge amount of new
DNA sequences will need to be compared, in order to infer functional/structural characteristics.
Time spent in each comparison, as well as the accuracy of the result obtained, will be
challenging in genome projects.
Sequence comparison is, a very basic and important
operation in Bioinformatics. As a result of this step,
one or more sequence alignments can be produced.
A sequence alignment has a similarity score associated to it
that is obtained by placing one sequence above the other,
making clear the correspondence between the characters
and possibly introducing gaps into them. The most
common types of sequence alignment are global and local.
To solve a global alignment problem is to find the best
match between the entire sequences.
On the other hand,
local alignment algorithms must find the best match between
parts of the sequences.
Smith-Waterman (SW) is an exact algorithm based
on the longest common subsequence (LCS) concept that
uses dynamic programming to find local alignments between
two sequences of size m and n in
O(mn) space
and time. In order to accelerate these methods, many
GPU algorithms were proposed in the litreature.
Smith Watermann (SW) algorithm : Parallelisation on HPC GPU Cluster
String Search algorithms
Pattern Matching or String Matching is very important subject which has application on
broad domain of text processing. The core part of the algorithm are also used in many
implementation of practical software existing under many application domain like "Speech
Processing", "DNA analysis" etc. Moreover, they emphasize programming methods that serve as
paradigms in other fields of computer science (system or software design). Theoretically pattern
matching pose as a challenging problem under theoretical computer science.
Sequential pattern matching algorithms refers to the class of algorithm where a single
stream of execution do pattern matching matching activity at a time.
Brute force Pattern Matching algorithm is considered to
be most simple to understand and implement. Some time this algorithm is called as naive approach
of pattern matching.
characters of P and T left to right until either two unequal characters are found or until P is
exhausted, in which case an the occurrence of P is reported in side T on this particular location. On
the other hand if match is not found then P is shifted one place right, and the same loop of
comparison is continued. This process continue up to end of sequence T.
Boyer Moore algorithm is considered to be the most efficient pattern matching algorithm.
The algorithm scan the characters of the pattern from right to left beginning with the right-most
one.
In case of mismatch it used two pre-compiled functions to shift the window to the right.
|