Abstract

Efficient Computation of Long Similar Subsequences

by: Abdullah N. Arslan and Omer Egecioglu

Abstract:

Given sequences X of length n and Y of length m with n >= m, let LAt* and NLAt* denote the maximum ordinary, and maximum length normalized scores of local alignments with length at least a given threshold value t. The alignment length is defined as the sum of the lengths of the involved subsequences, and length normalized score of an alignmentis the quotient of the ordinary score by the alignment length. We develop an algorithm which finds an alignment with ordinary score >= LAt*, and length >= (1-1/r)t for a given r, in time O(rnm) and space O(rm).The algorithm can be used to find an alignment with length normalized score > lambda for a given positive lambda with the same time and space complexityand within the same approximation bounds. Thus this algorithm provides a length-approximate answer to a query such as \"Do X and Y share a (sufficiently long) fragment with more than 70% of similarity?\" We also show that our approach gives improved approximation algorithms for the normalized local alignment problem. In this case we can efficiently find an alignment with length >=(1-1/r)t which has a length nomalized score >= NLAt*.

Keywords:

Local alignment, normalized local alignment, approximation algorithm, dynamic programming, ratio maximization.

Date:

March 2002

Document: 2002-09

Department of Computer Science

University of California, Santa Barbara

Abstract

Efficient Computation of Long Similar Subsequences

by: Abdullah N. Arslan and Omer Egecioglu