ISU Electrical and Computer Engineering Archives

Algorithms for hierarchical clustering of gene expression data

Komarina, Srikanth (2004) Algorithms for hierarchical clustering of gene expression data. Masters thesis, Iowa State University.

This is the latest version of this eprint.

Full text available as:

PDF - Requires Adobe Acrobat Reader or other PDF viewer.

Abstract

Genes are parts of the genome which encode for proteins in an organism. Proteins play an important part in many biological processes in any organism. Measuring expression level of a gene helps biologists estimate the amount of protein produced by that gene. Mircoarrays can be used to measure the expression levels of thousands of genes in a single experiment. Using additional techniques such as clustering, various correlations among genes of interest can be found. The most commonly used clustering technique for microarray data analysis is hierarchical clustering. Various metrics such as euclidean, manhattan, pearson correlation coefficient have been used to measure(dis)similarity between genes. A commonly used software for hierarchical clustering based on pearson correlation coefficient takes O(N^3) for clustering N genes, even though there are algorithms which can reduce the runtime to O(N^2). In this thesis, we show how the runtime can be reduced to O(N log N) by using a geometric interpretation of the pearson correlation coefficient and show that it is optimal.

EPrint Type:Thesis (Masters)
Uncontrolled Keywords:microaray, clustering, hierarchical algorithms, optimal
Subjects:Computer Engineering > SOFTWARE SYSTEMS > Parallel and Distributed Computing
Computer Engineering > SOFTWARE SYSTEMS > Computational Biology and Computational Science
ID Code:124
Identification Number:TR-2004-11-3
Deposited By:Mr. Srikanth Komarina
Deposited On:09 December 2004

Available Versions of this Item

Archive Staff Only: edit this record