ISU Electrical and Computer Engineering Archives

Multi-scale genetic network inference based on time series gene expression profiles

Du, Pan (2005) Multi-scale genetic network inference based on time series gene expression profiles. PhD thesis, Iowa State University.

Full text available as:

PDF - Requires Adobe Acrobat Reader or other PDF viewer.

Abstract

This work integrates multi-scale clustering and short-time correlation to estimate genetic networks with different time resolutions and detail levels. Gene expression data are noisy and large scale. Clustering is widely used to group genes with similar pattern. The cluster centers can be used to infer the genetic networks among these clusters. This work introduces the Multi-scale Fuzzy K-means clustering algorithm to uncover groups of coregulated genes and capture the networks in different levels of detail. Time series expression profiles provide dynamic information for inferring gene regulatory relationships. Large scale network inference, identifying the transient interactions and feedback loops as well as differentiating direct and indirect interactions are among the major challenges of genetic network inference. Time correlation can estimate the time delay and edge direction. Partial correlation and directed-separation theory help differentiate direct and indirect interactions and identify feedback loops. This work introduces the constraint based time-correlation (CBTC) network inference algorithm that combines these methods with time correlation estimation to more fully characterize genetic networks. Gene expression regulation can happen in specific time periods and conditions instead of across the whole expression profile. Short-time correlation can capture transient interactions. The network discovery algorithm was mainly validated using yeast cell cycle data. The algorithm successfully identified the yeast cell cycle development stages, cell cycle and negative feedback loops, and indicated how the networks dynamically changes over time. The inferred networks reflect most interactions previously identified by genome-wide location analysis and match the extant literature. At detailed network level, the inferred networks provide more detailed information about genes (or clusters) and the interactions among them. Interesting genes, clusters and interactions were identified, which match the literature and the gene ontology information and provide hypotheses for further studies.

EPrint Type:Thesis (PhD)
Uncontrolled Keywords:network inference, systems biology, causal inference, clustering, microarray data analysis, fuzzy clustering, d-separation, partial correlation, short-time correlation
Subjects:Electrical Engineering > COMMUNICATION & SIGNAL PROCESSING > Bioinformatics
ID Code:208
Identification Number:TR-2005-11-7
Deposited By:Pan Du
Deposited On:01 December 2005

Archive Staff Only: edit this record