• Identification of Interesting Gene Behaviours for AD at TU

    MLRBDA (Machine Learning Research and Big Data Analytics) is an MHRD supported CoE (Centre of Excellence) at TU, which has made a significant contribution in the recent days in the healthcare sector. A group of researchers led by Prof D K Bhattacharyya, Chief Investigator of MLRBDA with Ms Tulika Kakati, PhD student of Dept of CSE at TU and Mr H J Kashyap, PhD student at University of California, Irvine has been able to develop a novel gene co-expression network module extraction method, referred to as THD-Module Extractor, to identify several interesting genes semantically associated with the Alzheimer’s Disease (AD), one of the deadliest diseases. AD is a neurodegenerative disease which progresses slowly and affects most areas of the human brain. It is a common cause of dementia and is mostly observed in elderly persons. In AD patients, a common observation is their declining abilities in thinking, judging, moving, speaking and memory. The disease starts with the development of misfolded proteins in brain tissues, known as senile plagues and neurofibrillary tangles. These senile plagues or amyloid plagues are clumps of ?-amyloid proteins which block communication between neuron cells and implicate death of neurons. The formation of ?-amyloid clumps at early onset of Alzheimer’s disease is initiated by mutations in proteins, known as Amyloid Beta (A4) Precursor Protein (APP), Presenilin 1, i.e. PSEN1, and Presenilin 2, i.e., PSEN2 (nlm.nih.gov). On the other hand, neurofibrillary tangles are formed due to mutations in a protein called Tau, which is responsible for ensuring clear passage for food molecules inside microtubule. Mutation in Tau causes disintegration of microtubule and formation of tangles which obstruct food passage to the cell neurons, resulting in death of neurons. Involvement of genes or proteins in molecular pathways, such as Wnt signaling, p53 signaling, Alzheimer disease-amyloid secretase, Apoptosis signaling, and Glycolysis, related to AD, show a close relation with pathogenesis of AD. The group investigates healthy as well as disease gene expression data for several deadly diseases including Alzheimer for identification of co-expressed as well as differently expressed genes across the stages of disease progression. To identify such genes the group uses both gene-gene expression similarity (based on their own measure called SSSim) and gene-gene semantic similarity (based on the well-known Lin’s measure) in an unsupervised (without using prior knowledge) framework towards (i) construction of a gene co-expression network (CEN), (ii) extraction of components or modules from the CEN with high functional coherence or similarity, and (iii) validation of the modules with reference to the globally acceptable benchmark results. A CEN is an undirected graph, where an edge between a pair of genes may represent information such as co-expression similarity, semantic similarity, physical interaction, genetic interaction, shared protein domains, co-localization, pathway, or predicted functions. Once the network components or modules of genes are successfully extracted, the group validates rigorously their biological relevance using external knowledge sources. At the later stage, based on whether genes are associated or differently expressed, a comprehensive solution towards identification of the biomarkers of a given disease, such as AD is provided. The group also uses topoGSA (a web-based tool) to study topological structures, i.e., connectivity or association of a gene with other member genes in a module in terms of topological parameters such as degree, node betweenness and shortest path length, and discovers genes that exhibit interesting behaviours against KEGG database with high KEGG enrichment scores. From each biologically enriched module extracted from the CEN of the AD dataset, the group has identified the genes that have low gene-gene expression similarity with the core or central genes of each module but high gene-gene semantic similarity with one another. Such genes are termed border genes. These border genes are then analyzed to find the differentially expressed genes across normal samples and disease samples. The group considers the top 2000 differentially expressed genes across normal and disease samples. The semantic correlations were determined among these differentially expressed genes using mgeneSim of GOSemSim package in R using Lin’s semantic correlation measure and Biological Process (BP) structure. From this analysis, 912 genes were found interesting, with high semantic correlation. These 912 interesting genes were analyzed using PANTHER and validated using GeneCards. Some such interesting border genes are: APBB2, CASP2, CSNK1D, CDK5, HSD17B10, MAPT, PSEN2, and RCAN1. The pathways associated with these border genes have also been identified and a few of these pathways are: Wnt signaling pathway, Alzheimer disease-amyloid secretase pathway, Apoptosis signaling pathway, and Glycolysis, which are related to AD. Further, these 912 interesting genes were analyzed to find GO terms related to AD. In addition, a Web-based tool called TopoGSA was used to study the topological structures. TopoGSA compares interesting genes against the KEGG database and the interesting border genes extracted from the modules have been found significantly enriched with KEGG pathways. The KEGG pathways such as Wnt signaling, Apoptosis, p53 signaling pathway, Notch signaling pathway, and Alzheimer’s disease signaling, associated with the interesting border genes and validated using GeneCards and existing literature, have been found related to AD. To perform the large computationally intensive correlation analysis task, this work leverages the parallel computing capabilities of the Graphics Processing Units (GPU) to find the SSSim correlation matrix, implemented using the NVIDIA CUDA library. This work has been published in Scientific Reports (6:38046) of Nature Publishing Group and further detail are available at DOI:10.1038/srep38046 (2016). The work has been widely appreciated by several Alzheimer’s research groups including Oxford ARUK (Alzheimer Research UK) of University of Oxford. The work also has been shared in social networks by AURK. The group is working in association with a professor from University of Colorado, USA on three other deadly diseases, viz., Parkinson, HIV1, and Breast Cancer and hopeful to be able to identify several interesting genes and their behaviour in the context of these diseases also.