Dissertação

{en_GB=Latent factors of Mult-Omics data and Clustering} {} EVALUATED

{pt=Perante a grande quantidade de dados disponível, existe o potencial dos algoritmos de aprendizagem automática na analise desses dados. É importante identificar as informações ocultas, e necessidade de encontrar as características semelhantes dos pacientes. Neste contexto, primeiro pretende-se comparar duas ferramentas existentes utilizando dados sintéticos criados, MOFA e iCluster para testar o desempenho das ferramentas. Depois,aplicar MOFA, a melhor ferramenta, aos dados de cancro ovário provenientes de TCGA. Com os resultados, MOFA mostrou a capacidade de recuperar vários características de dados nas amostras sintéticas e genes significantes nos dados reais. , en=Abstract—Cancer is a very complex disease; often, its types cannot easily be classified simply by its location or manifested characteristics. In these circumstances, it is critical to find the group of patients who have similar underlying biological information and apply similar treatment for the person that belongs to the same group. The Cancer Genome Atlas (TCGA) database which focuses on cancer diseases with various types of omic data, such as mutations, RNA expressions and DNA methylation. Often, this data have more variables than samples which faces the problem of curse of dimensionality. With this multi-omic data, this work aims to discover unknown factors common to the three data types using factor analysis tools such as iCluster and MOFA; this dimension reduction process can select more relevant information for further analysis. This thesis proposes a methodology that compares MOFA and iCluster by finding the underlying latent factors and perform a clustering of patients who share biological similarities within the group. After testing both methods, first on the synthetic data and comparing their abilities to recover the underlying factors and clusters, we decide to apply MOFA method for the Ovarian Carcinoma (OV) data extracted from TCGA, to find latent factors and the relevant clustering results.}
{pt=TCGA, Factor Analysis, Principal Component Analysis, MOFA, iCluster, K-means, en=TCGA, Factor Analysis, Principal Component Analysis, MOFA, iCluster, K-means}

dezembro 18, 2019, 10:0

Publicação

Obra sujeita a Direitos de Autor

Orientação

ORIENTADOR

Susana de Almeida Mendes Vinga Martins

Departamento de Bioengenharia (DBE)

Professor Associado

ORIENTADOR

Alexandra Sofia Martins de Carvalho

Departamento de Engenharia Electrotécnica e de Computadores (DEEC)

Professor Auxiliar