Data Science with Generalized Tensor Decompositions

Researcher(s)

Jesus Lorenzo Ramirez Perales, Electrical Engineering, Delaware State University

Faculty Mentor(s)

David Hong, Department of Electrical and Computer Engineering, University of Delaware

Abstract

Across science and engineering, the collection of vast amounts of data holds great promise to reveal patterns and produce new insights that can lead to breakthroughs. The question is: how can we discover patterns in these enormous datasets? Tensor Decomposition is a powerful unsupervised technique for accomplishing this task. It finds latent patterns in data by decomposing the data into simple rank-one components, each of which captures a single latent phenomenon. Our project explored how the Generalized Canonical Polyadic (GCP) Tensor Decomposition can be used for data science. We considered two datasets. The first is an Excitation and Emission Matrix (EEM) dataset coming from fluorescence spectroscopy. This dataset consists of EEM measurements for 18 samples, each of which contains a mixture of three compounds (containing fluorescent amino acids) with varying relative concentrations. Using only the fluorescence measurements, GCP discovered both the compound signatures and their relative concentrations. The second dataset was obtained from the Food and Agriculture Organization of the United Nations and consists of the amount of crops produced and their use (food or feed) in 174 countries over the span of 52 years. GCP revealed the shared patterns of crop usage between countries across regions of the world over the years. Overall, we found that tensor decomposition can be a powerful tool for discovering fundamental patterns in large datasets.