Open-Source Software for Generalized Tensor Decomposition

Researcher(s)

  • Gianna Baker, Mathematics, Washington & Jefferson College

Faculty Mentor(s)

  • David Hong, Electrical and Computer Engineering, University of Delaware

Abstract

Generalized tensor decomposition is a method for extracting simple patterns from complicated and large data.  It does so by decomposing a multi-dimensional array of data (i.e., a tensor) into a sum of simple rank-one tensors that each capture a single latent phenomenon/pattern.  Computing generalized tensor decompositions involves techniques and ideas from applied mathematics, statistics, and electrical engineering, and its applications span the vast field of data science.  Our project focused on contributing to an open-source software package for performing Generalized Canonical Polyadic (GCP) Decomposition, an extension of traditional CP tensor decomposition that offers broader applicability and user flexibility.  Our first contribution is in creating software demos to showcase the technique.  We considered datasets from significantly different domains: neural experiments and historical NBA game statistics.  In both cases, GCP uncovered intriguing patterns, identifying condition-specific neural behavior in the neuroscience data and team dynamics in the sports analytics data.  Our second contribution is in developing additional toolsets that help streamline working with tensor decompositions.  These include a function that normalizes the factors in a tensor decomposition, by redistributing their scalings into the decomposition weights, and a function for visualizing the tensor decompositions, which is a crucial part of analyzing and interpreting the results.  Finally, we contributed to the implementation of stochastic algorithms for GCP that will enable the package to handle larger datasets with less computational cost.  In particular, we have worked on implementing stochastic gradients that can be computed much faster than the full gradients currently used in the package. Overall, our contributions aim to help users see the power and versatility of GCP in data science, improve their workflow, and reduce the computational costs, making the benefits of tensor decomposition ever more accessible to a broader audience.