Latent variable models and dimensionality reduction methods for complex data

Latent variable models and dimensionality reduction methods for complex data

Local Research Unit: University of Padua

Scientific Responsible for the Department: Alessandra R. Brazzale, Head of Local Unit

Funder:  Ministry of University and Research under the Call for Proposals related to the scrolling of the final rankings of the PRIN 2022 call

General objectives:

We are experiencing rapid and significant changes across all empirical fields, driven by the exponential growth of data generated through new collection technologies. The sheer volume and complexity of this data—recorded at different times, influenced by multiple factors, and often highly heterogeneous—render traditional models and methods insufficient. To address these challenges, dimensionality reduction techniques are frequently used to extract key insights from high-dimensional data.
This project focuses on developing innovative Latent Variable Models (LVMs) and Dimensionality Reduction Methods (DRMs) to analyze complex data, with two closely related objectives:

  • methodologically, leveraging the rich information embedded in complex and unconventional data structures (e.g., high-dimensional, multi-way, relational, multilevel, functional, and mixed-type data).
  • empirically, applying these methods to data from the fields of education and health.

To enhance accessibility, a GitHub repository will be created, providing free software tools and user manuals for those interested in utilizing these methodologies.

Research objectives of the Local Unit:

The Unit is characterized by a specific expertise in LVMs (from multilevel modelling to
clustering), from both the methodological and the applied viewpoint.

At the methodological level, the Unit will tackle:
1. model specification and model selection for clustering time-dependent observations in the presence of multivariate unobserved heterogeneity.
2. new developments in statistical inference for dimension reduction and feature selection in LVMs and multilevel models.
3. propensity score matching with multilevel and missing data.

The new methods will be applied to problems in education (administrative data on university student ratings, academic achievements and dropout) and health (data from the SHARE-ERIC infrastructure on health and cognitive functioning in the elderly).

Expected results:

  • Advances in LVMs (model specification and variable selection; modelling of spatial, temporal, spatio-temporal and multilevel data)
  • Advances in DRMs (for multilevel, mixed-type and network data)

Progress and accomplishments:
Ongoing activities and completed results will be regularly shared on the official GitHub repository (link soon available).

Members of the local research unit:

  • Prof. Alessandra Rosalba Brazzale
  • Prof. Francesca Bassi
  • Prof. Giovanna Menardi
  • Dr. Andrea Sottosanti

Research Network:

  • University of Rome "La Sapienza"- Project Leader
  • University of Bologna
  • University of Florence
  • University of Naples “Federico II”
  • University of Padova
  • University of Udine

Duration: February 2025 – February 2027