Events and Meetings of Italian Statistical Society, Statistics and Demography: the Legacy of Corrado Gini

Font Size: 
Data reduction for categorical data: an explained heterogeneity approach
Alfonso Iodice D'Enza, Michel van de Velden

Last modified: 2015-09-05


The general aim of data reduction (DR) is to synthesize the information within a data set by defining a set of homogeneous groups of observations (row-wise) and a set of linear combinations of the starting attributes that approximate their relationship structure (column-wise). That is, DR embeds clustering and dimension reduction techniques that are often used sequentially. Albeit such sequential approach is straightforward, dimension reduction is applied first, and the reduced-space observation projections are clustered together, it may fail in retrieving the structure underlying data. In fact, the low-dimensional solution may mask the groups of homogeneous observations. To overcome this problem, joint DR techniques have been proposed, in this paper we focus on the categorical data case and on how such approaches relates to the explained heterogeneity.

Full Text: PDF