Jan de Leeuw

Geometric Representation of Multivariate Data Frames

2011

Jan de Leeuw

Abstract

We discuss two classes of drawing methods for multivariate categorical data. Both are inspired by multidimensional scaling, and are intimately linked to the notion that similarity in the data is naturally represented as distance in a low-dimensional Euclidean space. The objects that are measured, or categorized, by our variables are represented as points. Each variable defines a partition of the points into subsets corresponding with the values of the variable

The first class of methods are the clumping methods, that try to represent the objects with the same values on a variable by small compact subsets of space. Since there are many ways to measure the size of a point set, there are many clumping methods. These second class are separation methods, which try to construct smooth surfaces from some parametric family to separate points having different values on the variable.

Clumping and separation methods can be implemented using either least squares or likelihood based algorithms, which define the two main ways to measure and minimize badness-of-fit.

Main Content

For improved accessibility of PDF content, download the file to your device.

Department of Statistics, UCLA

Geometric Representation of Multivariate Data Frames