Analysis Types

Multidimensional Scaling – Types, Formulas and Examples

Multidimensional Scaling

Multidimensional Scaling

Multidimensional scaling (MDS) is a statistical technique often used in information visualization and social science research to visualize the structure of distance-like data. It’s a form of non-linear dimensionality reduction and can be used to explore similarities or differences in data.

MDS starts with a matrix of item-item similarities (or dissimilarities), then assigns a location to each item in N-dimensional space, where N is a user-defined number. In simple terms, MDS tries to map the items onto a lower-dimensional space (usually 2D for visual representation, or sometimes 3D) in such a way that items that were close together in the original high-dimensional space are close together in the low-dimensional space, and items that were far apart in the high-dimensional space are far apart in the low-dimensional space.

Types of Multidimensional Scaling

There are a few main types of Multidimensional Scaling (MDS), and they are typically classified based on the kinds of distances they preserve and the transformations they use:

Classical or Torgerson MDS

Also known as Principal Coordinates Analysis, Classical MDS uses the original distances between the objects and attempts to preserve these in the reduced dimensions. It is also referred to as metric MDS as it uses metric distances. This technique is based on eigen decomposition and is often used when you have an exact dissimilarity matrix (like geographic distance).

Metric MDS

Similar to Classical MDS, Metric MDS tries to preserve the metric distances as closely as possible. However, unlike Classical MDS, it uses an iterative algorithm and works on both ratio data and interval data.

Non-metric MDS

Non-metric MDS tries to preserve the rank order of distances rather than actual distances. This method is beneficial when the dissimilarities are not metric or when the transformation of dissimilarities is unknown or non-linear. It attempts to model similarity or dissimilarity data as distances in geometric space.

Generalized MDS

Generalized MDS (GMDS) is designed to handle different types of data, such as matrices with missing values, and to apply various constraints on the solutions. It provides a generalized framework for handling these more complex data structures.

Individual Differences Scaling (INDSCAL)

A type of MDS designed to analyze data from several different individuals or groups. It’s a three-dimensional scaling technique where one dimension represents different subjects.

Ordinal MDS

It is similar to non-metric MDS, but it places additional emphasis on preserving the order of distances.

Multidimensional Scaling Formulas

Multidimensional Scaling Formulas are as follows:

Euclidean Distance Calculation

In classical Multidimensional Scaling (MDS), we often start with a matrix of Euclidean distances. If we have a set of points, the Euclidean distance between two points (x1, y1) and (x2, y2) in two-dimensional space is:

Euclidean Distance

Double Centering

In the double-centering process, we adjust the dissimilarity matrix to remove any trend in the data that isn’t related to the distances between points. The formula is:

Double Centering

Matrix Decomposition (Eigenvalue Decomposition)

The matrix B obtained from the double-centering step is then decomposed into its eigenvalues and eigenvectors. This step gives us the coordinates of the points in the new, lower-dimensional space.

If λ’s are the eigenvalues and E’s are the corresponding eigenvectors of B, then the coordinate matrix X is obtained by:

Matrix Decomposition

Stress Function (Non-metric MDS)

Non-metric MDS uses a different approach. It begins with a guessed configuration and then iteratively updates it to reduce a quantity known as the “stress”. Stress is a measure of the difference between the distances in the high-dimensional space and the distances in the lower-dimensional space:

Stress Function

Here, D represents the original distances, 𝑑𝑖𝑗 is an element of D, and 𝑑̂𝑖𝑗 is the corresponding distance in the new, lower-dimensional space. The goal is to find a configuration that minimizes this stress function.

How to Conduct Multidimensional Scaling

Here is a step-by-step guide on how to use Multidimensional Scaling (MDS):

  • Collect and Prepare Data: Your data should represent distances, dissimilarities, or cost between pairs of objects. This is usually represented in a matrix form where each cell represents the distance between a pair of objects.
  • Choose the Type of MDS: Decide on the type of MDS that best fits your data and research question. If you are working with a metric distance and you want to maintain these distances in the reduced space, use metric MDS. If you only care about preserving the order of distances, use non-metric MDS. If your data has more complex structures or constraints, consider using a more general form of MDS.
  • Run the MDS Algorithm: The specific steps will depend on the type of MDS and the software you’re using. In general, you’ll start by computing a dissimilarity or distance matrix if you don’t already have one. Then, depending on the type of MDS, you might use an eigenvalue decomposition (for metric MDS) or an iterative algorithm (for non-metric MDS) to find a configuration of points in a lower-dimensional space that preserves the distances from the high-dimensional space as well as possible.
  • Interpret the Results: The output of MDS is usually a set of coordinates for each object in the lower-dimensional space. You can plot these points to visualize the relationships between your objects. Objects that are close together in this space are similar, and objects that are far apart are dissimilar. Look for patterns, clusters, or other structures in these plots that might answer your research question.
  • Evaluate the Fit: Finally, you should assess how well the MDS solution fits your data. One common measure of fit is “stress”, which quantifies the difference between the distances in the high-dimensional space and the distances in the MDS solution. A lower stress value indicates a better fit.

Examples of Multidimensional Scaling

Examples of Multidimensional Scaling are as follows:

Example 1: Market Research

Suppose a company is interested in understanding how its product is perceived in relation to competitors’ products. They could conduct a survey asking customers to rate the similarity of different pairs of products on a numerical scale. For instance, how similar is product A to product B on a scale of 1 to 10?

Once the company collects these similarity ratings for all pairs of products, they can represent this information as a matrix and use MDS to map the products onto a two-dimensional space. The resulting plot could reveal clusters of products that are perceived as similar by customers, which could help the company position its product more effectively or identify gaps in the market.

Example 2: Psychology Research

In psychology, researchers often use MDS to analyze similarity or dissimilarity data. For example, a researcher might ask subjects to rate the psychological similarity of different pairs of emotions.

After collecting these ratings, the researcher can use MDS to visualize the structure of emotional space. This might reveal that certain emotions (like joy and surprise) are often perceived as similar, while others (like anger and joy) are perceived as very different.

Example 3: Bioinformatics

MDS is often used in bioinformatics to visualize the similarity or dissimilarity between different gene or protein sequences. The distances between sequences could be based on some measure of sequence similarity (like the number of shared motifs), and MDS can help researchers visualize the relationships between many sequences at once.

When to use Multidimensional Scaling

Multidimensional scaling (MDS) is particularly useful in situations where you need to visualize and understand the structure or relationships in your data. It is suitable in the following scenarios:

  • Understanding Similarities or Differences: MDS is used when you want to understand the similarities or differences between a set of objects or individuals. This could be based on direct measures of distance or similarity (such as geographic distance or genetic similarity), or it could be based on more abstract measures (like perceptual similarity ratings).
  • Visualizing High-Dimensional Data: If your data has many dimensions (variables), it can be challenging to understand the relationships between different data points. MDS can help by reducing the dimensionality of the data to two or three dimensions that can be easily visualized.
  • Non-Metric Data: MDS is especially useful when your measures of similarity or dissimilarity are non-metric. Non-metric MDS preserves the rank order of distances, which makes it appropriate for ordinal data.
  • Pattern Discovery: MDS can be used to discover patterns or structures in your data that might not be apparent from the raw data. By visualizing the data in a lower-dimensional space, you might be able to identify clusters of similar items, outliers, or other patterns.
  • Comparison of Objects or Individuals: MDS can be used to compare and contrast different objects or individuals based on multiple measures at once. For example, you might use MDS to compare different products based on a range of product attributes.

Applications of Multidimensional Scaling

Multidimensional Scaling (MDS) has been widely applied in various fields to visualize the similarities and differences between a set of objects or individuals. Here are some applications across different domains:

  • Market Research: MDS is often used in market research to understand how different products are perceived by customers. For instance, a company might use MDS to map products onto a space based on customer ratings of product similarity, which could help identify market segments or guide product development strategies.
  • Psychology: In psychology, MDS has been used to map out perceptual spaces. For example, researchers might ask subjects to rate the similarity of different emotions, and then use MDS to visualize the structure of emotional space.
  • Bioinformatics: MDS can be used to visualize the similarities and differences between different gene or protein sequences, helping to identify clusters of similar sequences or outliers.
  • Sociology: Sociologists might use MDS to understand social networks, visualizing the relationships between individuals or groups based on measures of social distance.
  • Geography and Urban Planning: MDS can be used to create perceptual maps of geographic areas, based on people’s perceived distances between different locations, which could inform urban planning or transportation strategies.
  • Computer Science and Information Retrieval: MDS can be used to visualize the similarity between different documents or websites, which could be used to improve information retrieval algorithms or create more intuitive user interfaces.
  • Environmental Science: MDS is used to understand patterns in ecological data, for example, understanding species distribution or similarity in different environmental conditions.
  • Linguistics: In linguistics, MDS can be used to study the perceived similarity between different sounds, words, or languages.

Advantages of Multidimensional Scaling

Multidimensional Scaling (MDS) has several advantages which make it a useful tool in various fields:

  • Visualizing High-Dimensional Data: One of the key advantages of MDS is its ability to reduce high-dimensional data into a lower-dimensional space (typically 2D or 3D), which can be easily visualized and understood.
  • Handling Different Types of Data: MDS can handle a wide variety of data types. It can work with data that represents actual physical distances, but it can also work with more abstract measures of similarity or dissimilarity, making it quite versatile.
  • Non-Metric Data: Non-metric MDS does not rely on the actual numerical values of distances, but rather on the rank order of distances. This makes it useful for data that is ordinal or when the precise differences between distance values are not important or meaningful.
  • Revealing Hidden Patterns: MDS can help reveal hidden patterns or structures in data. By representing objects in a low-dimensional space, it may be easier to identify clusters of similar items, outliers, or other interesting patterns.
  • Intuitive Interpretation: The visual nature of the output from MDS can make the results more intuitive and easier to understand compared to other methods. It’s often easier to interpret a 2D or 3D plot of items than to interpret a high-dimensional data set or a complex statistical model.
  • Handling Missing Data: Some forms of MDS can deal with missing data, which is a common issue in many real-world data sets.

Disadvantages of Multidimensional Scaling

While Multidimensional Scaling (MDS) has several advantages, it also comes with certain disadvantages:

  • Subjectivity in Interpretation: One of the main challenges in using MDS is that interpreting the results can be somewhat subjective. While the proximity of points can indicate similarity, there is usually no inherent meaning to the dimensions that are produced.
  • Influence of Outliers: MDS can be sensitive to outliers. A single outlying point can distort the configuration of points in the reduced space.
  • Difficulty with High-Dimensional Data: While MDS is useful for reducing the dimensionality of data, it can struggle when the original data is very high-dimensional. In these cases, other dimension reduction techniques, like Principal Component Analysis, might be more appropriate.
  • Metric vs Non-Metric MDS: The choice between metric and non-metric MDS can be challenging. Metric MDS preserves the original distances as closely as possible, but may not provide a good fit if the relationships in your data are non-linear. Non-metric MDS only preserves the rank order of distances, and may therefore lose some information about the original distances.
  • Computationally Intensive: Non-metric MDS can be computationally intensive, as it involves an iterative algorithm to minimize the stress function. This can be a problem with very large datasets.
  • Difficulty Handling Missing Data: Although some advanced forms of MDS can handle missing data, basic MDS requires a complete distance or dissimilarity matrix. If some distances are missing or undefined, this can cause problems.
  • Assumption of Symmetric Distances: Traditional MDS assumes that the distance from item A to item B is the same as the distance from item B to item A, which might not always be true in practice.

About the author

Muhammad Hassan

Researcher, Academic Writer, Web developer