In today’s world with such a tremendous amount of data available on the internet and with same amount of time to actually read everything, it has necessitated the need to explore techniques which can extract relevant summaries quickly and most efficiently. Text summarization can be defined as a method to preserve the most important and relevant information from a single or multiple related documents in the most concise manner. Over the past few years this problem has been handled from different perspectives using multiple paradigms.
In this blog I am going to describe how to handle text summarization using Single Value Decomposition (SVD). There are a lot of papers published on this topic, but most of them are difficult to interpret for a beginner as a lot of mathematical formulas and jargons are used. I hope after reading this blog you will understand the theory behind text summarization using SVD.
Let’s begin by briefly understanding some basic background of Single Value Decomposition.
Consider the transformation (Amxn) shown on the left. This transformation helps us to go from a 2-dimensional space to a 3-dimensional space. The Amxn matrix does this transformation from a 2-dimensional space (Bnx1) to a 3-dimensional space (Cmx1). Let’s look at how this transformation is achieved.
We achieve this by 1st travelling to the appropriate space in the 2-dimensional space (U matrix) then we travel across to the other space (S matrix) and then finally to the desired position in the 3-dimensional space (V matrix). So any non-square matrix can be factorized as shown into U, S and V matrices.
Now we can factorize the given data matrix as follows:
U matrix: Matrix which explains what is happening in row-space
V matrix: Matrix which what is happening in column-space
S matrix: Matrix which achieves this transformation
So by doing this we have divided the input space in 3 components, 1st which only talks about the row-space, 2nd which talks about the column-space and the 3rd which actually brings about this transformation.
You will shortly understand why we looked at this matrix transformation and how this is related to text summarization. But before we apply the above intuition of SVD to text summarization context, let’s look at the different types of text summarization methods.
In this bog we will discuss the single document, generic and extractive method of text summarization. Following are the broad 5 steps on text summarization.
Let’s look at some of the most common pre-processing techniques that can be applied for any text mining algorithm. Pre-processing is the structured representation of the original text.
Now that we have created the base dataset for the SVD input by preprocessing the unstructured text data, the next step is to transform the data in a format that is suitable for the SVD algorithm. The columns of the matrix represent the sentences of the input text and the rows represent the words.
The skeleton of the input matrix is as shown above but the question at hand is to determine how we fill the individual cell values. The cells are filled to represent the importance of words in each of the sentences. This input matrix is a very sparse matrix with the number of words much higher than the number of sentences. The method used to fill the input matrix affects the performance of text summarization using SVD. I have explained in brief some of the techniques for filling the input matrix below.
Once you decide the method of filling the input matrix, run SVD on the input matrix. SVD is a method that models the relationships among words and sentences by combing different words into concepts, along with reduction in noise.
Interpretation of SVD:
- Derives mapping between m-dimensional space and r-dimensional singular vector space (rank of input matrix = r)
- Breaks down the original document into r linearly-independent base vectors or concepts
- SVD can semantically cluster words and sentences by finding salient and recurring patterns and representing them by 1 of the singular vectors
- The magnitude of the corresponding singular vector indicates the degree of importance of the pattern (salient feature/concept) within the document
- The sentence that best represents this pattern will have the largest index value
- After you run SVD and get the most salient concepts in the text, we need to select the sentences as summary.
Following are some of the sentence selection algorithms:
a. Gong & Liu (Gong and Liu, 2001): From each row of VT matrix which represents a concept, the sentence with the highest score is selected. This is repeated until a predefined number of sentences are collected.
b. Steinberger & Jezek (Steinberger and Jezek 2004): For each row of V matrix, the lengths of sentences using n concepts are calculated. The value n is given as an input parameter. Σ matrix values are also used as importance parameters in the length calculations.
c. Murray & Renals & Carletta (Murray et al., 2005):From each row of VT matrix, for each concept, one or more sentences with the higher scores are selected. The number of sentences to be collected for each concept is determined by getting the percentage of the related singular value over the sum of all singular values, which are represented in the Σ matrix.
Text summarization in itself is a vast topic, but I hope after reading this blog you have got the jist of text summarization using Single Value Decomposition.
Text Summarization of Turkish Texts using Latent Semantic Analysis by Ozsoy et. al.
A Survey of Text Summarization Extractive Techniques by Gupta et.al.
Using Latent Semantic Analysis in Text Summarization and Summary Evaluation by Steiberger et. al.
Sharayu Rane works as a Business Analyst at ZS Associates. As a Business Analyst, Sharayu has expertise in leveraging various complex and huge data to help clients track their brand performance, develop insights into clients’ business. Also, Sharayu volunteers as a Data Analyst at Safecity. Safecity is a non-profit organization based out of Mumbai, India which crowd sources personal stories of sexual harassment and abuse in public spaces. Sharayu’s current role involves using advanced predictive modelling techniques to predict the location/city where the incident took place as it often happens that people don’t mention the incident of the location/city.