Aspect/sentiment-aware review summarization (Recent)
Existing unsupervised, opinion summarization techniques follow a two-stage framework: first creating synthetic review-summary paired datasets and then feeding them into generative summary model for supervised training. However, these methods mainly focus on semantic similarity in synthetic dataset creation, ignoring consistency of aspects/sentiments in synthetic pairs. Such inconsistency also brings a gap to training and inference of summarization model. To alleviate this problem, ConsistSum [Ke 22] first extracts preliminary “review/summary” pairs from raw corpus by evaluating distance of aspect and sentiment distribution. Then, preliminary summary is refined with constrained Metropolis-Hastings sampling to produce highly consistent synthetic datasets. In the summarization phase, generative model T5 is fine-tuned by incorporating loss of predicting aspect/opinion distribution.
Previous summarization approaches construct multiple reviews and their summary based on textual similarities between reviews, resulting in information mismatch between review input and summary. [Liu 22] instead converts each review into a mix of structured and unstructured data, called as opinion/aspect pairs (OAs) and implicit sentences (ISs). A new method synthesizes training pairs of such mix-structured data as input and textual summary as output, and designs a summarization model with OA encoder and IS encoder.
Semantic Autoencoder (SemAE) [Chowdhury 22] performs extractive summarization in an unsupervised manner. SemAE uses dictionary learning to implicitly capture semantic information from review and learns a latent representation of each sentence over semantic units. A semantic unit is supposed to capture an abstract concept. Representations are leveraged to identify representative opinions among hundreds of reviews. SemAE is also able to perform controllable summarization to generate aspect-specific summaries.
Two simple yet effective unsupervised approaches [Shen 23] generate both aspect-specific and general opinion summaries by training on synthetic datasets constructed with aspect-related review contents. Seed Words Based Leave-One-Out (SW-LOO) identifies aspect-related portions of reviews simply by exact-matching aspect seed words. Natural Language Inference Based Leave-One-Out (NLILOO) identifies aspect-related sentences utilizing an NLI model in a more general setting without using seed words.