Aspect/sentiment-aware review summarization (SOTA)
Several pipeline methods [Bhaskar 22] apply GPT-3 to summarize a large collection of user reviews in a zero-shot fashion, notably approaches based on recursive summarization and selecting salient content to summarize through supervised clustering or extraction. On two datasets, an aspect-oriented summarization dataset of hotel reviews and a generic summarization dataset of Amazon and Yelp reviews, GPT-3 models achieve very strong performance in human evaluation. Standard evaluation metrics do not reflect this, and evaluation instead is against several new measures targeting faithfulness, factuality, and genericity to contrast these different methods
Comparative decisions, such as picking between two cars or deciding between two hiking trails, require the users to visit multiple webpages and contrast the choices along relevant aspects. Impressive capabilities of pre-trained large language models can help automate such analysis. This extractive aspect-based contrastive summarization task involves constructing a structured summary that compares the choices along relevant aspects. A novel method [Gunel 23]called STRUM for this task that can generalize across domains without requiring any human-written summaries or fixed aspect list as supervision. Given a set of relevant input webpages, STRUM solves this problem using two pre-trained T5-based [11] large language models: first one fine-tuned for aspect and value extraction [14], and second one fine-tuned for natural language inference [13]. We showcase the abilities of our method across different domains, identify shortcomings, and discuss questions that we believe will be critical in this new line of research.
Product reviews are summarized along three dimensions in [Wang 23]: a summary product verdict, pros, and cons. To improve the performance of summarization from a large number of reviews per product, we propose FARSum, an efficient solution that leverages review filtering based on review recency and customer feedback including review helpful vote and review rating in the first stage. To improve context generalization across product categories, we train a BART-based model on synthetic review summaries and fine tune the model using ground-truth summary labels - demonstrating its competitive performance with ROUGE metrics.?
领英推荐
Instead of simply mining opinion ratings on a target (e.g., a restaurant) or on multiple aspects (e.g., food, service) of atarget, it is desirable to go deeper, to mine opinion on fine-grained sub-aspects (e.g., fish). However, it is expensive to obtain high-quality annotations at such fine-grained scale. This leads to FineSum [Ge 23] which advances the frontier of opinion analysis in three aspects: (1) minimal supervision, where no document-summary pairs are provided, only aspect names and a few aspect/sentiment keywords are available; (2) fine-grained opinion analysis, where sentiment analysis drills down to a specific subject or characteristic within each general aspect; and (3) phrase-based summarization, where short phrases are taken as basic units for summarization, and semantically coherent phrases are gathered to improve the consistency and comprehensiveness of summary. Given a large corpus with no annotation, FineSum, first automatically identifies potential spans of opinion phrases, and further reduces the noise in identification results using aspect and sentiment classifiers. It then constructs multiple fine-grained opinion clusters under each aspect and sentiment. Each cluster expresses uniform opinions towards certain sub-aspects (e.g., "fish" in "food" aspect) or characteristics (e.g., "Mexican" in "food" aspect). To accomplish this, a spherical word embedding space is trained to explicitly represent different aspects and sentiments. Knowledge is then distilled from embedding to a contextualized phrase classifier, and clustering is performed using contextualized opinion-aware phrase embedding.
[Bhaskar 22] Zero-Shot Opinion Summarization with GPT-3