课程: Hands-On Natural Language Processing

What is text summarization?

- [Narrator] Text summarization is a computational technique for generating brief, accurate and coherent subsets of lengthier documents. When summarizing text, one or more documents are given as an input, and a summary is obtained as an output. Text summarization typically happens in three key steps. Topic identification, topic interpretation and summary generation. Summarization can either be abstractive or extractive. The abstractive summarization technique leverages natural longer processing algorithm to understand the meaning of text. It is generally considered to be a more complex and computationally intensive of input. The results in summary is comparable to how humans read and then summarize text in their our words. Extractive summarization technique, leverages statistical and linguistic characteristics to assign importance to sentences and paragraphs. The summary with this technique consists of phrases extracted from the original document. The central principle of summarization is to reduce text size while retaining the important content. They can be tuned to fit the need of the target audience, and their accuracy is measured by Recall-Oriented Understudy for Gisting Evaluation or ROUGE metrics. Some of the most important libraries and frameworks for summarization are pysummarization, sumy, Pyteaser, Gensim TextRank. All these frameworks are based on Python. Now there are many practical uses of summarization and many more are emerging every day. Summarization is used in the generation of news headline, book summarization, email summaries, research abstract, question answering, study notes and flashcards, social media content generation, and summary from transcripts. As more data is generated globally the need to capture the most important part of text data increases. Understanding text summarization and how to implement its methodology is an important skill in your professional toolbox.

内容