Process transcript with Python

Process transcript with Python

Let's see what is the most popular word from Donald Trump speech.

Transcript of the speech can be found when googling "deal of the century transcript", Linkedin does not like hyperlinks to be pasted in the article, this is why I am not attaching a link here.

In order to process the text, I want to decide first what text I DO NOT need. So, we need to work according to the following:

  #  1) only alphabetical tokens

  #  2) normalized to lowercase

  #  3) clean of punctuation and quotes

  #  4) clean of undesirable words (like names, stop words, etc...)

For this, I created a #Python method which takes the transcript file and the destination file to be written to, as two arguments. The method loops the content of the input file, split this content by space, verify we take only alphabetical words, remove punctuation according to mapping table, normalize the words into lower case, remove any quote, select only desirable words, and finally insert each word into a list.

No alt text provided for this image

Now, let's count the frequency of each word. How common each word is?

No alt text provided for this image

For this, we use a dictionary that holds words as a key, and frequency of this particular word as a value. Last we write the result to a text file, or print the result. Nice!

No alt text provided for this image

Code can be found in GitHub:

[https://github.com/Awawdi/textProcessing]

要查看或添加评论,请登录

Orsan Awawdi的更多文章

  • Docker Disk Space Management

    Docker Disk Space Management

    It is necessary to regularly check the disk space occupied by Docker to ensure efficient resource management and…

  • Code Review

    Code Review

    Code Review is a sensitive matter. It introduces the code you wrote to the eyes of another person, who has their own…

  • re.findall

    re.findall

    When we talk about finding recurrent text in a string, we think about regex (Regular Expressions). Regex has so many…

    1 条评论
  • AI generated code

    AI generated code

    Should we always listen to AI generated code? I asked BLACKBOX.AI to write me a simple code in #python.

  • Environment variables

    Environment variables

    Environment variables are variables that store data in your program but outside your code. For example, key and secret…

    1 条评论
  • Understanding type annotation in Python

    Understanding type annotation in Python

    Why do we need type hints in Python? We can annotate (comment) variables and functions with data types. Being a…

  • Nested Repeaters

    Nested Repeaters

    Let's take an example. We have Categories table in our DB, and each Category has multiple subcategories.

  • Three options to filter a list in C#

    Three options to filter a list in C#

    I will show here three different ways to filter text by some string criteria in C#. We have a public class called…

  • Validation using Attributes in C#

    Validation using Attributes in C#

    Validating data entered by user can be done via multiple methods. Attributes is one powerful yet simple way to validate…

社区洞察

其他会员也浏览了