ç™»å½•æŸ¥çœ‹æ›´å¤šå†…å®¹

Process transcript with Python

Orsan Awawdi

Software Engineer | Extensive experience in automation using Python and Jenkins

å‘å¸ƒæ—¥æœŸ: 2020å¹´2æœˆ5æ—¥

Let's see what is the most popular word from Donald Trump speech.

Transcript of the speech can be found when googling "deal of the century transcript", Linkedin does not like hyperlinks to be pasted in the article, this is why I am not attaching a link here.

In order to process the text, I want to decide first what text I DO NOT need. So, we need to work according to the following:

# 1) only alphabetical tokens

# 2) normalized to lowercase

# 3) clean of punctuation and quotes

# 4) clean of undesirable words (like names, stop words, etc...)

For this, I created a #Python method which takes the transcript file and the destination file to be written to, as two arguments. The method loops the content of the input file, split this content by space, verify we take only alphabetical words, remove punctuation according to mapping table, normalize the words into lower case, remove any quote, select only desirable words, and finally insert each word into a list.

Now, let's count the frequency of each word. How common each word is?

For this, we use a dictionary that holds words as a key, and frequency of this particular word as a value. Last we write the result to a text file, or print the result. Nice!

Code can be found in GitHub:

[https://github.com/Awawdi/textProcessing]

è¦æŸ¥çœ‹æˆ–æ·»åŠ è¯„è®ºï¼Œè¯·ç™»å½•

Orsan Awawdiçš„æ›´å¤šæ–‡ç«

Docker Disk Space Management

2024å¹´12æœˆ22æ—¥

Docker Disk Space Management

It is necessary to regularly check the disk space occupied by Docker to ensure efficient resource management andâ€¦
Code Review

2024å¹´5æœˆ12æ—¥

Code Review

Code Review is a sensitive matter. It introduces the code you wrote to the eyes of another person, who has their ownâ€¦
re.findall

2024å¹´4æœˆ30æ—¥

re.findall

When we talk about finding recurrent text in a string, we think about regex (Regular Expressions). Regex has so manyâ€¦

1 æ¡è¯„è®º
AI generated code

2023å¹´12æœˆ7æ—¥

AI generated code

Should we always listen to AI generated code? I asked BLACKBOX.AI to write me a simple code in #python.
Environment variables

2022å¹´10æœˆ3æ—¥

Environment variables

Environment variables are variables that store data in your program but outside your code. For example, key and secretâ€¦

1 æ¡è¯„è®º
Understanding type annotation in Python

2022å¹´7æœˆ12æ—¥

Understanding type annotation in Python

Why do we need type hints in Python? We can annotate (comment) variables and functions with data types. Being aâ€¦
Nested Repeaters

2021å¹´2æœˆ28æ—¥

Nested Repeaters

Let's take an example. We have Categories table in our DB, and each Category has multiple subcategories.
Three options to filter a list in C#

2019å¹´10æœˆ11æ—¥

Three options to filter a list in C#

I will show here three different ways to filter text by some string criteria in C#. We have a public class calledâ€¦
Validation using Attributes in C#

2019å¹´6æœˆ19æ—¥

Validation using Attributes in C#

Validating data entered by user can be done via multiple methods. Attributes is one powerful yet simple way to validateâ€¦

See all articles

Process transcript with Python

Orsan Awawdi

Software Engineer | Extensive experience in automation using Python and Jenkins

Orsan Awawdiçš„æ›´å¤šæ–‡ç«

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

Learning Python - Day 5

Introduction to Floating-Point Arithmetic in Python by MarsDevs.

Day # 2 ?? Unlocking the Power of Python: Operators in Python

How to Simulate the Monty Hall Problem to Predict Your Chances of Winning in Python + R

Understanding Binary Numbers using Python

Understanding Unicode (Part 2): Handle encoding error in python

Prime or Integer Factorization

Efficient In-Place Matrix Zeroing in Python

Implementing Real-Time Machine Learning Applications with Python: Use Cases and Solutions

Week - 3 Exploring Python Operators & Conditional Statements

Orsan Awawdiçš„æ›´å¤šæ–‡ç«

Docker Disk Space Management

Code Review

re.findall

AI generated code

Environment variables

Understanding type annotation in Python

Nested Repeaters

Three options to filter a list in C#

Validation using Attributes in C#

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

Learning Python - Day 5

Introduction to Floating-Point Arithmetic in Python by MarsDevs.

Day # 2 ?? Unlocking the Power of Python: Operators in Python

How to Simulate the Monty Hall Problem to Predict Your Chances of Winning in Python + R

Understanding Binary Numbers using Python

Understanding Unicode (Part 2): Handle encoding error in python

Prime or Integer Factorization

Efficient In-Place Matrix Zeroing in Python

Implementing Real-Time Machine Learning Applications with Python: Use Cases and Solutions

Week - 3 Exploring Python Operators & Conditional Statements

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†