Learn Data Analysis with Python

Learn Data Analysis with Python



100-Days Data Analysis challenge with Python
The image was created using AI.

Hey everyone,

It’s been a while since my last newsletter, and I want to thank all 192 of you for sticking around! I’ve decided to resume this newsletter with a fresh goal — completing a 100-day Data Analysis with Python challenge. The aim is to gain proficiency in solving Python data analysis problems and build a strong project portfolio by the end of it.

This is for anyone who is starting for the first time. Few details are missed intendedly so that you can do some homework!

Let me take you along for this journey, starting from Day 1!

Starting 100-days of learning Data Analysis with Python challenge

Day 1: Basic Python Practice

Tasks Completed:

  • Started from total basics like installing Python and then checking the version.

import sys
print(sys.version)        

  • Wrote a Python function to check if a number is odd or even.
  • Enhanced it to ask users if they wanted to continue testing other numbers.
  • Handled edge cases like invalid inputs.
  • Used ternary operators(single-line conditional expressions) to simplify the code later.
  • Tried different approaches of writing a code.

Here’s a snippet of the code I worked on:

def odd_even(number):
    if(number%2 == 0):
        print('Number is even')
    else:
        print('Number is odd')

    #using 'ternary operator'(single-line conditional expression), we can write above      #code in a single line

    # print('Number is even' if number % 2 == 0 else 'Number is odd')

while True:
    inp_num = int(input("Enter a number"))  #input returns a string by default, that is why it was converted to int. 
    odd_even(inp_num)
    
    continue_testing = input("Do you want to test another number?(yes/no): ").lower()

    if continue_testing not in ['yes', 'no']:
        print('Please enter a valid input with yes or no only')
    elif (continue_testing != 'yes'):
        print("Exiting the program")
        break;


# while True:   use strip() to handle accidental spaces. 
#     inp_num = int(input("Enter a number: "))  # Input converted to integer
#     odd_even(inp_num)
    
#     continue_testing = input("Test another number? (yes/no): ").strip().lower()
    
#     if continue_testing == 'no':
#         print("Exiting the program.")
#         break
#     elif continue_testing != 'yes':
#         print("Invalid input. Please enter 'yes' or 'no'.")        
The second suggestion for while loop was given by chatGPT. Let me know what do you think about the second version?

2nd Task:

Square of numbers

for i in range(1,11):
    print(f""" The square of {i} is {i**2}""")        

An f-string (formatted string literal) is a way to embed expressions inside string literals by prefixing the string with f or F. You can include variables or expressions inside curly braces {} within the string, and Python will evaluate and replace them with their values.


Day 2: Basic File Read/Write Operations

!touch sample.txt        

Create a sample file named sample.txt

Add some sample content in the file.

with open('sample.txt', "r") as opened_file:           #opening the file in read(r) mode.

    #print(opened_file.read())  #read all contents at once
    #print(opened_file.readline()) #read line by line

    # using a simple `for` loop.
    for x in opened_file:    #loop through the file
        print(x)        

Once you're okay with performing this, go to W3 schools and check the basic file read/write operations.


Day 3: Data structures and libraries

  • Create data structures like list, dictionary, series etc.

fruits = ['Apple', 'Orange', 'Kiwi', 'Banana', 'Avocado']     #list

fruits        
fruit_dict = {                        #dictionary
    'Apple': "Red",
    'Orange' : 'Saffron',
    'Kiwi' : 'Green',
    'Banana' : 'Yellow',
    'Avocado' : 'Dark Green'
}

print(fruit_dict)        

  • Create an array using numpy with 5 random numbers

To install numpy use pip or conda.

pip install numpy        
import numpy as np

random_numbers = np.random.rand(5)  #create an array of 5 random numbers. 

print(f""" Random_numbers: {random_numbers}""")
print(f""" Mean of numbers is: {np.mean(random_numbers)} """)         
print(f""" Std. deviation of numbers is: {np.std(random_numbers)} """)         
Mean is the average of numbers. Std.deviation means how dispersed the data is in relation to the mean.

  • Create a dictionary of people with attributes like name, age, and occupation.
  • Loop through the dictionary and print details of a specific person based on a condition (e.g., print the details of the person whose age is above 30).

sample_data = {
    'Name': ['Deepak', 'Onika', 'Varnika'],
    'Age' : [30, 32, 5],
    'Occupation': ['Content Creator', 'Senior Manager | ITSM', 'Pre-nursery student']
}        
for name, age, occupation in zip(sample_data['Name'], sample_data['Age'], sample_data['Occupation']):
    print(name, age, occupation)        
zip() function is considered best practice for iterating over multiple lists together, as it clearly communicates the intent of pairing corresponding elements.

If you want to print just one person's details say Onika, you can do it using below code:

print(f"Name: {sample_data['Name'][1]}, Age: {sample_data['Age'][1]}, Occupation: {sample_data['Occupation'][1]}")        

Here, We're referring to the 2nd element in the list which is index 1 as index in a list starts from 0.

One you're comfortable with above tasks, go ahead with Day-4 tasks.


This is your homework. We will discuss about these next week.

Day 4 Tasks

Handling Missing Data:

  • Load a CSV file into a Pandas DataFrame.
  • Check for missing values using df.isnull().sum().
  • Handle missing data by either filling with mean/median or dropping rows with missing values.

Sorting Data:

  • Sort the DataFrame by a single column (e.g., age or name).
  • Sort by multiple columns (e.g., name in descending order and age in ascending order).

Basic Mathematical Operations with Pandas:

  • Perform operations like adding a new column to the DataFrame that contains the square root of an existing column (e.g., age).
  • Convert the result of the mathematical operation to an integer.

Data Filtering and Subsetting:

  • Filter the DataFrame based on a condition (e.g., rows where age > 30).
  • Extract specific columns from the DataFrame for analysis.

Don't forget to install pandas before executing these steps.

So, If you have completed these, here's a screenshot for you covering some of these.


Day-4 Tasks


Day-4 tasks continued..
Sorting and Filtering

Day 5: Data Visualization(Basic)

Tasks

Data Visualization (Basic):

  1. Create simple plots (like histograms or bar charts) using Matplotlib or Seaborn to visualize distributions of numerical columns.
  2. Understand how to plot categorical data.

Group By Operations:

  1. Use groupby() to aggregate data based on one or more columns (e.g., average age by occupation).
  2. Explore different aggregation functions (mean, sum, count).

Data Merging:

  1. Merge two DataFrames using merge() based on a common column.
  2. Explore different types of joins (inner, outer, left, right).

Applying Functions:

  1. Use the apply() method to apply a custom function to a DataFrame column.
  2. Understand the difference between apply(), map(), and applymap().


Here are a few code snippets from these tasks. I've decided not to share all of them as you will not exercise by yourself if I do. But I will be happy to answer any questions to address your doubts.

For now, let's see few code examples.

groupby() function


merge two Dataframes
apply(), map() functions


groupby() and their aggregation using agg() with a Transpose


Day 6 Tasks:

  • String Manipulation in Pandas:

  1. Extract substrings from a column.
  2. Find the length of the strings in a text column.
  3. Check if a column's strings start or end with a certain character.

  • Applying Functions in Pandas:

  1. Use apply() with custom functions on DataFrame columns.
  2. Use map() to substitute values in a column.
  3. Use applymap() for element-wise operations on the entire DataFrame.

  • Handling Missing Data:

  1. Detect missing values in a DataFrame using isnull() and notnull().
  2. Drop rows/columns with missing values using dropna().
  3. Fill missing values using fillna().

  • Renaming and Replacing:

  1. Rename columns using rename().
  2. Replace specific values in a DataFrame with replace().

Day 7 Tasks:

  • String Operations in DataFrames:

  1. Extract specific text patterns using .str.extract() on text data columns.
  2. Split strings in columns using .str.split().
  3. Convert all text to lowercase and remove special characters from a column.

  • Handling Dates:

  1. Parse a date column into datetime format using pd.to_datetime().
  2. Extract the day, month, year, and weekday from a datetime column.
  3. Find the time difference between two date columns.

  • Working with Time Series:

  1. Resample time-series data into different frequencies (daily, monthly, etc.).
  2. Plot a time-series trend using matplotlib or seaborn.

  • Aggregations & Grouping:

  1. Group your data by a categorical column and calculate the sum, mean, and count.
  2. Use .agg() to apply multiple aggregation functions on numerical columns.

Day 8 Tasks:

  • Datetime Manipulation:

  1. Extract year, month, day, and time from a datetime column.
  2. Perform operations like adding or subtracting days from a date.
  3. Calculate the difference between two dates and express it in days or hours.

  • Handling Time Series Data:

  1. Convert a column to a datetime format using pd.to_datetime().
  2. Resample the data by day, month, or year.
  3. Use rolling windows and shift() to calculate moving averages or sums.

  • Working with Large Datasets:

  1. Load large datasets in chunks using the chunksize parameter.
  2. Explore the memory_usage() function to understand memory consumption.
  3. Filter rows and columns efficiently on a large dataset.

  • Categorical Data Manipulation:

  1. Convert columns to categorical type to save memory.
  2. Group data by categorical columns and calculate aggregates (sum, mean).

Here is the Homework for you guys.

Let's keep learning and if you've got any suggestions for me to work on, let me know in the comments. :)

要查看或添加评论,请登录

Deepak S.的更多文章

社区洞察

其他会员也浏览了