Learning Data Analytics - Sprint 1

Learning Data Analytics - Sprint 1

Those who know me well know I have this hunger to learn new things and to explore. Sometimes, it is out of FOMO :), but most of the time, it is out of interest and to align with certain personal goals.

You know how they say: if you want it to stick, build projects, right? Yeah, sometime in mid-2023, I started a pet project while learning mobile development (no code / low code) in Flutterflow. The goal was to make a substantial contribution to design and tech and to empower aspiring designers to efficiently learn and access the most relevant design materials focusing on what matters.

During this project, I started handling data and got exposed to data scrapping, SQL / relational databases, and schemas through supabase. It became a struggle when the project required that I maintain, manipulate and join related tables. I am a design guy (I think in pictures :), which means a lot of abstract and not exciting work. I needed to do something about this gap, and very fast. Fast forward to February 2025, I enrolled in a data analytics boot camp to help me understand these concepts. So far, it’s been an interesting journey with Strapp Technologies.

I don’t post that much, but to follow through and for grading purposes, I was requested to share my learnings here. In the next couple of weeks, I will share more content like this. Here we go…


SUMMARY

Sprint 1 (week 1):

Simply put, data analysis is a way of analysing past events to guide future decisions.

From a general standpoint it involves ;


1.?Primary(original) and secondary data gathering/collection

e.g., data gathered personally by distributing forms (surveys) and data sourced from an existing database or sources.


2.?Data processing/cleaning / Preparation

Data often come in their raw forms, so they need to be cleaned up and made usable for analysis.


3. Data visualisation

Investigating the distribution of the data sets, requires tools such as Excel, PowerBI, Tableau etc. Remember those bars and pie charts, yes, that’s what these tools help with and even more. As a visual person, I am looking forward to this part of the training. I will find this part very interesting (emoji).


4. Data Interpretation

here you draw insights from the data to make informed decisions. Your knowledge of mathematics and statistics will come in handy here.


5. Lastly, reports

the type of report or proposition depends on the stakeholder interested in the report (simple or complex).


We were also introduced to Excel (I am using the WPS Office — Spreadsheet, it’s free and has all the features you can find in Excel) with a clear understanding of what a “Workbook” is and the different elements that make up its interface. Elements such as; title bar, menu bar, address bar, formula bar, status bar, sheets, columns, rows, cells, labels, data sources and how to manipulate these data using statements and Excel formulas like;


EXCEL FORMULA

1. =MAX(range) & =MIN(range)

— meaning, within this column/range of values, find the maximum number and display it in this cell.


2. =IF(range of cells=” LinkedIn”,” Social media”,” Not found”)

— meaning, within this column/range of values, find the word/value “LinkedIn” and label it as “Social media” in a new column, while you label/display other values as “Not found”.


3. Nested IF

— Multiple IF statements or nested conditions and has a “FALSE” outcome.

e.g.

=IF (F2:F10 = “Salesman”, ”Sales”, IF(F2:F10 = “Accountant”, “Finance”, ”Not Relevant”)))

Here we are trying to take other roles within the column (Salesman, HR, and Accountant) into consideration and categorise them as Sales, Admin and Finance. Every other role would then be categorised as “Not Relevant”.


4. =IFS

— Takes multiple conditions and doesn’t have a false outcome. Instead, it will display #N/A (Not Available).

e.g.

=IF (F2:F10 = “Salesman”, ”Sales”, F2:F10 = “Accountant”, “Finance”)


5. =LEN

— means “Length”. Returns or calculates the number of characters in a “text” string. It is used for texts and not numerical values. If a cell in a column has the name “Receptionist” or a range of string values, you can display the number of characters in the name by using;

=LEN(F2:F10) — a use case is that it can be used to see the difference between 100s and thousands. To find bad Social Security numbers if they’re 10 digits instead of 9.


6. =PROPER

— This will clean up your text values making every text appear as title-case and consistent.


7. =LEFT(B2:B10,3) or =RIGHT(B2:B10,3)

i.e. from a range of values “B2 to B10”, display the first 3 characters from the “left” or “right”. Here, if for example one of the values is “Emmanuel”, this formula will call “Emm” as the first 3 characters or “uel” as the last 3 characters.


8. Date to Text

— as the name implies, helps with formatting date values into text values with the flexibility to rearrange how you want your dates to appear; day first, month first or year first. written as =TEXT(range of values, “dd/mm/yy”)


9. Trim

Removes the spaces within any given range of values. =TRIM (Range of values).


10. Substitute

This is used when you want to replace any given value. =SUBSTITUTE(select the cell that holds the value, “indicate the value to be replaced” , “Indicate what you want to substitute this value with”, “then state what instance you want it to happen”).

=SUBSTITUTE(L2, “/” , “-”, 2) Here you have indicated that it should take place in the second instance where the value “/” appears.

=SUBSTITUTE(L2, “/” , “-”) Here you have indicated no instance, hence this will apply to all instances where the value “/” appears.

The functions of the formula used in Excel are also “Case sensitive”.


11. SUM, SUMIF & SUMIFS =SUM(Range of values)

— This sums up the values within the column or range of values within a column.

=SUMIF(Select the range of values containing the condition to be applied, “identify the condition”, Select the range of values or column that contains the corresponding values to be used, to sum up the condition defined).

=SUMIF is used to solve a single condition.

For example =SUMIF(E2:E10,”Female”,G2:G10) — Sum of salaries of female employees

=SUMIFS(Select the range of values or column that contains the corresponding values to be used, to sum up the condition defined first, Select the range of values containing the condition to be applied, “identify the condition”, repeat).

=SUMIFS is used to solve multiple conditions. For example =SUMIFS(G2:G10,E2:E10,”Male”,D2:D10,”>30") — Sum of salaries of Male employees that are above the age of 30.


12. COUNT, COUNTIF, & COUNTIFS =COUNT(Select a range of values in a column)

— this will count the number of values within the range selected.


=COUNTIF(Select a range, “followed by the criteria”)

— This will count the number of values that meet the criteria specified in the formula (Single condition).


=COUNTIFS(Select the first range of values, “followed by the first criteria”, Select the second range of values, “followed by the second criteria”)

e.g. Find the number of female employees (first range is the column containing the sex of employees, first criterion is “female”, second range is the column containing the salaries of employees, second criteria is “<50000”) earning below 50,000 (multiple conditions).


13. CONCATENATE It means to join values together from different columns. =CONCATENATE(First value, “space”, another value you want to join to the first value)


14. DAYS & NETWORKDAYS

This is used to calculate the number of days between two dates; an End Date and a Start Date. =DAYS(End date value, Start date value)


NETWORKDAYS

— used when you want to know the actual working days that exist within a period. =NETWORKDAYS(Start date, End date, Number of holidays/off days) e.g. =NETWORKDAYS(H2,I2,20) using the table above.


Lastly, I learnt how to prepare and clean up a data set to make it suitable for use. The very step here is to ensure you make a copy of your data set before you start.


/End


要查看或添加评论,请登录

社区洞察

其他会员也浏览了