SQL Day 31/50
Vinay Reddy Vangala
Data Engineer @ BMO | MS in Data Science & Analytics @ UConn | Business, Systems, and Data Analytics Professional | SQL | Python | Tableau | AWS | Data Engineering | ML/AI | Ex Infosys
Q1: Query the list of CITY names from STATION which have vowels (i.e., a, e, i, o, and u) as both their first and last characters. Your result cannot contain duplicates.
Approach: Usage of substring is key concept over here, extracting the first and last element of the city and to avoid any further issues, lower the city name and then extract the first and last character using substring positioning, and use in function to match if the first and last letter matches the vowels.
Solution:
O/P:
Q2: Find the quarterback who threw the longest throw in 2016. Output the quarterback name along with their corresponding longest throw.
The 'lg' column contains the longest completion by the quarterback.
Approach: This query selects the quarterback (qb) and extracts the numeric part from the lg column using REGEXP_REPLACE(). It filters results to only include records from the year 2016. The subquery identifies the record with the maximum lg value for that year. Finally, it returns the quarterback and the extracted numeric value (aliased as throw) for the record with the maximum lg.
Solution:
O/P:
Q3: Find the top 3 medal-winning teams by counting the total number of medals for each event in the Rio De Janeiro 2016 olympics. In case there is a tie, order the countries by name in ascending order. Output the event name along with the top 3 teams as the 'gold team', 'silver team', and 'bronze team', with the team name and the total medals under each column in format "{team} with {number of medals} medals". Replace NULLs with "No Team" string.
Approach:
### Step-by-step explanation:
- Purpose: To calculate the number of medals each team won in each event in 2016 and rank them by medals.
- Select statement:
- event: The sporting event.
- team: The team participating in the event.
- count(medal) as total_medals: Counts the number of medals won by each team for a given event.
- Rank function:
- rank() over (partition by event order by count(medal) desc, team asc):
- This ranks the teams based on the total number of medals (`count(medal)`) for each event (`partition by event`).
- The ranking is ordered by count(medal) in descending order (the team with the most medals gets the highest rank).
- If two teams have the same number of medals, they are ordered alphabetically by team name (`team asc`).
- Where clause:
- where year = 2016: Filters results to only include data for the year 2016.
- Group by clause:
- group by event, team: Groups the results by event and team to aggregate the count of medals.
- Having clause:
- having count(medal) > 0: Ensures that only teams with at least one medal are included.
### 2. Final Query Using the CTE:
领英推荐
- Purpose: This query retrieves the gold, silver, and bronze medal-winning teams for each event from the CTE.
- Select columns:
- `event`: Selects the event name.
- `coalesce(max(case when ranking = 1 then concat(...)), "No Team") as gold_team`:
- `case when ranking = 1`: This checks if the team’s rank is 1 (i.e., gold medal).
- `concat(team, " with ", total_medals, " medals")`: Concatenates the team name with their total medals (e.g., "USA with 10 medals").
- `max()`: Since only one team can have a rank of 1 per event, max() will return that team.
- `coalesce(..., "No Team")`: If no team ranks 1, the result will be NULL. The coalesce() function replaces NULL with "No Team".
- Similar logic is applied for the silver team (ranking = 2) and the bronze team (ranking = 3).
- Group by clause:
- group by event: Groups the results by event to ensure only one row per event is returned, with the gold, silver, and bronze medal teams.
### Overall Approach:
1. In the CTE: Count the medals won by each team in each event in 2016 and assign rankings based on the medal count.
2. In the final query: Extract the gold, silver, and bronze teams for each event using MAX() and CASE statements, with fallback values (`"No Team"`) if no teams won medals for that position.
This approach efficiently ranks teams and extracts the top three teams (gold, silver, bronze) for each event.
Solution:
O/P:
Q4: You're given a dataset of health inspections. Count the number of violation in an inspection in 'Roxanne Cafe' for each year. If an inspection resulted in a violation, there will be a value in the 'violation_id' column. Output the number of violations by year in ascending order.
Link: https://platform.stratascratch.com/coding/9728-inspections-that-resulted-in-violations?code_type=3
Approach: extract the year from inspection_date and groups the results accordingly. The WHERE clause filters for only the cafe's inspections where violation_id is not null. The count(*) function provides the total inspections per year, and the results are ordered chronologically using ORDER BY 1.
Solution:
O/P:
Q5: Output share of US users that are active. Active users are the ones with an "open" status in the table.
Approach: SUM(CASE WHEN status = 'open' THEN 1 END): This counts how many users have their status as 'open' (i.e., active users).
Solution:
O/P:
GenAI Consultant | Keynote Speaker | Helping students & professionals use AI to scale, grow, and innovate.
6 个月Love your consistency man. Keep going! :)