From Excel to SQL and No Turnback: Breaking Up with Spreadsheets, One Query at a Time!
If you’re used to crunching numbers in Excel, transitioning to SQL might seem like a big leap. But once you see how similar tasks are performed in SQL, you'll realize it's not that difficult at all!
I recently simulated 10 years of oil production data, including workover events (costing between 3 and 6 million USD), and used SQL queries to calculate monthly cash flow from oil sales (at $70 per barrel). Here are some query examples to help you get started:
Aggregate Monthly Production & Revenue
This query groups daily production data into monthly buckets and calculates total oil production and monthly revenue.
SELECT
p.well_name,
DATEFROMPARTS(YEAR(p.date), MONTH(p.date), 1) AS MonthStart,
SUM(p.oil_production_stb) AS TotalOil,
SUM(p.oil_production_stb) * 70 AS MonthlyRevenue
FROM production AS p
GROUP BY
p.well_name,
DATEFROMPARTS(YEAR(p.date), MONTH(p.date), 1)
ORDER BY
p.well_name, MonthStart;
Explanation:
Calculate Cumulative Revenue Using a Common Table Expression (CTE)
CTEs allow you to create a temporary result set that you can reference within the main query, making complex queries easier to manage—similar to breaking down calculations across multiple columns in Excel.
WITH MonthlyRev AS (
SELECT
p.well_name,
DATEFROMPARTS(YEAR(p.date), MONTH(p.date), 1) AS MonthStart,
SUM(p.oil_production_stb) * 70 AS MonthlyRevenue
FROM production AS p
GROUP BY
p.well_name,
DATEFROMPARTS(YEAR(p.date), MONTH(p.date), 1)
)
SELECT
well_name,
MonthStart,
MonthlyRevenue,
SUM(MonthlyRevenue) OVER (PARTITION BY well_name ORDER BY MonthStart) AS CumulativeRevenue
FROM MonthlyRev
ORDER BY well_name, MonthStart;
Explanation:
Compute Cashflow After a Workover
This query joins the monthly revenue data with workover events and calculates the cumulative cash flow after a workover event (i.e., cumulative revenue minus the workover cost).
WITH MonthlyRev AS (
SELECT
p.well_name,
DATEFROMPARTS(YEAR(p.date), MONTH(p.date), 1) AS MonthStart,
SUM(p.oil_production_stb) * 70 AS MonthlyRevenue
FROM production AS p
GROUP BY
p.well_name,
DATEFROMPARTS(YEAR(p.date), MONTH(p.date), 1)
),
CumulativeRev AS (
SELECT
well_name,
MonthStart,
MonthlyRevenue,
SUM(MonthlyRevenue) OVER (PARTITION BY well_name ORDER BY MonthStart) AS CumulativeRevenue
FROM MonthlyRev
)
SELECT
c.well_name,
c.MonthStart,
c.CumulativeRevenue - w.workover_cost AS CashflowAfterWorkover
FROM CumulativeRev c
INNER JOIN workover w ON c.well_name = w.well_name
WHERE c.MonthStart >= w.workover_end_date
ORDER BY c.well_name, c.MonthStart;
Explanation:
Final Thoughts:
These examples show that whether you’re using Excel or SQL, the logic remains the same: aggregate your data, perform calculations, and then analyse the results. SQL just provides a more scalable and automated approach for handling large datasets. When you’re dealing with thousands of wells over decades of daily—or even 5-minute—data, Excel can slow to a crawl or simply crashes.
The best part? You can simply copy these queries and swap out the column names, table names, or aggregation functions to suit your data. Making the transition from Excel to SQL is a powerful step toward more efficient, robust data analysis. No turning back once you experience the speed and flexibility of SQL!
Let's connect if you'd like to chat more about these techniques or some more tips.
#OilAndGas #DataAnalysis #SQL #Excel #PetroleumEngineering #EngineeringTips
MSc, CPEng | Oil & Gas Res. Eng. Consultant | Founder of CrowdField, the Premiere Resource Hub for the Reservoir Engineering Crowd.
3 周Good stuff! Have you tried LLMs to generate code for these queries? If so, how would you rate them?