From Excel to SQL and No Turnback: Breaking Up with Spreadsheets, One Query at a Time!

From Excel to SQL and No Turnback: Breaking Up with Spreadsheets, One Query at a Time!

If you’re used to crunching numbers in Excel, transitioning to SQL might seem like a big leap. But once you see how similar tasks are performed in SQL, you'll realize it's not that difficult at all!

I recently simulated 10 years of oil production data, including workover events (costing between 3 and 6 million USD), and used SQL queries to calculate monthly cash flow from oil sales (at $70 per barrel). Here are some query examples to help you get started:

Aggregate Monthly Production & Revenue

This query groups daily production data into monthly buckets and calculates total oil production and monthly revenue.

SELECT 
    p.well_name, 
    DATEFROMPARTS(YEAR(p.date), MONTH(p.date), 1) AS MonthStart,
    SUM(p.oil_production_stb) AS TotalOil,
    SUM(p.oil_production_stb) * 70 AS MonthlyRevenue
FROM production AS p
GROUP BY 
    p.well_name, 
    DATEFROMPARTS(YEAR(p.date), MONTH(p.date), 1)
ORDER BY 
    p.well_name, MonthStart;        

Explanation:

  • SELECT: Specifies the columns to display. Equivalent in Excel: Choosing columns for your PivotTable.
  • FROM: Specifies the data table. AS is to give the data table an alias for convenience
  • DATEFROMPARTS(YEAR(p.date), MONTH(p.date), 1) AS MonthStart: Creates a date representing the first day of the month. Excel Equivalent: Grouping dates by month in a PivotTable or using the EOMONTH/DATE functions.
  • SUM(p.oil_production_stb): Aggregates oil production for each group. Excel Equivalent: Using SUM() in a PivotTable or a formula.
  • GROUP BY: Groups rows that share the same well name and month. Excel Equivalent: PivotTables automatically group data.
  • ORDER BY: Sorts the results. Excel Equivalent: Sorting rows in your Excel table.


Calculate Cumulative Revenue Using a Common Table Expression (CTE)

CTEs allow you to create a temporary result set that you can reference within the main query, making complex queries easier to manage—similar to breaking down calculations across multiple columns in Excel.

WITH MonthlyRev AS (
    SELECT 
        p.well_name, 
        DATEFROMPARTS(YEAR(p.date), MONTH(p.date), 1) AS MonthStart,
        SUM(p.oil_production_stb) * 70 AS MonthlyRevenue
    FROM production AS p
    GROUP BY 
        p.well_name, 
        DATEFROMPARTS(YEAR(p.date), MONTH(p.date), 1)
)
SELECT 
    well_name,
    MonthStart,
    MonthlyRevenue,
    SUM(MonthlyRevenue) OVER (PARTITION BY well_name ORDER BY MonthStart) AS CumulativeRevenue
FROM MonthlyRev
ORDER BY well_name, MonthStart;        

Explanation:

  • WITH MonthlyRev AS (...): This is a Common Table Expression (CTE). It creates a temporary result set named MonthlyRev that you can use in subsequent queries. Excel Equivalent: Creating a helper column or an intermediate table.
  • SUM(MonthlyRevenue) OVER (PARTITION BY well_name ORDER BY MonthStart): This window function calculates a running total (cumulative revenue) for each well. Excel Equivalent: A running total formula or cumulative sum in your spreadsheet.
  • PARTITION BY: Groups data by well (like separate PivotTables for each well).
  • ORDER BY in the OVER clause: Determines the order in which the running total is calculated.


Compute Cashflow After a Workover

This query joins the monthly revenue data with workover events and calculates the cumulative cash flow after a workover event (i.e., cumulative revenue minus the workover cost).

WITH MonthlyRev AS (
    SELECT 
        p.well_name, 
        DATEFROMPARTS(YEAR(p.date), MONTH(p.date), 1) AS MonthStart,
        SUM(p.oil_production_stb) * 70 AS MonthlyRevenue
    FROM production AS p
    GROUP BY 
        p.well_name, 
        DATEFROMPARTS(YEAR(p.date), MONTH(p.date), 1)
),
CumulativeRev AS (
    SELECT 
        well_name,
        MonthStart,
        MonthlyRevenue,
        SUM(MonthlyRevenue) OVER (PARTITION BY well_name ORDER BY MonthStart) AS CumulativeRevenue
    FROM MonthlyRev
)
SELECT 
    c.well_name,
    c.MonthStart,
    c.CumulativeRevenue - w.workover_cost AS CashflowAfterWorkover
FROM CumulativeRev c
INNER JOIN workover w ON c.well_name = w.well_name
WHERE c.MonthStart >= w.workover_end_date
ORDER BY c.well_name, c.MonthStart;        

Explanation:

  • INNER JOIN: Combines rows from the CumulativeRev CTE and the workover table where the well names match. Excel Equivalent: Using VLOOKUP or INDEX/MATCH to merge tables.
  • WHERE c.MonthStart >= w.workover_end_date: Filters the data to only include months after the workover is complete. Excel Equivalent: Filtering data based on date criteria.
  • c.CumulativeRevenue - w.workover_cost: Calculates the cash flow after subtracting the workover cost from the cumulative revenue. Excel Equivalent: A simple subtraction formula comparing cumulative revenue to a fixed cost.


Final Thoughts:

These examples show that whether you’re using Excel or SQL, the logic remains the same: aggregate your data, perform calculations, and then analyse the results. SQL just provides a more scalable and automated approach for handling large datasets. When you’re dealing with thousands of wells over decades of daily—or even 5-minute—data, Excel can slow to a crawl or simply crashes.

The best part? You can simply copy these queries and swap out the column names, table names, or aggregation functions to suit your data. Making the transition from Excel to SQL is a powerful step toward more efficient, robust data analysis. No turning back once you experience the speed and flexibility of SQL!

Let's connect if you'd like to chat more about these techniques or some more tips.

#OilAndGas #DataAnalysis #SQL #Excel #PetroleumEngineering #EngineeringTips

Alan Mourgues

MSc, CPEng | Oil & Gas Res. Eng. Consultant | Founder of CrowdField, the Premiere Resource Hub for the Reservoir Engineering Crowd.

3 周

Good stuff! Have you tried LLMs to generate code for these queries? If so, how would you rate them?

要查看或添加评论,请登录

Hai Hung Vu的更多文章