Unveiling Insights: A Comprehensive Exploratory Data Analysis (EDA) of Oil and Gas Well Production Data.

Unveiling Insights: A Comprehensive Exploratory Data Analysis (EDA) of Oil and Gas Well Production Data.

Exploratory Data Analysis (EDA) is one of the crucial method conducted by every data analyst to investigate and analyze the data sets and summarize it for their main characteries by using data visualization or statistical graphic method. The main objectivities are:

  1. Data Familiarity - Getting familiar with the datasets characteristics and behavior by understanding the contents and its quality issues (missing values, outliers, or inconsistences)
  2. Data Visualization - To present the data in plots, chats, or graphs for easier way in identifying patterns or trends that might not be appearent from numbers.
  3. Data relationship - To evaluate the relationships between the data variables and identifying correlations, associations, and dependencies.
  4. Hypothesis Generation - EDA leads to the hypotheses formulation to test it for deeper insights and make accurate prediction.

In this article, we will demonstrate the execution of Exploratory Data Analysis (EDA) using a publicly available oil and gas production dataset provided by VOLVE. You may download the datasets from the link below:

Oil and Gas Production Dataset: https://www.equinor.com/energy/volve-data-sharing

By employing EDA techniques, we aim to analyze the dataset comprehensively and present a clear and insightful demonstration of the analysis process, to identify the type of production well that have significant production, and searching for the potential well type development after 2018.

1. Data Import

Firstly, the datasets and the relevant libraries were imported for the analysis. In this analysis, pandas, matplotlib and seaborn will be used in this EDA:

No alt text provided for this image
.
No alt text provided for this image

Based on the datasets, there are 23 attributes in the dataset contained in this datasets. Use .head function for a better data overview.

No alt text provided for this image


No alt text provided for this image

In this analysis, the relevant attributes shown below will be extracted to a new DataFrame that suits for our study:

  • DATEPRD: The date of production.
  • NPD_WELL_BORE_CODE : The unique code for well identifier. This is to help for identifying the type of well.
  • BORE_OIL_VOL :The volume of oil produced.
  • BORE_GAS_VOL :The volume of gas produced.
  • WELL_TYPE?: The type of well (WI=Injector well or OP=Production Well).

We specifically excluded the variables BORE_WATER_VOL and BORE_WI_VOL from our analysis because we focused on the producing wells only. Since these variables pertain to water volume and water injection, they are not relevant for the selected subset of wells that are actively producing. As a result, there is no water injection coming from the producing wells.

By excluding the data related to water volume and water injection, we ensure that our analysis is focused solely on the variables directly associated with oil and gas production.

No alt text provided for this image

Before we proceed to data cleaning, graph can be plotted between BORE_OIL_VOL and BORE_GAS_VOL to briefly review on the data characteristies and distributions.

No alt text provided for this image

Based on the plotted graph, the relationship between the produced oil and produced gas was observed at initial stage.

2. Data Cleaning and Pre-processing

Firstly, the values in the WELL_TYPE attributes was checked and evaluated:

No alt text provided for this image

Since our focus is solely on analyzing production wells, we will eliminate a total of 6491 data columns that pertain to injection wells as they are deemed unnecessary for our analysis. By excluding these columns from our dataset, we can streamline our analysis specifically to the production well and ensure that our EDA is targeted towards the desired outcomes.

No alt text provided for this image

Now with the removal of all data columns related to injection wells, we can proceed with further analysis, starting with examining missing values and checking for outliers.

Checking missing values

Firstly, the missing data are located to the the total rows of data with missing value.

No alt text provided for this image

Based on the analysis, there are no missing values detected in any of the attributes. Therefore, no rows will be removed or modified, ensuring the data integrity remains intact.

Check for outlier

As the data type for BORE_OIL_VOL and BORE_GAS_VOL, the outlier can be checked and evaluated for accuracy improvement. The outlier are determined by using boxplot and histogram method.

No alt text provided for this image
No alt text provided for this image
No alt text provided for this image
No alt text provided for this image

After identifying the outliers in the BORE_OIL_VOL and BORE_GAS_VOL datasets, which occur beyond the value of approximately 460,000as indicated by the box plot, it is advisable to remain it for these outliers as the objective of this EDA is to determine the type of well with significant production, not for the comparison between oil and gas volume.

Based on the distribution plot, we can clearly seen that a higher proportion of well produced 0 oil and gas. These data shall be removed and only focus on the type of well that focuses on the significant volume of production.

Determination of the well type

To evaluate the total number of well types and the total number of production wells in each well type, we can perform a value count analysis on the available data. let's do a value count for each well type in this data to evaluate of total number of well type, and total number of production well in each well type.

No alt text provided for this image

Based on the analysis, a total of 6 well type with code #5599, #5351, #7078, #7289, #7405, and #5769 were determined. Well #5599 and #5351 helds the biggest amount of production well whereas Well #5769 helds the least amount of production wells.

Now, we will observed the proportion of each well type by using Empirical Cumulative Distribution Functions (ECDF)for both oil and gas volume under Seaborn libraries, which is a type of plot that aids direct comparisons between multiple distributions based on the proportion or count of observations falling below each unique value in a dataset.?

No alt text provided for this image
No alt text provided for this image

From the plotted results, Well #7405 helds the most production wells with zero oil and gas production, up to 40%. Other than that, well #7289 and #5769 helds around 18-22% each for the production well with no oil and gas. We expected to see a major decrease number of well with zero prodcution from these 3 wells, after all the zero production well data column are removed.

No alt text provided for this image

With the removal of the data columns where BORE_OIL_VOL is equal to 0, our datasets are now prepared for further analysis aimed at determining the best production strategies.

3. Evaluation of the oil and gas production for each well

To evaluate the well type performance, scatter plot (seaborn) is used to review the oil and gas production for each well type starting from 2008 to 2017.

No alt text provided for this image
No alt text provided for this image

By referring the scatter plot results, we can clearly analyse that well #5599 and #5351 shows the most the type of well with the most significant oil and gas production along the year. For the gas production, we can simply predict the same results as oil production due to the relationships between BORE_OIL_VOL and BORE_GAS_VOL that plotted previously.

No alt text provided for this image
No alt text provided for this image

4. Potential Significant Well for Oil and Gas Production after 2017

Upon analyzing the scatter plot, it is evident that well #7078 has emerged as the prominent well, exhibiting substantial growth in production volume since 2014. Furthermore, an intriguing trend is observed in well #5769, where there has been a noticeable increase in oil and gas production. To gain a deeper understanding, let's delve into the specifics of both wells.

No alt text provided for this image

Based on the observed trendline in the distributed data, it is possible to predict an increase in oil and gas production for well #5769 after 2017. The upward trajectory suggests a positive growth trend. However, when it comes to well #7078, it is challenging to conclusively determine its potential to become the most significant well (standing up for the highest amount of wells with relatively high oil and gas production volume) . A more comprehensive analysis is necessary, incorporating additional data beyond 2017, to gain a better insight into the future trajectory of well #7078 and its significance in terms of oil and gas production volume.

EDA Outcome Summary

1.Objective of the Analysis

  • to identify the type of production well that have significant production, and searching for the potential well type development after 2018.


2.Dataset Overview

  • The dataset consists of 23 attributes related to oil and gas production.
  • Selected crucial attributes for analysis: DATEPRD, NPD_WELL_BORE_CODE, BORE_OIL_VOL, BORE_GAS_VOL, and WELL_TYPE.


3. Data Cleaning and Pre-processing

  • Injection wells were removed from the analysis to focus on production wells.
  • No missing values were found in the dataset.
  • Outliers beyond a certain threshold were remained for improved accuracy.
  • Six well types were identified based on the NPD_WELL_BORE_CODE value count.
  • Well #5599 and #5351 had the highest number of production wells, while well #5769 had the fewest.
  • Analysis of the ECDF plots showed that well #7405 had the most production wells with zero production.
  • Well #7289 and #5769 also had a significant number of wells with zero production.


4. Evaluation of the Significant Oil and Gas Production

  • Scatter plots revealed that well #5599 and #5351 consistently exhibited the most significant oil and gas production from 2018 to 2017.
  • Well #7078 showed substantial growth in production volume since 2014.
  • Well #5769 exhibited an increasing trend in oil and gas production after 2016.


5. Predictions

  • Based on the trends observed, it is predicted that well #5769 may experience further increases in oil and gas production after 2017.
  • The potential of well #7078 to become the most significant well requires additional data beyond 2017 for a more accurate assessment.

No alt text provided for this image

Conclusion

In conclusion, The comprehensive exploratory data analysis (EDA) of the oil and gas well production data provided valuable insights into the field production dataset. By leveraging various EDA techniques, Deeper understanding was gained for the characteristics, trends, and potential of the wells.

By conducting this comprehensive EDA, Crucial steps were taken towards understanding the underlying patterns and trends in the oil and gas well production data. It sets the stage for further analysis and serves as a foundation for data-driven decision-making in the oil and gas industry. The importance of exploratory data analysis (EDA) cannot be overstated, especially as we move into a more digitalised world with full of AI and deep learning algorithm. EDA plays a vital role in uncovering valuable insights and patterns hidden within vast amounts of field data, enabling robust solution and insights based on the previous decision-making and optimizing the oil and gas production strategies in upstream and downstream region.

要查看或添加评论,请登录

Kenneth Chong Yih Haur的更多文章

社区洞察

其他会员也浏览了