Unveiling Insights: A Comprehensive Exploratory Data Analysis (EDA) of Oil and Gas Well Production Data.
Kenneth Chong Yih Haur
Instrument Engineer | Global Top 10 Student 2023 | National Energy Globe Award Winner
Exploratory Data Analysis (EDA) is one of the crucial method conducted by every data analyst to investigate and analyze the data sets and summarize it for their main characteries by using data visualization or statistical graphic method. The main objectivities are:
In this article, we will demonstrate the execution of Exploratory Data Analysis (EDA) using a publicly available oil and gas production dataset provided by VOLVE. You may download the datasets from the link below:
Oil and Gas Production Dataset: https://www.equinor.com/energy/volve-data-sharing
By employing EDA techniques, we aim to analyze the dataset comprehensively and present a clear and insightful demonstration of the analysis process, to identify the type of production well that have significant production, and searching for the potential well type development after 2018.
1. Data Import
Firstly, the datasets and the relevant libraries were imported for the analysis. In this analysis, pandas, matplotlib and seaborn will be used in this EDA:
Based on the datasets, there are 23 attributes in the dataset contained in this datasets. Use .head function for a better data overview.
In this analysis, the relevant attributes shown below will be extracted to a new DataFrame that suits for our study:
We specifically excluded the variables BORE_WATER_VOL and BORE_WI_VOL from our analysis because we focused on the producing wells only. Since these variables pertain to water volume and water injection, they are not relevant for the selected subset of wells that are actively producing. As a result, there is no water injection coming from the producing wells.
By excluding the data related to water volume and water injection, we ensure that our analysis is focused solely on the variables directly associated with oil and gas production.
Before we proceed to data cleaning, graph can be plotted between BORE_OIL_VOL and BORE_GAS_VOL to briefly review on the data characteristies and distributions.
Based on the plotted graph, the relationship between the produced oil and produced gas was observed at initial stage.
2. Data Cleaning and Pre-processing
Firstly, the values in the WELL_TYPE attributes was checked and evaluated:
Since our focus is solely on analyzing production wells, we will eliminate a total of 6491 data columns that pertain to injection wells as they are deemed unnecessary for our analysis. By excluding these columns from our dataset, we can streamline our analysis specifically to the production well and ensure that our EDA is targeted towards the desired outcomes.
Now with the removal of all data columns related to injection wells, we can proceed with further analysis, starting with examining missing values and checking for outliers.
Checking missing values
Firstly, the missing data are located to the the total rows of data with missing value.
Based on the analysis, there are no missing values detected in any of the attributes. Therefore, no rows will be removed or modified, ensuring the data integrity remains intact.
Check for outlier
As the data type for BORE_OIL_VOL and BORE_GAS_VOL, the outlier can be checked and evaluated for accuracy improvement. The outlier are determined by using boxplot and histogram method.
After identifying the outliers in the BORE_OIL_VOL and BORE_GAS_VOL datasets, which occur beyond the value of approximately 460,000as indicated by the box plot, it is advisable to remain it for these outliers as the objective of this EDA is to determine the type of well with significant production, not for the comparison between oil and gas volume.
Based on the distribution plot, we can clearly seen that a higher proportion of well produced 0 oil and gas. These data shall be removed and only focus on the type of well that focuses on the significant volume of production.
Determination of the well type
领英推荐
To evaluate the total number of well types and the total number of production wells in each well type, we can perform a value count analysis on the available data. let's do a value count for each well type in this data to evaluate of total number of well type, and total number of production well in each well type.
Based on the analysis, a total of 6 well type with code #5599, #5351, #7078, #7289, #7405, and #5769 were determined. Well #5599 and #5351 helds the biggest amount of production well whereas Well #5769 helds the least amount of production wells.
Now, we will observed the proportion of each well type by using Empirical Cumulative Distribution Functions (ECDF)for both oil and gas volume under Seaborn libraries, which is a type of plot that aids direct comparisons between multiple distributions based on the proportion or count of observations falling below each unique value in a dataset.?
From the plotted results, Well #7405 helds the most production wells with zero oil and gas production, up to 40%. Other than that, well #7289 and #5769 helds around 18-22% each for the production well with no oil and gas. We expected to see a major decrease number of well with zero prodcution from these 3 wells, after all the zero production well data column are removed.
With the removal of the data columns where BORE_OIL_VOL is equal to 0, our datasets are now prepared for further analysis aimed at determining the best production strategies.
3. Evaluation of the oil and gas production for each well
To evaluate the well type performance, scatter plot (seaborn) is used to review the oil and gas production for each well type starting from 2008 to 2017.
By referring the scatter plot results, we can clearly analyse that well #5599 and #5351 shows the most the type of well with the most significant oil and gas production along the year. For the gas production, we can simply predict the same results as oil production due to the relationships between BORE_OIL_VOL and BORE_GAS_VOL that plotted previously.
4. Potential Significant Well for Oil and Gas Production after 2017
Upon analyzing the scatter plot, it is evident that well #7078 has emerged as the prominent well, exhibiting substantial growth in production volume since 2014. Furthermore, an intriguing trend is observed in well #5769, where there has been a noticeable increase in oil and gas production. To gain a deeper understanding, let's delve into the specifics of both wells.
Based on the observed trendline in the distributed data, it is possible to predict an increase in oil and gas production for well #5769 after 2017. The upward trajectory suggests a positive growth trend. However, when it comes to well #7078, it is challenging to conclusively determine its potential to become the most significant well (standing up for the highest amount of wells with relatively high oil and gas production volume) . A more comprehensive analysis is necessary, incorporating additional data beyond 2017, to gain a better insight into the future trajectory of well #7078 and its significance in terms of oil and gas production volume.
EDA Outcome Summary
1.Objective of the Analysis
2.Dataset Overview
3. Data Cleaning and Pre-processing
4. Evaluation of the Significant Oil and Gas Production
5. Predictions
Conclusion
In conclusion, The comprehensive exploratory data analysis (EDA) of the oil and gas well production data provided valuable insights into the field production dataset. By leveraging various EDA techniques, Deeper understanding was gained for the characteristics, trends, and potential of the wells.
By conducting this comprehensive EDA, Crucial steps were taken towards understanding the underlying patterns and trends in the oil and gas well production data. It sets the stage for further analysis and serves as a foundation for data-driven decision-making in the oil and gas industry. The importance of exploratory data analysis (EDA) cannot be overstated, especially as we move into a more digitalised world with full of AI and deep learning algorithm. EDA plays a vital role in uncovering valuable insights and patterns hidden within vast amounts of field data, enabling robust solution and insights based on the previous decision-making and optimizing the oil and gas production strategies in upstream and downstream region.