登录查看更多内容

Can Exploratory Data Analysis (EDA) for ML and DL Work on Seismic Data as Effective as It Does on Non-seismic Data? Part I

Nicolas Martin

Seismic Quantitative Geoscientist, ML/DL/Geothermal Integrator, Geophysical Advisor & Remote Professional Trainer

发布日期: 2020年10月15日

By Nicolas Martin

Seismic and ML/DL Services Consultant and co-owner at CLS GeoSolutions LLC

This question could be considered at first glance a naive question, as I did but without having a way to prove it in a quantifiable basis. After all, seismic data is just other kind of structured data and EDA works on data to extract meaning from it. Then, I hypothesized that EDA algorithms should work on seismic data as well and be as effective as they have proven be when applied on countless of non-seismic data. But is my thinking 100% correct? How can convince myself that it is the best answer? What can I do to test my hypothesis in a quantitative battlefield?

Fortunately for me, I am on my own way to update my skills from geophysics to geosciences. It means that I know a lot of the key role of seismic data for O&G exploration and production and a comparative lesser but fast-growing expertise about ML and DL. Consequently, having both skills give me a chance to build a bridge between both skills to find a better answer to my prime question rather than just assume one. The goal of this article (presented in two separated parts) is to show my learning way in the fascinating and limitless ML and DL worlds to get a rational and plausible answer (one that really can convince me, and hope also the reader) to the question heading this article.

I will follow a narrative where the final conclusion of this study is firstly stated, followed for the exposition of potential clues that support my conclusion, to end with the execution of some quantifiable “experiments” oriented to measure the impact of seismic data conditioning before EDA to improve ML and DL learnings from this data.

Main conclusion: what I learnt from this study is that while my initial hypothesis had a reasonable justification from a perspective of what EDA does, it was not totally correct if the intrinsic structure of the seismic data is ignored before EDA. In other words, just applying EDA algorithms on seismic data to extract meaningful patterns from it to feed ML and DL stages is not enough but a seismic-oriented EDA approach is needed.

But which is the point here? For decades, the O&G industry have recognized the key role of data conditioning in seismic processing as a critical stage for improving the seismic image quality, ensuring better interpretability of it and inverting it into meaningful dynamic properties. Therefore, which is the relevance of my conclusion? To justify it, to get a deeper insight of the seismic data is first needed.

Following this reasoning, I consider convenient for the reader to insert a short recap of why seismic data is acquired, how it is recorded and what information is carried by it. Also, a summary of what EDA means and which tasks are done by EDA in ML is presented here.

In my opinion why seismic data is acquired obeys to the occurrence of three basic goals: 1) to get the most plausible geological (structural) model of the subsurface; 2) to obtain a quantitative mapping of the subsurface heterogeneity and rock properties distribution after seismic inversion, and 3) to have a regional, regularly sampled, and high-densely areal and in-depth cube of information of the subsurface which can image it along several thousands of feet in depth; a fact still hardly to be matched by other actual data surveying techniques (Figure 1).

Figure 1 (left) Typical acquired and processed seismic cube containing a dense information of the subsurface geology (free online image from Sleipner field). (upper right corner) a map showing areal distribution of clay content from seismic inversion (author’s intellectual property). (lower right corner) Typical 3D seismic interpretation showing layers and faults (free online image courtesy of Paradigm).

But what seismic data means? In essence, seismic data represents a particular response of the subsurface, mainly an elastic response, to stimulus sources regularly placed at or near below the ground (Figure 2). When a downward travelling wavefront is generated by the source in the subsurface and it hits an elastic interface ? acoustic impedance contrast between two contiguous rock materials ? an energy splitting phenomenon occurs at this point and secondary wavefronts are generated from the original down-going one. Some of these secondary wavefronts travel back to the ground surface as reflected waves, but others keep travelling downward as transmitted waves until they again hit deeper elastic interfaces and eventually travel back to surface later as new reflected waves. It is a dynamic process involving also other inelastic energy dispersive processes which contribute to a gradual seismic energy loss at deeper depths.

Figure 2 Two sketches of how the seismic energy is generated by surface sources from a land seismic acquisition (left) and a marine one (right). The down-going wavefronts are reflected to surface and transmitted downwards through the subsurface when they hit an elastic contrast between in-contact materials.

Now, let me talk a bit about EDA. From literature, Exploratory Data Analysis or shortly EDA is understood as algorithms designed to gain intuition for the data before building ML models with it. From Wikipedia, EDA has a conceptual difference with traditional statistical analysis: EDA is primarily focused on seeing what the data can tell us ? usually aided with visual methods ? without a pre-conceived format modeling or hypothesis testing task as classical statistics analysis does (Figure 3). In other words, for classical statistical analysis the data is analyzed following this sequence:

Problem >> Data >> Model (hypothesis) >> Analysis (validation) >> Conclusions (model accuracy)

But for EDA oriented to ML and DL tasks, the following sequence applies:

Problem >> Data >> Analysis (insight) >> Model (prediction) >> Conclusions (model accuracy)

The main EDA tasks as currently referred in the literature are:

– Maximize insight into the data structure (i.e., handle missing values, remove duplicates, define inner distribution, and so)

– Visualize and extract instance relationships or hidden feature patterns from data

– Detect and treat outliers (commonly using a departure from mean and median penalty)

– Normalizing and Scaling (Numerical Variables)

– Encoding Categorical variables

– Bivariate Analysis (i.e., covariance)

In the present study, I will be dealing only with the three first EDA objectives: data structure insight, hidden data information and detection of outliers, and how they can work with seismic data.

Figure 3 Examples of graphical EDA on numerical data to get insight of its internal structure (author’s intellectual property)

Finally, I am ready to start building my bridge between my experience with seismic data and my new skills in EDA. For making it I will use some clues as the building blocks. The first block is related to what the seismic data tell us about the subsurface. When a seismic image (figure 4a) is observed it brings to the observer a perception of the subsurface as a collage of diverse visual patterns, called seismic reflections in the oil industry jargon, arranged in such way that after an appropriate interpretation of it a meaningful geological model appears as formed by layers defining structures (Figure 4b). How do we get geological sense from seismic data? The answer is simple. Because the seismic data is usually recorded in time, what we are seen on a seismic image is just a record of the arriving times to the ground of all the reflected seismic waves travelling across the subsurface. As explained previously, each seismic reflected amplitude is a response to a particular elastic contrast in depth. Then, each recorded arriving time is only a transformed representation of the seismic response in depth. For this reason, the coherent distribution of all those reflected amplitudes, arranged as a function of their associated travel times, is what give us the geological perception of it. But this fact also introduces, as we will see later, a bias in the DL training stage which must be removed or mitigated before EDA.

Figure 4 (a) Typical seismic image after processing. (b) Interpreted seismic image with a geological overlapping.

The second clue is related to the fact that seismic data is a relative data (Figure 5). In other words, it lacks to have a volumetric character like other subsurface properties intrinsically have. For example, rock properties like density, velocity, lithology among others share a common characteristic: they are volumetric properties associated to absolute values when they are measured. In contrast, the seismic energy as previously explained is originated from a transient perturbation propagating through the subsurface (locally pushing and pulling the rock’s particles while it travels) only reacting to elastic contrasts. As a result of this dynamic process, the recorded seismic data is also a series of timing perturbations (recorded as electrical pulses) pushing up, down and horizontally geophones planted in the ground.

Figure 5 (left) Example of a typical seismic trace and its correlation with the elastic impedance contrast profile in depth. The impedance is defined by the rock’s density times its velocity. The impedance increases from left to right. (right) File showing a download of the trace amplitudes. The rows are time samples and the columns trace numbers (from MATLAB 2019)

Consequently, the seismic trace shows amplitudes ranging from negative to positive values representing oscillations around a zero-amplitude line. In this case, the positive seismic amplitudes represent a positive elastic contrast (lower rock material having a higher elastic impedance at interface) while negative seismic amplitudes reveal a negative elastic contrast (upper rock material having a higher elastic impedance at interface). Finally, the zero-amplitude line is a locus of all those depths where the upper and lower impedances across an interface are equal giving a zero-elastic contrast (Figure 5). This seismic amplitude behavior corresponds to a relative property.

The last clue is associated to a by-product of this seismic wavefront propagation through the subsurface. We have been talking only about primary waves (incident, reflected and refracted) but they are not the only waves generated in the subsurface when a wavefront is going down-. Some waves follow a near horizontal path close to the ground surface been recorded as ground-roll waves, other follow a multiple bouncing path between two adjacent layers to be recorded as multiples, and others are incoherent noise like as noise from an untuned radio station. If we consider our primary waves as the desired signals to be recorded, any other wave having a different traveling path as the primary waves have or a different source can be considered outliers for EDA purposes or noise from a seismic processing perspective. Figure 6 shows two examples of a seismic image with primaries (lateral continuous reflections defining top and base of layers) and seismic “outliers” overlapping them in some zones. A caution note here: the reader has to know that although the primaries and outliers coincides in time at those zones, it does not necessarily means that their secondary sources (depth elastic contrast interfaces) which generated them are spatially close. The reason is that we are just recording their corresponding arrival times, but signals and outliers can come from very apart zones in depth and following different travel paths.

Figure 6 Two examples of noise contamination on the seismic primaries or signals. (left) seismic parabolic outliers interfering with the signals (horizontal events) on a NMO-corrected AVA gather. Note that the main effects of the outliers is affect the horizontal signal alignment at the center times and distort the signal amplitudes at far angles. (right) Other example of noise contamination produced by a type of wave running along the ground surface (named ground-rolls). This noise is dispersive in nature and generates a noise cone that strongly degrades the signals around the center of this cone as shown. Those types of seismic “outliers” cannot be effectively removed using traditional EDA applications on non-seismic data.

At this point, I think that the reader has a better understanding of how complex the seismic data structure can be … then, the geoscientist must to have extra caution with this type of data for EDA, ML and DL purposes. But still there is an additional aspect to be considered about the signal and outliers data which is not so obvious and have a significant impact on the performance of ML and DL after EDA when seismic data is concerned. But it will be discussed in the Part II of this article.

Also, I will be introduced to the reader a kind of EDA that I called seismic-driven EDA which takes on account all the above discussed characteristics defining the seismic data and seismic “outliers”. Some quantitative “experiments” will be also included in the Part II of this article to defend my former conclusion. Thanks for your attention.

José Manuel C.

Depth Imaging Geophysicist

4 年

I will be waiting for the second part of your article, Nicolas. Thanks for sharing-it. You show clearly how complex seismic reflection data analysis could be. The way we work for getting an insight of-it while trying to model the subsufarce geological structures and "extract" rock properties from-it should guide-us to wisely apply these new technologies. Otherwise, we will be hoping for algorithms to "magically" give us a useful output. We have to act maybe as "coaches" of a very well built-body athele who has never done gymnastics, to "train" these algorithms with the proper routines and "feed" them with the best "nourishment" to make them "competitive"...

要查看或添加评论，请登录

Nicolas Martin的更多文章

Un Enfoque Empírico en Sísmica Cuantitativa para Geotermia (Parte 1 de 3)

2021年10月25日

Un Enfoque Empírico en Sísmica Cuantitativa para Geotermia (Parte 1 de 3)

INTRODUCCION Bajo el término exploración geotérmica se incluyen todas aquellas técnicas geológicas, geoquímicas y…

5 条评论
CLS GeoSolutions, LLC Receives 2019 Best of Spring Award

2019年9月14日

CLS GeoSolutions, LLC Receives 2019 Best of Spring Award

After three consecutive years of focused efforts to project CLS GeoSolutions, LLC (https://www.clsgeosolutionsllc.

2 条评论
Can ML Improve the Conventional Estimation of Reservoir’s Properties?: Teapot case.

2019年2月17日

Can ML Improve the Conventional Estimation of Reservoir’s Properties?: Teapot case.

The conventional reservoir’s property estimation from seismic data involves a process that usually combines elastic…
ML Classification from Seismic Divergence predictors: Teapot case.

2019年2月7日

ML Classification from Seismic Divergence predictors: Teapot case.

This article shows an attempt to classify seismic by cumulative oil production using machine learning (ML)…

1 条评论
Interpreting ML Divergence Stacks for Geosteering and Fracking planning in Unconventionals.

2019年1月14日

Interpreting ML Divergence Stacks for Geosteering and Fracking planning in Unconventionals.

The present article is a continuation of the previous article titled “A Shale heterogeneity proxy from Supervised…

2 条评论
A Shale heterogeneity proxy from Supervised Machine Learning?

2018年12月27日

A Shale heterogeneity proxy from Supervised Machine Learning?

A real seismic case is shown to validate the use of supervised machine learning to improve the seismic interpretation…

2 条评论
Improving Seismic Interpretation from Supervised Machine Learning

2018年12月7日

Improving Seismic Interpretation from Supervised Machine Learning

Two real cases from the same seismic dataset are shown to validate the use of supervised machine learning to improve…
Seismic Gas Detection Case from Supervised Machine Learning.

2018年12月2日

Seismic Gas Detection Case from Supervised Machine Learning.

A real case to seismically detect gas sands by using a Machine Learning solution supervised with rock physics analysis…

9 条评论
Can ICA Improve Post-stack Seismic AI Inversion? : A real data test (Part II)

2018年10月12日

Can ICA Improve Post-stack Seismic AI Inversion? : A real data test (Part II)

By Nicolas Martin - Main Seismic QI Advisor and Co-founder at CLS Geo Solutions, LLC
Can ICA Improve Post-stack Seismic AI Inversion? : A real data test (Part I)

2018年10月12日

Can ICA Improve Post-stack Seismic AI Inversion? : A real data test (Part I)

By Nicolas Martin - Main Seismic QI Advisor and Co-founder at CLS Geo Solutions, LLC

See all articles

Can Exploratory Data Analysis (EDA) for ML and DL Work on Seismic Data as Effective as It Does on Non-seismic Data? Part I

Nicolas Martin

Seismic Quantitative Geoscientist, ML/DL/Geothermal Integrator, Geophysical Advisor & Remote Professional Trainer

Nicolas Martin的更多文章

社区洞察

其他会员也浏览了

The Integration of AI in Geophysics

15 Data Science and Machine Learning Projects to Make Geophysics More Marketable and profitable in Mineral and Coal Exploration

AI and Geophysics Update from earlier Post

Tutorial: Building quality geological cross sections using the Geoscience plugin in QGIS

How CGG speeds geoscience insights using Azure HPC for Seismic imaging and modeling

Identifying New Mineral Occurrence using Remote Sensing Images

Revolutionizing Lithological Mapping: Harnessing Remote Sensing and Machine Learning for Accuracy and Efficiency

Advanced Techniques in Velocity Modeling for Depth Conversion in Oil & Gas

Advantages of Remote Sensing and Machine Learning in Lithological Mapping

Nicolas Martin的更多文章

Un Enfoque Empírico en Sísmica Cuantitativa para Geotermia (Parte 1 de 3)

CLS GeoSolutions, LLC Receives 2019 Best of Spring Award

Can ML Improve the Conventional Estimation of Reservoir’s Properties?: Teapot case.

ML Classification from Seismic Divergence predictors: Teapot case.

Interpreting ML Divergence Stacks for Geosteering and Fracking planning in Unconventionals.

A Shale heterogeneity proxy from Supervised Machine Learning?

Improving Seismic Interpretation from Supervised Machine Learning

Seismic Gas Detection Case from Supervised Machine Learning.

Can ICA Improve Post-stack Seismic AI Inversion? : A real data test (Part II)

Can ICA Improve Post-stack Seismic AI Inversion? : A real data test (Part I)

社区洞察

其他会员也浏览了

The Integration of AI in Geophysics

15 Data Science and Machine Learning Projects to Make Geophysics More Marketable and profitable in Mineral and Coal Exploration

AI and Geophysics Update from earlier Post

Tutorial: Building quality geological cross sections using the Geoscience plugin in QGIS

How CGG speeds geoscience insights using Azure HPC for Seismic imaging and modeling

Identifying New Mineral Occurrence using Remote Sensing Images

Revolutionizing Lithological Mapping: Harnessing Remote Sensing and Machine Learning for Accuracy and Efficiency

Advanced Techniques in Velocity Modeling for Depth Conversion in Oil & Gas

Advantages of Remote Sensing and Machine Learning in Lithological Mapping