Data science and machine learning in petroleum geostatistics - Part 1

Data science and machine learning in petroleum geostatistics - Part 1

"A journey of a thousand miles begins with a single step"

Lao Tzu

Introduction

We start this topic with a simple problem taken from the book Applied Geostatistics for Reservoir Characterization, written by Mohan Kelkar and Godofredo Perez. It is on page 18 in Chapter 2, Principles of Statistics. By solving this problem, you will learn the most elementary lesson about how to interact with data in Microsoft Excel and Python. At the same time, you will also understand and know how to calculate frequency and plot a histogram. These are the most fundamental concepts in statistics and data science. In addition, although it is assumed that you already have some knowledge of descriptive statistics, I will also review the basic concepts and how to calculate them in the problem-solving method section. Now let's begin. ?

Problem 1: Analyze porosity values using a frequency-distribution analysis

????The following porosity samples are measured in a wellbore: 0.141, 0.124, 0.152, 0.156, 0.113, 0.167, 0.194, 0.142, 0.133, 0.149, 0.106, 0.137, 0.147, 0.159, 0.174, 0.129, 0.153, 0.173, 0.189, 0.16, 0.193, 0.156, 0.149, 0.135, 0.145, 0.171, 0.101, 0.151, 0.176, 0.191, 0.121, 0.148, 0.153, 0.171, 0.183, 0.108, 0.123, 0.169, 0.185, 0.153, 0.117, 0.127, 0.145, 0.141, 0.165, 0.14, 0.143, 0.178, 0.179, 0.157.

??? Analyze these porosity, ?, values using a frequency-distribution analysis.

Problem-solving method

For the convenience of comparing the calculation results, I will follow the problem-solving method in the book cited above. We also divide the values into 5 classes with the following range:

1. 0.10 < ? ≤ 0.12

2. 0.12 < ? ≤ 0.14

3. 0.14 < ? ≤ 0.16

4. 0.16 < ? ≤ 0.18

5. 0.18 < ? ≤ 0.2

Next, we summarize the basic concepts related to frequency and how to calculate them.

??Class intervals: the range of the data is divided into intervals. For this problem, we have five class intervals.

??Class frequency, fi: the number of measurements falling within a particular class, i. For example, the class 5 has six measurements falling within a range of 0.18 < ? ≤ 0.2.

??Relative frequency, fRi

No alt text provided for this image

where,

n = total number of samples. In this problem, we have 50 samples.

For example, the class 2 has nine measurements falling within a range of 0.12 < ? ≤ 0.14 (fi = 9). As a result, the relative frequency of class 2 is 9/50, or 0.18.?

Note that the sum of all relative frequencies is one.

No alt text provided for this image

??Cumulative relative class frequency, Fj

No alt text provided for this image

where,

Fj = cumulative frequency for class j that results from the sum of all the relative frequencies from class 1 to class j.

Problem-solving by using Microsoft Excel

In this section, we will learn how to use Microsoft Excel, a world-famous spreadsheet developed by Microsoft, to solve this problem. Through this work, we'll learn some useful little tips and tricks on how to work with data.

The first is data entry. With the usual way of thinking, we will enter each porosity value into each cell according to a certain column. But this is quite time-consuming. The fastest way is to take advantage of some tricks in Excel. Here, I would like to give you step-by-step instructions so that you can input data quickly and solve the problem.

Step 1: Enter the data in column form.

Copy all porosity values. Open a new Excel file, in cell A1, right-click and select Match Destination Formatting (M) in Paste Special.

No alt text provided for this image

The result in Excel is shown as follows:

No alt text provided for this image

In the Data tab, select Text to Columns. Click Delimited.

No alt text provided for this image

Then, in the box, click Comma to separate values into cells. Click Next and Finish.

No alt text provided for this image

The result in Excel is shown as follows:

No alt text provided for this image

Use the key combination Ctrl+A and Ctrl+C to select and copy all porosity values in cells. In cell A2, right-click and select Paste Special.

No alt text provided for this image

In Paste Special box, click Transpose.

No alt text provided for this image

All porosity values from Row 1 are transposed to one column. The result in Excel is shown as follows:

No alt text provided for this image

Right-click Row 1 and select Delete.

No alt text provided for this image

Now, all porosity values are in column A.

No alt text provided for this image

Step 2: Prepare the spreadsheet and use “Define Name”.

Before starting to demonstrate the implementation of Step 2, I have a few words about a way of thinking and working in Excel. That is naming a cell or array as a variable in Excel with Define Name. With this approach, we can save a lot of time checking cell references and almost completely eliminate the use of the F4 key to fix cells or arrays. For example, we can count the total number of samples, n, by using “Define Name”.

Select all porosity values in cells, right-click, and select Define Name.

No alt text provided for this image

Enter the text "Data" into the Name cell and click OK.

No alt text provided for this image

The result in Excel is shown as follows. You can see “Data” in the “Name Box”.

No alt text provided for this image

Enter n in B1 cell. In C1 cell, enter the function:

=count(Data)

The result is shown to be that n, the total number of samples, is equal to 50.

No alt text provided for this image

In C1 cell, right-click, and select Define Name. Click OK.

No alt text provided for this image

?You can see “n” in the “Name Box”.

No alt text provided for this image

Note that we have 5 classes divided into intervals of values as mentioned above. Prepare the spreadsheet as follows:

No alt text provided for this image

Step 3: Calculate the parameters related to frequency.

In F2 cell, enter the function:

=COUNTIFS(Data,">0.1",Data,"<=0.12")

The result is shown to be that fi1, the class 1 frequency, is equal to 5.

No alt text provided for this image

Do the same for cells F3, F4, F5, and F6, we have the values fi2, fi3, fi4, and fi5 of 9, 20, 10, and 6, respectively.

No alt text provided for this image

Set the variable names from cells F2 to F6, respectively, fi1_, fi2_, fi3_, fi4_, and fi5_. Now you can calculate the relative class frequency.

In G2 cell, enter the function:

=fi1_/n

The result is shown to be that fRi1, the class 1 relative frequency, is equal to 0.1.

No alt text provided for this image

Do the same for cells G3, G4, G5, and G6, we have the values fRi2, fRi3, fRi4, and fRi5 of 0.18, 0.4, 0.2, and 0.12, respectively.

No alt text provided for this image

Set the variable names from cells G2 to G6, respectively, fRi1_, fRi2_, fRi3_, fRi4_, and fRi5_. Now you can calculate the cumulative relative class frequency.

In H2 cell, enter the function:

=SUM(fRi1_)

The result is shown to be that Fi1, the class 1 cumulative relative frequency, is equal to 0.1.

In H3 cell, enter the function:

=SUM(fRi1_,fRi2_)

The result is shown to be that Fi2, the class 1 to class 2 cumulative relative frequency, is equal to 0.28.

Do the same for cells H4, H5, and H6. By adding cumulatively, we have the values Fi3, Fi4, and Fi5 of 0.68, 0.88, and 1.0, respectively.

No alt text provided for this image

Step 4: Plot the relative class frequency and cumulative relative class frequency.

For convenient plotting of relative class frequency and cumulative relative class frequency, you can copy the Class and fRi?columns into columns J and K and the Class and Fi?columns into columns M and N.

No alt text provided for this image

Select array J2:K6. In the Insert tab, select Clustered column.

No alt text provided for this image

You can edit the title plot name with “Relative class frequency”. The result is shown as follows:

No alt text provided for this image

Do the same for the cumulative relative class frequency plot, and the result is shown as follows:

No alt text provided for this image

Part 1 ends here. I hope you enjoy it, and the first drum is not so low. I will upload this spreadsheet to my personal Github page. You can download it to practice if you want. If the article has errors or omissions, I hope to receive feedback and suggestions to make it better. I'll see you in the next post with a solution to Problem 1 using Python.

Note: You are free to share the content, but please cite the source. Thank you so much for this kindness.

要查看或添加评论,请登录

Tam Tran的更多文章

社区洞察

其他会员也浏览了