Basic statistical functions in Excel (for geologists)
Beibarys Bakytzhan
Consultant - Geology and Data Specialist(MSc, MSEG, AAusIMM, APONEN)
I'm thinking about starting to share my geological data skills via LinkedIn. My experience may not be as great as many of the people I connect with, but at least it can be useful.
I'm going to try to publish this kind of material as often as possible.
Basic statistical functions
I will start with Excel and then go further and further in complexity.
I think it's worth start with the basic functions that are often used in working with data, these are:
=SUM(Range/s or Cells)
=COUNT(Range/s or Cells)
=AVERAGE(Range/s or Cells)
=MIN(Range/s or Cells)
=MAX(Range/s or Cells)
These functions are quite simple to use. So you enter the SUM function and get the sum for a cell, or for a range.
As an example:
=SUM(A1:A5)
This formula will give you the sum for column A from rows 1 to 5. But there are derivatives of these functions that are more interesting. If you add a logical expression to these statistical functions, you get functions like:
=SUMIF(Range, Criteria, [Sum Range])
=COUNTIF(Range, Criteria)
=AVERAGEIF(Range, Criteria, [Average Range])
And if you have more than one logical expression, you can apply:
=SUMIFS(Sum Range, Criteria Range 1, Criteria 1, Criteria Range 2, Criteria 2,....)
=COUNTIFS(Criteria Range 1, Criteria 1, Criteria Range 2, Criteria 2,....)
=AVERAGEIFS(Average Range, Criteria Range 1, Criteria 1, Criteria Range 2, Criteria 2,....)
=MINIFS(Min Range, Criteria Range 1, Criteria 1, Criteria Range 2, Criteria 2,....)
=MAXIFS(Max Range, Criteria Range 1, Criteria 1, Criteria Range 2, Criteria 2,....)
A little clarification. Statistical functions with one logical expression differ (“first group”) from statistical functions with many logical expressions (“second group”) not only by the number of conditions, but also by their structure. If for the “first group” the range for calculation (Sum range, Average range...) is the last argument, then for the “second group” it is moved to the beginning of the function, because the number of conditions can be many and in order not to make an error the range for calculation is the first argument in the function.
So how do these functions work?
Let's represent an example data set in the form of a collar, which contains a standard set of information.
Note: All data is artificially generated and has no relation to reality.
The goal is to find:
The volume of drilling in meters for each zone
In order to find the volume of drilling in meters for each zone, you can use the SUMIF function. Why SUMIF?
First of all, we have the depth of holes that need to be summarized, so we use the SUM function, but we also need to make a split by zones, in this case we have a logical expression that will filter the data for summarizing. Therefore, in this case we need to use the SUMIF function.
=SUMIF(H2:H11,"Zone 1",E2:E11)
As a result, we have the volume of drilling for each zone.
The drilling start date and drilling end date for each zone
As in the first case, so for the second and third tasks there will be one logical expression. There are 2 basic functions used in this problem: MIN and MAX. However, there are no functions for MIN and MAX with one logical expression, but there are functions for multiple logical expressions. This does not mean that we necessarily need to use more than one logical expression, i.e. we can limit ourselves to just one.
Fun fact. You must have accidentally changed the data type for dates and you got values like 45233 etc. This number has its own meaning, it represents the number of days from January 1, 1900. By default, Microsoft Excel for Windows uses the 1900 date system.
To find the required dates, we need to get the minimum date value by drilling start dates and the maximum by drilling end dates with a logical expression by zone.
=MINIFS(F2:F11,H2:H11,"Zone 1")
=MAXIFS(G2:G11,H2:H11,"Zone 2")
As a result, we have the start and end dates for each of the zones.
The number of holes for each zone
The easiest of all is finding the number of holes for each zone. In this situation, there is no range to calculate, only a range of conditions, which we will calculate based on the conditions.
=COUNTIF(H2:H11,"Zone 1")
The counting function may vary depending on the task.
Hope this was helpful, I'll try to post more of this kind of interesting stuff.
P.S. Cover picture generated by AI.
Mineral Consultant| Mineral Economy/ Head of Secretariat of PONEN Executive Committee
9 个月As always- very useful!
Consultant
10 个月Very helpful!
Finance & Accounting
10 个月Thank you for sharing, will definitely be following the series ????
Consultant Geotechnical Engineer (Tailings) (AAusIMM)
10 个月I'll read this regularly! :D