Basic statistical functions in Excel  (for geologists)

Basic statistical functions in Excel (for geologists)

I'm thinking about starting to share my geological data skills via LinkedIn. My experience may not be as great as many of the people I connect with, but at least it can be useful.

I'm going to try to publish this kind of material as often as possible.

Basic statistical functions

I will start with Excel and then go further and further in complexity.

I think it's worth start with the basic functions that are often used in working with data, these are:

=SUM(Range/s or Cells)
=COUNT(Range/s or Cells)
=AVERAGE(Range/s or Cells)
=MIN(Range/s or Cells)
=MAX(Range/s or Cells)        

These functions are quite simple to use. So you enter the SUM function and get the sum for a cell, or for a range.

As an example:

=SUM(A1:A5)         

This formula will give you the sum for column A from rows 1 to 5. But there are derivatives of these functions that are more interesting. If you add a logical expression to these statistical functions, you get functions like:

=SUMIF(Range, Criteria, [Sum Range])
=COUNTIF(Range, Criteria)
=AVERAGEIF(Range, Criteria, [Average Range])        

And if you have more than one logical expression, you can apply:

=SUMIFS(Sum Range, Criteria Range 1, Criteria 1, Criteria Range 2, Criteria 2,....)
=COUNTIFS(Criteria Range 1, Criteria 1, Criteria Range 2, Criteria 2,....)
=AVERAGEIFS(Average Range, Criteria Range 1, Criteria 1, Criteria Range 2, Criteria 2,....)
=MINIFS(Min Range, Criteria Range 1, Criteria 1, Criteria Range 2, Criteria 2,....)
=MAXIFS(Max Range, Criteria Range 1, Criteria 1, Criteria Range 2, Criteria 2,....)        
A little clarification. Statistical functions with one logical expression differ (“first group”) from statistical functions with many logical expressions (“second group”) not only by the number of conditions, but also by their structure. If for the “first group” the range for calculation (Sum range, Average range...) is the last argument, then for the “second group” it is moved to the beginning of the function, because the number of conditions can be many and in order not to make an error the range for calculation is the first argument in the function.

So how do these functions work?

Let's represent an example data set in the form of a collar, which contains a standard set of information.

Note: All data is artificially generated and has no relation to reality.
Collar table
Hole location scheme

The goal is to find:

  1. the volume of drilling in meters for each zone
  2. the drilling start date and drilling end date for each zone
  3. the number of holes for each zone

The volume of drilling in meters for each zone

In order to find the volume of drilling in meters for each zone, you can use the SUMIF function. Why SUMIF?

First of all, we have the depth of holes that need to be summarized, so we use the SUM function, but we also need to make a split by zones, in this case we have a logical expression that will filter the data for summarizing. Therefore, in this case we need to use the SUMIF function.

=SUMIF(H2:H11,"Zone 1",E2:E11)        

As a result, we have the volume of drilling for each zone.

Result of task 1

The drilling start date and drilling end date for each zone

As in the first case, so for the second and third tasks there will be one logical expression. There are 2 basic functions used in this problem: MIN and MAX. However, there are no functions for MIN and MAX with one logical expression, but there are functions for multiple logical expressions. This does not mean that we necessarily need to use more than one logical expression, i.e. we can limit ourselves to just one.

Fun fact. You must have accidentally changed the data type for dates and you got values like 45233 etc. This number has its own meaning, it represents the number of days from January 1, 1900. By default, Microsoft Excel for Windows uses the 1900 date system.

To find the required dates, we need to get the minimum date value by drilling start dates and the maximum by drilling end dates with a logical expression by zone.

=MINIFS(F2:F11,H2:H11,"Zone 1")
=MAXIFS(G2:G11,H2:H11,"Zone 2")        

As a result, we have the start and end dates for each of the zones.

Result of task 2

The number of holes for each zone

The easiest of all is finding the number of holes for each zone. In this situation, there is no range to calculate, only a range of conditions, which we will calculate based on the conditions.

=COUNTIF(H2:H11,"Zone 1")        
Result of task 3

The counting function may vary depending on the task.

  1. If you want to count numerical values, you should use COUNT
  2. If you want to count any filled cells, you should use COUNTA
  3. If you need to count empty cells, you must use COUNTBLANK (If you select the entire range instead of a limited range, it will give you the number of empty cells up to row 1,048,576 and the value will be incorrect).


Hope this was helpful, I'll try to post more of this kind of interesting stuff.

P.S. Cover picture generated by AI.


Zaure Kunanbayeva-Yerkhanova, MAusIIM APONEN

Mineral Consultant| Mineral Economy/ Head of Secretariat of PONEN Executive Committee

9 个月

As always- very useful!

Very helpful!

Akerke Malikova

Finance & Accounting

10 个月

Thank you for sharing, will definitely be following the series ????

Francois du Toit

Consultant Geotechnical Engineer (Tailings) (AAusIMM)

10 个月

I'll read this regularly! :D

要查看或添加评论,请登录

Beibarys Bakytzhan的更多文章