Did you receive a database and need to check it for errors? Let's do it together.

Did you receive a database and need to check it for errors? Let's do it together.

Attention! Your data format will probably be different, but the principles of data validation are similar.

Frequently we find that the data provided to us contains errors. And before we load it into Leapfrog Geo (or any other software) we need to go through it and find out if there is any data that has any errors.

Let's go through the generated data as an example (also you can find there solved Excel spreadsheet).

Collar Data
Survey Data
Assay and QAQC Data

We have Collar, Survey and Assay + QAQC data.

Collar is the same one that I dealt with in my last post.

Under the Assay&QAQC tab, we have data on the survey interval by interval with content. Just like in the last post, the data was generated.

The check in this case will consist of 3 parts.

  1. Checking the depth from and depth to
  2. Checking intersecting intervals
  3. Checking if SAMPLE_TYPE, QAQC_TYPE and grades are full

Checking the depth from and depth to

In order to check the depth from and depth to for errors, the simplest subtraction will do.

That is, subtract the Depth to from the Depth from and get the values. If the values are negative, then the interval is written incorrectly. For this purpose we can add a new column between the depth to and the interval (between columns D and E).

Where to add new column (optional)
How to check validity of intervals per row

You can get fancy and make a more understandable formula. Here you need to use the logical function IF, which will allow you to output a custom value based on your requests.

The IF logical function uses a logical condition. The IF function has 2 basic outcomes, True and False. An error may also occur, depending on the conditions.

Let's take the following values as an example:

  • If subtraction from depth from depth to is higher than 0
  • If true - No error
  • If false - Depth is not written correctly

As a result we have the following formula and it needs to be extended for the whole column.

=IF(D2-C2>0,"No error", "Depth is not written correctly")        
Formula at Column E


Result of Validation

After pulling through the entire formula, we have the following result. A total of 81 intervals with errors were found.

Result of validation

Checking intersecting intervals

Let's get to the fun part, the part that's harder to define. Intersecting intervals. Yes, of course Leapfrog Geo (assay data excluded QAQC) can easily define all intersecting intervals for us (and not only that), but let's consider a situation where we need to do it with Excel only

A file contains information for all holes and we need to identify intersecting intervals for the entire dataset. However, the solution to this problem is surprisingly easy. In our dataset (in most cases) intervals and holes are collected sequentially. If not, just sort out the intervals and then sort out the holes and everything becomes consistent.

In some cases it is undesirable to sort the data as it may have errors that are already present. Use sorting only if you are sure that it will not cause more problems.

Let's look at the sample data not as samples, but as a stream of data itself. Let's focus only on Hole Number and intervals. Each sample comes sequentially after the other. And the beginning of each interval is the end of the previous interval.

Now let's start the validation.

In order to determine the intersected intervals:

  • we need to use the logical function IF
  • compare the hole number of the previous interval with the hole number of the current interval.
  • If true subtract from the previous interval to the current interval from
  • If false write as a comment “Hole's first interval” because the intervals false in this case are the beginning of the data for another hole.

The formula will look like this:

=IF(A2=A1,D1-C2,"Hole's first interval")        

This resulted in the identification of intervals that overlap with each other (highlighted in red). They are distinguished by the fact that they are not equal to 0, but are greater or lesser.

Result of validation
The values that are presented are the result of subtracting the previous interval from the current interval.

Checking if SAMPLE_TYPE, QAQC_TYPE and grades are full

In some cases we also need to check for some columns to be populated. As an example in my case this is data by sample type, by QAQC type and by grade.

The formula is going to be pretty simple:

#SAMPLE_TYPE Check
=IF(G2="","Sample Type is Empty","")
#QAQC_TYPE Check
=IF(H2="","QAQC Type is Empty","")
#Au grades Check
=IF(J2="","Au grades result is Empty","")        

All comments in one cell for verification? Is this possible?

Yes we can put all of these checks in one cell and it will give us the result for all types of checks. However, it should be based on whether you want everything to be written in one cell or in different cells:

  • If you want it in one cell, then apply the formula below (It is slightly modified and uses the TEXTJOIN function, which allows you to join the cells into one. In our case, all the formulas are in one cell)

=TEXTJOIN(" | ",TRUE,IF(D2-C2>0,"","Depth is not written correctly"),IF(A2=A1,IF(D1-C2<>0,"Overlapping interval or unsampled interval",""),""),IF(F2="","Sample type is empty",""),IF(AND(F2<>"Routine",F2<>"",F2<>"BLK",G2=""),"QAQC Type is empty",""),IF(I2="","Au grade is empty",""))        
Formula usage

  • If you want them in separate cells, then you need to use the formula below (This formula is also slightly modified and uses the inverse of the TEXTJOIN function TEXTSPLIT in combination with the same TEXTJOIN function. All values will be written in separate cells to the right of the cell where you enter this formula). In this case, IFERROR is used to ignore errors in cases where there are no delimiters and it just outputs the value “”.

=IFERROR(TEXTSPLIT(TEXTJOIN(" | ",TRUE,IF(D51-C51>0,"","Depth is not written correctly"),IF(A51=A50,IF(D50-C51<>0,"Overlapping interval or unsampled interval",""),""),IF(F51="","Sample type is empty",""),IF(AND(F51<>"Routine",F51<>"",F51<>"BLK",G51=""),"QAQC Type is empty",""),IF(I51="","Au grade is empty",""))," | "),"")        
Formula usage
Note! The TEXTJOIN function is available for Excel version 2019 and later only. The TEXTSPLIT function is available for Microsoft 365 only.

Hope this was helpful, I'll try to post more of this kind of interesting stuff.

P.S. Cover picture generated by AI (again).



Moses Masingati

Geological Engineer/Exploration geologist

1 个月

I have found this article very helpful. Thanks for taking the time to put it out.

Francois du Toit

Consultant Geotechnical Engineer (Tailings) (AAusIMM)

9 个月

Nice post, Beibarys! Data validation is a must!

要查看或添加评论,请登录

Beibarys Bakytzhan的更多文章

社区洞察

其他会员也浏览了