Did you receive a database and need to check it for errors? Let's do it together.
Beibarys Bakytzhan
Consultant - Geology and Data Specialist(MSc, MSEG, AAusIMM, APONEN)
Attention! Your data format will probably be different, but the principles of data validation are similar.
Frequently we find that the data provided to us contains errors. And before we load it into Leapfrog Geo (or any other software) we need to go through it and find out if there is any data that has any errors.
Let's go through the generated data as an example (also you can find there solved Excel spreadsheet).
We have Collar, Survey and Assay + QAQC data.
Collar is the same one that I dealt with in my last post.
Under the Assay&QAQC tab, we have data on the survey interval by interval with content. Just like in the last post, the data was generated.
The check in this case will consist of 3 parts.
Checking the depth from and depth to
In order to check the depth from and depth to for errors, the simplest subtraction will do.
That is, subtract the Depth to from the Depth from and get the values. If the values are negative, then the interval is written incorrectly. For this purpose we can add a new column between the depth to and the interval (between columns D and E).
You can get fancy and make a more understandable formula. Here you need to use the logical function IF, which will allow you to output a custom value based on your requests.
The IF logical function uses a logical condition. The IF function has 2 basic outcomes, True and False. An error may also occur, depending on the conditions.
Let's take the following values as an example:
As a result we have the following formula and it needs to be extended for the whole column.
=IF(D2-C2>0,"No error", "Depth is not written correctly")
After pulling through the entire formula, we have the following result. A total of 81 intervals with errors were found.
Checking intersecting intervals
Let's get to the fun part, the part that's harder to define. Intersecting intervals. Yes, of course Leapfrog Geo (assay data excluded QAQC) can easily define all intersecting intervals for us (and not only that), but let's consider a situation where we need to do it with Excel only
领英推荐
A file contains information for all holes and we need to identify intersecting intervals for the entire dataset. However, the solution to this problem is surprisingly easy. In our dataset (in most cases) intervals and holes are collected sequentially. If not, just sort out the intervals and then sort out the holes and everything becomes consistent.
In some cases it is undesirable to sort the data as it may have errors that are already present. Use sorting only if you are sure that it will not cause more problems.
Let's look at the sample data not as samples, but as a stream of data itself. Let's focus only on Hole Number and intervals. Each sample comes sequentially after the other. And the beginning of each interval is the end of the previous interval.
Now let's start the validation.
In order to determine the intersected intervals:
The formula will look like this:
=IF(A2=A1,D1-C2,"Hole's first interval")
This resulted in the identification of intervals that overlap with each other (highlighted in red). They are distinguished by the fact that they are not equal to 0, but are greater or lesser.
The values that are presented are the result of subtracting the previous interval from the current interval.
Checking if SAMPLE_TYPE, QAQC_TYPE and grades are full
In some cases we also need to check for some columns to be populated. As an example in my case this is data by sample type, by QAQC type and by grade.
The formula is going to be pretty simple:
#SAMPLE_TYPE Check
=IF(G2="","Sample Type is Empty","")
#QAQC_TYPE Check
=IF(H2="","QAQC Type is Empty","")
#Au grades Check
=IF(J2="","Au grades result is Empty","")
All comments in one cell for verification? Is this possible?
Yes we can put all of these checks in one cell and it will give us the result for all types of checks. However, it should be based on whether you want everything to be written in one cell or in different cells:
=TEXTJOIN(" | ",TRUE,IF(D2-C2>0,"","Depth is not written correctly"),IF(A2=A1,IF(D1-C2<>0,"Overlapping interval or unsampled interval",""),""),IF(F2="","Sample type is empty",""),IF(AND(F2<>"Routine",F2<>"",F2<>"BLK",G2=""),"QAQC Type is empty",""),IF(I2="","Au grade is empty",""))
=IFERROR(TEXTSPLIT(TEXTJOIN(" | ",TRUE,IF(D51-C51>0,"","Depth is not written correctly"),IF(A51=A50,IF(D50-C51<>0,"Overlapping interval or unsampled interval",""),""),IF(F51="","Sample type is empty",""),IF(AND(F51<>"Routine",F51<>"",F51<>"BLK",G51=""),"QAQC Type is empty",""),IF(I51="","Au grade is empty",""))," | "),"")
Note! The TEXTJOIN function is available for Excel version 2019 and later only. The TEXTSPLIT function is available for Microsoft 365 only.
Hope this was helpful, I'll try to post more of this kind of interesting stuff.
P.S. Cover picture generated by AI (again).
Geological Engineer/Exploration geologist
1 个月I have found this article very helpful. Thanks for taking the time to put it out.
Consultant Geotechnical Engineer (Tailings) (AAusIMM)
9 个月Nice post, Beibarys! Data validation is a must!