2024: Are you still using Excel (for Geotech data)?
Bounding boxes for inter-percentile ranges from 5% to 0.01% (seismic data points)

2024: Are you still using Excel (for Geotech data)?

Spoiler alert … for most of you, the answer is probably ‘yes’. A recent poll in Mining Geomechanics showed these results:

Poll in Mining Geomechanics (Dec 2023). Undoubtedly, it would have been nice (and more relevant) to have better participation in a group with over 5000 members, but that is the nature of LinkedIn.

Now, there wouldn’t be a single person reading this article who has not used Excel – a software that has been around for almost 40 years and is used by over half a billion people. It allows for highly complex operations, and limitations on rows or formulas hardly matter anymore. It is usually safe to assume you can share an Excel file with anyone else without raising an eyebrow.

Power BI

Next, Power BI is probably the most overlooked and undervalued tool currently in the market on the commercial side. Deriving its origins in Excel's (loved or hated) pivot tables a decade ago, it became more than a spin-off. It has fascinating capabilities to get data from various sources and process these with simple-to-use rules on import. You can combine results from different files or different sources altogether into a dashboard with sophisticated interactive graphics and refresh all of this when the sources change. The PRO edition (currently $10US/month) allows sharing these dashboards via websites and other collaboration features.

Combining data sources in a dashboard in Power BI (left) and interactive drill-down on the web version (right).
Sustainable. Platform. Cloud. Dashboard.

Let’s look at mining software providers. Despite many New Year messages from their CEOs with their vision and throwing buzzwords around for what is coming in the year ahead, they are mostly asleep at the wheel (I'd set 2024's recent appointees exempt; let's give them a chance). It seems the larger companies can’t keep up with innovation at the level of open-source and cross-industry solutions. Using proprietary software has several downsides. Costs, lock-in to specific data formats, and workflows that are not practical or flexible enough for small, quick operations on data.

/?pa?θ?n/

That leaves us with Python, a high-level, general-purpose programming language that is open source (Python Software Foundation). "Its design philosophy emphasizes code readability with the use of significant indentation" (Wikipedia). It's great to see the Python camp in the poll is very visible. If you’ve been reluctant to start programming so far, here are some useful resources to get started. All of these deserve their own article and documentation and courses can dive deep into the functionality. A simple search will often point to some good resources.

https://www.python.org/ - home of the Python community, providing downloads and documentation.

https://www.anaconda.com/ - the Anaconda distribution has become the go-to solution to manage the various packages that accelerate code development. Python libraries like NumPy, SciPy, Pandas, Matplotlib and so many more allow us to reuse solutions for common workflows like efficiently solving math problems or handling data in arrays (and tables). Matplotlib is a great tool to quickly plot some data, especially in code development, not necessarily replacing all your visualisation needs.

iPython and Jupyter Notebooks are mind-blowingly accessible tools.

Visual Studio Code - a great contribution from Microsoft as a free development tool that integrates well with above.

Why Python?

20 years ago the value of Python programming might have come from the ability to write code to read files and process data in a rather detailed manner. Now, most tasks have been done before and are available through libraries. Objects like arrays in NumPy make complex operations so simple they can often fit into a single statement. Another step is Pandas, with the DataFrame object essentially a table competing with Excel.

The example below determines the bounding box of (x,y,z) data with additional columns of values (it can be a block model, it can be seismic data or results from a simulation in Flac3D or Abaqus).

Bounding Box of np.array() as ((x0,y0,z0),(x1,y1,z1))

Another typical application might be filtering the data inside a bounding box. Not its own, obviously, but rather one determined by other criteria such as an offset from the mine geometry (stl can be processed easily; more about that in a future article). More interesting might be the inter-percentile range, as this might be useful for dealing with extreme values, outliers or even typos where a missing decimal point or comma results in coordinates in millions (still passing other tests as a valid float point number, though):

Filter in inter-percentile range of np.array() of shape (N, 3 or more)

Simple and fast

The practical application of this to a cloud of data points is shown below, using the published data set of Tasmania Mine seismic events. By itself, this is not very clever. However processing some 40,000 lines in seconds allows experimenting with the data for analysis, not just producing a static outcome.

Bounding boxes for inter-percentile ranges from 5% to 0.01% (seismic data points)

Least Squares

To show a few more examples, here is a quick regression in Python in 5 lines of code, using LinAlg in NumPy and Pandas' DataFrame CSV reader:

Validating Python LinAlg regression with Excel

KDTree

Mapping millions of points to their closest counterpart in another dataset sounds like a familiar problem? KDTree in scipy.spatial does it (in seconds):

Validating KDTree by just looking at one sample point (#12)

  • data[points[n][0]] is closest to sample[n]
  • dist[n][m] is distance sample[n] to data[points[m]]

What's next?

You might have to overcome some reluctance to start coding if you're new to this. You can try the above examples to experiment and have some fun.

Play!

In the next article, I will:

  • Show some visualisation tools
  • Work with quick and simple geometry tools for STL
  • Look at numbers that are not numbers (‘123’ instead of 123) and missing data (' ,, ')

geotechTools

As the code becomes more complex there will be an open source repository for access to the examples on GitHub for some common geotech tools.

Have a great 2024!


Stephan ARNDT

Head of Technology & Innovation | Numerical Modelling Expert

1 年
Muhammad Hafeez Osman

Senior Lecturer | Graduate Member IEM | Lifetime Member of CSM | Hobbyist Photographer | Amatur Drone Pilot

1 年

thank you for the encouragement ????

Stephan ARNDT

Head of Technology & Innovation | Numerical Modelling Expert

1 年

Thanks Stephan Arndt. Great suggestions. I have played with python but power BI looks interesting. For me the great leap of AI is to be able to help in developing more of these tools. I wrote an article recently on trying to get AI to write an excel macro. To be able to get basic code frameworks with a click of the button is incredible.

要查看或添加评论,请登录

Stephan ARNDT的更多文章

  • EAGCG Recap: IUCM & Abaqus

    EAGCG Recap: IUCM & Abaqus

    This year's EAGCG workshop was spectacular. There was an extensive line-up of speakers and sponsors, about 145…

  • Slope Stability (Part 3): Pore Pressure

    Slope Stability (Part 3): Pore Pressure

    Why is pore pressure important for slope stability? Part 1 of the Slope Stability articles outlined the importance of…

    6 条评论
  • Slope Stability - Shear Strength Reduction Method (Part 2)

    Slope Stability - Shear Strength Reduction Method (Part 2)

    "It has always been done this way" The five monkeys experiment. I doubt there's reliable information to confirm this…

    4 条评论
  • Slope Stability - Shear Strength Reduction Method (Part 1)

    Slope Stability - Shear Strength Reduction Method (Part 1)

    Ok, up front: There's nothing new here; you'll have to wait for Part 2. There, I intend to present some challenging…

    9 条评论
  • Monkeys on Typewriters

    Monkeys on Typewriters

    AI will take your job - AI will make it easier until it does - AI in Python and C++ using Copilot - AI does not know…

    7 条评论
  • Principal Stress Transformations

    Principal Stress Transformations

    In this article, we're going off on a tangent from the theme of block models, kdTrees and grid data in the GitHub…

    6 条评论
  • Don't stress, use Python

    Don't stress, use Python

    Any discussion about in-situ stress for an underground mine can turn into a heated argument quickly - just from my…

    2 条评论
  • Python gridData in geotechTools

    Python gridData in geotechTools

    Traditionally, there are endless applications for arranging points in a regular grid. In practical terms, this means we…

    2 条评论
  • geotechTools mapData (kdTree)

    geotechTools mapData (kdTree)

    Introducing the Python geotechTools last week (and the repository on GitHub), the example did little more than robustly…

    2 条评论
  • Python geotechTools on GitHub

    Python geotechTools on GitHub

    So, you're still using Excel for geotech data. Here are some examples of using Python and libraries like Numpy to…

    37 条评论

社区洞察

其他会员也浏览了