登录查看更多内容

Can AI help in palynological taxonomy?

Mike Stephenson

Director at Stephenson Geoscience Consulting

发布日期: 2024年5月26日

Introduction

If you read my article on ‘Taxonomic keys in palynology: use and benefits’ you’ll know that I’ve been interested for many years in structured approaches to taxonomic decisions in determining species. Recently I had the chance to work with an International Union of Geological Sciences (IUGS) Deep-time Digital Earth (DDE) team and partners Alibaba, that are developing large language models in geology and even for the International Olympic Committee. I wanted to know if the team could develop artificial intelligence methods to augment a traditional taxonomic key. Could a palynologist sitting by their microscope get some help from a Large Language Model (LLM) in determining a species? It turns out that palynology very much lends itself to developing a professional LLM that might be very useful in teaching taxonomy to apprentice palynologists, or to professionals in environments that require expertise in many areas of palynology. It could also have particular use in the Global South where access to reference materials may be difficult.

I have to say at the start that I am far from being an information technologist. My work with the team has been as a humble palynologist providing advice on the ways that taxonomic keys work, and the team are translating this guidance into tools to develop an LLM. It’s worth noting that this LLM-aided taxonomy is text based, and will probably be delivered through a series of structured questions and answers that converge on a determination, rather than being based on image recognition. There are already some quite sophisticated image-based methods of identification in palynology (e.g. Mahmood et al. 2023; Chronosurveys 2024; Barnes et al. 2023). A text-based system could have some big advantages over an image-recognition system.

Taxonomic keys

In palaeontology, an identification key or taxonomic key can be in the form of a printed series of notes and instructions that help you identify a palynomorph (or any fossil), perhaps at genus or species level. Identification keys are also used in many other scientific and technical fields to identify or diagnose diseases, minerals, or archaeological artifacts.

Most keys provide a fixed sequence of identification steps, each with multiple alternatives, the choice of which determines the next step (Fig. 1). When the key stage has two alternatives, the stage is dichotomous, when there are more choices, the stage in the key is polytomous.

Fig. 1. A taxonomic key for bisaccate pollen.

At each step, the user has to make a choice about the characters of the fossil being identified. In palynology there are a range of possible starting points and so the person using the key has to know something about the morphology of the range of palynomorphs that might be encountered.

Palynological taxonomy

Discussions with specialists in LLMs has shown that palynology and other newer branches of the science of palaeontology may have some special advantages for the development of LLMs because they can provide ‘learning materials’ that are clearly and consistently structured. The hierarchical structure of descriptions of palynological species also provide signposts for the LLM in developing the order of questions.

So what might an LLM-aided taxonomic key look like? Well perhaps the system might ask the palynologist a series of questions like: what is the shape of the palynomorph and what is its size? In other words the LLM key would ask a series of relevant intelligent questions in a particular order, prompting the user to make sensible taxonomic decisions getting closer and closer to a determination. This imitates the traditional key, only a computer is asking the questions. The value of this method over identification using images (e.g. Mahmood et al. 2023) is that the taxonomist is guided through a series of steps that help them understand how taxonomy works. In other words, it is not a ‘black box’.

Imagine if the LLM key was helping you identify a car that you had seen in a car park. It might ask, what’s the badge on the car? what’s the shape of the headlight, how many headlights does the car have? What is the shape of the radiator grill? After a series of relevant intelligent questions and ‘taxonomic’ decisions you might arrive at an answer at ‘species level’ – so the car is perhaps identified as a Tesla Model S. The same might be true of the palynological taxonomy LLM.

I mentioned earlier that palynology lends itself to an LLM because the ‘learning materials’ are clearly and consistently structured. What I mean by this is that palynology, being a relatively young branch of palaeontology, is ‘written down’ in a fairly consistent way, i.e. the descriptions and diagnoses of its species are generally consistent in form. This is because most of the descriptions and diagnoses have been written in the last few decades, rather than over a century or more, as is the case with older branches of palaeontology. Mostly nomenclature has remained consistent because descriptive terms mean the same as they have always meant - and measurements are in consistent units. Another advantageous aspect is that palynological descriptions and ‘diagnoses’ are remarkably similar in structure and content. Mostly descriptions follow a particular pattern. For example for a spore, the description may begin ‘spores, radial, trilete; amb circular; laesurae distinct, with narrow lips’. For a pollen grain (in this case a monosaccate pollen grain), it might begin ‘pollen, monosaccate, radially symmetrical, trilete; amb circular’. The description goes on to become more detailed. But in a way, the description is offering a hierarchy, in the sense that it begins with the big elements (things like shape and symmetry), and then delves deeper. So as ‘learning material’ for the LLM, it has a built-in step by step process.

领英推荐

Celebrating Geography at 100

King's College London Alumni 2 年前

The Third Chimpanzee: A Thought-Provoking Journey by…

Bookey Ideas Network Technology LLC 1 年前

Identifier Profile: @eijimyorin

iNaturalist 1 年前

Looking deeper, you can analyse each species description in terms of ‘keys’ and ‘values’. Keys are character types, and values are the descriptors, the adjectives. Figs 2 and 3 show a set of ‘keys’ and ‘values’ for a species of saccate pollen and a spore. The keys are signposts for parts of the decision tree, and the values are possible ‘answers’ (like in a drop down menu). Some keys will have a limited number of ‘values’; other keys will have values that are more variable and descriptive with a very large number of possible ‘answers’, and perhaps will need to accommodate continuous variation or subtle shades of difference. For these keys it would be very difficult and restrictive to apply a small number of value 'choices'.

Fig. 2 is a set of ‘keys’ and ‘values’ for a species of spore

Fig. 3 is a set of ‘keys’ and ‘values’ for a species of bisaccate pollen

As I mentioned in my previous article, taxonomic keys are not perfect because they depend on the knowledge or judgment of the person that designed them. They also depend on the quality of the descriptions and diagnoses, but once an appropriate description structure (as in Figs. 2 and 3) is created, the LLM will do the rest of the work assuming that it is given plenty of high quality material to ‘learn’ with. In the cases above, the descriptions are from my PhD thesis – and these are not ‘official diagnoses or descriptions’ (i.e. those of the original authors), and they could be flawed. Much more material will be needed (perhaps descriptions of thousands of species), probably from other PhD theses and databases, so permissions and approvals will have to be gained. Even with these shortcomings, taxonomic keys could be used in teaching and learning, helping students to quickly gain a grasp of the basics of taxonomy. They could be particularly useful in the Global South where even now information is hard to come by (e.g. Nobes and Harris 2019).

Could an LLM-guided taxonomic key be even better than a manual key? Whatever the answer to that question, a traditional or LLM key should not replace the need for students to use taxonomic literature (for example published diagnoses and descriptions). But the use of a well-designed LLM key would have considerable pedagogic value. In a commercial environment, for example in companies that use stratigraphic palynologists, LLM-guided keys could help in standardising taxonomic procedures making taxonomy more reliable and therefore correlation and stratigraphy more reliable.

if you’re interested in getting involved in the project, please contact me through LinkedIn.

Prof Mike Stephenson is available for palynological consultancy and training.

References

Barnes, et al. 2023. Deductive automated pollen classification in environmental samples via exploratory deep learning and imaging flow cytometry. New Phytol, 240: 1305-1326.?https://doi.org/10.1111/nph.19186

Chronosurveys 2024 https://www.chronosurveys.com/research/

Mahmood et al. 2023. Artificial intelligence-based classification of pollen grains using attention-guided pollen features aggregation network, Journal of King Saud University - Computer and Information Sciences, 35, 740-756.

Nobes, A., & Harris, S. 2019. Open Access in developing countries – attitudes and experiences of researchers. https://doi.org/10.5281/zenodo.3464868

带有此图标的链接由领英创建，不带此图标的链接由作者添加。

要查看或添加评论，请登录

Mike Stephenson的更多文章

Reducing drilling uncertainty caused by the Hercynian unconformity

2025年1月15日

Reducing drilling uncertainty caused by the Hercynian unconformity

My new paper (published early but coming out officially in Geological Society Special Publication volume 550 for 2025…

7 条评论
Can palynological studies of reservoir heterogeneity contribute to numerical reservoir models?

2024年10月23日

Can palynological studies of reservoir heterogeneity contribute to numerical reservoir models?

This recent paper discussed the lateral and vertical variation in palynology within individual argillaceous units, as…

8 条评论
Palynology for geologists

2024年1月17日

Palynology for geologists

Sign on for my EAGE 'Palynology for geologists' online course 19-22 Feb 2024. https://eage.

4 条评论
Why geo-energy test sites are so important

2022年2月28日

Why geo-energy test sites are so important

Our paper published this week in the new open access journal Earth Science, Systems and Society shows how important…

1 条评论
Palynology in action: the Dead Sea fault and its influence

2022年2月21日

Palynology in action: the Dead Sea fault and its influence

Palynology is usually associated with simple dating or palaeoenvironmental and climate studies, but it can be used to…
Palynology solving geological problems: How diachronistic was the Khuff sea?

2022年1月21日

Palynology solving geological problems: How diachronistic was the Khuff sea?

The Khuff Formation is one of the most productive natural gas reservoirs in the world with supergiant fields across the…
How old are the oldest Permo-Carboniferous glacigene rocks in the Middle East?

2021年12月27日

How old are the oldest Permo-Carboniferous glacigene rocks in the Middle East?

Glacigene rocks are widespread in the Arabian peninsula, a part of the suite of Permo-Carboniferous glacigene rocks…
How can palynology help explorationists? The Arabian Plate 'Hercynian unconformity'

2021年12月18日

How can palynology help explorationists? The Arabian Plate 'Hercynian unconformity'

Across the Arabian Plate a stratigraphic gap is present within the Palaeozoic, sometimes known as the Hercynian…
Palynology and sand connectivity for better reservoirs

2021年12月2日

Palynology and sand connectivity for better reservoirs

In complex non-marine sequences common in the Carboniferous and Permian of the Middle East, the connectivity of…
Golden spikes in geology

2021年11月25日

Golden spikes in geology

Science needs its standards. In physics and chemistry, standards of mass and dimension are vital for science to…

See all articles

Can AI help in palynological taxonomy?

Mike Stephenson

Director at Stephenson Geoscience Consulting

领英推荐

Mike Stephenson的更多文章

社区洞察

其他会员也浏览了

History-Making Discovery

The Tree of Life: A peak into evolutionary Biology

5 Exclusive Workshops at International Universities

Interesting events for freshwater professionals coming up during this autumn: National Modelling seminar, Geoforum Summit, Finnish Geography days 2024

Finding: All Bharatiyas come from the same genetic pool

Geography The Planet Earth We Live Grade 7 Unit 1 Model Past Paper Questions and Answers from Textbook

Prisoners of Geography - Book Review

Jerard's September 2024 Reads

Darwinism in H.G. Wells’ Time Machine.

Origin: Complex Designs in Evolution

领英推荐

Mike Stephenson的更多文章

Reducing drilling uncertainty caused by the Hercynian unconformity

Can palynological studies of reservoir heterogeneity contribute to numerical reservoir models?

Palynology for geologists

Why geo-energy test sites are so important

Palynology in action: the Dead Sea fault and its influence

Palynology solving geological problems: How diachronistic was the Khuff sea?

How old are the oldest Permo-Carboniferous glacigene rocks in the Middle East?

How can palynology help explorationists? The Arabian Plate 'Hercynian unconformity'

Palynology and sand connectivity for better reservoirs

Golden spikes in geology

社区洞察

其他会员也浏览了

History-Making Discovery

The Tree of Life: A peak into evolutionary Biology

5 Exclusive Workshops at International Universities

Interesting events for freshwater professionals coming up during this autumn: National Modelling seminar, Geoforum Summit, Finnish Geography days 2024

Finding: All Bharatiyas come from the same genetic pool

Geography The Planet Earth We Live Grade 7 Unit 1 Model Past Paper Questions and Answers from Textbook

Prisoners of Geography - Book Review

Jerard's September 2024 Reads

Darwinism in H.G. Wells’ Time Machine.

Origin: Complex Designs in Evolution