Is it time to get your legacy data in order?
Elsevier for Engineering R&D
Discover engineering-related resources, events and updates for corporate, academic and government organizations
If your company is like many R&D-driven organizations, you may be sitting on reams of research data stashed in countless silos and formats — and laced with ever-evolving jargon. So where do you begin when you want to set your data free?
That was the challenge facing Johnson Matthey— a global company that started as a gold assayer for the Bank of England in 1817. (Check out our recent webinar featuring Johnson Matthey.) Now focused on sustainable technologies that are catalyzing the net zero transition, “JM” has just achieved a milestone in safeguarding its intellectual property and making it accessible for researchers and algorithms to trigger future innovation.
Using Elsevier’s SciBite, JM is taking control of its “unstructured data problem” with data science and AI technologies unlocking — and interconnecting — all this knowledge.
JM has been innovating for over 200 years. Today, they rate as a global leader in sustainable technologies that help some of the world’s leading energy, chemicals and automotive companies decarbonize, reduce emissions and achieve their sustainability goals.
“Our expertise in precious metals still underpins our technologies: it’s about making every milligram count,” explained Principal Information Analyst Ed Wright. “And with every application being quite niche, these all require their own development and innovation. And we’re still very tightly linked to this idea of questioning how we can make the most of everything. How can we adjust it, tweak it and keep moving it forward?”
In this way, employees were very open to trying something new if it meant streamlining their research.
“Chemists have been filling up notebooks — both paper and digital versions — for decades and decades,” noted Dr Nathan Barrow, R&D Digital Transformation Lead. “And since this was highly valuable intellectual property, they were sent off-site to a locked container. This, of course, makes it very difficult to actually go back and find the information that you’re looking for. And when your colleagues retire, it’s almost impossible to actually go back and find the important information they captured so diligently 20 years ago — but that’s still relevant for today.
“To avoid replicating all of that clever work that happened before, my job is to digitalize the chemistry and science that happens at JM. I am bringing in new tools and software so that the data is all captured not by the chemists but automatically by the instruments collecting the data. Then, the chemists can add extra information in terms of context and whether they thought the experiment worked or not.”
But while researchers switched to this new electronic lab notebook, there were still two obsolete electronic lab notebooks with legacy information on them. “These databases represent over 16 years and countless millions worth of research,” Nathan said. “So we needed a way for people to search and find the documents they needed — while going beyond a simple search on title or possibly abstract. Other solutions just didn’t do the job and lacked semantic search capabilities.”
Meanwhile, the problem went beyond just two obsolete electronic lab notebook systems. “When it comes down to it, JM has got a huge wealth of knowledge stored in a few individuals,” said Ed. “And the way you work is you go and have a chat with that person. And then they'll point you in the right direction — perhaps towards a certain report in a certain filing cabinet.
“But when the (COVID) lockdown happened, either you could no longer get to that person, or they could not get into their filing cabinet. Suddenly, there was this realization: ‘Hold on, we can't work this way anymore. And, actually, these people are also moving steadily toward retirement. What happens then?’”
Happily, it soon became apparent that SciBite had the solution they needed. “I was actually convinced when I first saw SciBite’s demo video,” said Data Science Strategy Lead Owen Jones. “It was easy and straightforward to use. And I must say, since we have 1,300 researchers, it was appealing that we could license it for the whole department and not by user."
领英推荐
“And with the central R&D problem around replacing old electronic lab notebook systems, we quickly realized SciBite could also solve our problems,” he said. “Now it’s been rolled out for six months with over 300,000 documents. People are using it and finding stuff they couldn’t find before. And we want to keep adding new data sources.”
Owen is also eyeing the “orange notebooks” of lore — those notebooks that documented all the experiments from the pre-digital age. “As you can imagine, these handwritten notebooks can be a mess, but we’ve already done some extraction experiments, and I am hopeful we can get there.”
The project’s biggest challenge was, and remains, building the ontologies — the actual codification of all of JM’s facts. And while the SciBite team helped lay the groundwork for this process, it became a largely in-house effort.
“It’s really part of my larger job as Principal Information Analyst,” Ed noted. “I work in a team that essentially provides intelligence for the company. My role is to see how we can use all of the new data becoming available through government and other open-source resources. I also see how we can use digital tools to better work with more conventional sources such as patents.”
In this case, the intelligence gathering is happening inside the company. “And to move forward, you need to do the standardization; that’s where you get into the ontologies, for which SciBite’s CENtree Ontology Manager is very useful,” Ed explained. “This in turn moves you into the world of knowledge graphs, where you can connect equivalent concepts across different data sources.”
But how do you avoid this relative drudgery of embedding this metadata — all that data that organizes your data — for the future?
“People who are writing reports today need to think more about how someone reads and uses their report in the future,” said Owen. “How are these readers going to find it? How are they going to reuse your data and your knowledge?”
“That’s the tricky part,” said Nathan. “In our new system, we are asking our scientists to add more metadata and context to their experiments. Once we’ve got a critical mass of information in the system, we can start using that structured data and layering it with AI. And then their lives are going to be a lot faster and easier.”
To learn more, read the full article “Worth its weight in gold: getting your legacy data in order” on Elsevier Connect.
Watch our recent webinar for a deeper dive into how Johnson Matthey is leveraging data, technology and tooling to pursue sustainable technologies using SciBite.
#Elsevier #Data #LegacyData #Research