Data, data everywhere… far too much to think

Data, data everywhere… far too much to think

The volume of scientific data published each year has exploded. Wisdom is out there but how can you find it? It is getting harder and harder to even rule out whether your own current research question has already been addressed. One frequent reason journals reject your work is that reviewers feel that your data duplicates work that's already been published. For this reason alone, it is important to keep up with published works in your field and use a structured search strategy to make sure you don't overlook similar publications.

It seems obvious to assume that every scientist is proficient at staying on top of the literature. However, not everyone works at the cutting edge, we are also asked, from time to time, to look outside our field of specialisation, and we tend to be creatures of habit, employing techniques we learned as young scientists, whereas the field of literature management, retrieval and searching is changing – who knows where innovations like Chat GPT and other Ais will take us.

The scale of the literature itself is something the sane mind finds difficult to comprehend.?

  • Over 50 million journal articles are already published online
  • Another 2.5 million scholarly articles are added each year
  • Scientific papers are added to the literature at a rate of one every 4 minutes (and let’s not mention preprints)

If you find you’re spending more and more time and resources searching for relevant content, you’re not alone. A friend recently asked me about the best way to conduct a structured literature search (obviously I pointed her to our Insider’s Insight on literature searching). And that is the nub. Whether you are in business or an academic, to be successful in your research efforts, you need fast, cost-effective access to scientific content. But what you do and how you do it is possibly more important than the articles you identify.

Get your search strategy right and it will avoid potential duplication of work, provide a clear search pathway that can be re-engineered on the identification of possible gaps/omissions and allow you to provide a report on methodology. With the increasing repurposing of existing drugs for new targets we are seeing the increasing use of reported trial data in regulatory documents – raising the requirement for a structured rational on how the data you include was sourced.???

Equally, there are a variety of search engines (databases tend to have unique sources for obtaining data and processes for deciding which journal articles to index) and techniques which, when used smartly with research technology to access and filter your search results, will dramatically simplify your scientific literature search. Databases and software tools, unlike books or articles, constantly change as do their metadata descriptions. This dynamism stands in contrast to current practices for the literature, where standards and best practices exist to ensure the reproducibility of your search findings let alone the stability of article content and links.

As I said to my friend, the best way to ensure a successful search session is to write a protocol. The utility of your output depends on the reproducibility of your search strategy and so it is important to minimise and potential subjectivity. Managing a multi-factorial process benefits from a system that clearly defines as many of the variables as possible. This is best addressed in the form of an objective protocol (you can get our search methodology template from the ‘Resources’ page on the Niche website) that outlines the brief, your proposed search strategy and criteria for review. Record the precise search terms, details of any filters and search engine(s), so that the search is reproducible.

Describe how outputs of the literature search are to be captured or stored and what information on

each citation you will keep. The protocol should also describe how you will review and score each citation. Manually reviewing the titles and/or abstracts ensures that all the results adhere to the search criteria and all literature pertaining to the topic are collected. Employing stringent methods for selecting studies will limit bias, which in turn improves the reliability and accuracy of your conclusions. You can further benefit in preparing a formal report where you detail the process of paper selection giving the number of studies screened, reasons for exclusion and the final numbers of articles included.

Remember to:

  • Define your keywords: Break up the topic being researched into specific components and define keywords for each. Expand the list by writing down synonyms and alternative phrasings for each keyword. Also, use terms that you plan hope to include in your work – it may give you some ideas of how relevant they are.
  • Create a checklist for defining keywords: What alternative vocabulary is used in discussion of the topic? Are there American and British variants of spelling or vocabulary? Is there any word-stems for truncation? E.g., child$ to find child, children, or childish. What common abbreviations, acronyms or formulae are there? Are there any categories to exclude?
  • Interrogate relevant citations: on identifying some relevant journal articles, an easy way to find more studies is by looking through the reference lists of these articles (backward searching – but make sure you document it). Studies referenced in the article may be quite relevant. Also, look which papers have cited the articles since they were published (forward searching).?
  • Record everything: There is no excuse in the electronic age to NOT keen a record of all your searches, the strategy you used and the results they produce. Things were different in the days of?Index Medicus. Keeping track will allow you to logically refine your search strategy, enriching the final outcome data set and reducing the amount of manual confirmation your will need to engage in.
  • Rank: At some point you will need to review individual publications to determine their relevance and scientific ‘value.’ This can be challenging when content you may want to review is locked behind paywalls. If an article seems relevant you can always let your peers guide you – keep the simple altmetric widget (https://www.altmetric.com/solutions/free-tools/) on your desk-top and review an articles score to see whether other researchers felt the work made a useful contribution to the science.


Considering the huge number of publications you will probably have to sift through and track, manual compilation of references is no longer an option. You can use a reference manager to help manage your search outputs – in some cases allowing you to download and save papers in your computer's library directly.?

Books are becoming less and less relevant thought they can be a useful starting point if you are new to a subject. They can provide you a general overview of your topic. Similarly, the ‘Grey’ literature is becoming less relevant – it consists of information that is not easily searchable through conventional search engines, databases, and library catalogues. Grey literature searches can still yield valuable information indicating current hot topics for research. Conference proceedings can provide the latest findings and discussions on the topic you're studying and give you clues on forthcoming papers that may be published. Unpublished clinical trials on registries like ClinicalTrials.gov will inform you of trials already undertaken (and possibly their results). Theses, dissertations, and working papers can alert you to similar work being undertaken by other researchers. However, the art of interrogating a library has mostly been replaced by desk-based research (online). One note of caution to those who do wander into the library, you need to be extra careful when citing grey literature – most database content has already had a certain level of peer consideration prior to inclusion.

In today’s fast-paced digital landscape, researchers who can successfully exploit the goldmine of content published online—via the skilful use of search tools and automated retrieval solutions—will enjoy significant advantage in the race to scientific invention and discovery. Who knows where AI will take us in the future but in the words of the sadly missed, late, great Carl Sagan once famously put it “You have to know the past to understand the present.”??

No alt text provided for this image

Tim Hardman?is Managing Director of?Niche Science & Technology Ltd., a UK-based CRO, Chairman of the?Association of Human Pharmacology in the Pharmaceutical Industry, President of the?European Federation for Exploratory Medicines Development?and occasional commentator on science, business and drug development.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了