Happy January 60!

We owe Babylonians the sixty counts (seconds and minutes). Why not days? Today could have been 1/60/2024, should humans have decided to follow the sixty-count for days.

Surprised? Think of the culturally different ways we count time. Different cultures celebrate #NewYear on a different date: Gregorian (used in most parts of the world); Jewish (Rosh Hashana); Chinese (Xinlian); Gujarati (Diwali); Muslim (Muharram); Julian (Russian); Tibetan (Lobsar); Mayan (this was an interesting one); and so forth. Is it 2024? Or is it 4722 (in China)? Or 5784 (in Israel)? In Iran, it's 2582 AP (Anno Persico). According to some interpretations of the Mayan calendar, the world ended on December 21, 2012. Thus, our grasp of time, illustrated by the diverse calendars around the world, is merely a human construct. What about space?

Traditional world map (Mercator convention), with Greenland circled.
The surface-corrected upside-down map (it's not a Eurocentric world)

The above two images illustrate the impact of stereotypes. We're so used to thinking that "Greenland is up and New Zealand is down" that we often forget that the world is, in fact, round. In fact, the upper figure illustrates a Eurocentric view of the world, corrected in the upsidedown map. More importantly, the bottom image also corrects the surface representation of the world. Owing to stretching a globe onto a two-dimensional plane, Greenland appears much larger than Africa (top image). It isn't (bottom image). Try to find Greenland. No offense to the people of Greenland or the kingdom of Denmark.

These variances in time- and space- navigation serve as a metaphor for the broader concept that our understanding of the world relies on agreed-upon conventions. Just as every culture has its own New Year, data science, too, requires a set of conventions to navigate the vast seas of information effectively.

The construction of knowledge, akin to navigating through time and space, is predicated on consistently developing and using such conventions. Geography and map-making have long been subjected to such agreements, where representations of our world vary dramatically based on the map's purpose and the perspective of its creators. This analogy extends into the realm of informatics and data science, where without a shared language or agreed standards, progress is stifled. Our understanding of the world, and indeed the universe, is framed by the conventions we adopt, underscoring the importance of establishing clear, universal standards in data science akin to the Mercator projection in geography.

These conventions influence our perception of the physical world and our understanding of scientific concepts, such as the distinction between correlation and causation (see below). The challenge in biological research, as highlighted by Lazebnik, lies in the absence of a quantitative language, leading to a paradox where an increase in facts may actually diminish our understanding. John Ioannidis noted that most biomedical publications are flawed.

"Sir, there is a concern in West Germany over the falling birth rate. The accompanying graph might suggest a solution that every child knows makes sense." Letter by H. Sies, published in

This paradox is further exacerbated by the prevalence of confirmation bias in scientific research, where the pursuit of knowledge is 99% skewed towards positive results, obscuring the full spectrum of truth. We need negative data and negative results. Particularly for target-disease association machine learning models.

The reliance on popular search engines like Google and ChatGPT further complicates this landscape. Google, ChatGPT, and similar tools become gatekeepers of information, certain facts may be prioritized over others based on algorithms rather than truth. This dominance of digital search engines in shaping our access to knowledge calls for a critical evaluation of the sources we rely on, advocating for direct consultation of original research where possible (a piece of advice I first learned from reading Mircea Eliade). Most textbooks favor answers over questions, and critical thinking is rarely encouraged. Yet, "When little is known, don’t expect knowledge to accumulate quickly" (cf. Lenat & Feigenbaum, Artif. Intell. 1991, 1173-1182).

The distinction between data, information, and knowledge is crucial in understanding the limitations and potential of data science. Consider the following scenario:

  • You have the phone number and address of every living person. It does not tell you much about them: Although you have lots of data, there is little information.
  • Having their phone numbers & addresses, plus their complete genomic profile, may inform you about disease vs. health patterns. Now you may have a lot of information, but less knowledge.
  • Having their phones, addresses & sequenced genomes, plus complete medical records & detailed psychological profiles may help us know them.

Knowledge comes in many forms. Possessing vast amounts of data does not inherently provide us with meaningful insights unless we can transform this data into actionable knowledge. The path to knowledge comes from acting on data & information to build scientific hypotheses (perhaps via machine learning models) and then validating (or falsifying) them.

However, data & information & knowledge have a limited shelf life. Truth has an expiration date. In 1948, two adrenergic receptors were known; now, there are nine. The journey from data to information and knowledge is not static: Truth evolves over time as our understanding deepens.

Philosophy and science grapple with fundamental daily concepts such as truth and disease, confronting the limitations of our binary view of the world (true/false). In a world where truths are relative and constantly evolving, the task of distinguishing between health and disease becomes a reflection of our broader struggle with understanding the universe. The challenges inherent in defining disease and health underscore the complexity of translating data into knowledge. There is danger ahead, given the potential for misunderstanding and oversimplification.

Informatics and data science stand as foundational building blocks in this quest. As artificial intelligence efforts grow (they are embiggened), they must rely on a structured pathway from data to information and knowledge. We, the community, must strive to offer true (reliable) data, proper vocabularies, harmonized ontologies, and avoid misinformation. This process is not merely academic but has real-world implications for improving human health and understanding diseases at a molecular level.

Will this, ultimately, lead to wisdom? Only time will tell.

Happy Leap Day. And happy Rare Disease Day.

And if you're Romanian, happy M?r?i?or.


Carsten Kettner

Head of Funding and Conferences Department at Beilstein-Institut

11 个月

I really like the distinction between data, information and knowledge as these three terms are often used synonymously but they are far away from being synonyms. These three terms also gain more impact in times of AI, and vice versa AI is challenged now to proof how intelligent it is.

回复
Graham Timmins

Quantum biologist

11 个月

Tudor, I commend your many efforts to create large data sets with the best possible curation. Intelligent curation seems to be a limiting step in development of these kind of things.

回复
Ivan Cornella-Taracido

CSO and Co-Founder of Covant Therapeutics

1 年

Thank you, Tudor Oprea, MD PhD for sharing your wisdom and humor. An enjoyable and thought provoking read for the 6th or 7th day of the Babylonian week…

Ioana Ungureanu

Senior Research Investigator at Givaudan

1 年

Happy Martisor!

Mark Haynes

Drug Discovery Immunologist, University of New Mexico Department of Pathology

1 年

I don't know why but your happy January thoughts made me think of Cloud Atlas by David Mitchell and one of the quote I can remember "Truth is singular. Its 'versions' are mis-truths"

要查看或添加评论,请登录

Tudor Oprea, MD PhD的更多文章

  • DrugCentral 2020

    DrugCentral 2020

    An update of DrugCentral has just been uploaded. https://drugcentral.

    2 条评论
  • Fighting against COVID-19: Clinical Research, Drug Discovery, and Literature Mining

    Fighting against COVID-19: Clinical Research, Drug Discovery, and Literature Mining

    Seminar Organized by Ying Ding, organized as part of the course AI in Health (INF 385T) at School of Information, UT…

    2 条评论
  • Memo to LinkedIn Developers: Please update your (English) Vocabulary

    Memo to LinkedIn Developers: Please update your (English) Vocabulary

    I've been direct-messaging some friends. Sometimes, I type "immunossuppresion" and the automated spell-checker from…

  • Time to add "Cheminformatics" (*) to Keywords indexing Science

    Time to add "Cheminformatics" (*) to Keywords indexing Science

    Our ability to rapidly and reliably process chemical information using computers has progressed significantly in the…

    26 条评论
  • Hunting Dark Genes for Druggable Targets

    Hunting Dark Genes for Druggable Targets

    About one in three human proteins is understudied. Even when quantifying data availability from multiple sources…

    4 条评论
  • Chlorine dioxide is not a cure for autism

    Chlorine dioxide is not a cure for autism

    I am watching this conversation hosted on the MIT website. It's a scientific discussion about the potential benefits of…

    5 条评论
  • Against Method

    Against Method

    Paul Feyerabend, one of my favorite philosophers of science, argued that science is an anarchic enterprise, where…

    2 条评论
  • Move Away from the Lamppost & Find Druggable Targets

    Move Away from the Lamppost & Find Druggable Targets

    ACS Symposium: "Move Away from the Lamppost & Find Druggable Targets" -- Wednesday, Aug 22 8:30 AM, Grand Ballroom E…

    2 条评论
  • Airlines and the Future of Greed

    Airlines and the Future of Greed

    This just happened: British Airways cancelled some 2000 tickets because they were erroneously "too cheap". If you read…

    2 条评论
  • Predictions in an Uncertain World

    Predictions in an Uncertain World

    I spend a considerable time analyzing data and evaluating predictions (basically since 1989). But it's time to take…

    2 条评论

社区洞察

其他会员也浏览了