Happy January 60!
We owe Babylonians the sixty counts (seconds and minutes). Why not days? Today could have been 1/60/2024, should humans have decided to follow the sixty-count for days.
Surprised? Think of the culturally different ways we count time. Different cultures celebrate #NewYear on a different date: Gregorian (used in most parts of the world); Jewish (Rosh Hashana); Chinese (Xinlian); Gujarati (Diwali); Muslim (Muharram); Julian (Russian); Tibetan (Lobsar); Mayan (this was an interesting one); and so forth. Is it 2024? Or is it 4722 (in China)? Or 5784 (in Israel)? In Iran, it's 2582 AP (Anno Persico). According to some interpretations of the Mayan calendar, the world ended on December 21, 2012. Thus, our grasp of time, illustrated by the diverse calendars around the world, is merely a human construct. What about space?
The above two images illustrate the impact of stereotypes. We're so used to thinking that "Greenland is up and New Zealand is down" that we often forget that the world is, in fact, round. In fact, the upper figure illustrates a Eurocentric view of the world, corrected in the upsidedown map. More importantly, the bottom image also corrects the surface representation of the world. Owing to stretching a globe onto a two-dimensional plane, Greenland appears much larger than Africa (top image). It isn't (bottom image). Try to find Greenland. No offense to the people of Greenland or the kingdom of Denmark.
These variances in time- and space- navigation serve as a metaphor for the broader concept that our understanding of the world relies on agreed-upon conventions. Just as every culture has its own New Year, data science, too, requires a set of conventions to navigate the vast seas of information effectively.
The construction of knowledge, akin to navigating through time and space, is predicated on consistently developing and using such conventions. Geography and map-making have long been subjected to such agreements, where representations of our world vary dramatically based on the map's purpose and the perspective of its creators. This analogy extends into the realm of informatics and data science, where without a shared language or agreed standards, progress is stifled. Our understanding of the world, and indeed the universe, is framed by the conventions we adopt, underscoring the importance of establishing clear, universal standards in data science akin to the Mercator projection in geography.
These conventions influence our perception of the physical world and our understanding of scientific concepts, such as the distinction between correlation and causation (see below). The challenge in biological research, as highlighted by Lazebnik, lies in the absence of a quantitative language, leading to a paradox where an increase in facts may actually diminish our understanding. John Ioannidis noted that most biomedical publications are flawed.
This paradox is further exacerbated by the prevalence of confirmation bias in scientific research, where the pursuit of knowledge is 99% skewed towards positive results, obscuring the full spectrum of truth. We need negative data and negative results. Particularly for target-disease association machine learning models.
The reliance on popular search engines like Google and ChatGPT further complicates this landscape. Google, ChatGPT, and similar tools become gatekeepers of information, certain facts may be prioritized over others based on algorithms rather than truth. This dominance of digital search engines in shaping our access to knowledge calls for a critical evaluation of the sources we rely on, advocating for direct consultation of original research where possible (a piece of advice I first learned from reading Mircea Eliade). Most textbooks favor answers over questions, and critical thinking is rarely encouraged. Yet, "When little is known, don’t expect knowledge to accumulate quickly" (cf. Lenat & Feigenbaum, Artif. Intell. 1991, 1173-1182).
领英推荐
The distinction between data, information, and knowledge is crucial in understanding the limitations and potential of data science. Consider the following scenario:
Knowledge comes in many forms. Possessing vast amounts of data does not inherently provide us with meaningful insights unless we can transform this data into actionable knowledge. The path to knowledge comes from acting on data & information to build scientific hypotheses (perhaps via machine learning models) and then validating (or falsifying) them.
However, data & information & knowledge have a limited shelf life. Truth has an expiration date. In 1948, two adrenergic receptors were known; now, there are nine. The journey from data to information and knowledge is not static: Truth evolves over time as our understanding deepens.
Philosophy and science grapple with fundamental daily concepts such as truth and disease, confronting the limitations of our binary view of the world (true/false). In a world where truths are relative and constantly evolving, the task of distinguishing between health and disease becomes a reflection of our broader struggle with understanding the universe. The challenges inherent in defining disease and health underscore the complexity of translating data into knowledge. There is danger ahead, given the potential for misunderstanding and oversimplification.
Informatics and data science stand as foundational building blocks in this quest. As artificial intelligence efforts grow (they are embiggened), they must rely on a structured pathway from data to information and knowledge. We, the community, must strive to offer true (reliable) data, proper vocabularies, harmonized ontologies, and avoid misinformation. This process is not merely academic but has real-world implications for improving human health and understanding diseases at a molecular level.
Will this, ultimately, lead to wisdom? Only time will tell.
Happy Leap Day. And happy Rare Disease Day.
And if you're Romanian, happy M?r?i?or.
Head of Funding and Conferences Department at Beilstein-Institut
11 个月I really like the distinction between data, information and knowledge as these three terms are often used synonymously but they are far away from being synonyms. These three terms also gain more impact in times of AI, and vice versa AI is challenged now to proof how intelligent it is.
Quantum biologist
11 个月Tudor, I commend your many efforts to create large data sets with the best possible curation. Intelligent curation seems to be a limiting step in development of these kind of things.
CSO and Co-Founder of Covant Therapeutics
1 年Thank you, Tudor Oprea, MD PhD for sharing your wisdom and humor. An enjoyable and thought provoking read for the 6th or 7th day of the Babylonian week…
Senior Research Investigator at Givaudan
1 年Happy Martisor!
Drug Discovery Immunologist, University of New Mexico Department of Pathology
1 年I don't know why but your happy January thoughts made me think of Cloud Atlas by David Mitchell and one of the quote I can remember "Truth is singular. Its 'versions' are mis-truths"