登录查看更多内容

Crazy, But Reinforcement Learning is No Different than Building a Startup

Pankaj Mishra, PhD

Founder & CTO - Future Therapeutics | Building proprietary AI infrastructure to find new cures | Co-founder - Neovarsity

发布日期: 2024年2月12日

My job involves working with reinforcement learning (RL), and lately I've been spending quite some time on it, leading to a point where I can't help but draw parallels between RL and building and leading startups.

And as crazy as it may sound, RL isn't all that different from running a startup, especially from the perspective of founders.

To provide some context, I currently hold the position of Founder and CTO at Future Therapeutics .?

It's a pharmatech company based in Berlin, focused on developing and utilizing state-of-the-art proprietary AI infrastructure to discover new cures and treatments for life-threatening diseases.

Additionally, I co-founded Neovarsity, a Berlin-based venture aimed at educating individuals in data-driven drug discovery and other allied deep tech domains. Its goal is to address the talent shortage in these fields.

Before these ventures, I also founded two other companies: Uresearcher (2021-2023), an advanced STEM education venture, and Medgenera (2016-2018), which marked my first foray into entrepreneurship.

Medgenera was a leading digital healthcare media platform. While it provided tremendous learning experiences, it ultimately failed.

The sum of these experiences has prompted me to draw this seemingly odd yet fitting analogy between RL and startup entrepreneurship.

So here it goes :) But first, a brief overview of RL:

RL is a type of machine learning where an agent learns to make decisions by interacting with an environment.?

A notable example of reinforcement learning is AlphaGo from DeepMind.

AlphaGo is a computer program that uses deep reinforcement learning to play the board game Go. It learns by playing against itself and improving its strategies over time.

AlphaGo was the first computer program to defeat a professional human Go player, showcasing the prowess of RL in tackling complex decision-making tasks.?

If you’ve been following developments in AI, you've likely heard of this achievement.

Reinforcement Learning 101. Image Credit: Shweta Bhatt

When discussing RL, it's crucial to have an understanding of its key components. Let's illustrate these components using the example of AlphaGo for better clarity:

Agent: AlphaGo, the computer program designed to play the game of Go, acts as the learner or decision-maker that interacts with the virtual Go board.
Environment: The virtual Go board serves as the external system with which AlphaGo interacts and learns from.
Action: In AlphaGo, actions represent the moves made by AlphaGo on the Go board, such as placing a stone in a specific position.
State: The state in AlphaGo corresponds to the current arrangement of stones on the Go board, providing information for AlphaGo to make decisions about its next move.
Reward: Feedback from the game environment in AlphaGo comes in the form of winning or losing the game, where AlphaGo seeks to maximize its wins and minimize its losses.
Risk: In AlphaGo, the risk may refer to uncertainty about the outcome of a move and the potential negative consequences of making suboptimal moves.
Policy: AlphaGo's policy refers to the strategy or set of rules it uses to make decisions about which moves to make on the Go board, aiming to maximize its chances of winning the game.

This brief should provide an introduction to set the stage for the following sections of this article. For further information on reinforcement learning (RL), feel free to explore additional resources here, here, and here.

Now, let's hear this founder's perspective on the analogy between RL and a startup:

1. Exploration vs. Exploitation:

In RL, exploration and exploitation are two fundamental concepts.?

Exploration involves trying out different actions to discover new information about the environment or to find better strategies for maximizing rewards.?

Exploitation, on the other hand, involves taking advantage of known information or strategies to maximize immediate rewards.

领英推荐

Using generative AI to support literacy in 2024: What…

National Literacy Trust 8 个月前

Foundational Learning: Empowering Africa’s Agency and…

Human Capital Africa 1 个月前

If EdTech can't beat ai, will it join it?

RizingTV 5 个月前

In RL, balancing the exploration and exploitation trade-offs is crucial to achieving optimal performance over time.

This is similar to our dilemma of making everyday choices: should I stick to my favorite restaurant, or venture out and try a new one today??

Now, when it comes to startups, isn't that precisely what startups are all about?

Isn't it what startup founders do every day, constantly balancing between trying out new strategies (exploration) and leveraging known successful strategies (exploitation)?

2. Risk and Reward:

In RL, Risk and Reward refer to the trade-off between taking actions that may have uncertain outcomes (risk) and the potential benefits or gains (reward) associated with those actions.?

Rewards provide feedback to the learning agent, signaling the effectiveness of chosen actions. Actions with higher risk may result in undesirable outcomes or lower rewards.?

In RL, the primary objective is typically to maximize rewards. But balancing risk and reward is essential for an RL agent to make optimal decisions in dynamic and uncertain environments.

Agents must learn to navigate this trade-off effectively to achieve their goals while minimizing potential negative outcomes.

Isn't it the daily task of a founder to balance the trade-off between risk and reward?

3. Learning and Adaptation:

RL algorithms continually learn and adapt by processing feedback from the environment, iteratively adjusting their actions to optimize performance in dynamic and uncertain conditions.?

Doesn't it resonate with the journey of every startup founder?

Just as founders respond to changes, refine strategies, and seek ways to enhance success, RL algorithms iterate and learn, navigating ever-changing environments to achieve their objectives.

4. Long-term Vision vs. Short-term Gain:

While RL algorithms prioritize long-term rewards, they must also consider short-term gains to sustain exploration and learning.

Similarly, startups face the challenge of balancing their long-term vision with the imperative of achieving short-term revenue and growth.?

For founders, this means striking a delicate balance between pursuing their overarching vision for the company and meeting immediate milestones.

Concluding Remarks

While the analogy between RL and running a startup may seem unconventional at first glance, the parallels between the two are indeed remarkable.?At least I see it that way!

In my opinion, both require a combination of exploration and exploitation, a willingness to take risks, continuous learning and adaptation, and a delicate balance between long-term vision and short-term gains.?

What are your thoughts on this comparison?

要查看或添加评论，请登录

Pankaj Mishra, PhD的更多文章

Does the potential of AI to accelerate drug discovery really require a reality check?

2024年2月19日

Does the potential of AI to accelerate drug discovery really require a reality check?

As I write this post, I'm also building Future Therapeutics, where we're developing and leveraging proprietary AI…

2 条评论
The Leaps and Bounds Growth Story of Biosimilars: A Progressive Review

2016年9月24日

The Leaps and Bounds Growth Story of Biosimilars: A Progressive Review

Almost a decade ago, the concept of bringing low cost, high quality, affordable biologics similar to the blockbuster…

2 条评论
Funding Sources for Biotech, Pharma & Healthcare Startups in India

2016年7月16日

Funding Sources for Biotech, Pharma & Healthcare Startups in India

Here we compiled the list of all the potential funding sources that can help to launch, sustain and grow a…

5 条评论
DNA-The New Hard Drives for Digital Data Storage

2016年5月6日

DNA-The New Hard Drives for Digital Data Storage

DNA digital data storage can be considered as one of the greatest biotech breakthroughs of the time and is expected to…
Is Indian Life Sciences Really in Limelight?

2016年4月28日

Is Indian Life Sciences Really in Limelight?

India–the land encompassing prodigious flora and fauna is flourishing with untapped opportunity in the field of life…

1 条评论
CRISPR/Cas9 Gene Drives ‘ON’ Malaria ‘GONE’- Promise and Peril

2016年4月24日

CRISPR/Cas9 Gene Drives ‘ON’ Malaria ‘GONE’- Promise and Peril

Today, we are commemorating ‘World Malaria Day’. According to latest WHO estimates, globally 214 million fresh cases of…
The Global Game Changers from Biotech Industry

2016年4月18日

The Global Game Changers from Biotech Industry

Forbes has released its inaugural edition of ‘Global Game Changers’ for the year 2016 and included 30 global business…
CAR-T Cell Therapy-The Near Miracle Cure for Cancer

2016年4月11日

CAR-T Cell Therapy-The Near Miracle Cure for Cancer

No Miracle yet but I am sure that CAR-T Cell Therapy will do some wonder--just the matter of time! Read about it in…
Ultimate Review: Zika Virus, Epidemic, Drug Discovery and New Insights

2016年3月16日

Ultimate Review: Zika Virus, Epidemic, Drug Discovery and New Insights

This review was originally published at https://medgenera.com/ The history of Zika virus goes back to 1947 when it was…

1 条评论
What is your criteria for drug-like library against non-CNS target?

2016年2月8日

What is your criteria for drug-like library against non-CNS target?

I came across a very interesting project which led me to ask this question. I have been happily following 'Lipinski…

1 条评论

See all articles

Crazy, But Reinforcement Learning is No Different than Building a Startup

Pankaj Mishra, PhD

Founder & CTO - Future Therapeutics | Building proprietary AI infrastructure to find new cures | Co-founder - Neovarsity

领英推荐

Pankaj Mishra, PhD的更多文章

社区洞察

其他会员也浏览了

Assessment Insights in December ?

Empower Your Career with SheTek’s AI Essentials Bootcamp

Pioneering AI Excellence: Kuril Founder B-School Inaugurates Cutting-Edge Center of AI.

Epic human inspo & our next Maxme Masterclass

An AI Professor at Harvard: ChatLTV

Neural Jam: The Digitalization of Participatory Learning

Proactive Strategies for an AI Future- Professionally, Personally and Societally

Unleashing Potential: Generative AI, the iPhone Moment for Students

AI Literacy for Educators: Revolutionize Relationships with Generative AI

AI Flight school is about improving mindset and skill sets

领英推荐

Pankaj Mishra, PhD的更多文章

Does the potential of AI to accelerate drug discovery really require a reality check?

The Leaps and Bounds Growth Story of Biosimilars: A Progressive Review

Funding Sources for Biotech, Pharma & Healthcare Startups in India

DNA-The New Hard Drives for Digital Data Storage

Is Indian Life Sciences Really in Limelight?

CRISPR/Cas9 Gene Drives ‘ON’ Malaria ‘GONE’- Promise and Peril

The Global Game Changers from Biotech Industry

CAR-T Cell Therapy-The Near Miracle Cure for Cancer

Ultimate Review: Zika Virus, Epidemic, Drug Discovery and New Insights

What is your criteria for drug-like library against non-CNS target?

社区洞察

其他会员也浏览了

Assessment Insights in December ?

Empower Your Career with SheTek’s AI Essentials Bootcamp

Pioneering AI Excellence: Kuril Founder B-School Inaugurates Cutting-Edge Center of AI.

Epic human inspo & our next Maxme Masterclass

An AI Professor at Harvard: ChatLTV

Neural Jam: The Digitalization of Participatory Learning

Proactive Strategies for an AI Future- Professionally, Personally and Societally

Unleashing Potential: Generative AI, the iPhone Moment for Students

AI Literacy for Educators: Revolutionize Relationships with Generative AI

AI Flight school is about improving mindset and skill sets