登录查看更多内容

How I built a compbio project in my free time to land a biotech job

Dean Lee

Figure One Lab: A Gateway Computational Biology Experience | 1931 Code-Enabled Biologists and Counting

发布日期: 2024年9月3日

I have written a lot about the power of industry-relevant computational biology projects to strengthen your biotech/pharma job search. So I figured I might as well give a detailed account of how I built such a project in my spare time to land my first biotech compbio job. It is not a trivial undertaking, but it can be done in parallel to your day job with a degree of discipline.

Pre-Job Search Positioning

In the fall of 2018, I was working full-time in a neuroscience lab at Harvard while studying part-time for a master's in applied statistics at Penn State. I did not get a solid statistics foundation in college, so I wanted to use this master's to shore up that knowledge gap. Plus, I had a tuition reimbursement benefit from my job that helped to pay for this degree.

At that time I was also scrolling through LinkedIn daily to educate myself about jobs I might want. After having studied LinkedIn job posts for many months, I noticed that many biotech/pharma employers wanted a computational biologist who can analyze single-cell RNA-seq (scRNA-seq) data. So I thought it might be good for my future job search to produce a computational analysis to demonstrate that skill. I aimed to complete that project before I allowed myself to apply for industry jobs.

That semester I took STAT 555: Statistical Analysis of Genomics Data, taught by Dr. Naomi Altman, as part of my master's coursework. STAT 555 required a final project, which was convenient, because I co-opted it as an opportunity to demonstrate my scRNA-seq data analysis ability to biotech/pharma employers.

The Project: Sleep Neurons

For STAT 555's final project I was free to propose any statistical analysis of any omics dataset. So, in true Figure One Lab fashion, I proposed to analyze a published scRNA-seq dataset from a Nature paper about sleep-promoting neurons. My goal for this project was to recapitulate and then extend some of the authors' observations using their data. If I did that, it meant I was analyzing data at a level appropriate for a peer-reviewed Nature publication. I wanted to challenge myself to do that.

Some context: I had some background in neuroscience, but had no exposure to neural systems controlling sleep. I was also fairly new to scRNA-seq data analysis. I was not in my comfort zone for this project.

Instead of walking you through the entire project, I will post just its introduction and discussion sections as an overview of how I set up the project and what I found.

Here is the introduction section of my project report:

Chung et al. are interested in a specific population of neurons in the ventrolateral preoptic area (VLPO) and the median preoptic area (MnPO) of the hypothalamus that controls sleep in mice. They found that these neurons are active during sleep and that activating these neurons induces sleep. While the general location of these neurons is known to be the preoptic area (POA) of the hypothalamus, it is not clear which genetic markers distinguish these neurons from other types of neurons in the POA, which also contains wake-active neurons. Knowing these markers is crucial for accurate genetic targeting and circuit analysis of these sleep-inducing neurons. Chung et al. used single-cell RNA-seq on a purified population of sleep-inducing neurons from the POA. They found that Tac1 and Pdyn are two strong markers for this population of neurons. However, they did not analyze their scRNA-seq dataset further to ask whether there might be subtypes of neurons that subdivide this population. Because sleep is a finely regulated state with distinct stages (NREM and REM), it may involve more than a single type of neurons. Therefore, it is important to ask whether there are subtypes of sleep-inducing neurons that might coordinate different stages of sleep, and what the subtype-specific markers are.

Here is the discussion section of my project report:

I have reanalyzed Chung et al.’s scRNA-seq dataset mainly with the use of the the Seurat package in R. My analysis showed that these neurons could be subdivided into at least three groups according to their gene expression patterns. The largest group, as previously described by Chung et al., consists of neurons expressing Tac1 and Pdyn. These two markers capture the majority of the sleep-inducing neurons in this study. My further analysis, however, revealed that at least two other groups of neurons, distinct from the Tac1/Pdyn neurons, can be defined from the same dataset. One of these groups is marked by Gpx3/Ngb (Cluster 3), and another group is marked by Chat/Cd44/Slc5a7 (Cluster 4). Furthermore, the Tac1/Pdyn population has a subset of neurons that is marked by Col19a1/C1ql2 (Cluster 1/2). These results suggest that within the population of sleep-inducing neurons in the POA, transcriptionally distinct subpopulations exist. Having the markers for each of these subpopulations paves the way to validation steps in vivo to show whether these genetically distinct cells are also anatomically distinct in the POA and functionally distinct in the fine regulation of sleep. Initial data from the Allen Brain Institute already suggest that these are anatomically distinct populations.

You can see the full project report here, which includes many more technical details of my analysis.

领英推荐

Why can't I find entry-level computational biology…

Dean Lee 1 年前

Discover BDG Lifesciences: Your Trusted Partner in…

BDG LifeSciences 3 个月前

Dr. Denis Karwigi: Exploring a Career in…

Your Pharmacists Diary 6 个月前

I was pretty happy with how this project turned out. I reproduced the authors' main observations of neuronal subtypes in their scRNA-seq data. And then I made new observations the authors had not mentioned. I validated those observations using in situ hybridization data from mouse brain tissue from the Allen Brain Institute.

The Job Search

When I finally felt ready to start applying for biotech/pharma compbio jobs in 2019 (almost a year after I had completed the project above), I went into my job search armed with this project and a few others I had done by then. At that point I was by no means the conventional applicant. I did not have a PhD. I did not have an extensive history of programming or statistics. I did not work in a lab that focused on computational biology. I had no internal referrals. I was just groping along in the dark and applying for jobs cold.

My project-centric strategy, which took me a full year to execute, worked. Armed with industry-relevant projects, I eventually got several interviews and a job offer. I had convinced one hiring manager that I could do the job, and that was all the opportunity I needed.

The Takeaway

Don't underestimate the power of well-executed projects to open doors in the world of computational biology. All the data and methods you need to put together compelling projects are freely available online. The rest is up to you.

Relevant LinkedIn Posts

Here I present three of my past LinkedIn posts that further explain my thoughts on building good compbio projects inspired by in-demand skills in biotech/pharma.

Figure One Lab

16,593 位关注者

Tom X

CS + Bio @ Simon Fraser University | Passionate about Systems Modelling for Engineering Biology

2 个月

This is great, thanks Dean! For an adjacent resource, perhaps for the "next step" like a second project for those interested in getting experience in other analyses, skills, etc., I found this guide providing advice on taking an idea to a(n initial) tangible project plan quite informative: https://doi.org/10.1371/journal.pcbi.1010786 It is written by a professor and veteran of computational biology at the University of Washington.

Alina Aliaskerova

6 个月

It’s a great inspiration, thanks for the advice!

AMRITH RAMAKRISHNAN.D

BIOCHEMIST

6 个月

Very informative thank you

Tamara W

Operations Data Administrator @ Los Angeles Convention Center | Data Analysis, Asset Management

6 个月

Brilliant share. Thanks for the motivation.

1 次回应

Toheed Murtaza

6 个月

Great Dean Lee

1 次回应

查看更多评论

要查看或添加评论，请登录

Dean Lee的更多文章

~40% of life science PhD students in the US don't graduate

2025年1月16日

~40% of life science PhD students in the US don't graduate

It's been almost 7 years since I officially left my life science PhD program. When I first left, I wondered who else…

70 条评论
Figure One Lab Update: Bare Minimum R is Out

2025年1月6日

Figure One Lab Update: Bare Minimum R is Out

After hemming and hawing for weeks, I decided that I will never truly feel ready to release Bare Minimum R, the first…

26 条评论
Figure One Lab Course Update: How Much Guidance Is Too Much?

2024年12月2日

Figure One Lab Course Update: How Much Guidance Is Too Much?

Here is another update on my progress in building Figure One Lab online courses designed to help biologists pick up…

16 条评论
Figure One Lab Course Update: Posit Cloud + Kajabi, Bare Minimum R

2024年11月25日

Figure One Lab Course Update: Posit Cloud + Kajabi, Bare Minimum R

Here is another update on my progress in building Figure One Lab online courses designed to help biologists pick up…

13 条评论
Figure One Lab Course Update: Easy Code Environment Setup

2024年11月18日

Figure One Lab Course Update: Easy Code Environment Setup

Last week I made a commitment to build a course to help biologists pick up computational skills less painfully. So I…

24 条评论
The path from bench to computational biology requires a piecemeal learning approach

2024年9月25日

The path from bench to computational biology requires a piecemeal learning approach

The tagline of this newsletter is “Pathways to Computational Biology.” So far in 2024 I have interpreted this tagline…

10 条评论
Week 5 of F1L Internship Emulator: The Slides

2024年8月19日

Week 5 of F1L Internship Emulator: The Slides

In the previous week, I asked participants to take the initiative to explore the Kinker et al. scRNA-seq dataset.

2 条评论
Week 4 of F1L Internship Emulator: The Biology

2024年8月5日

Week 4 of F1L Internship Emulator: The Biology

Last week you should have successfully replicated Figure 1B of Kinker et al. (DOI: 10.

3 条评论
Week 3 of F1L Internship Emulator: The Data

2024年7月29日

Week 3 of F1L Internship Emulator: The Data

The goal this week is to explore the scRNA-seq data from Kinker et al. (DOI: 10.

4 条评论
Week 2 of F1L Internship Emulator: The Paper

2024年7月22日

Week 2 of F1L Internship Emulator: The Paper

Read the Paper One of the greatest shortcomings in modern life science education is not forcing students to read more…

4 条评论

See all articles

How I built a compbio project in my free time to land a biotech job

Dean Lee

Figure One Lab: A Gateway Computational Biology Experience | 1931 Code-Enabled Biologists and Counting

Pre-Job Search Positioning

The Project: Sleep Neurons

领英推荐

The Job Search

The Takeaway

Relevant LinkedIn Posts

Figure One Lab

16,593 位关注者

Dean Lee的更多文章

社区洞察

其他会员也浏览了

Steps to learn Next Generation Sequencing Analysis?

A Deep Dive into STAT5005 Quantitative Biology Course

From Showmanship to Science: Academic Negotiations

Unlocking the key to genetics with Bioinformatics

From Showmanship to Science: Academic Negotiations

From Wet Lab to Dry Lab and Artificial Intelligence. Why Professionals Choose Howest.

Bioinformatics

Postdoc position in Natural Products Metabolomics at University of Geneva

How to Become a Successful Bioinformatician? -4 Skills

How to Become a bioinformatic in India: A Comprehensive Guide

Pre-Job Search Positioning

The Project: Sleep Neurons

领英推荐

The Job Search

The Takeaway

Relevant LinkedIn Posts

Figure One Lab

16,593 位关注者

Dean Lee的更多文章

~40% of life science PhD students in the US don't graduate

Figure One Lab Update: Bare Minimum R is Out

Figure One Lab Course Update: How Much Guidance Is Too Much?

Figure One Lab Course Update: Posit Cloud + Kajabi, Bare Minimum R

Figure One Lab Course Update: Easy Code Environment Setup

The path from bench to computational biology requires a piecemeal learning approach

Week 5 of F1L Internship Emulator: The Slides

Week 4 of F1L Internship Emulator: The Biology

Week 3 of F1L Internship Emulator: The Data

Week 2 of F1L Internship Emulator: The Paper

社区洞察

其他会员也浏览了

Steps to learn Next Generation Sequencing Analysis?

A Deep Dive into STAT5005 Quantitative Biology Course

From Showmanship to Science: Academic Negotiations

Unlocking the key to genetics with Bioinformatics

From Showmanship to Science: Academic Negotiations

From Wet Lab to Dry Lab and Artificial Intelligence. Why Professionals Choose Howest.

Bioinformatics

Postdoc position in Natural Products Metabolomics at University of Geneva

How to Become a Successful Bioinformatician? -4 Skills

How to Become a bioinformatic in India: A Comprehensive Guide