How I built a compbio project in my free time to land a biotech job
I have written a lot about the power of industry-relevant computational biology projects to strengthen your biotech/pharma job search. So I figured I might as well give a detailed account of how I built such a project in my spare time to land my first biotech compbio job. It is not a trivial undertaking, but it can be done in parallel to your day job with a degree of discipline.
Pre-Job Search Positioning
In the fall of 2018, I was working full-time in a neuroscience lab at Harvard while studying part-time for a master's in applied statistics at Penn State. I did not get a solid statistics foundation in college, so I wanted to use this master's to shore up that knowledge gap. Plus, I had a tuition reimbursement benefit from my job that helped to pay for this degree.
At that time I was also scrolling through LinkedIn daily to educate myself about jobs I might want. After having studied LinkedIn job posts for many months, I noticed that many biotech/pharma employers wanted a computational biologist who can analyze single-cell RNA-seq (scRNA-seq) data. So I thought it might be good for my future job search to produce a computational analysis to demonstrate that skill. I aimed to complete that project before I allowed myself to apply for industry jobs.
That semester I took STAT 555: Statistical Analysis of Genomics Data, taught by Dr. Naomi Altman, as part of my master's coursework. STAT 555 required a final project, which was convenient, because I co-opted it as an opportunity to demonstrate my scRNA-seq data analysis ability to biotech/pharma employers.
The Project: Sleep Neurons
For STAT 555's final project I was free to propose any statistical analysis of any omics dataset. So, in true Figure One Lab fashion, I proposed to analyze a published scRNA-seq dataset from a Nature paper about sleep-promoting neurons. My goal for this project was to recapitulate and then extend some of the authors' observations using their data. If I did that, it meant I was analyzing data at a level appropriate for a peer-reviewed Nature publication. I wanted to challenge myself to do that.
Some context: I had some background in neuroscience, but had no exposure to neural systems controlling sleep. I was also fairly new to scRNA-seq data analysis. I was not in my comfort zone for this project.
Instead of walking you through the entire project, I will post just its introduction and discussion sections as an overview of how I set up the project and what I found.
Here is the introduction section of my project report:
Chung et al. are interested in a specific population of neurons in the ventrolateral preoptic area (VLPO) and the median preoptic area (MnPO) of the hypothalamus that controls sleep in mice. They found that these neurons are active during sleep and that activating these neurons induces sleep. While the general location of these neurons is known to be the preoptic area (POA) of the hypothalamus, it is not clear which genetic markers distinguish these neurons from other types of neurons in the POA, which also contains wake-active neurons. Knowing these markers is crucial for accurate genetic targeting and circuit analysis of these sleep-inducing neurons. Chung et al. used single-cell RNA-seq on a purified population of sleep-inducing neurons from the POA. They found that Tac1 and Pdyn are two strong markers for this population of neurons. However, they did not analyze their scRNA-seq dataset further to ask whether there might be subtypes of neurons that subdivide this population. Because sleep is a finely regulated state with distinct stages (NREM and REM), it may involve more than a single type of neurons. Therefore, it is important to ask whether there are subtypes of sleep-inducing neurons that might coordinate different stages of sleep, and what the subtype-specific markers are.
Here is the discussion section of my project report:
I have reanalyzed Chung et al.’s scRNA-seq dataset mainly with the use of the the Seurat package in R. My analysis showed that these neurons could be subdivided into at least three groups according to their gene expression patterns. The largest group, as previously described by Chung et al., consists of neurons expressing Tac1 and Pdyn. These two markers capture the majority of the sleep-inducing neurons in this study. My further analysis, however, revealed that at least two other groups of neurons, distinct from the Tac1/Pdyn neurons, can be defined from the same dataset. One of these groups is marked by Gpx3/Ngb (Cluster 3), and another group is marked by Chat/Cd44/Slc5a7 (Cluster 4). Furthermore, the Tac1/Pdyn population has a subset of neurons that is marked by Col19a1/C1ql2 (Cluster 1/2). These results suggest that within the population of sleep-inducing neurons in the POA, transcriptionally distinct subpopulations exist. Having the markers for each of these subpopulations paves the way to validation steps in vivo to show whether these genetically distinct cells are also anatomically distinct in the POA and functionally distinct in the fine regulation of sleep. Initial data from the Allen Brain Institute already suggest that these are anatomically distinct populations.
You can see the full project report here, which includes many more technical details of my analysis.
领英推荐
I was pretty happy with how this project turned out. I reproduced the authors' main observations of neuronal subtypes in their scRNA-seq data. And then I made new observations the authors had not mentioned. I validated those observations using in situ hybridization data from mouse brain tissue from the Allen Brain Institute.
The Job Search
When I finally felt ready to start applying for biotech/pharma compbio jobs in 2019 (almost a year after I had completed the project above), I went into my job search armed with this project and a few others I had done by then. At that point I was by no means the conventional applicant. I did not have a PhD. I did not have an extensive history of programming or statistics. I did not work in a lab that focused on computational biology. I had no internal referrals. I was just groping along in the dark and applying for jobs cold.
My project-centric strategy, which took me a full year to execute, worked. Armed with industry-relevant projects, I eventually got several interviews and a job offer. I had convinced one hiring manager that I could do the job, and that was all the opportunity I needed.
The Takeaway
Don't underestimate the power of well-executed projects to open doors in the world of computational biology. All the data and methods you need to put together compelling projects are freely available online. The rest is up to you.
Relevant LinkedIn Posts
Here I present three of my past LinkedIn posts that further explain my thoughts on building good compbio projects inspired by in-demand skills in biotech/pharma.
CS + Bio @ Simon Fraser University | Passionate about Systems Modelling for Engineering Biology
2 个月This is great, thanks Dean! For an adjacent resource, perhaps for the "next step" like a second project for those interested in getting experience in other analyses, skills, etc., I found this guide providing advice on taking an idea to a(n initial) tangible project plan quite informative: https://doi.org/10.1371/journal.pcbi.1010786 It is written by a professor and veteran of computational biology at the University of Washington.
It’s a great inspiration, thanks for the advice!
BIOCHEMIST
6 个月Very informative thank you
Operations Data Administrator @ Los Angeles Convention Center | Data Analysis, Asset Management
6 个月Brilliant share. Thanks for the motivation.
Helping Professionals Land Jobs & Businesses Find Top Talent | LinkedIn Optimization | Professional Resume Writer | ATS Expert | Data Analyst | QC/QA Chemist
6 个月Great Dean Lee