How to pursue a data science career in the geoscience field?
Ali Ahmadalipour
Research Scientist at Google[X] | Geospatial, AI, Climate Change, Sustainability
About myself: I have been studying and working in the field of Earth sciences (i.e. satellite remote sensing, hydrology, climate change, natural hazards, and sustainability) for over a decade (since 2010). I got my PhD in 2017 and then worked as a postdoc researcher for over a year. For the past ~2 years, I have had the pleasure of working at early-stage startup companies, which has been a great experience for me. That being said, I am sure that my experience is quite limited and does not cover the entire perspective. Yet, I regularly receive messages asking questions and seeking advice, and I thought it would be good to share my personal experience and takeaways, hoping that someone finds them useful.?
The main questions that I focus here are as follows:
1. Which companies (and in what sectors) employ geospatial data scientists?
There are a lot of companies that appreciate applicants with a geospatial data science background (e.g. people with MSc/PhD degree in a field related to Earth sciences such as meteorology, hydrology, or climatology that have a decent data science background). I have listed some of the primary sectors and a few companies for each case:
The list can go on with many more interdisciplinary or narrow focused companies that are major worldwide firms (e.g. Bayer, McKinsey, SwissRe) or small startups. In my research thus far, I have found more than 70 companies (mostly startup and early stage companies) across the US that are working in at least one of the above mentioned sectors.
2. What are the essential skills an applicant should have?
Different companies require distinct skills (obviously!) that can be more in line with their focus. For instance, a company focusing on climate change impacts would probably like to hire data scientists who have analyzed climate models (e.g. CMIP5 and CMIP6 models) and know the terminology and most recent methodologies. You would need to have a decent understanding (and probably a track record of publications) on the subject. In any case, you should have strong programming and data analysis skills with Python, and you need to showcase your ability to handle large datasets (of multiple Terabytes size). Experience with multidimensional datasets (e.g. grib, NetCDF, or zarr formats) and relevant Python libraries (e.g. xarray and zarr) is a must have. You also need to be proficient with functions such as "groupby" and "select" for visualizing and performing statistical analyses on chunks of data. Additionally, experience with dask and similar high performance computing tools as well as cloud computing (using Google cloud or Amazon Web Services) is either a requirement or a positive add-on. Both Google and Amazon provide these services for free trials. So, if you are a student and your research is performed on local machines or computer clusters at your university, you can try these resources for frand gain experience with them (check out some of Google’s resources that I previously listed here).
For companies that focus on location-based services or supply chain management, the requirements can be a bit different, and these companies usually require strong SQL experience and data visualization and dashboard preparation with Tableau or similar tools.?
In all cases, a data scientist is expected to have a decent understanding and a solid experience with the most recent AI and machine learning tools (e.g. Tensorflow, Keras, and Pytorch). In most job interviews, a small exercise or a pair coding session is carried out in the initial rounds of interview to assess applicant’s qualifications. Tensorflow and Pytorch are currently two primary AI tools in the industry, and you are expected to be able to work with at least one of them. In other words, no one cares if you wrote a genetic algorithm or an ANN code in Matlab 10 years ago. If you want to be considered for data science positions, you should be able to implement the common tools and collaborate with other team members on those.
In addition to the above mentioned tools, experience with version control softwares (e.g. Github or Bitbucket) is also beneficial.
3. What is the expected salary range?
It is good to have an estimate of what you can expect (on average) working as a data scientist in a geoscience field. It can motivate one to practice and work on the required skills. The salary range obviously differs among various regions and it can depend on company size/revenue/resources and how much your work is interesting to a company. For example, consider a PhD candidate who is experienced with various data sources, has a deep understanding and a decent track record of agriculture and climatology subjects, and has developed a low-cost machine learning model that can forecast frost events a couple weeks in advance with high accuracy at farm level and beyond. This is an example of a case that is likely interesting to several companies in this sector, and it can be the case that those companies reach out to him/her even before graduation.
Normally (before the COVID-19 pandemic), the starting base salary for geoscience data scientist positions in startups is usually something between ~$90K (outside San Francisco Bay Area) to ~$140K per year. In addition, companies provide extra perks and benefits such as stock options, retirement plans, flexible and generous paid time off (PTO), daily (or weekly) lunch, and snacks. Having all the required skills and passing the interviews (and probably doing a coding assignment and presentation/site interview), companies know that the candidate has the required technical skills, so it comes down to the founders to establish a healthy and productive work environment and motivate the team to work collectively and grow the business.
4. Job hunting amid COVID-19 and the job market in the coming years
The pandemic has surely impacted the US job market (as a whole) and geoscience careers (specifically). Many early- to mid-stage startups have laid off some of their staff or adjusted their salaries, and they have mostly halted new hirings. Notably, the job market is now quite competitive and the salaries offered can be lower than the norm, especially with the possibility of remote workforce. Nonetheless, there have been many companies that raised funding (seed or series A/B) within the past few months and they have had open positions. In fact, a good approach to find possible upcoming job positions is to stay tuned to fundraising updates. Meanwhile (and unfortunately), some companies have taken the “fake position” route, posting redundant positions or keeping the old ones (and even interviewing for those fake positions and asking for proposals or research ideas), assuming (and hoping) that having open positions on their website makes them seem prestigious or growing.
领英推荐
In my opinion, the pandemic was not the only reason for the geoscience job market slowdown, and it was in fact more a catalyst for such. The decay in interest for these companies (either from customers or investors) roots back a few years ago when several startups raised multi-million dollar seed funding and tens of millions of dollars in series A, and there was a huge hype for climate-focused industries. This was concurrent with the AI and cloud computing boom, which convinced VCs to pour $$ into this sector, trying to stay ahead of the game. Extravagant claims from a few companies exacerbated this hype and helped them (and several other startups) raise unprecedented funds in this sector. In some cases, however, the approach turned out to be unsustainable.
But, the future is bright (hopefully)! The Biden-Harris administration taking the office will probably reignite climate positivity, which can bring back funding and raise new opportunities. Currently, it seems to me that the decarbonization platforms and emission monitoring companies are already getting attention and raising decent funding (e.g. several seed fundings in the past year ranging $4-6 million). For sure, I am positive about this and I hope these companies can tackle climate change from different angles and mitigate its impacts and risks.
5. Final suggestions
5.1. Connection, connection, connection!
Learning about open positions is an advantage, but not everyone knows about all these startup companies. So, try to connect with the people in these industries and keep an eye on the growing businesses. See who is doing what and where, and leverage your insights for job hunting. In addition, there are multiple forums and websites that share specific positions for geospatial jobs, and you can easily subscribe to them and get informed about openings.
5.2. Practice and learn
Data science is a rapidly changing subject, and there are new advances in the field every few months or so. An outstanding data scientist, in my opinion, devotes a certain portion of his/her schedule to learn new methods and features. Be patient and get out of your comfort zone.
If you feel you are not proficient in Python or machine learning, online courses are invaluable resources that you can benefit from. I took a couple courses on Udemy for data analysis & visualization and deep learning, and I highly recommend them.
5.3. Data matters! Collect a repository of various datasets
Machine learning models highly depend on data, and knowing about different available data sources (e.g. weather and climate datasets, land data, hydrological variables, satellite observations, and reanalysis products) and where/how to access them can substantially help you in your career. I started gathering information about various datasets during my PhD and have compiled that repository ever since, which has become a great resource in various projects.
5.4. Interview skills matter too!
Having all the required skills, you still need to nail that initial interview (/phone call) and make a good impression of yourself to proceed to the next steps. It requires practice and preparation. Therefore, you need to apply to various positions and learn from those conversations.
5.5. Stay positive and keep up the good work
Easier said than done, but try to stay positive and do not underestimate yourself just because an arrogant (/ignorant) company or founder does not value your skills (/accomplishments, potentials).
Good luck!
????????
R&D Engineer | Focused on PFAS Emerging Contaminants
3 年Thanks for sharing this! It's been a great insight into what I am trying to accomplish. The fact that you suggest online courses is amazing and likely helps people get motivated to use these resources. Just wondering if you've applied deep learning in any projects? just wanted to see how far we've gone in the application of DL in our field.
Climate Adaptation With Data
3 年Ali, thank you so much for writing this up! I found it to be hugely helpful. In your opinion should companies like UrbanFootprint, SidewalkLabs, or Kevala make it on to this list too?
Analytics, Blockchain, Geospatial Devoloper
3 年Related: https://www.dhirubhai.net/posts/kipling-crossing-a3882215a_nogdal-activity-6751341907999363072-q4eh
Co-founder @ Regrow Ag | TIME 100NEXT | MIT 35u35 | Resilient Ag and Nature Based Solutions
3 年Nice article Ali Ahmadalipour! Maybe worth including FluroSat in the ag list ;)