The Value of Mentorship in the World of Data
I have been fortunate to have worked on applications of data analytics to the energy industry through both internships and academia. However, the amount there is to learn in this field, even after four corporate internships, never fails to amaze me. Much of this is due to the great mentorship I have had access to. I began my first official job as a software engineer at Schlumberger two weeks ago. Though I was without a company laptop for the first week, I still learned so much just by shadowing other developers.
In the past few years, several industry domains have been focused on making the most of big data and statistical learning techniques. This is extremely relevant in the oil and gas industry because, as my internship mentor told me two years ago, there is an "explosion of data" coming from so many sensors, especially with unconventional resources. Furthermore, Schlumberger builds software products and platforms for clients to analyze this large amount of data. Both as an intern and a full time employee, I have been in a role that intersects software development and data analytics.
I worked on a large smart grid data analytics project at the end of my undergraduate and beginning of my graduate programs. This project introduced me to important topics such as system architecture for big data, machine learning, and statistical quality control. I would like to share some of the simple but valuable insights my mentor at Duke, Kyle Bradbury, imparted to our project team:
1) Plot your data - before any analysis, plot the data so you can see what it looks like and know exactly what you are working with. This can give you a clue if your approach is on the right path or not. For example, with time series data, plotting the data immediately shows the range of values you are working with as well as the shapes and behavior of trends that are interesting in the data.
2) More data is not always better - the quality of the data matters, and feeding all of your data into a machine learning algorithm will more often than not provide unsatisfactory results. This comes up a lot in the feature extraction step of a machine learning algorithm. For example, facial recognition in image processing will use certain features, or qualities of each individual, to differentiate them. 1 feature is certainly not enough - if you only looked at the size of each face in a picture, all the results would get mixed up. On the other extreme, using every pixel of the face to identify that person will only confuse the algorithm. The goal is to find a sweet number of features that provide separability of the events you are trying to classify. Another example of too much data being a problem is in forecasting - rather than trying to forecast on the entire time series, you might have to narrow down to a recent region that is representative of the desired trend.
3) Understand the context - just like how the quality of the data matters, the meaning is also important. It is easy to take a dataset and come up with an endless number of conclusions using many toolkits available today. Do not look at data blindly. Know where it came from, and how it fits into the bigger picture of the domain. This will help draw meaningful conclusions.
I have already referred back to these points of advice in the first two weeks, and I look forward to learning so much more in my career!
Data Science Architect at Schlumberger
8 年Very good article. I think we have just started to touch data in oil and gas. There is so much to explore and learn. I couldn’t agree more with your (2) point. Noise removal is a very crucial step for feature extraction. Keep it up!
Communications @ HashiCorp
8 年Succinct and widely applicable advice, thanks!
Chief Digital Officer at Big Data Energy Services
8 年You have great insight just two weeks into your first job. You are going to have an awesome career!!!
Lead Production Engineer @ Chevron | Petroleum Engineering Professor | 40 Under 40 | YouTuber & Podcaster (PetroPapers) | CrossFit Level 1 Trainer
8 年I highly suggest you check out the Digital Oilfield Energy Conference if you have not yet this year. The conference touches upon everything you mention in this article. I think the 2016 conference passed but go next year!