writtencast 004 Fernando López Velasco - Machine Learning Engineer
Fernando López Velasco is the Head of Data Science at Hitch , which uses machine learning to improve the recruiting process. At?Hitch, Fernando has worked on projects that involve evaluating candidates from videos using deep learning. In this interview, we explore his use of machine learning at Hitch , and how he uses Amazon Web Services (AWS) as part of his machine learning stack among other topics. Fernando also tells us about how mentorship played a role in his career and which skill he considers to be the most underrated in data science and machine learning.
How did you transition from a software engineer to a machine learning engineer?
In fact, my transition was more in the other direction, from mathematical theory to software engineering. When I graduated and started my first job in the industry, I realized the need for the skills and or knowledge involved in software development. Although my specialization was oriented towards data analysis and modeling, I think it is highly important and necessary to expand our skills in order to have a better and more effective integration with other teams.
What is the most interesting project you have worked on at Hitch?
The project for evaluation of video interviews of applicants with AI models. And I must say that it has been quite a challenge. I have been fortunate to participate in the different stages of this project, from the design of the strategy for data collection and labeling to the selection and deployment of the model.
As I mentioned in the previous question, expanding your knowledge and skills in software engineering and systems design has been vital for the transcendence of the results.
You have numerous AWS certifications. How important are these certifications in the life of a machine learning engineer or Head of Data Science?
I think that the certification is a reflection that you at least know the tool or platform and some of its components. However, true experience and knowledge are acquired as you face and solve problems.
Tell us about how you are currently using AWS as part of your machine learning stack.
Of course. I currently use several AWS services for different tasks that we have to tackle, where each task belongs to a certain specific area, (for instance, Data Engineering, Machine Learning). For example, we use AWS Glue to extract data from third-party integrations we have. We use AWS Lambda and Step Functions for some data transformation pipelines which are triggered on certain events. On the other hand, in the area purely focused on ML, we use SageMaker to experiment, prototype and deploy models. In short, there are several services that we use and orchestrate to address specific tasks. It is very interesting!
What type of hardware do you use to train emotion recognition models from videos?
Given the size of the dataset and the type of architecture we use, we opted to use AWS P2 instances . NVIDIA K80 GPUs , each with 2,496 parallel processing cores and 12GiB of GPU memory.
What are the key factors to consider when training models to recognize emotion in videos? What type of models do you use for emotion recognition projects?
When it comes to implementing this type of models (that is, emotion recognition) it is important to consider:
In the case of transfer learning: the data domain with which the model was trained.
Representative data of the problem to be modeled. It is important to carry out a fair and correct collection of data such that these are representative approximations of the problem that is intended to be solved.
In the end, when it comes to ML model training, the data is the most important thing.
You have written data science and machine learning articles. What role has this played in your data science and machine learning career?
I have been writing articles on ML, DS, and statistics topics for a couple of years. I have noticed that writing has led me to reinforce knowledge, understand or visualize certain topics from different perspectives, and of course, learn new topics. Likewise, constantly learning has generated a positive impact on my development as a professional in the industry. There have been times where, when I need an answer to a problem, I remember that I have just already written something about it and I just visit my own material, it is something very interesting.
What role has mentorship played in your career?
A very important and transcendental role. In my first experience as an ML Engineer, I was fortunate to have a leader who, over time, became a mentor who left me with great learning.
I believe that for professional development, it is important to have a mentor who provides a vision that, in my case, due to lack of experience, I still could not visualize and that precisely this mentor guided me to know and explore.
From your working experience, what are the differences between the day-to-day tasks of a machine learning engineer and that of a data scientist?
Interestingly, in my experience, there has been no difference between the role of a Data Scientist and an ML Engineer. I have the impression that, in the industry, it is still not possible to differentiate the scope of each of these roles. However, a Data Scientist is a professional capable of taking data, (preprocessed or almost preprocessed), and applying methods to extract insights, patterns and, if necessary, build a model. Instead, an ML Engineer is a professional who is capable of taking a model and/or the requirements of the model and adapting it to deploy it to its endpoint.
领英推荐
What would your learning journey look like today if you just starting your journey to becoming a machine learning engineer? Which resources would you use and where would you find them?
Well, currently I focus on strengthening my software development skills, so for the moment, I focus on improving my skills with data structures and topics related to microservices, concurrency, sync and async, etc.
Commonly the first sources I consult are blogs (Medium, TDS) and YouTube videos. In my case, it is not common to take a book. I feel more comfortable exploring blogs and watching videos.
In your opinion, which are the most underrated skills in data science and machine learning?
Software engineering of course!
Apart from mainstream machine learning packages such as TensorFlow and PyTorch. Which other tools do you use in your work that are little known?
Tell us more about the link prediction from the static and dynamic knowledge graphs project.
Of course.
The link prediction problem is very interesting. The idea in a nutshell is to predict hidden links that are not found in a given graph. This problem is based on the premise that, given the topology of the graph, a pattern can be detected that allows one to infer where a new link could exist.
On the other hand, the link prediction problem has an extension that starts from the premise that links can be predicted as long as we know how the graph evolved, this makes it a dynamic graph problem.
The project in which I participated, we approached both positions to try to solve the link prediction problem applied to a specific domain. Precisely modeling the problem as a dynamic graph was the one that yielded the best results.
It was a great experience.
In your previous role, you used AutoML? Is there a strong case for using AutoML instead of building machine learning models from scratch?
In fact, we built an AutoML platform. The idea was to provide a platform in which a user would upload tabular data and automatically extract all possible insights to show an interpretable ML model. It was a very big challenge because behind the scenes, I had to design the ML system in such a way that it was capable of self-training and validating itself to show results to the end user.
You have previously used Kubeflow for the development, synchronization, and deployment of ML pipelines. Why did you choose to use Kubeflow for these tasks?
We used Kubeflow because, after exploring various alternatives, we found that Kubeflow's pipelines covered many of our needs. One of the advantages of kubeflow, is that it works with orchestrated microservices in the form of containers, this solved the problem of autoscaling and integration with backend APIs. Although another alternative was to use Kubernetes directly, we decided to use Kubeflow because, in a certain way, it facilitates the work of interacting with Kubernetes.
Where can people find you online?
LinkedIn .
GitHub .
Medium .
?? Enjoy this newsletter?
Forward to a friend and let them know where they can subscribe (hint: it's here ).
Anything else? Hit reply to send us feedback or say hello.
Join the conversation: Got more questions or comments? Join the conversation in the comments section.?