Should we learn programming to Future proof ourselves?
Gareth Nicholson
Chief Investment Officer (CIO) and Head of Managed Investments for Nomura International Wealth Management
A Road map for a future where you cannot ignore data science
Fortunate to have received feedback from over 75 Data Scientist in our community on where the future of data science is going, but do note this is not written for data scientists.
The goal of this article is to share a road-map for Finance professionals aware that data is something they cannot ignore for future progression, but unlikely to become data scientists themselves. I look to outline differences between the need for Coding and Drag&Drop software as solutions, PLUS provide a clear road map for tackling data science - with varying routes depending on your time and interest.
Summary of Feedback on Coding vs. Drag&Drop software solutions
Our 77 data scientists included feel:
Coding will Remain King and Python is the preferred language to start (with many suggesting collaboration with SQL) . Drag&Drop has evolved leaps and bounds of recent, but still remains a complementary tool for most. In the future some believe Auto Machine learning software with enhance Pre-build software to a point only the very specialized need to code. And likely in the further future quantum computing will solve all our questions without the nitty gritty of coding.
Further feedback below:
A Step back: Can we ignore the data revolution?
"The collective CEO future vision is one where technology and data enabled capabilities will be the future of business decision making"
So if we agree some form of data management skills are essential in our evolving digital future, is programming required or is Drag&Drop software sufficient?
To start answering this question, best to get a little technical and break down the primary utility of a "data scientist" into two broad functions:
a) Insight discovery and hypothesis formulation/confirmation
b) Creating machine learning models and deploying them into business operations
The two activities are not the same - it's an unfortunately common misconception even among data practitioners. (a) is a "humanistic" process involving both bottom-up (insight discovery) and top-down (hypothesis forming or "trying to explain the phenomenon", and statistical confirmation) activities that go beyond just analytics; communication and business acumen are key success factors. (b) is more of a "machine" process to develop automated solutions that generate consistent business impact by enabling speedy actions on data (even real-time); taking human biases, errors, and delays out of the loop are part and parcel of the end goal.
For (a), there are visual analytics tools such as Tableau which can be used in combination with data manipulation tools like Alteryx. They provide a visual/no-coding way of wrangling and exploring the data. For (b), coding is absolutely required. Even if the machine learning and modelling part can be done with tools such as Data Robot, when it comes time to "operationalise" the models, there is no escaping from coding.
Whilst there is general agreement with the above distinction, many believe that it is highly beneficial to have a basic understanding of programming for both functions, including a) Insight discovery and hypothesis formulation/confirmation
Why coding is essential?
Very simply, a single tool cannot fit all solutions. Overall code is more flexible and powerful than pre-build.
But possibly an even more important reason to code is behind the fundamental objective of coding - be able to solve complex problems and develop solutions to streamline business processes. Thus, to become more future proof, the consensus is key for one to stay agile, develop critical thinking and problem-solving skills. And there are numerous sources that state that computational thinking and experience in coding do boost problem solving and logical thinking skills
The question remains if it is not necessary to pickup hardcore, in-depth programming skills ?Feedback suggests unless you’re planning to be a framework / operating system developer or a true data scientist who researches and writes cutting-edge algorithms (text, speech, image, etc.),, because now anyone with sufficient programming foundation is able to access all of the open-source resources out there and perform really advanced work with it.
Programming is a transferable skill that once you a learn a programming language, more or less you get the gist of it and code in other languages. While Drag&Drop programming usually has less skills transfer-ability.
If coding is still king, how come Drag&Drop Software is gaining so much demand?
First lets recognize a lot of the stuff that we do today such as making a website can be super easily done using drag and drop website builder. But probably just a decade ago, building an interactive website would require some serious mastery in web development.
We are seeing the same trend with some basic machine learning stuff today whereby a data scientist can make use of fantastic tools like "BigML" to come up with basic regression models etc.
Due to the high market demand in the field and their accessibility I believe Drag&Drop software will soon explode, and I see Tableau & power BI (Microsoft) leading the way forward.
Google and Amazon won’t be left behind either, with each planning to democratize the framework of machine learning without coding by creating an offering to train a model based on your data directly in their model template (cloud based). This goal is clear to achieve advanced data science without all the nitty gritty of the programming.
Experts also believe its worth keeping an eye on quantum machine learning. There are still some limitations to what classical computing can do, particularly from the massive amount of data that is possessed by large corporations/ financial markets.. Quantum computing and algorithms associated with them might be key in the coming future to solve this, and potentially this is Big Techs way to bridge the banks.
Getting practical - Where should I start?
Different needs - Different paths : The suggested Road map for a future where you cannot ignore data science recognizes that peoples paths will vary depending on their time available and interest/available opportunities.
As such we summarize 3 paths, from most meaningful change with least time input, to being committed but not ready for coding and finally the most ambitious where the want for the skills to distrupt your career/business is high and you willing to embrace challenge to find a meaningful way to harness a data-driven future.
Roadmap
a) most meaningful change with least time – follow only Phase 3
b) committed but not ready for coding - follow Phase 2 and then 3
c) want skills to distrupt my career/business in a meaningful way through harnessing data - follow Phase 1, 2 and then 3
Phase 1 - Programming
The experts key recommendation for “personal disrupters” is to start playing with Python (programming language) and SQL (database and query language). These would provide a very good bridging experience for people who are originally not from a tech background.
Python the easiest and most accessible, open-source language would have to be Python for sure (especially in the realm of data science). It’s a clear winner here.
The most efficient method our experts find for learning Python is the following:
1) Start with platforms such as Udemy and Coursera where you can learn python from top Instructors such as Jose Marcial Portilla. Great instructors create great courses and make learning a lot more interesting. First of all it is good to get the very basics right. Things like syntax of this language, how to do loops, check logical conditions, write functions, string manipulation. Secondly I would suggest that one should look into this Python library Pandas, which is the best and most accessible data manipulation tool in Python. Lastly, matplotlib is a important python library to visualize your result.
Useful online courses in order of preference:
· Python For Everybody[coursera.com]
· Introduction to Python: Absolute Beginner[edx.com]
· Introduction to Computer Science and Programming Using Python[edx.com]
· Python for Data Science and Machine Learning Bootcamp [udemy.com - Marcial Portilla]
· AI Programming with Python[udacity.com]
· Python I: Essentials[quickstart.com]
2) But don’t spend too much time learning everything. Use course to pick up fundamentals then quickly move on to project based learning. Find things that interest you and work on it. In my opinion, the best is to find a small automation problem in daily work and try solving it using python.
In finance an interesting and very recommended project to investigate is the ability to develop a basic investment algo. For great Python programming tutorials in finance, go on PythonProgramming.net and YouTube.com/sentdex. All tutorials are free in both text and video forms.But honestly focusing an solving a daily function is most rewarding. Here are a few examples in you stuck for ideas.
3) I think learning how to Google the answers to questions is also very powerful and many believe more useful than any course you can take (in other words, done't try to learn everything and then work, instead learn what you need to know project by project).
Ability to write SQL query for data extraction from database also will be useful to get your data from your service
The learning curve in SQL is not steep to master the basics, with the best and most accessible place to start SQL basics being online with w3schools.
Actually my all time favorite resources for Data Science is Nabih Ibrahim Bawazir (linkedin: https://www.dhirubhai.net/in/nabihbawazir/)
Here is a great source of summaries on SQL to help you start from Nabid's library
Phase 2 - Drag&Drop
Due to the high market demand in the field and their accessibility, I believe Drag&Drop software will soon explode, and I see Tableau & power BI (Microsoft) leading the way forward.
Tableau is a great space to start as the software is both powerful and intuitive. There is a free public version to learn and the end product is affordable in a corporate context.
The most efficient method to start learning Tableau, suggested by experts, is the following:
I. Download public version here
II. Work-though demos to understand worksheets and dashboards
III. Create a dashboard for yourself or modify an existing. The library of dashboards are impressive and a great idea resource.
"Whats really important with the Drag&Drop space is to continue to be curious as things are changing rapidly."
The experts believe say this not just because No Code/Drag-and-Drop software like Alteryx and Tableau are incredibly powerful already, but also because of the explosion of open-source developmental frameworks (eg. bootstrap by twitter, react by Facebook, flask for python), developmental packages (eg. Tensorflow by Google), APIs (eg. IBM Watson, software-as-a-service) and microservices architecture.
Join our community for regular posts on useful software to explore in this space.
Phase 3 - Excel
And finally, I feel that we cannot overlook Microsoft Excel which has served us so well for so long.
Understandably it is unable to process big data properly, however in terms of data visualization of the final output I feel that Microsoft Excel does a great job and deserves it’s title of being the go to analytics tool for business users across every industry since its inception.
High to accelerate your Excel skills?
To sharpen up your excel skills to a level where you can meaningfully take advantage of reasonable data sets with limited time and input, I suggest the following great resources.
https://github.com/abhat222/Data-Science--Cheat-Sheet/tree/master/Excel
VBA within Excel is also a very practical skill to have not just for data analysis, but general efficiency.
I will share more in our community "Steer your Finance Career"
Feel free to join here: https://www.dhirubhai.net/groups/10411845/
Thank you to all the contributors! Some really interesting insights of which many I still need to digest & explore further. Your time is much appreciated
Global Head of Capital Markets, IWS Platforms and Digital Assets at HSBC Global Private Banking and Wealth
4 年Good piece Gareth, thanks
I Help Businesses Scale & Thrive | Connecting Data, Growth, and Innovation | Strategy & Innovation
4 年Thanks Gareth. Just the sort of summary I’ve been looking for. From my own reading, this is all very applicable well beyond the finance industry.
CEO @ Etna Research | Frontier AI for alpha discovery in asset & wealth management
4 年Pretty handy for the new comers. I am big believer of automl, I would not be surprised pricing goes down massively and it becomes mainstream in a couple of years.
Risk management specialist
4 年This is super..