To succeed as a data scientist

Some of my dear students, younger (and brighter) team-members :) and colleagues ask me how to succeed as a data scientist. Before anything else, I would like to thank and salute the European Commission teams of experts for all the cooperation they all provided to make all previous projects successful. Our experts from Spain, Roumania, England, Italy, and of course Austria and Germany. Your continuous cooperation was adding to every project and to me personally all the time, even if you might feel the opposite “success is team expression”, and I was so lucky to work with excellent teams all the time. Thank you.

But actually my answer here may be a little bit disappointing; "there is no one single path to achieve this".

If you take my word, I can only tell you two kinds of hints; what to do, and what not to :)

[ I keep my right to update this later on, you all know how weak my memory is :) ]

To Do:

-Solve your critical problems as early as possible, find the right resources, make sure you have the suitable infra-structure, and above all, find the right field experts (unless you are doing basic common mission).

-Build your own tool-box and let it grow wisely with different projects. By the word “wisely” I mean just follow the OOD while you are thinking please. Nobody needs to waste the team time on testing blocks of solutions that were already tested before. Abstract and generalize.

-For each new project or mission, make sure to understand the objectives and communicate it with context experts whether you are doing simple classification to suggest a new pricing scheme, or you are forecasting the financial market assets movements. Do not panic when you get a huge missions, with proper understanding and suitable resources (almost) anything is doable.

-Introduce fake outputs as examples to make sure you will meet the stakeholders expectations

-Read as much as you can, and do your best to understand what is behind the words. It is not what people could do before; it is how they thought to reach there. Your best tool set is the extracted abstracted techniques of problem solving. I am not listing all good sources, but you can find very good experts in my linked in connections, I suggest you follow people who share the same interest. 

-Re-introduce the problem context to yourself, first. Then get some of the context experts and try to introduce your new perspective to them (to validate your understanding and for you to make sure you really understand it to the extent you can easily describe it). Then you are ready to introduce it to the "Model"

-From time to time, remind yourself of the mission objective(s). Most of new comers were slow or even failed because tools swallowed their eyes and souls. They end up producing fancy representation of the same thing they started with. Answering shallow questions that could be answered without any data science at all.

-In data science it is either success of failure, this is true; is not everything? Yes, but take care, do not expect success signs to appear later than 35% - 40% of the mission time. A mission plan should include 35%-40% for the actual problem solving; the rest of the time is for the validation, and representation.

-For those who did not practice algorithms design, or at least implementation I suggest they should try (on a very simple scale). It does not only help them understand the internals of different algorithms, but also helps them understand how algorithms parameters affect their behaviour. At least they will start to value the tools and libraries they are using. I understand that some of you did not go that deep after the university study, but once you try it will all come back to you.

Do not do any of the following:

-Do not feel that your understanding is complete (we are humans, which means there usually more to know)

-Do not feel that your work is perfect, whenever you feel that everything is perfect, be sure that you have forgot something really big. Machine learning and deep learning are usually used with problems that are complex enough for normal techniques (unless you are doing some comparative-researches) and most of these problems require huge hidden process that is very hard to follow up and track. 

-Do not try to learn by reading only, practice is the only proof that you started to understand a new topic.

-Do not read on data science only, most of the working short cuts are out there in other sciences and world daily practices (especially in physics by the way).

-Do not believe everything you read (read to learn and validate, not just to know)

-Do not jump to new missions or projects before you re-think what you did, and why you could succeed or fail.

-Do not evaluate your results by yourself. Context experts have the last word here.

-Everyone will tell you “Beware of over fitting”. Yes in real implementations this is right, but during the initial data assessment you need to see what over fitting looks like in your specific mission, the 80/20 are not fixed numbers, invest some more time on exploring the above 80 area. Sometimes, the progress behaviour within this area is the golden key (the chase away phenomena, I guess I wrote about this on 2005 or 2006 in using Genetic Algorithms for autonomous control systems [I will re-write about this soon I guess])

Although these hints can work for software engineering in general. They are much more important in data science. Finally, expectations management, and the best way to do this is to keep all stakeholders updated (even if they do not have time, invite them for lunch and let them know your road map and different outputs over phases, and keep this involved and aware of progress)

Rasha Fahim

Executive Director - Textile Export Council of Egypt (TEC)

6 年

Thank you for such a valuable article. Despite I'm not in the field (& I really hope to be one day) but all mentioned techniques and guidance points are vey valid for any scope of management. All the best our mentor ??

回复

要查看或添加评论,请登录

Ahmed B. Moharram的更多文章

  • Time Series Analysis in Manufacturing

    Time Series Analysis in Manufacturing

    Time series analysis refers to the systematic examination of a sequence of data points, each associated with a specific…

    2 条评论
  • Machine learning worst enemies; Mickey Mice

    Machine learning worst enemies; Mickey Mice

    I know this is strange, but I really recommend that you see this short story of Mickey Mouse before you continue…

    2 条评论
  • A little bit more details on pre-processed genetic material for Genetic Algorithms acceleration

    A little bit more details on pre-processed genetic material for Genetic Algorithms acceleration

    The question: Can Genetic Algorithms be used for real-time autonomous control? The quick answer is; Yes, and some…

  • My roots ...

    My roots ...

    Just a few moments ago, I had an epiphany about my longstanding passion that has always existed in the depths of my…

  • Today advice

    Today advice

    As a data scientist, you have to know that people awareness (some times including you organisation management) of the…

社区洞察

其他会员也浏览了