Dear Data Palawan 2 - Padawans in Data / Master Jedis in Tech

Dear Data Palawan 2 - Padawans in Data / Master Jedis in Tech


Yes, my friend, I know you’ve been in tech for a long time. Many of you have been doing tech for almost my age. My goal is not to claim being smarter than you. It’s to give you the tools to understand better the world I call my own: the world of data.

Data Science and its surrounding areas are simpler than they look yet they can bring more problems than it seems at the surface.


Let’s talk then about the practical and ethical aspects of data projects and how to thrive and keep them financially reliable and ethical at the same time.

First, let’s keep it quite simple. Data Science, Machine Learning and Deep Learning are fancy names for advanced mathematics that help us understand the world a bit better through data. ChatGPT, that fancy thing that looks like it knows all your thoughts, that’s a complex statistical machine =)

Weird, I know. ChatGPT doesn’t grasp what it’s giving as output, it’s only calculating the probability of its answer being well accepted by the user.

A simple example of ChatGPT not fully understanding the depths of a question can be observed here. While Rosenblatt did in fact made the first hardware implementation of a perceptron, it was Warren McCulloch and Walter Pitts who invented the original mathematical concept behind it. Niche AI knowledge you might think? That’s precisely why ChatGPT doesn’t get the full answer right the first time.



Only when we press it a bit more, we see that the bias of the algorithm tends to attribute the perceptron to Rosenblatt because of its physical implementation. Maybe because most of ChatGPT’s data is tech related and not maths related? Who knows?

So, you see, even the most complex algorithms are nothing butt fancy maths dependent on the data fed to them.

As you can see any algorithm is highly dependent on the data fed into it. So, here’s your first step to a good data strategy Master Jedi, set the ground with good data and methods to assure its quality. This can make it or break it in the money and time you invest in data science.

Even with a full description of what the concept of garbage in, garbage out is, Microsoft designer created this mess ?? which perfectly illustrates my point.


That means Data is at the core of everything in data science. You can have the Ferrari of Ai models, if the data it’s running from it’s trash, all it will spill out it’s trash.

Advice 1: Spend Money and Time and Manpower on Data Quality

From this we can grasp now my very first advice, dear master Jedi in tech. A sizeable portion of your budget, time and team should be dedicated to data quality.

Be sure to set good metrics right from the start (Precision and Recall are not the only ones [https://www.analyticsvidhya.com/blog/2019/08/11-important-model-evaluation-error-metrics/] ) , have in mind bias in your data (use [[] https://www.aequitas-project.eu/ ], it’s a good framework) and be aware of model drift (I’m biased towards [https://www.nannyml.com/] they’re awesome).

Always strive for the best available data, the most accessible way to access it and keep an eye on it. Invest in good data engineers, treat them well and they will teach valuable things to your precious “unicorns” the data scientists. Old school data engineers are fine, they will keep a good code quality leash on the data scientists ;)

Data is your master stone in all of this, don’t screw it up.

Advice 2: K.I.S.S (Keep it simple, Simian?)

Seriously Master Tech Jedi, stop overcomplicating the models. The simpler the solution, the more effective the maintenance will be.

“Oh but it’s very difficult to sell clustering models to clients instead of AI!” I hear the complaints. First, that’s a sales problem, not yours :P

Second, why over-engineering something? To give you a headache on support?

Sales should be worried in selling good Data Science pipelines that solve the costumer’s problems. A good linear regression which solves the issue and is easy to maintain is ten times better than an over engineered LLM that barely grasps the problem.

And with over engineered stuff comes problems. Huge Ai models waste a lot of energy for little results ([https://dl.acm.org/doi/10.1145/3442188.3445922]) and they pose serious risks for safety and privacy of people ([https://issuu.com/jhulawreview/docs/spring\_21\_journal/s/14124452] yes, deepfake of child SA is a thing, take your child’s photos out of social media, now). And let’s not talk about the very thin boundaries that cross intellectual property [https://www.nytimes.com/2024/01/08/technology/openai-new-york-times-lawsuit.html]. Do you really want to be stealing art, music or writing from their authors just in the name of an algorithm?

In sum:

Sell solutions, not algorithms. Sell actually data driven methods, not AI. Sell ethical responses to real problems, not ethically grey myths.

Advice 3: Your Junior Data Scientists Will Teach you as Much as You teach Them (so let them)

Mentor/mentee relationships in data science are the best for both parties. The kids that have just come out of the university eager to work and learn some more bring with them novelty and a lot of new tools.

Treat them well, patiently teach them the ropes of good coding practices (yes, data scientists are notoriously bad at commenting code), of good code versioning and you will get out of them bounds of energy and innovative ideas. They will sharpen your data science mind at the same time you’ll sharpen their coding / working in group/ project management tools.

Let them sit at the decision table and let them express their opinions. They will commit mistakes (as you did when you were their age) but out of those mistakes diamonds in the rough will appear and your team will be able to build great stuff.

Be sure to help them to not dull their spark of initiative and curiosity. Make them leaders, not pieces of a machine.




In the end dear tech Jedi, I believe that it’s between both of our generations (the old tech and the new tech) that we can build a better tech world. You’re far from obsolete and we have a lot to learn from you.

I hope I've imparted a nugget of wisdom in this piece. May the data force be with you!

Absolutely loving the initiative towards empowering new data enthusiasts! ?? Remember what Albert Einstein said, “The only source of knowledge is experience.” Embrace the learning journey. ???? Also, did you know there's an exciting sponsorship opportunity available for the Guinness World Record of Tree Planting? Discover more here: https://bit.ly/TreeGuinnessWorldRecord ?? #GrowWithKnowledge

??Great tips for navigating the data universe! Remember, as the legendary Steve Jobs once said, “The only way to do great work is to love what you do.” ?? Your passion for tech and data will surely lead you and your team to success! Keep sharing such insightful advice. #TechInspiration ?

Tricia Paige Bagley, MBA, M.S.

Data Science Business Partner, Employing BI & Advance Analytics for Decision-Support to Lift Profit & Limit Risk | Planter of Seeds | Connector of Dots | Engineer of Impactful Outcomes | Problem Solver | Execution Diva

1 年

Advice No 3 is spot on! Synergy within a team and across teams can only come from a porous attitude of “always be learning from each other.” In other words, each player—no matter their background or experience—must embrace being a sponge so the team and business reap the rewards in quality and timeliness to insights and delivery.

要查看或添加评论,请登录

Susana P.的更多文章

社区洞察

其他会员也浏览了