Program = Data + Algorithm
Based on my experience all of our developers life is spent for creating program. But i think there is a huge philosophy inside the simple program. Program basically is about data and processing. We have data structure and algorithm. Data also need to be stored. So we need to have a schema or data type. We need to manage that one so we can access that easily.
It doesn't matter how complex your software or how big your system. It still consist of data and algorithm. Let's talk about big data.
A lot of jargon is trying to define a big data. Like 3V, 4V or VXX. But at the technological level and engineering perspective that's just a scalable data storage and scalable processing that arranged with efficiency in mind.
Given the size of the data you need to think about how to minimize data movement. You can't just get the huge data from multiple node then process that into a single one. That would defeat the divide and conquer paradigm. Instead of get the data to us then do computation. We just need to push the code closer to the data. Because basically the code is way smaller than the data itself right?
So when you are working on the big data problem, always asked your self how the network impact will affect your data processing system. Because remember 8 fallacy of distributed computing. It all scary when there's a network in between.
So data should have storage. You also need to have a schema so it will be easier to access. You can also choose the binary representation of the data. And don't think JSON, XML or CSV is the only way to represent the data. I hear you mention table. But at the end database is all about how to manage file system. Wether it's single node or multiple node.
If you love to see the data instead of the reading the code itself. Then you should pick data science path. As the code is just a toys for you to massage and wrangling and squeeze out the last drop of the insight from your data.
But if you love to see how your algorithm working, debugging, messaging, monitoring, production pipeline, networking, distributed system and a lot of complexity on managing system in production. Data or AI engineering is the best bet for you.
So which one do you prefer? Engineering or Scientist. Both should work together in harmony. As they are create to love each other from the day one. Learn to loyal to your profession.
Cheers
Dev
4 年Risman Adnan Jesse Anderson