登录查看更多内容

Tech Lessons for CEOs and Boards: To Build a Data Pipeline, Remove the People

Andrew Tahvildary

CTO | Tech & Strategy Advisor | Scaling Engineering Teams for Growth, IPOs & M&A | Startup Mentor/Advisor

发布日期: 2023年1月18日

A successful B-round startup company that I worked with in the past, had an ambitious business goal to 3x revenue for the year.

In order to achieve that business goal, we had to:?

consider additional innovation to expand TAM,
set ambitious product development goals to launch new features faster
double the size of the engineering team and scale up the skill sets
raise the bar on product development processes to improve time to market
and many, many more …

This ambitious plan also? required us to accommodate third-party data in addition to the company's own data sources.?

So we needed to evaluate the company’s data platform to make sure it could handle the new sources of data. Was there any technical debt built up in the platform that needed to be addressed? We started by evaluating the platform data pipeline. Basically, these are automated processes that shift data from one location or format to another. A basic example of this might be collecting data in one system and then moving it to another system for further analysis. Data pipelines are an essential component of any business that performs or provides detailed, high-volume data analytics. We needed to pull apart the data pipeline to ensure it could accommodate third-party sources.?

In fast growing companies, teams sometimes make short-term decisions that may turn into technical debt. Technical debt is a term technologists and engineers use to describe software, hardware, and any other element of technology that will require further investment and modification at a later date. For example, an engineering team might choose to use Google Maps for its initial data source for maps with the understanding that later on it will become too expensive and need to be switched out for something more economically scalable. Or an infrastructure team might elect to write code that quickly solves a problem but will be harder to maintain and will require a rip-and-replace at a later date.

There were a few technology upgrades that we needed to address among various technical debt buckets. But the biggest source of product risk was “people in the middle of a data pipeline”. This is a pretty common but very serious example of technical debt — the use of people to do jobs that should have been done by technology. The “people in the middle of a data pipeline” problem impacted everything in our product and technology expansion plans.?

People-in-the-middle inject many serious risks into both data pipelines and technology products as a whole, Those include”?

领英推荐

12 principles for value creation in a data ecosystem

Denny Wong 2 年前

Time to Value the Currency of 'Data Operations'(Data…

Patrick Mutabazi 7 个月前

Revolutionizing Data Processing: How DSPyGen and…

Sean Chatman 12 个月前

Human error: People are prone to making mistakes, and this is especially true when they are handling large volumes of data. If someone is manually manipulating data as it moves through the pipeline, there is a risk that they will make mistakes. This can lead to data loss or corruption.
Inefficiency: Having people manually handle data as it moves through the pipeline can also be time-consuming and inefficient. Automating the process can help ensure that data can move quickly and accurately.
Lack of scalability: If the data pipeline needs to process a large volume of data, it may be unfeasible to have people manually handling the data. Automating the process allows the pipeline to scale more easily.
Security concerns: If people are handling sensitive data as it moves through the pipeline, there is a risk that they may inadvertently expose the data to unauthorized parties. And there is an increased risk of data breaches due to human error or malicious intent. This risk can be reduced via automation.
Lack of standardization and documentation: If people are manually handling data as it moves through the pipeline, it is more difficult to ensure that the process is consistent and follows established standards. It can get difficult to keep track of the various steps in the process and to document them accurately. This can make it difficult to understand what has happened to the data and to troubleshoot any issues that arise. Automating the process allows for more standardization and consistency.
Dependence on specific individuals: If the data pipeline relies on specific individuals to handle the data, it can be disrupted if those individuals are unavailable or leave the organization. Automating the process can help ensure that the pipeline is not dependent on specific individuals.
Difficulty in monitoring and auditing: If people are manually handling data, it can be difficult to monitor and audit the process to ensure that it is being carried out correctly. Better monitoring and auditing can be achieved via automation.
Difficulty in maintaining data integrity: If people are manually handling data, it can be difficult to maintain the integrity of the data. Automating the process can help ensure that the data remains accurate and consistent.
Limited flexibility: If the data pipeline involves manual processes, it can be more difficult to adapt to changing requirements. Automating the process allows for more flexibility and the ability to quickly make changes as needed.
Higher costs: Having people manually handle data as it moves through the pipeline can be more expensive than automating the process. Automation can? increase efficiency and help reduce labor costs.

For all these reasons, we made it a Priority-1 project to fix our data pipeline and make it entirely automated. Yes, people still need to monitor the data pipeline to make sure its working and to periodically verify the accuracy of the data. But the pipeline can now take raw data inputs from third-parties and run them through workflows using tools we built or configured to generate outputs that can be shared directly with customers — no human interaction required.?

The lesson of this story is simple. When your company needs to grow quickly and make changes to its product rapidly, technical debt tends to come due. Yes, all the decisions you made before that accrued technical debt might have helped you to get to that moment. But often a short-term decision and the associated debt becomes a major blocker to progress.?

All those things could create a bottleneck and complexity when you're trying to scale. CEOs and non-technical managers looking to build products and companies that scale should always ask three simple questions:

How much technical debt are we carrying?
Where is this technical debt in our product?
How can we address this technical debt?

Every competent CTO and VP of Engineering has asked these questions and thinks about these issues constantly. CEOs and Board Members can and should work with the engineering team to track any build up on technical debt. Not all technical debt needs to be fixed immediately. Some of it can go on for many years. All technical debt, however, should be tracked and considered because at the end of the day, all debts must be paid — sooner or later.

Note: Thanks to Alex Salkever for helping edit this post

#technology #cto #techdebt #founders

Lydia Varmazis

Head of Venture Portfolio | Chief Product Officer | Board Director

2 年

These are great insights, ?? Andrew Tahvildary. Thank you for sharing.

1 次回应

Alex Salkever

Techquity.ai / Vionix Biosciences / Product + GTM Advisor (focus on Open Source, AI, and where they meet) / Author of books about Technology, AI and Society / Strong Opinions, Gently Argued

2 年

So I wonder Andrew Tahvildary what do you recommend for building pipelines on data sources that frequently change their structure? Curious, b/c I know web scraping is MISERABLE. Most CEOs think you can just point a scraper at stuff and it works!

查看更多评论

要查看或添加评论，请登录

Andrew Tahvildary的更多文章

How Managers can Learn Valuable Insights from Hackers' Mindset

2023年6月1日

How Managers can Learn Valuable Insights from Hackers' Mindset

Just finished an eye-opening read from Harvard Business Review on how managers can learn valuable insights from…
Microsoft Researchers Claim GPT-4 Shows "Sparks" of Artificial General Intelligence

2023年5月22日

Microsoft Researchers Claim GPT-4 Shows "Sparks" of Artificial General Intelligence

Microsoft's experimentation with a new AI system, OpenAI’s GPT-4, sparked a noteworthy debate about the boundaries of…
Serverless vs. Microservices vs. Monoliths: Why Every CEO Should Focus on Results, Not Labels

2023年5月11日

Serverless vs. Microservices vs. Monoliths: Why Every CEO Should Focus on Results, Not Labels

An article by the Amazon Prime Video team about how they optimized their infrastructure spend and software design…
Digital Darwinism

2023年4月26日

Digital Darwinism

Digital Darwinism: Will Your Company Survive or Become Extinct? In the 21st century, every company, regardless of their…

2 条评论
Data Governance Matters At Every Size and Stage

2023年4月6日

Data Governance Matters At Every Size and Stage

Without a doubt, cloud computing offers flexible and cost-effective ways to store data as compared to legacy…

See all articles

Tech Lessons for CEOs and Boards: To Build a Data Pipeline, Remove the People

Andrew Tahvildary

CTO | Tech & Strategy Advisor | Scaling Engineering Teams for Growth, IPOs & M&A | Startup Mentor/Advisor

领英推荐

Andrew Tahvildary的更多文章

社区洞察

其他会员也浏览了

June 14, 2024

Data Centers Optimization with DNA Storage

?? In-Memory Databases: Accelerating Real-Time Data Processing

End-to-End Implementation of Data Science: Real-World Use Cases in BFSI, Healthcare, and Automobile Domains

CULTURE AS A BARRIER TO DATA-DRIVEN ORGANIZATION

Harnessing the Power of Data Modernization: A C-Level Blueprint for the Future

IBM's Data Governance: Catapulting Businesses into the Digital Future

Diggibyte's Data Odyssey:

The Data Natives

领英推荐

Andrew Tahvildary的更多文章

How Managers can Learn Valuable Insights from Hackers' Mindset

Microsoft Researchers Claim GPT-4 Shows "Sparks" of Artificial General Intelligence

Serverless vs. Microservices vs. Monoliths: Why Every CEO Should Focus on Results, Not Labels

Digital Darwinism

Data Governance Matters At Every Size and Stage

社区洞察

其他会员也浏览了

June 14, 2024

Data Centers Optimization with DNA Storage

?? In-Memory Databases: Accelerating Real-Time Data Processing

End-to-End Implementation of Data Science: Real-World Use Cases in BFSI, Healthcare, and Automobile Domains

CULTURE AS A BARRIER TO DATA-DRIVEN ORGANIZATION

Harnessing the Power of Data Modernization: A C-Level Blueprint for the Future

IBM's Data Governance: Catapulting Businesses into the Digital Future

Diggibyte's Data Odyssey:

The Data Natives