Why Was Compute.AI Founded
Vikram Joshi
Founder: Compute.AI, Xcalar, ioTurbine, PixBlitz Studios | Entrepreneur | Programmer | Author | Advocate: Ending Violence Against Women
Introduction
Compute.AI was founded with a twofold vision: (1) to be the compute platform for machine-generated code, and (2) to leverage AI to transform the landscape of compute. Machine-generated code is inherently complex and requires high concurrency to function effectively, especially in processes like prompt engineering, context building, and LLM-driven code generation, which are often resource-intensive and interaction-heavy.
This complexity doesn’t stem from limitations in LLM capabilities—these models are evolving quickly, supporting sophisticated, platform-specific optimizations. Rather, the challenge lies in how machine-generated code replaces human effort and interacts with the rapidly changing nature of enterprise data. Data is fluid and evolving: schemas shift in real-time, data streams like Kafka require immediate querying, and the rigid structures of the past—where highly denormalized tables served to avoid complex JOINs—are becoming obsolete. Today’s demands include filtering data from vast tables and executing complex JOINs and GROUP BY operations that were once minimized.
Legacy compute architectures are struggling to keep pace. Cloud nodes that take minutes to start, memory that scales only horizontally, and frequent OOM (Out of Memory) errors are simply insufficient for the modern enterprise landscape. CIOs are pushing for ROI on GenAI-enabled applications where LLMs, trained on vast public data, must extract actionable insights from enterprise-specific data—much of it locked in relational formats. Adding multimodal data (e.g., streaming from external platforms like Marketo, Salesforce, and numerous other sources) further compounds the challenge, as enterprises seek insights that were previously impossible.
In this context, imagine a chatbot that fields user queries and retrieves answers from enterprise data locked in databases or tabular formats (e.g., Parquet, Iceberg). Meeting these demands requires a compute infrastructure with unparalleled reliability and responsiveness, akin to the reliability of a Google search. Enterprise solutions cannot afford downtime or lag; they demand robust, high-performance compute.
High concurrency and complex SQL-based compute requires a powerful engine. Unfortunately, current open-source solutions fall short: Spark is designed for batch processing, Presto for single-session querying, and even proprietary databases struggle with true concurrency at scale. Recognizing these limitations, we assembled a world-class team to build a unique compute platform, rigorously tested by one of the world's top three blue-chip technology companies and leading Wall Street banks.
From an investment perspective, databases may seem a solved problem, but for machine-generated, high-concurrency, complex compute, we are in a new era with unprecedented demands on infrastructure—up to 1000x higher than before (to put that number into perspective, BI tools and apps like Tableau, PowerBI, others, generated more that 100x more SQL than humans). SQL, a declarative language, is fairly straightforward human readable text needs to be converted into something a machine can understand (code). The days of human-written SQL are behind us. Can Spark, Presto, Trino, or even Snowflake meet these demands for the deeper, ROI-driven applications that CIOs seek from GenAI? The answer is clear: these new requirements call for a complete reimagination of compute infrastructure, particularly for enterprises where relational and multimodal data require processing by LLM-generated code.
In this new paradigm, accuracy is non-negotiable. Responses need to be fact-based, reliable, and latency-optimized. Enterprises need perfect answers—not approximations or hallucinations—delivered without the frequent failures of legacy infrastructure. High-performance compute is no longer just about power; it’s about meeting the critical demands of the modern AI-driven enterprise.
Compute.AI—The Blah
At Compute.AI, we harness the power of AI to achieve unprecedented levels of compute efficiency and performance, continually advancing toward our greater vision. A core technical mission is to make CPU and GPU compute both abundant and infinitely scalable by unlocking the full potential of billions of underutilized processors.
Our founding team—four experts in AI/ML, databases, system software, DevOps, and cloud operations—has a strong history of collaboration, having worked together in a previous startup. From the outset, we focused on addressing the inefficiencies in processor utilization caused by legacy data processing architectures to make compute truly abundant.
Back to Tech...
Optimal utilization of CPUs and GPUs presents distinct challenges, requiring specialized analysis. This article zeroes in on CPU utilization within analytics and relational databases, a primary area where CPUs remain underutilized. In a future article, we will explore our work on GPU efficiency. While certain algorithms in relations compute (FILTER, HASH JOIN) can be accelerated on GPUs, we will not focus on them. GPUs are already dear, not suitable for general-purpose compute, and CPUs do a good job overall for even these specialized ops.
In-memory databases and elastic clusters contribute to the <30% CPU utilization (often <10% in clusters with 5+ nodes) observed in leading analytics platforms, including open-source systems like Spark and Presto, Trino, as well as proprietary cloud data warehouses operating as elastic clusters. We’ll discuss why CPU utilization drops as cluster size grows and how compute efficiency can be improved.
Ironically, in-memory analytics platforms use cloud compute infrastructure like AWS EC2 more for memory than for CPU, despite EC2 standing for 'Elastic Cloud Compute,' and not 'Elastic Cloud Memory' as implemented by most relational software. As data and working sets grow, horizontal scaling is required to meet in-memory demands, resulting in significant data movement across clusters for operations like JOINs and GROUP BYs. This data movement, or shuffling, leads to high network I/O wait times, which drastically impacts performance.
Demand paging, or 'spill-to-disk,' further compounds the inefficiency by pushing memory pages to slower NVMe-based SSDs when data exceeds available memory. Today’s architectures do not handle this efficiently, as paging algorithms lack awareness of data semantics, memory residency, and optimal scatter-gather scheduling. Traditional paging methods fail to handle modern, complex computational workloads, often leaving CPUs ~70% idle due to I/O wait times that cannot be masked by other computations. This is exacerbated by the growing core-to-memory ratio in modern processors, which stalls cores as they wait for data.
The heavy reliance on in-memory systems leads to extensive infrastructure deployment, with memory often over-provisioned to prevent Out-of-Memory errors during peak operations. Consequently, most workloads see grossly underutilized resources as memory is scaled for worst-case scenarios rather than actual needs.
The CIO Dilemma: Navigating the GenAI Landscape Without Losing Sight of ROI
Let’s talk about the CIO dilemma. After more than a decade of experiments with Hadoop—only to realize that the cost of failure was too high—CIOs are tired of science experiments that don’t deliver. The stakes are even higher with generative AI (GenAI). This time, we can’t afford a similar outcome.
GenAI is here, and it’s undeniable that foundational large language models (LLMs), trained on vast amounts of public data, hold enormous potential. But here’s the critical issue: these LLMs, built on “knowledge of the universe,” aren’t inherently equipped to provide insights specific to a company’s private data. Yet, that’s exactly what executives expect. If, as a CIO, I don’t jump on the GenAI bandwagon, I could be betting against the future—not just for myself but possibly for the entire company.
The Fine-Tuning Challenge: Is It Worth the Cost?
One of the first questions is whether to fine-tune the LLM on our proprietary data. However, fine-tuning can be extraordinarily expensive, and it’s not as simple as just feeding data into the model. Realistic AI applications are complex and require thorough core knowledge preparation: semantic knowledge identification, causal relationship mapping, and distillation of core knowledge. But here’s the catch: extracting this deep, nuanced understanding of a business isn’t something any vendor can hand over. This requires specialized skills, serious upskilling of existing teams, and a deep familiarity with our unique business context.
The question becomes, how do I get my team to uncover this vital business knowledge and translate it into something the LLM can learn from? Fine-tuning seems increasingly unfeasible when we consider the expertise and resources required.
Note: The insights above are drawn from conversations with dozens of CIOs and industry leaders. It’s worth mentioning that some exceptionally talented teams—such as Lamini.ai, which the author holds in high regard—are addressing the fine-tuning challenge at a foundational level. While this topic is beyond the scope of this article, it’s recommended reading for those deeply involved in this space.
The Rise of RAG: A Viable Alternative?
Given the challenges of fine-tuning, a CIO starts to consider Retrieval-Augmented Generation (RAG) as an alternative. Or perhaps, an advanced RAG model—where a learning module establishes an ongoing, “chatty” interaction between a new “brain” and the LLM. Over time, this dynamic could help the LLM generate code that can operate on an enterprise’s tabular + multi-modal data, delivering precise answers tailored to our needs.
But the complexity only deepens. The engineers are already overwhelmed, grappling with decisions on which vector database to use and how to integrate it with search, secondary indices, database catalogs, knowledge graphs, and ML/NLP components. Instead of tackling true business challenges, engineering managers are stuck hiring cloud operations, DevOps, and MLOps engineers to handle the infrastructure—when what is really needed are specialists focused on solving the core problems that GenAI is meant to address.
The Real Stakes: ROI and Me-the-CIO's Neck on the Line
At the end of the day, this isn’t just a tech project; it’s a business imperative. A CIO's neck is on the line, and so is the return on investment (ROI) that the board and shareholders expect from GenAI initiatives. While the promise of GenAI is transformative, the execution must be anchored in results that justify the investment.
The challenge is clear: how do we make GenAI work for us in a way that is practical, scalable, and profitable? The answers are still emerging, but one thing is certain—as a CIO, I must stay focused on building GenAI applications that deliver tangible value, rather than getting lost in the infrastructure weeds. The pressure is on, but with the right strategy, we can navigate this complex landscape and delivers the ROI that GenAI has promised.
You have reached the well overdue...TL;DR
Memory & Over-Provisioning
Subtitle: The Limitations of Spill-to-Disk and the Ongoing Challenge of Memory Management
Some open-source solutions have incorporated spill-to-disk mechanisms to avoid provisioning memory for worst-case workloads. However, these mechanisms aren’t universally applied across all relational operations. Core relational tasks like FILTER, AGGREGATE, SORT, and JOIN often require follow-up operations—PROJECT, GROUP BY, MERGE, UNION, and WINDOW. Making each of these functions run efficiently on minimal memory with reliable spill-to-disk is exceptionally challenging, so spill-to-disk support is often selective. Data skew, which adds another layer of complexity, usually demands additional data movement across the cluster, leading to network I/O delays and idle CPU time that could otherwise be put to productive use.
领英推荐
When additional memory is needed to execute a relational algorithm without spill-to-disk support, in-memory databases are prone to failure, resulting in an Out-of-Memory (OOM) error. In such cases, the software throws an exception, and the only workaround is to rerun the job with more memory. Resource management systems like Yarn and Kubernetes can help by launching new clusters with over-provisioned memory to handle worst-case scenarios. While operationally feasible, this approach introduces significant trade-offs.
If memory requirements can be estimated through trial and error or empirical methods, some OOM failures may be avoided. However, unpredictable data compressions and rarefactions within the working sets of a SQL plan can still lead to memory overloads that are challenging to predict. In these cases, the system’s Out-of-Memory (OOM) killer process steps in, sending a fatal SIGKILL signal that terminates the job. These OOM-kills, although disruptive, have become an accepted industry reality due to the lack of easy solutions. When jobs fail due to memory constraints, developers are often forced to either expand cluster resources (a process that can take ~30 minutes) or reduce data size to fit within the available memory. In production, Yarn-managed policies typically reschedule the job on a larger cluster.
In summary, these “solutions” come with high costs: reduced developer productivity, unreliable production workloads, and infrastructure overprovisioning to handle worst-case memory demands. This results in significant overhead, turning infrastructure into a financial burden for CIOs, CDOs, and CFOs, transforming what should be an asset into a costly liability.
The Cost of Compute
Over-provisioning memory for worst-case scenarios means buying more cloud infrastructure than is typically needed, just to avoid Out-of-Memory (OOM) failures in production workloads. This often translates to purchasing additional memory bundled with extra CPU cores—resources that are then underutilized as more nodes are added, reducing overall CPU efficiency. This inefficiency drives up the 'cost of compute,' and software licensing costs from database vendors only add to the expense.
By examining the cost structure of cloud compute, it becomes clear that memory is the most expensive component (even though our focus is on CPUs in this article, HBM is generally more expensive than other components in a GPU, including the die itself). Compute.AI tackles this "memory" challenge by addressing analytics software's heavy reliance on memory, a primary cause of both failure (from OOM errors) and underutilized CPUs.
Compute.AI’s advanced algorithms leverage AI and ML for intelligent memory management, core scheduling, and data movement across memory tiers, optimizing CPU usage. Our neural network characterizes compute infrastructure—across memory, network, and cores—using both supervised and unsupervised learning to deliver highly efficient compute. We believe this technology redefines the landscape for the largest open analytics platforms, achieving unparalleled efficiency and performance through the power of AI.
The CPU-Memory Paradox
CPU and memory are two sides of the same coin—counterintuitive as it may seem. When a core stalls waiting for data from memory, no work is actually getting done. Unlike idle CPU or an I/O wait, a memory stall appears as busy CPU time. So, even the 10-30% CPU utilization seen in Distributed Shared Memory (DSM) systems often includes hidden memory stalls. This is a serious inefficiency! Eliminating these stalls can unlock highly efficient compute, especially as data volumes and job complexity increase.
Put simply, complex SQL workloads on large datasets perform best with a vertically tiered memory hierarchy on a single node. For smaller, simpler tasks, current architectures work well. But as data sizes and complexity grow, clusters reach their limits, particularly with added concurrency. Compute.AI’s solution, combining vertically integrated nodes with elastic clusters, scales better for complex workloads than typical distributed clusters. With NVMe SSDs offering higher bandwidth than network transfers, Compute.AI achieves impressive performance in throughput-heavy analytics by distributing at the SSD tier, rather than the DSM level—resulting in major cost savings regardless of workload. Overall, Compute.AI delivers superior price/performance and often higher absolute performance without compromises that include no sacrificing of performance!
Now, back to memory stalls. While the causes of memory stalls are numerous (worthy of a separate article), data movement during memory thrashing is a key culprit. Ideally, data should reach processor cores without causing them to idle, eliminating CPU starvation—essentially a 'busy wait' for data.
Compute.AI minimizes dependence on memory and network resources, collapsing cluster-sized workloads into a single node while improving performance for complex, memory-intensive tasks. By achieving >300x memory overcommitment to cooler SSDs, it significantly reduces power consumption, maximizing both efficiency and performance.
Our Approach To Changing the World
Compute.AI uses ANSI SQL. Its technology is set to make a transformative impact on the relational compute ecosystems by making compute both abundant and efficient. Without such improvements, today’s analytics tools, now driven by GenAI, will struggle to meet the rising demands of complexity and concurrency driven by the integration of AI and analytics—think AI-driven BI, semantic layer creation, high-concurrency complex SQL, and low/no-code applications. Our platform thrives at this intersection of relational compute and AI/ML.
As massive amounts of auto-generated SQL come into play, computation needs become more pressing than in typical BI platforms, where caching can mask poor concurrency in cloud data warehouses. For instance, returning stale data from a cache to a Tableau dashboard is often acceptable. But with the new era of AI-generated SQL, use cases now demand both high SQL complexity and concurrency. To meet this, relational compute must be as abundant as oxygen—charging premium prices for compute, as some cloud warehouses do, is like charging for every breath. This is why we advocate for open standards, no walled gardens, and accessible compute.
Compute.AI delivers a ~50x improvement in price/performance for complex, high-concurrency SQL and supports the widely adopted Spark and Presto, Trino SQL dialects to target a broad range of use cases.
Leveraging Memory Tiers
Subtitle: Compute.AI and the Future of High-Efficiency Compute"
Leveraging memory tiers, aka the memory hierarchy, has long been a goal at Compute.AI to maximize value for our customers. Our AI-driven technology has shown that high-complexity, 5–8 node cluster workloads can often be collapsed into a single node by managing memory hierarchy in a highly sophisticated, fine-grained way. Unlike traditional in-memory architectures that rely on compute nodes primarily for memory, Compute.AI’s architecture leverages compute tiers (e.g., AWS EC2) for true computational power. We've demonstrated that running large-scale, complex workloads on clusters with Distributed Shared Memory (DSM) is far less efficient than Compute.AI’s CPU-optimized, memory-dense algorithms—backed by an innovative Distributed Shared SSD architecture. Our approach eliminates the “network tax” on node-to-node data transfers, instead facilitating sharing at the memory tier (such as through SSD-based file systems like AWS EFS).
Compute.AI harnesses commodity hardware to deliver non-linear gains in performance, efficiency, and reliability, leveraging SSDs for both caching (virtualizing DRAM) and persistence (through distributed shared tables and warehouses across an elastic cluster). This aligns perfectly with the advancements in memory tiering that are now commercially viable.
Our elastic Compute.AI cluster brings tremendous value to open data ecosystems enabling unlimited concurrency for complex compute with superior performance, reliability, and efficiency. This reduces cloud infrastructure costs by 2–5x, and greater as workloads complexity increases. Simply download the container, develop with datasets of any size, and run unlimited concurrent jobs on terabyte-scale data—no management or tuning required. Compute.AI is serverlessly and operates autonomously, like a well-oiled OS.
If you made it this far, you deserve more than just a TL;DR—you earned the icing on the cake! Here’s to smarter compute, less waste, and maybe even a slice of ROI.
CPU compute might not be 'sexy' anymore—just take a look at the stock of companies making processors. But can GPUs alone carry the load? As appealing as the world of GPUs and tensors may be, real-world applications require a balanced ecosystem: CPUs, GPUs, vast memory resources, memory tiering across CPU-GPU boundaries, and overcoming vendor lock-ins that often stem from CPU-GPU competition. This rivalry can hinder enterprise progress and impact productivity.
The way forward? Let’s build a future that prioritizes technology and innovation, transcending short-term business gains to deliver systems that genuinely empower enterprises.
GTM Strategy & Sales Operations
4 个月Great read
CEO at OnHires | Tech recruitment for future unicorns ??
6 个月Tiffany, just dropped you a message! :)
Customer-Focused Analytics | Fearless Problem Solver | Transformation and Growth Strategy | Delivering Data-Driven Insights and Strategy
1 年Nice article Vikram. processing huge volumes of data while multiple process competing for resources is a major challenge DE's and ML Engineers face. Look forward to more knowledge share.
DGX Cloud at Nvidia
1 年Nice write-up, best of luck Vikram Joshi, remember: Frugality is a proxy for Sustainability!
Principal Architect (Software & Security ) |Management | Leader CTO Innovation Office | Innovator
1 年A wonderful foundational idea on which company is built..!! Truly amazing. Congratulations Vikram Joshi and Compute.AI team ??