Scaleout Systems: Shaping the Future of Federated Learning
Benjamin Wolba
eurodefense.tech |?Fostering Defense Innovation for European Sovereignty | Blogging at future-of-computing.com
Training machine learning models demands significant computational resources, so it’s typically done in large data centers where data is aggregated centrally.
However, in many real-world scenarios, data is generated in a decentralized manner, such as through sensor networks, or the data is too sensitive to be shared or stored centrally—a new approach to model training is needed to process this data.
Scaleout Systems was founded in 2018 by Andreas Hellander , Salman Toor , Daniel Zakrisson , Ebba Kraemer , Jens Frid , Morgan Ekmefjord , and Ola Spjuth . The startup enables data scientists to adopt federated learning quickly and smoothly to train models across decentralized data sources while maintaining high standards for data privacy and security. In late 2023, Scaleout Systems raised $1.5M from Navigare Ventures , Almi Invest , and Uppsala University Invest .
Learn more about the future of federated learning from our interview with the co-founder and CEO, Andreas Hellander:
Why Did You Start Scaleout Systems?
I’m an academic by training and have pursued a long career at university, running a research group with one of my now co-founders for ten years dedicated to researching large-scale distributed compute infrastructure and leveraging it best for scientific computing.?
At the same time, we were looking into managing cloud computing workloads for systems biology and bioinformatics applications using federated learning. The more we dug into it, the more we fell in love with the method rather than just its applications.?
It seemed like the perfect combination of machine learning and distributed systems orchestration. We realized that this problem, how to juggle machine learning workloads across multiple compute nodes, was far too interesting and impactful to be pursued solely as an academic endeavor.?
It’s at the intersection of two big trends: one is edge computing, where compute is shifting from data centers and centralized clouds to edge devices like your smartphone or a chip placed next to a sensor at the edge of a network. The other one is machine learning, where, in 2018, if you had talked to the right people and followed the early breakthroughs, you could see that this field would explode.
While there were practical questions about how to best orchestrate machine learning across distributed data sources, there were also concerns about data privacy and protecting sensitive data.?
We were sure that nothing would stop AI’s growth, certainly not data privacy concerns, so we’d better find technological solutions to keep data private before serious data privacy and security problems could emerge.?
In 2017, Salman Toor, then academic collaborator and now Scaleout’s CTO, and I longed for an easier way to pursue applied R&D and direct collaboration with the industry. We started Scaleout originally to have a vehicle to facilitate more rapid and more applied collaboration with enterprises—collaboration processes at a top-tier university can be slow, and they rightfully focus on basic research.?
Later that year, I had a coffee to catch up with my old friend Daniel, who had pursued a career in the startup world. We ended up talking about federated learning and how it will become a foundation layer in edge AI. Half an hour later, we joined forces, and with our other co-founders, we started to work on Scaleout in its current form.?
How Does Federated Learning Work?
Federated learning is conceptually simple: if the old way of doing machine learning was to bring data to compute, i.e., pool vast amounts of data in a data center to train machine learning models, instead, we coordinate the training directly at the edge.
We extract parameters from a neural network, send them to our edge devices where the training happens, update the parameters, and then send them back to aggregate them into a global machine learning model. Then, rinse and repeat until we have a model that’s as accomplished as one trained in a centralized way. It is still early, but we think this will be how machine learning will be done on edge in the future.?
We can train models from scratch and update them in real-time, and we support various forms of machine learning. In particular, we support representation learning, which focuses on automatically discovering useful features from raw data and can process large volumes of data at the edge without requiring labeling, in contrast to supervised learning.
For example, if you train a vision or perception model for autonomous vehicles using representation learning, you can get a base model that is robust to various scenarios, like different weather and geographies.?
领英推荐
What Are the Challenges of Federated Learning?
Of course, it’s complex to train models in a decentralized way. We have built an entire framework to scale the training with the number of edge devices, leveraging our know-how and insights from our research on distributed compute systems to make federated learning resilient, well-distributed, and robust to attacks.?
In a data center, you have control over your bandwidth and load balancing, so you can optimize for parallel computation. Since we don’t have control over how the data is distributed across edge devices, we don’t get that luxury in federated learning and need to adjust our machine learning accordingly.
One of the main challenges at the edge is networking—you’re typically constrained by weak internet connections, so it’s hard to transfer large amounts of parameters. Training large models, like a large language model, at the edge is still very hard, while training smaller models (up to a couple of GBs or so) is easier.
How Does Federated Learning Compare to Other Privacy-Enhancing Technologies?
There are many so-called privacy-enhancing technologies used to protect sensitive data. Federated learning can be one of them, particularly in addressing input privacy, i.e., protecting the training data that goes into training a model as it stays with the data owner. It does not address output privacy, i.e., what can be learned about the original training data from the trained model.?
We see federated learning as a foundational technology layer that enables training without moving data—in that sense, it’s also complementary to what other privacy-enhancing technologies offer. It accomplishes data protection with comparatively low complexity and is easier to put into production than, say, multi-party computation.?
As the transition to more distributed compute infrastructure continues, we think the potential of federated learning can be much bigger than enabling privacy-preserving machine learning. In five years, we won’t be talking about federated learning anymore. It will be the standard way to do machine learning at the edge, and the opportunity is for us to become one of the pivotal solution providers for this new machine learning era.?
What’s the Opportunity in Federated Learning?
We bootstrapped Scaleout for five years before raising venture capital, which gave us plenty of time to learn about the market. Enterprise customers feel the data ownership problem in reality, so they’re aware of the problem already, and we can start the discussion around how we could help them solve the problem.
We have a freemium model, a free version for personal and research use and small projects, and a paid version for enterprise clients. As we strongly support open source, the federated learning core and all the client-side APIs are Apache 2 licensed.?
We did several focused pilot projects to learn how enterprises would integrate our technology. Also, we only do paid pilots, as it’s important to prove that we’re developing a valuable product and not just an academic project. We measure progress not by GitHub stars but by the number and quality of enterprise pilots and revenue.?
What Advice Would You Give Fellow Deep Tech Founders?
Founding a deep tech startup is not something you do for three years. You need to have a vision and commit to it for a long time, as it takes time to develop deeply technical solutions and convince enterprises that they will work and that they should buy them.
You need to build resilience and patience. Yet, you can’t be fully prepared for the journey in advance because you’d never get started. And it’s important to get started, as you’ll learn a lot once you commit to a long-term goal and do something great.
Find ways to rely not just on venture capital. As you’re developing deep tech, you also need to find soft money such as grants. We had some prior experience with this coming from academia. At the same time, having a clearly defined RnD roadmap of how you will develop the technology over a long time to develop products and eventually address real-world use cases is crucial.
Otherwise, you risk doing fragmented, random RnD projects that won’t contribute to your roadmap. Align soft funding with your vision—don’t get distracted by grants to pursue random side projects; use them to drive your RnD roadmap forward.
Federated learning is a game-changer for data privacy. How do you see it evolving in the next few years?
Federated Learning | Co-founder, COO, and Head of Defence at Scaleout
3 周Thanks for highlighting all the up and coming great defense tech and dual use companies in Europe Benjamin Wolba