Utilizing Federated Machine Learning and Privacy-Enhancing Technologies in Multi-Cloud Environments
Graphic by CPO Magazine

Utilizing Federated Machine Learning and Privacy-Enhancing Technologies in Multi-Cloud Environments

Over the past several years, machine learning(ML) and artificial intelligence(AI) have rapidly evolved and are leveraged across many cloud services and applications. As a result, massive amounts of data are collected, processed, and used to train machine learning models. However, a significant challenge with machine learning is data privacy concerns. Customers want to benefit from leveraging their data to gain business insights and power personalized experiences, but they also want to maintain privacy and control over their data.

Federated Learning is a technique that trains an aggregated global model using decentralized local data, without directly sharing data samples. Local data remains local, and only model updates are shared between participants. This approach allows organizations to build machine learning models without compromising data privacy.

Some public cloud providers have started to offer federated learning capabilities within their platforms. For example, Google recently announced Federated Learning of Cohorts (FLoC) in Chrome, a new approach to interest-based advertising that doesn't track individuals across sites. With FLoC, local machine learning models run on-device to analyze browsing history and assign users to groups of people with similar interests. However, no personal information is shared with other sites or Google.

Snowflake, a popular cloud data warehouse, provides native federated learning capabilities. Data scientists can build and train machine learning models on encrypted data using Snowflake's Secure Data Sharing (SDS) and Federated Learning features. The global model is aggregated across participants without sharing the underlying data. BigQuery, Google's enterprise data warehouse, supports federated learning through its BigQuery ML product. Data scientists can build models on encrypted data and share model updates between participants to train a single aggregated model.

Some companies are also leveraging foundation and custom large language models (LLMs) trained on huge datasets for various NLP tasks while addressing privacy concerns. Anthropic, an AI safety startup based in San Francisco, developed a technique called Constitutional AI to generate foundation models that are provably helpful, harmless, and honest. The models are trained on general knowledge through a process called Constitutional AI Alignment, which produces models without questionable or dangerous behaviors on sensitive data. Companies can then fine-tune these models on their private data using federated learning to gain customized models for their needs.

ML federated learning and Privacy-Enhancing Technologies(PET) provides a path forward for organizations to leverage data and machine learning while maintaining privacy and control. As cloud providers continue to build more advanced and robust federated learning capabilities into their platforms, it will open up more opportunities for companies to utilize data and AI in a responsible and ethical way.

Shivangi Singh

Operations Manager in a Real Estate Organization

5 个月

Great article. Currently, researchers and practitioners are pursuing the following four methodologies for addressing limitations in AI systems: Active Learning, Transfer Learning, Federated Learning, and Meta Learning. Active Learning optimizes labeled data usage by iteratively selecting data subsets for labeling based on model confidence. Transfer Learning repurposes knowledge from one task to another, enhancing efficiency. Federated Learning decentralizes model training across devices with private data, addressing privacy concerns. Meta Learning aims to reduce training time and costs by teaching AI systems to learn from diverse data, allowing adaptation to various tasks. However, Meta Learning faces challenges due to the brittle nature of complex AI systems and their sensitivity to noise, requiring extensive data for effective implementation. These methodologies offer potential benefits, such as cost savings, improved model accuracy, and privacy preservation, but each comes with its own set of challenges and considerations. More about this topic: https://lnkd.in/gPjFMgy7

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了