Privacy-Preserving Machine Learning with Fully Homomorphic Encryption
Most LLMs (Large Language Models) today are trained primarily on publicly available data. This can limit their applicability in domains with strict data privacy requirements. While training data impacts these models’ capacity, other factors such as model architecture, training techniques, and computational resources are significant as well.
The ability to operate on private data is crucial for many use cases such as:
Unfortunately, the one-size-fits-all LLM approach may not always be suitable.
However, techniques exist to adapt LLMs to work with private data securely. These include:
Applications of Fully Homomorphic Encryption (FHE) for on-chain use cases have emerged, thanks to organizations like Zama + Inco
Now there's potential to bring these primitives to privacy-preserving machine learning (PPML) without compromising data privacy. This benefits model training on private, exclusive, proprietary data, as well as inference on encrypted weights.
What is FHE?
FHE allows:
FHE is particularly relevant for areas where data protection is crucial, including:
领英推荐
? Genomic LLMs for inference or training foundation models
? Patient-specific medical data
? Tailored customer support solutions using personal data
? Secure R&D using proprietary information and intellectual property
? Secure multi-party computation for jointly developing models without sharing individual datasets
As the demand for generative AI in sensitive domains continues to grow, the importance of these techniques cannot be overstated. By allowing computations on encrypted data, FHE enables the development of powerful ML models that can leverage private and proprietary data without compromising privacy. This opens up a wide range of possibilities for industries such as healthcare, finance, and government, where data privacy is of utmost importance. As research in this field advances, we can expect to see more widespread adoption of privacy-preserving machine learning, unlocking the full potential of AI while ensuring the protection of sensitive information.
The drawback however is that FHE is inherently compute intensive. Coupling that AI/ML could really exacerbate the needs for compute when training or fine-tuning on encrypted data. BUT, there is promise here. For example until recently, most zk proofs were too intensive for comfortably running locally ICICLE and other GPU acceleration libraries have emerged (but they're CUDA-based and today lack support for the Apple chipset) More on that here: https://blog.ezkl.xyz/post/acceleration/… Last, FHE FPGAs are also scheduled to hit the market in 2025 and there are several companies are working on it, including Cornami, Inc. , Intel, Duality, Fabric, etc.
For a much more detailed overview of privacy-preserving machine learning (PPML) check out this great post from Bagel ??
These series of posts by Daniel Huynh / Mithril Security is definitely worth digging into: https://towardsdatascience.com/homomorphic-encryption-intro-part-1-overview-and-use-cases-a601adcff06c
Also check out Zama’s ConcreteML: https://docs.zama.ai/concrete-ml
Special thanks to Remi Gai Bidhan R. Sree Duggirala Shrey J. for helping me with this post!
Passionate, strategic, business development servant leader for technology platform partnerships & GTMs.
11 个月OpenFHE has good strides to unify FHE libraries and schemes for some time. I agree HAL for acceleration via intel and new fpga and othe compute improvements needed. Prehaps the new to this world crypto, AI and old-timers(Academia, Duality, Microsoft Research…) should come together more often?
Cofounder @ Async Labs | Glean | Facebook | Uber
1 年While I'm familiar with all of the techniques listed, what's unclear to me is whether operating in an encrypted space truly hides identity - which is crucial to privacy. For example, can a chat over medical records not match the patient name when the inference is also happening on encrypted space? In my understanding FHE is great at data security while being analyzable directly by those with private key; hence the usecases in medical industry. differential-privacy's goal revolves more toward privacy (inability to identify unique feature) but comes at the cost of significant quality deterioration. ZKML is great in an adversarial environment, and fed-learning for keeping raw data on the edge while sharing learnt patterns, although in the case of LLMs weights memorize lots of raw data. Lots of nuances here.. Interesting to think more though
Machine Learning Intern | AI, Cloud Computing, Python Programming | Leveraging tech skills for solving complex problems facing mankind.
1 年The?capacity?of?Fully?Homomorphic?Encryption?(FHE)?to?facilitate?computations?on?encrypted?data?without?the?need?for?decryption,?so?preserving?data?privacy?throughout?computational?procedures?such?as?model?training?and?inference,?is?undeniably?revolutionary. The?prospective?uses?of?Fully?Homomorphic?Encryption?(FHE)?in?areas?such?as?genomic?Longitudinal?Latent?Markov?Models?(LLMs),?analysis?of?patient-specific?medical?data,?and?secure?multi-party?computation?offer?great?potential?for?advancing?these?crucial?sciences. Indeed,?as?you?correctly?highlighted,?the?computationally?demanding?characteristics?of?Fully?Homomorphic?Encryption?(FHE)?provide?a?notable?obstacle,?especially?when?combined?with?Artificial?Intelligence/Machine?Learning?(AI/ML)?activities. However,?the?development?of?GPU?acceleration?libraries?such?as?ICICLE?is?a?positive?advancement?in?addressing?these?difficulties,?however?there?are?now?restrictions?in?terms?of?support?for?the?Apple?chipset.
Principal PM Lead at Microsoft with expertise in Machine Learning and AI
1 年Great read, thanks. Never thought of encrypted weights, makes a lot of sense. It surely comes at a computational cost, but it should shrink with advalcemwnts in silicon and local compute for inference.
Strategic Solution Architect in Healthcare – Leadership, Innovation, and Sustainable Partnerships for Success
1 年Acknowledging the complexities of integrating Fully Homomorphic Encryption for LLMs. It's indeed cutting-edge but challenging. How do you see technologies like Secure Multi-Party Computation fitting with this priorities?