Can you control Generative AI?
Duncan Robson
Enterprise Architect | Data and AI Innovator | Retail Expert | University Lecturer | Distinguished Architect with the Open Group | MSc FBCS CITP
In 2005 I gave a presentation to a room full of retailers where I made a prediction: in the future computers would learn rather than be programmed, and you would choose to buy a system based on it’s experience and knowledge, rather than it’s technical specifications.
I also posed this question: If in the future you were using AI in your retail business, how would you ensure it stayed on brand and didn’t recommend your competitors’ products over yours, even if they were better?
Fast forward 20 years, and we are now seeing that playing out. There has a lot of discussion about DeepSeek, both its speed and cost, but also about the controls that have been put in place to manage the information it provides.
I am not going to discuss why these controls exist, however I am interested in how they have been implemented, and the implications on controlling Generative AI solutions in the future.
To investigate this I locally installed the DeepSeek-r1-distill-llana-8b model on my laptop to try it out. This isn’t the full R1 model from DeepSeek, but it exhibits similar behaviour.
To start with I asked it ‘What happened at Tiananmen Square'
Ok, that is consistent with what other folk have found, and much better that just ‘I can’t give you this information'
One of the common prompt engineering methods to get more information is to be specific, so let’s try that - ‘What happened at Tiananmen Square in 1989’
Here is its response;
Where this starts to get really interesting is why has the model now chosen to give me more information based on that prompt? Wasn’t it given rules that this was a sensitive topic?
One of the features of DeepSeek in the version I am using is it allows you to see it's ’Thoughts’ - it’s internal monologue. Here it is for the same question.
领英推荐
This fascinates me. Clearly the model knows much more, but is ‘choosing’ how much to tell me based on the prompt I provide. This raises a number of questions;
1. Why didn’t the creators of DeepSeek simply remove the restricted data from the model?
Is this a limitation of the source datasets used to train the model, or it is too complex or resource intensive to selectively remove specific facts? If so what challenges does this pose for curating datasets in future AI systems?
2. Would training your Large Language Model (LLM) using other LLMs transfer their content rules and ethical standards to yours along with their data?
If the model is making the decisions on what it can and cannot say, where are these rules stored, and how are they managed?
3. How effective are the controls if the model can reveal restricted information through alternative phrasing?
Numerous reports suggest that by ‘reasoning’ with the model or rephrasing questions users can bypass restrictions, so how secure are these controls? What are the broader implications for LLMs containing sensitive data, especially in environments where compliance and confidentiality are key?
4. What is the potential impact on people when the model ‘decides’ whether you can access information?
In the legal profession or other regulated industries, how would this 'filtering information by AI' influence decision making, fairness, and trust? Who gets to define the rules, and how transparent are these decisions to users?
5. What happens if bad actors inject harmful information in to your LLMs?
How are you going to control the information produced and your LLM's behaviour if it has ‘learned’ from bad data? Will data security specialists now also need to be behavioural science specialists? Will you have to wrap your GenAI solution with a traditional rule based solution to be safe?
6. It’s also about people - Who do you trust?
Many LLMs, including I assume DeepSeek initially, use Reinforcement Learning from Human Feedback (RLHF) to fine tune the model, and also to build in behaviours based on what they view as acceptable. So when choosing which LLM to use, it's also important to know who tuned the model, and to decide if you trust them.
So more questions than answers, but this is something we need to understand when we are practically implementing Generative AI solutions in our businesses.
What are your thoughts?, and if you would like to follow me on LinkedIn please click here.
Enterprise Architect | Data and AI Innovator | Retail Expert | University Lecturer | Distinguished Architect with the Open Group | MSc FBCS CITP
3 周Building on these thoughts, I’ve just written another article regarding trust and transparency for LLMs : https://www.dhirubhai.net/pulse/llm-reality-check-which-one-use-duncan-robson-he6ke/
Technology Director | Sustainable Innovation & Architecture | Speaker, Podcaster and Facilitator | MBCS CITP
1 个月1. Why didn’t the creators of DeepSeek simply remove the restricted data from the model? Surely if that was removed then it could be prone to [semi] hallucinating answers that might not be aligned? 2. This is exactly why we don't think LLMs shouldn't be at the centre of your architecture - something deterministic should orchestrate multiple LLMs to evaluate and keep them "honest" / balanced (more on this coming as we've just completed some research using our InferGPT framework developed with help from Chris Booth 3. Not sure prompt injection has been truly solved yet has it? https://blog.scottlogic.com/2024/07/08/beyond-the-hype-will-we-ever-be-able-to-secure-genai.html 4. Great question - I still don't think the legal implications have been fully thought through. One for a possible discussion with Chris Williams or his colleagues at Clyde 5. This is why I came up with the GenAI conceptual architecture in 2023 - you need filtering, audit logging, evaluation and escalation to a human process all the way through. https://blog.scottlogic.com/2023/05/04/generative-ai-solution-architecture.html 6. Agree! Again why I think you need a basket / zoo of models (as per the architecture above)
Enterprise Architect | Data and AI Innovator | Retail Expert | University Lecturer | Distinguished Architect with the Open Group | MSc FBCS CITP
1 个月I suspect that some of censorship is due to the initial code used to train the model when it was built. This project should be really interesting as it would potentially keep all of the cool new elements of the tech, but remove the control aspects : https://huggingface.co/blog/open-r1