Chatting with GP & 3rd party assurance response

Mark Osborne

CISO, Mentor, Security Advisor, Issue Resolver and Foundational advocate for the Cyber-security Industry

发布日期: 2024年12月4日

Summary – Creating a free to use working OpenAI application and RAG (i.e. Working code not just waffle) which can answer 3rd party vendor assurance questionnaires intelligently

UPDATE: ACTUAL OPEN SOURCE PRODUCTS NOW AVAILABLE ON PACKETSTORM site -

Product Name : questionnaire_mate

Introduction

6+ years ago when the FUD on AL & ML had just started but before it hit the current crescendo where everything is AI – like it is now: AI at every conference, on Linked-in, as new form of data breach and well everywhere, I brushed up my math knowledge that I learnt in my computer science degree and wrote a book on AI&ML with real world programming examples – with the help of an Oxford maths grad.

As part of the book, I decided to include a chapter on a project where I used an AWS Lambda & a AWS Lex to answer the hugely tedious questionnaires that most security groups get, either as part of Bids-And-Proposals, or RFQs or 3rd party assurance.

Previously, I had tried approaching ISACA leadership to see if they could pioneer a standard XML questionnaire which could be answered with traditional computer processing. The reasoning being most of these forms seem to be largely ISO27001 driven perhaps Nist CSF if they come from stateside. Could they not all be the same like X509 certs with custom extensions where needed!! Doh! That didn’t go well at all. I was sure I used to have influence – honest guv. Perhaps not.

SO!!!I had a real incentive to invest personal time: a) it was an interesting project using AI & Cloud so good for the old CV, b) It would help me finish the book which had a deadline and c) it would save me from the bloody questionnaires!

If you search linked-in somewhere and you'll find the article I wrote on the project - to save you buying the book. Turns out that more people read the book than the article; Read it if you want to know more, but I will summarise the project here.

As you all know now, Alexa is a superb natural language processor, and it takes that from the AWS LEX facility. Compared to the program libraries I have used, it is both effective and simple; what messing you have to do with intents and tokens is minimal. At the time there was a community effort written as an example of question answering program (Q&Abot – I believe it is much better now), which I figured I could improve on and tilt it towards the context of what I was trying to achieve. The output certainly could answer a simple questionnaire. But was it successful?

Hell No!. My intent (see what I did there – it is an NLP buzword) was to get a proficiently working prototype, prove its worth with the team and push it in front of executive management to productionize. I should have known better:

Reason1: the executive management were hugely invested in building-out an off-shore service hub [at least 10 years behind the trend when it was profitable to do so, at least in that location], so there was no interest in staff reducing, cost reducing automation.

Reason 2: Whilst I liked my team, I was probable more entrenched in my passion for security than they were. After specifically requesting they give it a go but treat the tool gently, did they do that? Of course, not. Instead of asking it representative questions from the knowledge base, I got:

Am i a carrot?"
“Can I use a BatMan magic decoder ring to encrypt PCI data?".
What time is it?
Am I a Kumquat?
Is 7 a colour?
Who won the war?
Is my name Mr Carrot?

Hilarious!!! Such Fun!!!!!Then they started talking to it about cricket!! I gave up……… Well at least for a while.

Having a Chat with your GP

The only way to speak to my General Practioner is with a Ouija board!! Not far from the truth but joking apart.

Jumping in the time-tunnel going back to the future to now-ish, I became aware of a couple of GRC products with the same capability – the ability to answer security questionnaires. And by aware, I a mean shouted at by a salesman, of course. Obviously, because I don’t have a ponytail and I don’t fit into the bracket of being a “next to no experience, but a great personality” wannabe (quite happy with the “has-been” label), the salesman was very dismissive when I asked questions out of genuine interest.

I am a loveable fellow that has reached the age where you let “put downs” and passive aggressive Sales/Experts/CISOs who lets jibes from “developed an interest in security at school and now, 4 years later I am a renowned expert!” types float past me. But after a I while I commented: “Naah! I can knock-out something better than that over a couple of weekends?”.

Given that I figured I would resurrect the project as public domain, my plan was to give it just enough functionality so that it makes the “questionnaire completion” function an unnecessary expense, then tell the world. Burger flipping holds great promise for our pushy salesman friend.

But at no point did I lose my RAG? [ for our friends in the USA “losing one’s RAG” is cockney for throwing a strop, getting a monk on, or………. simply losing your temp].

RAG and Bone

The method of getting a public chat-bot infrastructure to analysis and respond using a closed data sources is called RAG. It is a form of supervised learning.

Retrieval Augmented Generation (RAG) is an architectural name for a system that augments the capabilities of a Large Language Model (LLM) like ChatGPT by adding a private information retrieval system that provides grounding data. Adding an information retrieval system gives you control over grounding data used by an LLM when it formulates a response. For an enterprise solution, RAG architecture means that you can constrain generative AI to your enterprise content sourced from vectorized documents and images, and other data formats.

Vendors claim that the augmentation data is never used to train the larger public model.Even if it did, the data we are using is not extremely confidential.

OpenAI & ChatGP

I have barely used either. Right up front I have to say the results have been great, but ultimately worrying. It was like the early days of Internet security where you would download a tool that had rave reviews only to find that not only does it not work but it never has worked. That’s why I ended up writing Widz, arguable one of the first (the 1st) wireless IDS. I needed to track some very low-level L2- packets that a particular claimed to be able to do. I was going to conferences and people were showing the front basic screens of this tool and making claims of what and who they hacked into. So I install the code and drivers only to see Nada nought zero. Inspecting the driver, I saw that case statement that contained the special magic packet processing section, only to see it returned nothing when it hit the juicy motherload. To this day I have never named and shamed the person involved – I will take it with me to the big penetration test in the sky!!!

OpenAI API

It seems like the development of the product is so rapid that the reference code and manuals are out of date before anyone looks at them. Let me explain:

The first example I tried to use had a “Answer” method: open.answers.create() – this was not ancient code – probably a year and half - published on a OpenAI community website. It didn’t work because that method had been deprecated. A drama I was going to get very familiar with.

I dug in for a longer haul. I sparked up a new, clean AWS EC2 machine. I then tried the Open.Completion.Create() method. That too, revealed that I was using a deprecated version of the method/library and I should review a migration guide focusing on the move from V1 OpenAI api to V2. I clicked on the URL and got a http 404. This isn’t going well.

A bit of research suggested that I needed to down grade my Python module and install an older module. SO:

[Fatbloke]# pip uinstall openai

[Fatbloke]# pip install openai==0.28

Result !! Magic was achieved by this little code fragment (yes there is more to it but this is the crux):

response = openai.Completion.create(
        engine="gpt-3.5-turbo-instruct",
        prompt=f"Answer the following question based on these   
texts:\n\n{text}\n\nQuestion:    {question} \n ",
        temperature=0.5,
        max_tokens=100,
        top_p=1,
        frequency_penalty=0,
        presence_penalty=0,
    )

To me , the results were spectacular!! Using a real-world Ms-Excel completed questionnaire from the original LEX project converted to CSV, the results could be described as about 80%+ accurate.

But this is using a soon to be retired API call!!!!. Plus with the prompt including:

The question->where {Question} [ is a string containing a question]
The reference knowledge base -> “texts:\n\n{text}” [where text is an array object listing the entire knowledge base]
The intent and persona->“Answer the following question in role of an engineer and intent of resolving a problem”

All within the one format string made it very crowded. It had to impact the performance, and I could foresee future capacity problems.

Migrate to API v2

So to make it more “military grade” we need to use the new API and utilize the features known as “Vector Storage” and the “Assistant” processing structure

What is an Assistant

An Assistant is a component of OpenAi which holds intents, context and persona in the form of instructions(also known as prompts). It can leverage these plus a number of llm-models, tools, and knowledge to respond to user queries.

What is a Vector Store

Knowledge is stored in files within vector stores. Once a file is added to a vector store, it’s automatically parsed, chunked, and embedded, made ready to be searched. Vector stores can be used across assistants and threads.

That’s great, but the coding confusion continued – I went onto the OpenAi – Playground (an interactive try-out sandbox) and there is a big warning banner about using V2 not V1 API. I asked ChatGpt itself, it was adamant that was not the case.

Eventually, I decided to attach the reference file to a thread. But the documentation was wrong. After a while I found someone on a news group who was having the same trouble, and together we figured it out. The correct format was

	attachments=[{"file_id": my_file.id , "tools": [{"type": "file_search"}]}]

Yup far from simple – but easy if you can find correct accurate documentation.

But the results were good, really, really worth waiting for.

I knew the ideal place to attach the vector store was the Assistant, rather than Thread. And again, finding how to do this was a challenge. Eventually I did a course on Youtube. It was recent (less than 6 months old) but showed the wrong format/superceded format working perfect – The screen shot below:

shows the listing the tool-type as “retrieval” and
a much simpler format for specifying the data source.i.e. File_IDS=

Whilst the real format is :

    tools=[{"type": "file_search"}],
    tool_resources={ "file_search": { "vector_stores": [{ "file_ids": [ my_file.id] }] } } )

But the results were worth waiting for the thing really works well

How it works

The program is written in Python3 using the OpenAI v2 API. When provided a list of questions from a supplier questionnaire and a Golden Questionnaire with questions+answers pairs in a CSV, the program creates the necessary OpenAI components. It then sends the questions to a process thread to answer them based on the private knowledge base.

These are the steps:

1 - Upload a knowledge base

2 - Create an assistant object

3 - Create an Execution Thread

4 – For each question in the questionnaire, loop round the following steps [4a-d]

4a - Send a Message to the Thread with the next question in it

4b - Run the Assistant request on the current question

4c - Loop round waiting for the request to complete

4d - Print the output

5 - Clean up the vector store and files.

One of the beta versions of my code is shown at the end of the article (why one of the versions? Read on) - It works!!! just cut and paste it into your favorite environment and grab an OpenAI API key

So in short:

Obtain an OpenAi api key and assign it to the environment variable 'OPENAI_API_KEY'.
Make a knowledge base file – this has to be some sort of text file but OpenAPI is tolerant of various variations. Context free answer text can work but a completed questionnaire in a coma separated format works great:

Example Answer only format

Example Answer only format
Answer: Yes we have a security policy
Answer: Yes we have an acceptable use policy
Answer: We use cisco asa firewalls
Answer: We have never had a reportable data breach
Answer: the malware products used include AMP and Windows defender
……

Example CSV question and answer format

Quest,Answer
'Do you conduct annually assessments to identify information security risks','Yes  a key threat focus in the last three year strategy on  Email Security, Malware and Data Regulation.'
'Do you formally accept risks', 'yes'
'Do you have an individual  responsible for security (e.g. Chief Information Security Officer)', 'The CISO '
'Do you have cybersecurity insurance','Yes - we have appropriate Cyber Insurance'
………….

Make a list of questions in questionnaire file – this has to be some sort of text file - with or without the Question: prefix

Example Question format

Question: do you have a security policy? 
Question: do you have an acceptable use policy? 
Question: What type of firewall do you use? 
Question: Have you ever had a reportable data breach?  Question: what anti-virus or  malware protection products do you use?

Then at the Unix command line type

aws ec2$ questionnaire_mate.py

After that provide the names of the files at the prompts

Issues with the system

When I completed the proto-type – I built a release Beta to deposit with our friends at packetstorm (at www.packetstormsecurity.com) . As an aside they have been my “goto” source for opensource security software and repository for my own tools for the last 22 years (they tell me ) . The art of opensource security tools is on the decline, yet many of our most valuable tools come from this route – these guys have been holding back the tide. Support them and their new site which well be launched in the new year. I will be depositing a V1 there in the new year too – just check https://packetstormsecurity.com/files/author/2528/y.com/files/author/2528/

Back on mission -> The tool worked really, really well.

I was using real but historic completed questionnaires not from my current company. I could just feed it in as a knowledge base, with little or no modification. I was finding that the system would answer 100% of questions within 2 attempts on a questionnaire. Most single runs worked with 90+% successful completion. Fantastic – even better only one or two, if any, answers would be in error(often not the same in consecutive runs). If there was any other notable fault, sometimes the program became over enthusiastic and provided information that was only vaguely on target – a bit like me, overly chatty.

So I froze the reference code and started to tidy up comments etc. A good habit I learnt years ago. During thanksgiving, I ran my V 0.2 code on the same test set, using the same model and the same vector store – the results were terrible. Less than 10% success on the above criteria. I ran the frozen code on the same test set – I get the same result, just a word salad. So I made modification to a third prototype – different but bad results.

The reasons appear to be:

Performance – the quality of the results plummeted before thanksgiving and improved after – suggesting capacity and bandwidth issues. The course on Youtube I took in desperation referred to this performance issue. Their suggestion was “Come back later”.
Lack of Stability – As mentioned, development on enhancements is rapid at the cost of accelerated obsolescence – stuff evolves before you can exit the editor.
Leakage – despite Prompt instruction not to retrieve answer from the global data source, it refused to say “I DON’T KNOW”. This is supposed to be removable by “careful prompt engineering” but I tried 2 dozen variations of “if you don’t know, say so” and it never works.

So lets dwell on the errors:

Amnesia: forgetting about Hardened Builds

The questionnaire contained a typical question:

Q: Do you have a secure laptop OS build?

And the knowledge base contained something like:

Q: Do you have a secure laptop  OS build?
Ans: Yes we have a secure laptop os build. Unnecessary accounts are removed, unnecessary service stopped, and Data Execution Prevention (DEP/ Address Space Layout Randomization (ASLR) enabled …

The initial prototype answered correctly and then just stopped with no changes to code or data. Eventual I cheated and changed the question.

Race condition

Another problem that emerged during times of high ChatGPT usage specifically at “thanks giving” was the answer of a previous failed-question appeared as the first item on the next questions answer stack i.e

Q: What antivirus products do you use?

This failed. The next question received the following answer:

Q: What firewalls do you use:
Ans: We use Sophos antivirus to protect against malware. 
        We use Cisco ASA 5580s and Firewall1 firewalls.

Obvious there is a race condition in the thread polling call so it places answers to two questions in the latest message buffer – 1 for the current question and 1 for the previous one that “failed”. A number of clumsy sleep() calls has reduced but not eradicated the problem

Moist Leakage

A simple case of data leakage or data pollution. The engine has been given strict instructions not to provide any answers not contained in the knowledge base. However, as a control, the questionnaire always had the last question :

Q: What is the capital of Brazil ?

The answer always appears on every version of the system as follows

Q: What is the capital of Brazil
Ans: The capital of Brazil is Brasília.

My golden questionnaire doesn’t contain this “pub quiz” style data. At times of stress the systems has also answered the firewall question as follows

Ans: Hi I am an AI assistant , so no I don’t need firewalls

Hmmmmmm! Doesn’t fill me with confidence

Tired and bored

Despite the clumsy throttling – large questionnaires suffer a higher failure rate. As discussed below, a simple question may fail in a 100 question questionnaire but work when submitted in a group of a dozen questions. I am considering pushing failures onto a stack for 2nd chance processing in the V1 version "proper" release.

Success rates

It might seem a basic move – but given that most questionnaires are 50+ questions – adding a end-of-run summary report would enable us to monitor questions that failed. See an example of a slightly worse than average below.

No Status Question

1 completed 'Do you conduct annually assessments to identify information security risks'

2 completed 'Do you formally accept risks'

3 completed 'Do you have a written Information Security policy?'

4 completed 'Do you review the Information Security policy at planned intervals'

5 completed 'Do you have an individual responsible for security (e.g. Chief Information Security Officer)'

6 completed 'Do you have cybersecurity insurance'

7 completed 'Do you perform background verifications checks on staff?'

8 completed 'Do you assess the level of awareness and skills of the staff assigned to the service regarding the security of IT systems?'

9 completed 'Do you do a criminal record check?'

10 completed 'Are staff personally bound by a confidentiality agreement?'

11 failed 'Is compliance to Information Security Policy a clause in the employment contract?'

12 completed 'Do you ensure that staff receive training and updates on relevant information security policies and procedures'

13 completed 'Are developers assigned regular awareness sessions on secure coding principals and development standards'

14 completed 'Do you have controls in place the ensure the confidentiality and the integrity of your customer's data and materials?'

15 completed 'Do you require employees to have a firm-imaged laptop with an encrypted hard drive when accessing information systems remotely (from outside of the firm's facilities)?'

16 completed 'Do you encrypt and password protect all removable and mobile devices (e.g. CDs

17 completed 'Do you have capabilities to encrypt data at rest?'

18 completed 'Do you ensure that non-public information is encrypted in transit?'

19 completed 'Do you have capabilities in place to ensure the secure destruction of all data at the end of contract? '

20 completed 'Do you have a security officer?'

21 completed 'Are access requests approvaled and captured in an auditable and traceable manner? '

22 completed 'Do you have controls in place to ensure access rights are deactivated in 24 hours when a member of staff leaves?'

23 completed 'Do you perform regular reviews of user access rights ?'

24 completed 'Do you have automated requirements to require entry of a unique user ID and a complex password to identify and authenticate users?'

25 failed 'Does your solution support transparent authentication (Single Sign On based solution)?'

26 completed 'Is your solution compliant with SAML v2 ?'

27 completed 'Does your solution support dual factor authentication or MFA?'

28 completed 'Confirm that you are able to provide two ways authentication mechanisms for application exchange?'

29 completed 'Do you require dual factor authentication for remote access from outside of the office?'

30 completed ' Do you have lan segregation within your environment?'

31 completed 'Do you have security controls in place to ensure that only legitimate users can access your physical environment (offices

32 completed 'Do you own you IT Infrastructures?'

33 failed 'Do you review environmental risks that your office facilities are exposed to ?'

34 failed 'Do you have formal and documented operating procedures made available to all support staff'

35 failed 'Do you have a process for managing changes to any/all components of service operational environments ?'

36 failed 'Do you engage to notify customers of any material change that may effect the risk management environment of the service?'

37 completed 'Do you have a capacity plan to ensure the required system performance?'

38 failed 'Are all workstations and servers overseen by your firm equipped with anti-virus software utilizing regularly updated signatures and/or white listing?'

39 failed 'Do you scan and filter incoming Internet traffic to protect against malicious and inappropriate content?'

40 failed 'Do you have controls in place to protect against mobile code performing unauthorized actions?'

41 completed 'Do you have controls in place to identify and eliminate viruses during development

42 completed 'Do you have systems backups in place and perform regular tests?'

43 failed 'Do you have procedures in place for working with customers to determine an appropriate backup frequency?'

44 completed 'Do you have procedures in place for working with customers to determine an appropriate backup frequency?'

45 completed 'Do you have audit logs that continuously capture information security relevant events'

46 completed 'Do you have controls in place to check the integrity of introduced and new software versions?'

47 failed 'Do you establish baseline security requirements to be applied to the design and the implementation of your own internal applications

48 failed 'Do you have vulnerability and patch management processes and mechanisms in place?'

49 completed 'Do you have network controls in place to protect your client's information in systems and applications'

50 completed 'In case of e-commerce services

51 failed 'Do you require at least WPA2 protection for all wireless networks at relevant locations?'

52 failed 'Do you have documented security requirements to be adopted during the software & application developments?'

53 failed 'Do you have a process in place to ascertain the security requirements throughout your software & application development life cycle?'

54 completed 'Do you maintain record regarding individual granted access

55 completed 'Do you have documented design techniques as well as a defined process

56 failed 'Do you ensure a segregation between development

57 failed 'Do you allow the use of production data in test environments?'

58 failed 'Does the provider assess security risks related to vendors and business partners and incorporate security requirements into contracts? '

59 completed 'Do you inform customers of a cybersecurity event

60 completed 'Does the provider have an incident management process

61 completed 'Does your company define appropriate statutory

62 failed 'Does your firm periodically assess the compliance to PCI DSS requirements?'

63 completed 'Does the provider perform independent reviews of security and compliance with security policies

64 completed 'Does your company conduct any external independent third party security assessments (such as penetration testing

65 failed 'Does you company confirm that all identified vulnerabilities are covered by relevant corrective measures and remediated in a reasonable timeframe that does not exceed 6 months for critical ones?'

66 completed 'Is a screen lock enforced after a set amount of user inactivity? '

67 completed 'Do you have firewalls protecting your internal network?'

---------------------------------------------------------------------------

But it also allows us to resubmit the questions for a 2second chance, manually for the moment. A 2nd run is shown below:

No Status Question

0 Completed 'Does your solution support transparent authentication like Single Sign On?

1 completed 'Do you own your IT Infrastructures?'

2 completed 'Do you review environmental risks of your office facilities’

3 completed 'Do you have formal and documented operating procedures made available to all support staff'

4 completed 'Do you have a process for managing changes to the service operational environments ?'

5 completed 'Do you notify customers of any material change that may effect the risk management environment of the service?'

6 completed 'Are all workstations and servers run by your firm equipped with anti-virus software?'

7 completed 'Do you filter incoming Internet traffic to protect against malicious and inappropriate content?'

8 completed 'Do you have controls in place to protect against mobile code performing unauthorized actions?'

9 completed 'Do you have procedures in place for working with customers to determine an appropriate backup frequency?'

10 completed ‘Do you establish baseline security requirements on the design and the implementation of your own internal applications ?’

11 completed 'Do you have vulnerability and patch management processes and mechanisms in place?'

It does highlight how security people might have great spleellin (spelling - I am being lovable ) but then tend to talk in double speak – I am guilty myself

Take the example above:

 Question “'Does your solution support transparent authentication (Single Sign On based solution)?'”

Not an example of crystal clarity!! ChatGPT has learnt using “normal person speak” not our “mucho pomposo de lingua” speech. No wonder it struggles sometimes. Rerun a slightly modified version below and you get perfect results see above

'Does your solution support transparent authentication like Single Sign On ?'

Likewise, “'Are all workstations and servers overseen by your firm equipped with anti-virus software utilizing regularly updated signatures and/or white listing?’” when curated to benefit from the extraction of some redundant terminology this can be answered correctly. Whether this curated improved accuracy is achieved by a trial run, prior to some edits or a completely new run on a question file drawn directly from failed items in the summary report is a matter of personal choice.

What’s left to do and Conclusions

The release prototype version is already close to completion. It already affords some of the features below:

Better Vector store definition
Vector stores that persist between runs
Model fine tuning via the “fine tune” Jason file

These are all very close to being a reality. Please keep your eye on PacketStorm for the V1 release package.

But it has limitations or does it?

Portals #1

Increasingly these (I reckon around 15-20% ) of these questionnaires are being dispatched via context aware questionnaire web portals. Surely this system can’t cope? Well, you would think not but they all have the ability to export the questions to excel. And often the system can answer these as is.

But that isn’t quite the end of it. These portals vendors are aware of these innovations and seem to be trying to make it difficult to process automatically. They do this by providing multiple choice options like yes/no [case1] or select all that apply from firewall, DLP or IDS[case2]. Imagine our knowledge base golden1.txt contains:

'List the network security tools you use?', ' We use  Firewalls, IDS, DLP and WAF'
'Do you have a written Information Security policy?','Yes.  Our Corporate Governance
             Policies include :- Information Technology Use, Anti-Virus Software, 
             Information Classification, Information Security”

Then our program can cope as follows:

Case1:

Question: Do you have a security policy?  Answer yes or no only 
Answer: Yes.

Case: 2

Question: What network security tools  does the company use ? Select all that apply from Firewall, IDS , DLP 
Answer: The company uses all three network security tools: Firewalls, Intrusion Detection Systems (IDS), and Data Loss Prevention (DLP)【14:0?golden1.txt】

As long as the questionnaire contains this constraint in the question that we send to OpenAi, the above shows it can cope. There is another use case where the portal vendor has a yes/no question followed by a text box with a context absent question like “justify this answer”. Again a strategic edit of the questionnaire will help i.e . Modify the “justify this answer” to include the initial question and the system will work. For example: “Do you have adequate network security, describe?” Do this and the system works.

All this can be done by a pre-processor script. Or a few manual edits. In the future, a conversation mode could be used as ChatGPT supports it for human interaction.

But to close, Lets answer some questions ( ?? ) and draw some conclusions:

Q: Am I pleased?

A: Yes this thing really works and is usable – I don’t do these things to form a profitable software company or make money. I do them to solve a problem, because I am interested and because I live in hope that someone in the industry will be interested too. But I fear its too “technical” for many coming into the industry. In short and rather surprisingly to me, I am rather proud of the output. I think its pretty impressive

There again I remember using Punch Cards and machine code - google it kiddies

Q: Is OpenAi the right tool

A: Well I got it to work so that is a plus – but the results are unpredictable, the cost of re-engineering for the reasons noted above are high & indicative of a platform unready for commercial use. “Prompt engineering” as a way to pass directives via natural language to the model seems ineffective, stochastic and provably error prone. Little attention has been paid to customer-developer useability - I am security leader/ciso from a cohort dating some 30 years back when the occasional frolic into kernel mods or opensource exploit development was common, often in low level “C” or Assembler. This was a simple project in comparison – used what is effectively an end-user scripting language and basic API calls, yet it was unnecessary challenging because of lack of focus on the base platform stability and documentation.

Q: Is it secure?

A: Well tell me what exactly is “secure” these days, Plenty of closed models have been compromised. I have no indicators that in conventional usage of “secure” (|i.e. CIA) that could give extra cause for pause. However, the scientific CIA evaluation, the platform exhibited:

Integrity issues – reverse data pollution does occur despite instructions to the later
Availability – the thing is obviously prone to capacity issues
Prompt engineering – seems to give huge ambiguity in an application where we can be very specific.

My firm has a MS Copilot enterprise account – maybe I can ask them to take on a skunkworks style re-engineering on to this platform to see if this irradicates the issues. But as you can guess, I have doubts.

AND NOW FOR THE BIT NOBODY ON LINKED WILL READ --- THE CODE?

Code – cut and past into your editor

And yes I now know I cant spell questionnaire - talk to the hand

 
#!/usr/bin/python3  
# Questionaire_mate.py    Author: Mark Osborne
#  
# This reads in a list of questions and uses OpenAI to answer them based on a private knowledge base 
# Designed to answer 3rd party questionnaires
#  
# 1 - read/upload a knowledge base and a questionnaire
# 2 -  create an assistant object
# 3 -  Create an Execution Thread
# 4 - loop round  all questions in the questionnaire 
# 4a - send  a Message to the Thread with a question in it
# 4b -  Run the Assistant request
# 49c - loop waiting for the request to complete 
# 4d - print the output 
# 5 - clean up 
#  
#  Report written to output.txt in the current directory 
#
import os
import openai
import time

def read_file_to_array():
    file_path = input("Enter the file name of the Questionaire: \n")
    try:
        with open(file_path, 'r') as file:
            lines = file.readlines()
            # Remove any trailing newline characters
            lines = [line.strip() for line in lines]
        return lines
    except FileNotFoundError:
        print(f"The file at {file_path} was not found.")
        exit(1)

def create_file_object():          
  ansfile = input("Enter the file name of the knowledge base: \n")  
  try:
     my_file = openai.files.create(
       file=open(ansfile,"rb"),
       purpose='assistants'
       )
     return my_file
  except :
     print(f"The file at {ansfile} was not found.")
     exit(1)

        
def  wait_run_complete( wait_thread_id, wait_run_id) :
#  Periodically retrieve the Run to check on its status to see if it has moved to completed
  err_count = 0
  keep_retrieving_run = openai.beta.threads.runs.retrieve( thread_id= wait_thread_id, run_id= wait_run_id)
  dot="."
  while True: 
  # while keep_retrieving_run.status in ["queued", "in_progress"]:
  # is a better construct but it does not account for erroneous returns from the API
      keep_retrieving_run = openai.beta.threads.runs.retrieve( thread_id= wait_thread_id,   run_id= wait_run_id)
      print(f"!Run status: {keep_retrieving_run.status} {dot}",end="\r")
      dot+="."
      if keep_retrieving_run.status == "completed":
        print("\n********complete******\n")
        return( keep_retrieving_run.status )              
      elif keep_retrieving_run.status == "queued" or keep_retrieving_run.status == "in_progress":
        time.sleep(14)
        pass
      else:
        print(f"Run status: {keep_retrieving_run.status}")
        return("failed")
        err_count+= 1
        if err_count >= 200:
           print(f"Run status: {keep_retrieving_run.status}")
           print("Thread returned greater than 20 errors  ")
           return()

def  clean_up( my_assistant, my_file) :
  try:
    vs  = openai.beta.vector_stores.delete(my_assistant.tool_resources.file_search.vector_store_ids[0])
    print(f"successful delete of vect resource object: {vs} \n")
  except : 
    print(f"del vector store  error    \n")
  try:
    del_file=openai.files.delete(my_file.id)
    print(f"delete file success: file object: {del_file} \n")
  except : 
    print(f"del file error    \n")

debug = True
debug = False
openai.api_key =  os.getenv('OPENAI_API_KEY')
if debug :
  print( openai.api_key )
# Example usage
file_path = 'output.txt'

file = open(file_path, 'w') 
text = f"Questionaire_mate - Question and Answer AI completion system \n \n \n"
file.write(text)
print(text)

# read in a list of questions
def main():
  score = []
  questionaire  = read_file_to_array()
  if debug :
    print( questionaire )
  my_file = create_file_object()
  if debug :
    print(f"This is the file object: {my_file} \n")
#  Create an Assistant
  my_assistant = openai.beta.assistants.create(
   # model="gpt-3.5-turbo-1106",
    model="gpt-4o",
    instructions="""You are an administrator answering questions with the  answers found in a reference document containing the only correct answers.

    You need to select the best answer from the document upload.

    Provide 1 answer that best fits the question from the file provided. Do not combine multiple answers.

    Do not provide answers not contained in the file  or vector store.

    If you cannot find the  answer in the knowledge provided in the files, you must respond with 'I DONT KNOW'.

    """,
    name="leme", temperature=0.1,
    tools=[{"type": "file_search"}],
    tool_resources={ "file_search": { "vector_stores": [{ "file_ids": [ my_file.id] }] } } )
 #
  time.sleep(25)
  if debug :
    print(f"This is the assistant object: {my_assistant} \n")
# Create a Thread
  my_thread = openai.beta.threads.create()
  if debug :
    print(f"This is the thread object: {my_thread} \n")
#process a single question from  the questionaire
  for question in questionaire :
    print( question )
  #  send   a Message to a Thread
    my_thread_message = openai.beta.threads.messages.create(
      thread_id=my_thread.id,
      role="user",
      content=question 
#      Previously attached the vectore store                                           
#       attachments=[{"file_id": my_file.id , "tools": [{"type": "file_search"}]}]
      ) 
    if debug :
      print(f"This is the message object: {my_thread_message} \n")

#  Run the Assistant request
    my_run = openai.beta.threads.runs.create( thread_id=my_thread.id, assistant_id=my_assistant.id, instructions="Please ")
    if debug :
      print(f"This is the run object: {my_run} \n")

# Wait checking the Run status to see if it has moved to completed
    wait_state = wait_run_complete( 
        my_thread.id,
        my_run.id
        )
    if wait_state == "completed":
      score.append("completed")
    else:
      score.append("failed")
#  Retrieve the Messages added by the Assistant to the Thread
    all_messages = openai.beta.threads.messages.list(
          thread_id=my_thread.id)
    print("------------------------------------------------------------ \n")
    print(f"Question: {my_thread_message.content[0].text.value} \n\n")
    print(f"Answer: {all_messages.data[0].content[0].text.value}")
    text = f"Question: {my_thread_message.content[0].text.value} \n\n"
    file.write(text)
    text = f"Answer: {all_messages.data[0].content[0].text.value} \n\n"
    file.write(text)
### end loop

  print(f"tool resource object: {my_assistant.tool_resources.file_search.vector_store_ids} \n")
  clean_up( my_assistant, my_file) 
  print(" No  \t  Status \t Question       \n")
  for x in range(len(score)) :
     print(f" {x} \t {score[x]} \t  {questionaire[x]}  \n")

## end main
if __name__ == "__main__":
    main()
#  end of program
#  end of program

<<<<<<<<<<<<<<<<<<<<<<<<< END OF DOCUMENT >>>>>>>>>>>>>>>

Michael Wager

When it comes to cyber security, if you need it I know good people who can deliver it.

3 个月

Mark Osborne, as ever, interesting stuff (well, the bits I understand!). Have you seen https://heyiris.ai/?

1 次回应

Paul Hudson

Semi-retired - previous roles covered much of it services and software development as solutions architect, developer, services manager, development manager...

3 个月

I read the code ??

1 次回应

Denis Ontiveros Merlo

vp Enterprise Platforms

3 个月

Nice ! And nice example of learning by doing ! Those blended domain competent rich ciso types are going extinct! We need to keep that alive! Thanks for sharing!!

1 次回应

查看更多评论

要查看或添加评论，请登录

Mark Osborne的更多文章

Chatting on GP & 3rd party assurance response II

2025年1月25日

Chatting on GP & 3rd party assurance response II

Summary – Creating a free to use working OpenAI application and RAG (i.e.

2 条评论
Panning for Gold

2024年12月23日

Panning for Gold

Everybody knows I like working, but I like working smarter, not harder. I also hold strong opinions on security…
Luhn algorithm and why it makes CC number DLP a reality.

2024年3月28日

Luhn algorithm and why it makes CC number DLP a reality.

Introduction With PCIDSS 4 dead-lines approaching, I figured it was time to revisit some of the basics of CC payments…

6 条评论
Christmas Pudding or Yule Log(4j)

2021年12月27日

Christmas Pudding or Yule Log(4j)

Christmas Pudding or Yule Log(4j) Nobody aged thirty or younger will understand the following statement: “One of the…

2 条评论
Lex AI those Attestations

2019年6月9日

Lex AI those Attestations

Wietse Venema, Dan Farmer, Marcus Ranum,Marty Roesch, Kris Klaus, W.Richard Stevens, Marc Hause, Ralf Moonen, Rain…

2 条评论
OFFENSIVE AI - When Good Computers Go Bad!

2019年5月7日

OFFENSIVE AI - When Good Computers Go Bad!

I am always being asked for examples how Machine Learning can be used by the Bad Guys. I have always struggled to find…

1 条评论

See all articles

Introduction

Having a Chat with your GP

RAG and Bone

OpenAI & ChatGP

OpenAI API

Migrate to API v2

What is an Assistant

What is a Vector Store

How it works

Issues with the system

Amnesia: forgetting about Hardened Builds

Race condition

Moist Leakage

Tired and bored

Success rates

What’s left to do and Conclusions

Portals #1

AND NOW FOR THE BIT NOBODY ON LINKED WILL READ --- THE CODE?

Code – cut and past into your editor

Mark Osborne的更多文章

Chatting on GP & 3rd party assurance response II

Panning for Gold

Luhn algorithm and why it makes CC number DLP a reality.

Christmas Pudding or Yule Log(4j)

Lex AI those Attestations

OFFENSIVE AI - When Good Computers Go Bad!