SparkCognition AI Studio - A Test Drive
Sanil Pillai
Bridging Human Potential and AI Innovation | Coaching for the Future of Work
Recently, Deven Samant and I had the chance to explore SparkCognition's AI Studio to test the construction of a no-code AI pipeline. Our opportunity arose thanks to Amir Husain and Jarred Capellman , who initiated a competition that facilitated this experience. Our goal with the pipeline was to evaluate the ease of using Large Language Models (LLMs) to extract insights from resumes and job descriptions, as well as to find correlations between the two. Fortunately, AI Studio enabled us to achieve this effortlessly and without writing any code. Here's an overview of our approach.
5. Connected the text file to a Document node
6. Connected the Document node to an AI Studio Question Prompt
7. Added an LLM node (OpenAI in this case). Configured the node with my OpenAI key
8. Connected the Question node and the LLM node to a langchain node.
9. Ran the pipeline. That's it!
This is what our pipeline looked like
We managed to conduct various successful queries, as illustrated by the examples below:
领英推荐
In a very short amount of time, we secured access to a working Retrieval-Augmented Generation (RAG) setup for LLMs. By contrast, I attempted to create similar functionality through Python code, which, as a beginner, took me a few hours and involved numerous debugging efforts. The code is below:
#Code to create a vector store from everything.txt
import pinecone
from pinecone import Pinecone
pc = Pinecone(
api_key='******'
)
#pinecone.init(api_key='******', environment='us-west-2')
index_name = '384index'
# Create an index if it doesn't already exist
#if index_name not in pc.list_indexes():
# pc.create_index(name=index_name, dimension=768, metric='cosine') # 768 is for BERT-like models
#
# Connect to your index
index = pc.Index(name=index_name)
file_path = '/Users/sanilpillai/Downloads/everything.txt'
with open(file_path, 'r') as file:
documents = [line.strip() for line in file.readlines() if line.strip()]
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
# Generate embeddings
document_embeddings = model.encode(documents)
for i, (text, embedding) in enumerate(zip(documents, document_embeddings)):
document_id = f"doc_{i}" # Generate a unique ID for each document
index.upsert(vectors=[(document_id, embedding.tolist(), {"text": text})])
# Example query to check the first few uploaded documents
query_result = index.query(vector=document_embeddings[0].tolist(), top_k=1)
print(query_result)
#Code to retrieve query response
import openai
import pinecone
# Initialize OpenAI
openai.api_key = '*****'
# Initialize Pinecone
from pinecone import Pinecone
pc = Pinecone(
api_key='*****'
)
#pinecone.init(api_key='*****', environment='us-west-2')
# Check if the index exists, if not create one
index_name = '384index'
#if index_name not in pc.list_indexes():
#pc.create_index(index_name, dimension=768) # Dimension depends on the model used for embeddings
# Connect to the index
index = pc.Index(index_name)
# Ingest documents into Pinecone
#index.upsert(vectors=[(doc['id'], doc['vector'], doc['metadata']) for doc in docs])
def retrieve_documents(query, top_k=3):
"""
Retrieve top_k most relevant documents from Pinecone.
"""
model = SentenceTransformer('all-MiniLM-L6-v2')
# Generate embeddings
query_vector = model.encode(query).tolist()
#query_vector = convert_query_to_vector(query)
response = index.query(vector=query_vector, top_k=top_k, include_metadata=True)
print(response)
#document_snippets = [hit['metadata']['text'] for hit in response['results'][0]['matches']]
document_snippets = [match['metadata']['text'] for match in response['matches']]
return document_snippets
def generate_response(document_snippets, prompt, model="gpt-3.5-turbo"):
"""
Generate a response using OpenAI's GPT based on retrieved documents.
"""
augmented_prompt = "\n\n".join(document_snippets) + "\n\n" + prompt
from openai import OpenAI
client = OpenAI(api_key=openai.api_key)
response = client.chat.completions.create(
model=model,
messages=[{'role': 'user', 'content': augmented_prompt}],
max_tokens=150,
temperature=0.7,
top_p=1.0,
frequency_penalty=0.0,
presence_penalty=0.0
)
return response.choices[0].message.content
# Example usage
query = "who was an executive between 2005 and 2013?"
document_snippets = retrieve_documents(query)
response = generate_response(document_snippets, "Given the context above, ")
print(response)
It goes without saying that the capabilities and benefits of AI Studio became abundantly clear during our brief exploration. We could iterate much faster and not have to worry about the intricacies of vector stores and embeddings. I look forward to its evolution and ways that it will simplify complex processes for both technical and non-technical users alike, making advanced AI tools accessible and user-friendly for everyone!
Energy Leadership Coach for Healthcare Professionals | Founder & CEO, Transforming Healthcare Coaching | Contact us for Signature 1:1 & Group Coaching Programs for Healthcare Clinicians | Academic Intensivist | MedEd
9 个月This is super interesting: and I appreciate the breaking down of the thought process and output. Thank you for sharing!
Co-Founder, Ensemble | UT Austin, CMU and Ex-Workday, Oracle, Veeva
9 个月Super cool! AI Studio is revolutionary. Thanks for sharing your experience Sanil Pillai
Founder: Avathon (prev SparkCognition), SkyGrid, Navigate | Author: The Sentient Machine, Gen AI for Leaders, Hyperwar | Board: UT Austin PAIB & CS, WorldQuant Predictive, SpecFive, Global Venture Bridge
9 个月Excellent project!