SparkCognition AI Studio - A Test Drive
Credit: SparkCognition

SparkCognition AI Studio - A Test Drive

Recently, Deven Samant and I had the chance to explore SparkCognition's AI Studio to test the construction of a no-code AI pipeline. Our opportunity arose thanks to Amir Husain and Jarred Capellman , who initiated a competition that facilitated this experience. Our goal with the pipeline was to evaluate the ease of using Large Language Models (LLMs) to extract insights from resumes and job descriptions, as well as to find correlations between the two. Fortunately, AI Studio enabled us to achieve this effortlessly and without writing any code. Here's an overview of our approach.

  1. Downloaded a dataset of pdf resumes from Kaggle and bulk converted them into text using pdftotext.com
  2. Downloaded a dataset of CSV job descriptions from Kaggle and bulk-converted them into text using convertio.co/csv-txt/
  3. Combined both the text files into everything.txt (lazy with the naming here :-))
  4. Uploaded the text to the AI Studio repository

5. Connected the text file to a Document node

6. Connected the Document node to an AI Studio Question Prompt

7. Added an LLM node (OpenAI in this case). Configured the node with my OpenAI key

8. Connected the Question node and the LLM node to a langchain node.

9. Ran the pipeline. That's it!

This is what our pipeline looked like

Our TalentAI pipeline in SparkCognition AI Studio

We managed to conduct various successful queries, as illustrated by the examples below:



In a very short amount of time, we secured access to a working Retrieval-Augmented Generation (RAG) setup for LLMs. By contrast, I attempted to create similar functionality through Python code, which, as a beginner, took me a few hours and involved numerous debugging efforts. The code is below:


#Code to create a vector store from everything.txt
import pinecone

from pinecone import Pinecone
pc = Pinecone(
        api_key='******'
)
#pinecone.init(api_key='******', environment='us-west-2')

index_name = '384index'

# Create an index if it doesn't already exist
#if index_name not in pc.list_indexes():
#    pc.create_index(name=index_name, dimension=768, metric='cosine')  # 768 is for BERT-like models
#
# Connect to your index
index = pc.Index(name=index_name)

file_path = '/Users/sanilpillai/Downloads/everything.txt'

with open(file_path, 'r') as file:
    documents = [line.strip() for line in file.readlines() if line.strip()]

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')

# Generate embeddings
document_embeddings = model.encode(documents)

for i, (text, embedding) in enumerate(zip(documents, document_embeddings)):
    document_id = f"doc_{i}"  # Generate a unique ID for each document
    index.upsert(vectors=[(document_id, embedding.tolist(), {"text": text})])

# Example query to check the first few uploaded documents
query_result = index.query(vector=document_embeddings[0].tolist(), top_k=1)
print(query_result)        
#Code to retrieve query response
import openai
import pinecone

# Initialize OpenAI
openai.api_key = '*****'

# Initialize Pinecone
from pinecone import Pinecone
pc = Pinecone(
        api_key='*****'
)
#pinecone.init(api_key='*****', environment='us-west-2')

# Check if the index exists, if not create one
index_name = '384index'
#if index_name not in pc.list_indexes():
    #pc.create_index(index_name, dimension=768)  # Dimension depends on the model used for embeddings

# Connect to the index
index = pc.Index(index_name)


# Ingest documents into Pinecone
#index.upsert(vectors=[(doc['id'], doc['vector'], doc['metadata']) for doc in docs])

def retrieve_documents(query, top_k=3):
    """
    Retrieve top_k most relevant documents from Pinecone.
    """
   
    model = SentenceTransformer('all-MiniLM-L6-v2')
    # Generate embeddings
    query_vector = model.encode(query).tolist()
    #query_vector = convert_query_to_vector(query)
    response = index.query(vector=query_vector, top_k=top_k, include_metadata=True)
    print(response)
    #document_snippets = [hit['metadata']['text'] for hit in response['results'][0]['matches']]
    document_snippets = [match['metadata']['text'] for match in response['matches']]
    return document_snippets

def generate_response(document_snippets, prompt, model="gpt-3.5-turbo"):
    """
    Generate a response using OpenAI's GPT based on retrieved documents.
    """
    augmented_prompt = "\n\n".join(document_snippets) + "\n\n" + prompt
    from openai import OpenAI
    client = OpenAI(api_key=openai.api_key)
    response = client.chat.completions.create(
        model=model,
        messages=[{'role': 'user', 'content': augmented_prompt}],
        max_tokens=150,
        temperature=0.7,
        top_p=1.0,
        frequency_penalty=0.0,
        presence_penalty=0.0
    )
    return response.choices[0].message.content

# Example usage
query = "who was an executive between 2005 and 2013?"
document_snippets = retrieve_documents(query)
response = generate_response(document_snippets, "Given the context above, ")
print(response)        

It goes without saying that the capabilities and benefits of AI Studio became abundantly clear during our brief exploration. We could iterate much faster and not have to worry about the intricacies of vector stores and embeddings. I look forward to its evolution and ways that it will simplify complex processes for both technical and non-technical users alike, making advanced AI tools accessible and user-friendly for everyone!

Lillian Liang Emlet, MD MS CPC ELI-MP

Energy Leadership Coach for Healthcare Professionals | Founder & CEO, Transforming Healthcare Coaching | Contact us for Signature 1:1 & Group Coaching Programs for Healthcare Clinicians | Academic Intensivist | MedEd

9 个月

This is super interesting: and I appreciate the breaking down of the thought process and output. Thank you for sharing!

Khurram Mahmood

Co-Founder, Ensemble | UT Austin, CMU and Ex-Workday, Oracle, Veeva

9 个月

Super cool! AI Studio is revolutionary. Thanks for sharing your experience Sanil Pillai

Amir Husain

Founder: Avathon (prev SparkCognition), SkyGrid, Navigate | Author: The Sentient Machine, Gen AI for Leaders, Hyperwar | Board: UT Austin PAIB & CS, WorldQuant Predictive, SpecFive, Global Venture Bridge

9 个月

Excellent project!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了