Using Generative AI for Machine Learning: A Ticket Prioritization Example

Using Generative AI for Machine Learning: A Ticket Prioritization Example

With the rise of Large Language Models (LLMs), everyone seems to be exploring ways to solve problems using Generative AI. Some of these use cases are quite fascinating, while others feel more like gimmicks. As a data scientist, I began thinking about how Generative AI could enhance machine learning processes. But before we dive into that, let’s step back a little for those new to the field.

Gen AI vs. Machine Learning: What’s the Difference??

Generative AI and Machine Learning are distinct domains within AI. Machine Learning involves learning from data, whether supervised or unsupervised, and training algorithms to predict outcomes. The learning aspect is vital here because the models improve as they learn from data.

Generative AI, on the other hand, is exemplified by LLMs, which are trained on vast amounts of data. Re-training these models on new data can be highly resource-intensive, both in terms of time and cost.?

Now, how can we combine the strengths of these two fields to improve machine learning results?

Using Generative AI in Machine Learning: Feature Engineering

?In machine learning, feature selection is crucial. This step involves removing unnecessary data, helping algorithms make better predictions. For instance, if you are predicting the price of a home, you might choose features like the number of rooms, total area, and proximity to the beach.

?But to make even better predictions, you would likely create additional features through feature engineering. For example, calculating "cost per square foot" as a feature. For time series data, libraries like TSFresh are incredibly useful for feature engineering with numerical data. But what about textual data?

This is where LLMs and embeddings come into play.

?

Example: Using LLMs for Textual Feature Engineering

?Imagine you have a large dataset of Zillow reviews for homes in a specific area, and you want to use these reviews to predict home prices. LLM embeddings can convert these text reviews into vectors. Without going too deep into the technicalities of vectors today, let's focus on their application in machine learning.

?

Use Case: Ticket Prioritization with Text Embeddings

?Consider another example: You have customer support ticket data and want to prioritize these tickets based on their content. This is a great opportunity to combine machine learning with LLM embeddings.


Prioritizing Support Tickets with Embeddings: Step-by-Step Guide

?Here is how embeddings can be applied to prioritize customer support tickets based on textual data. (This is a simplified example, not an actual project.)

Step 1: Generating Dummy Ticket Data

?

The first step is to clean the ticket data by removing any unnecessary noise from the text.

import pandas as pd
import re

# Sample dataset of support tickets
data = {
    'ticket_id': [101, 102, 103, 104],
    'description': [
        'Server is down, critical issue needs immediate attention!',
        'Password reset request for new user.',
        'High latency on our website causing customer complaints.',
        'Minor issue with email notifications, not urgent.'
    ],
    'priority': [1, 3, 2, 4]  # 1 is highest priority, 4 is lowest
}

df = pd.DataFrame(data)
        

Step 2: Generating Text Embeddings

?

Next, we convert the ticket descriptions into embeddings using a pre-trained model, such as OpenAI's embedding model.

import openai

openai.api_key = 'YOUR_API_KEY'

def get_embedding(text, engine="text-embedding-ada-002"):
    response = openai.Embedding.create(input=[text], engine=engine)
    embedding = response['data'][0]['embedding']
    return embedding

df['embedding'] = df['description'].apply(lambda x: get_embedding(x))
        

Step 3: Training a Model to Predict Ticket Priority

?

Once we have the embeddings, they can be used as features to train a machine learning model to predict ticket priority.

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

X = list(df['embedding'].values)
y = df['priority']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = RandomForestClassifier()
model.fit(X_train, y_train)        

Step 4: Automating Ticket Prioritization

?

With the model trained, new support tickets can now be automatically prioritized based on their text content. This automation reduces the manual workload for support teams, allowing them to focus on resolving critical issues first.

?

Text embeddings are a powerful tool for data scientists. They enable the enrichment of textual data for machine learning algorithms, leading to more accurate and context-aware predictions. While this blog explored ticket prioritization as an example, there are countless other use cases where embeddings can make a significant impact. From sentiment analysis to product recommendations, the possibilities are vast.

Samiullah Khan

Sr. Practice Manager, AIoT at AlphaBold | C++ | ML | Open source | Fullstack [JS/dotnet core/python] | Enterprise Architecture | Cloud [AWS/Azure]

5 个月

This is an interesting take

要查看或添加评论,请登录

Awais Aslam的更多文章

社区洞察

其他会员也浏览了