Creating Your Own Chatbot with Amazon Bedrock
David Carnahan MD MSCE
Healthcare IT expert leading Veteran and Military Health initiatives at AWS
The opinions expressed in this blog are my own, and do not necessarily represent the opinions of Amazon, Amazon Web Services, or any other entity not named David Carnahan.
This project was A LOT of fun.
My goal was to use Amazon Bedrock and Langchain for the backend and Streamlit for the front end ... then testing out two or more different Large Language Models.
Source of Inspiration: Udemy Course by Rahul Trisal
Version 1: Streamlit Chatbot with Bedrock & Llama2
High Level Architecture
For this app ... there is a frontend script and a backend script.
Front end
The front end contains the following:
- Title
- Memory & Chat history
- Input text
- Session state
# import libraries
import streamlit as st
import chatbot_backend as demo
# parameters
st.title(":star-struck: Amazon Bedrock Chatbot") # title
# add langchain memory to session state
if 'memory' not in st.session_state:
st.session_state.memory = demo.demo_memory()
# add chat history to session
if 'chat_history' not in st.session_state:
st.session_state.chat_history = []
# render chat history
for message in st.session_state.chat_history:
with st.chat_message(message["role"]):
st.markdown(message["text"])
# input text box for chatbot
input_text = st.chat_input("Powered by Amazon Bedrock & Claude 2")
if input_text:
with st.chat_message("user"):
st.markdown(input_text)
# Append user input to chat history
st.session_state.chat_history.append({"role":"user", "text":input_text})
# Generate chat response using the chatbot instance
chat_response = demo.demo_conversation(input_text=input_text, memory=st.session_state.memory)
# Display the chat response
with st.chat_message("assistant"):
st.markdown(chat_response["response"])
# Append assistant's response to chat history
st.session_state.chat_history.append({"role":"assistant", "text":chat_response["response"]})
I adjusted a few points lines in the code so that you would always have the "response" of the chat_response ... rather than the whole history as coded by the original course.
Backend
The backend is straightforward as well. It contains three functions that mimic the high level architecture shown above.
- The Bedrock connection (using langchain) + model parameters for API
- Chat message (conversation) and response
- Memory & chat history
# import modules
import os
import boto3
from langchain.llms.bedrock import Bedrock
from langchain_anthropic import AnthropicLLM
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain
# function that invokes bedrock model
""" def demo_chatbot():
demo_llm = Bedrock(
credentials_profile_name="default",
model_id="meta.llama2-70b-chat-v1",
model_kwargs={
"temperature": 0.9,
"top_p": 0.5,
"max_gen_len": 512
}
)
return demo_llm """
def demo_chatbot():
demo_llm = Bedrock(
credentials_profile_name="default",
model_id="anthropic.claude-v2:1",
model_kwargs={
"temperature": 0.9,
"top_p": 0.5,
"max_tokens_to_sample": 512
}
)
return demo_llm
# function for conversation memory
def demo_memory():
llm_data = demo_chatbot()
memory = ConversationBufferMemory(
llm=llm_data,
max_token_limit=512
)
return memory
# function for conversation
def demo_conversation(input_text, memory):
llm_chain_data = demo_chatbot()
llm_conversation = ConversationChain(
llm=llm_chain_data,
memory=memory,
verbose=True
)
# chat response using invoke (prompt template)
chat_reply = llm_conversation.invoke(input=input_text)
return chat_reply
There are several things I want to point out:
- The first bedrock function is commented out ... but is the one that used Llama2 -- we'll talk about why I did that and decided to use Claude 2 in a little bit.
- You can provide a verbose versus non-verbose version of the chatbot by toggling between True and False with that parameter, or by increasing the max token limit.
- This script instantiates an object of the LLM three times versus doing it once (which we will also talk about later). This likely impacts the performance of the chatbot, but for such a small project doesn't matter much. Nevertheless, it is likely better if you do the instantiation once and then carry out all the necessary steps needed from the model.
领英推è
Llama 2, 3, 4, ...
When I used Llama2, there was a lot of verbosity ... some of which was in the name of a conversation between human and AI, but the problem is I wasn't the one creating the human prompts. Look at the image below to see what I mean.
I provided the initial question ... but Llama 2 generated additional 'human' questions for the LLM to answer. Not sure why ... but this disappeared when I changed the model from Llama 2 to Claude 2.
Version 2: Streamlit Chatbot with Bedrock & Claude 2
I took the code for the frontend and backend and asked AI to improve its efficiency, and the following is the code it provided.
Front end
import streamlit as st
from efficient_backend import DemoChatbot # Adjusted import to match the new backend structure
# Initialize chatbot instance & history if not already in session
if 'chatbot' not in st.session_state:
st.session_state.chatbot = DemoChatbot()
if 'chat_history' not in st.session_state:
st.session_state.chat_history = [] # Ensuring chat_history is initialized
# Title chatbot
st.title("?? Efficient Chatbot") # Title with an emoji
# Render chat history
for message in st.session_state.chat_history:
with st.chat_message(message["role"]):
st.markdown(message["text"])
# Input text box for chatbot
input_text = st.chat_input("Powered by Claude 2")
if input_text:
with st.chat_message("user"):
st.markdown(input_text)
# Append user input to chat history
st.session_state.chat_history.append({"role": "user", "text": input_text})
# Generate chat response using the chatbot instance
chat_response = st.session_state.chatbot.demo_conversation(input_text)
# Display the chat response
with st.chat_message("assistant"):
st.markdown(chat_response["response"])
# Append assistant's response to chat history
st.session_state.chat_history.append({"role": "assistant", "text": chat_response["response"]})
Backend
import os
import boto3
from langchain.llms.bedrock import Bedrock
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain
class DemoChatbot:
def __init__(self, model_id="anthropic.claude-v2:1", model_kwargs=None, max_token_limit=512):
self.model_id = model_id
self.model_kwargs = model_kwargs if model_kwargs is not None else {
"temperature": 0.9,
"top_p": 0.5,
"max_tokens_to_sample": 512
}
self.max_token_limit = max_token_limit
self.llm = self.init_llm()
self.memory = self.init_memory()
def init_llm(self):
try:
demo_llm = Bedrock(
credentials_profile_name="default",
model_id=self.model_id,
model_kwargs=self.model_kwargs
)
return demo_llm
except Exception as e:
print(f"Failed to initialize the LLM: {e}")
return None
def init_memory(self):
if self.llm is not None:
try:
memory = ConversationBufferMemory(
llm=self.llm,
max_token_limit=self.max_token_limit
)
return memory
except Exception as e:
print(f"Failed to initialize memory: {e}")
return None
def demo_conversation(self, input_text):
if self.llm is not None and self.memory is not None:
try:
llm_conversation = ConversationChain(
llm=self.llm,
memory=self.memory,
verbose=True
)
chat_reply = llm_conversation.invoke(input=input_text)
return chat_reply
except Exception as e:
print(f"Error during conversation: {e}")
return None
# Usage example
if __name__ == "__main__":
chatbot = DemoChatbot()
user_input = "Hello, how can you help me today?"
response = chatbot.demo_conversation(input_text=user_input)
print(response)
I really like the object oriented code for the backend of the 'efficient chatbot'. I deleted the usage example and then tested it by using streamlit commands for local hosting.
> streamlit run your_frontend_code.py
Finally a couple of things I ran into that I want to make you aware of ...
- Each model will have its own parameters ... so if you look at the code above for Llama2 you'll notice that there is a parameter "max_gen_len": 512 for the token limits you want to place on the prompt and response length, but in Claude 2 that same parameter is named differently -- "max_tokens_to_sample": 512. You'll need to look at the models API parameters in bedrock or other documentation to know what the required parameters are for the model of interest.
- Additionally, you'll need to look up the model id for the model in the same place to be able to access the model (and you'll need to have permission to model for the API call to work).
- Finally, you'll need to configure your AWS credentials using your IDE of choice as always. There is a lot of documentation out there on how to do this.
Where do you find these API details?
You go to the Bedrock Providers section ... press on the LLM provider (in my case Anthropic) and the model version (Claude 2.1).
Then scroll down to the API section and you'll find the modelId and the different parameters in the body ... notice the 'max_tokens_to_sample' parameter?
The Results
Original with Claude 2
Efficient with Claude 2
Conclusion
Using Amazon Bedrock is pretty straight forward once you have credentials. Adding Langchain for the memory capability, and Streamlit for a basic UI turned out fairly simple as well. I know there can be as much complexity as the use case demands, but hopefully this shows you what could be possible with some elbow grease and time to read the docs.
Healthcare IT expert leading Veteran and Military Health initiatives at AWS
11 个月Trey Oats Peter Easter, DO, MBA, MHA, FAAP, FAMIA Andrew D. Plummer, MD, MPH Ildar Ibragimov, PMP, CISSP Bryan Jernigan Albert Bonnema, MD MPH Chris Nichols Chakib Jaber Mark Poe Michael Peterson Gabriel Juarez Charles Gabrial Agustin "Tino" Moreno, MS Jamie Baker Chris Caggiano, MD FACEP Jason Nixdorf Adam Ginsburg Jim Jones Will Davis Shawn Arnwine