GenAI n00b, Part 2
This article is Part 2 of multi-part series “GenAI n00b”. Part 1 is here - https://www.dhirubhai.net/pulse/genai-n00b-part-1-dmitry-grinberg-cdzye/
I decided to give it a bit of space from Part 1 to iterate on few concepts that I started with and see what would matter more to share in detail. I ended up building Chain-of-thought / Reason-Act Proof of Concept (POC) with few very interesting takeaways. Though I am compelled to follow through on Part 1’s promise to unwrap all briefly described points with a lot more detail, I believe it is a lot more valuable for me to share some of the nuances of my POC. Most of these nuances do builds on some of the previous topics, so not entirely deviating.
For POC, I wanted to build a bot that can answer complex questions that would rely on a mix of structured dataset, ability to lookup/search internet where needed, run few simple functions if appropriate, all decided by llm itself (not programmed algorithm).
I wanted to be able to use either OpenAI or custom LLM - the route I had to take had to work for both, so I can compare, etc. Since this was a POC, I didn’t want to build any fa?ades nor polymorphic implementation to generate different representation based on model used.
Here is what the setup looks like for the POC, simplified with single "tool" that looks at specific dataset for answers, or uses it's own knowledge. Adding more tools is just a function of enumerating them (use your imagination reading bellow template in Step 4) - I've implemented half a dozen, the limit is obvious (max tokens).
Step 1. Using Colab, get things installed. Make sure to pick correct GPU, switching resets everything (everything has to be reinstalled and redownloaded). I mostly used V100, as A100 burns through credits too fast. Though if you need to iterate through a lot of open llm chatter, A100 will save a ton of time (responds a lot faster). Use cheapest option for OpenAI (doesn't need GPU).
# in order to use GPU, need use this command for llama cpp python wrapper
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python
# OpenAI API
!pip install openai
# using Jinja2 to templatize prompts. Concatenating or
# formatting strings is pathetic, even for POC.
!pip install Jinja2
I’ve settled on llama-cpp-python for couple of reasons, I’ve used many things, determined this is the best way to go. Partially because it supports streaming (not much else does) and it just works (I ended up not really needing a kitchen sink that many libs offer). Going beyind POC might change my mind, though unlikely (will describe in later parts, if I ever get that far).
Step 3. I get dataset that I want to make sure bot understands. Could be many different kinds. I’ve tested mostly with “drgilermo/nba-players-stats” from Kaggle, it has 3 csv (name explains what’s in there). Also loaded Insurance Claims, Stocks, Housing Data, many others. etc. Works equally well, as long as you don't go nuts with number of CSV and columns in each.
dataset_name = “some name of the dataset from Kaggle”
dest_path = “destination folder, where dataset will be unzipped”
os.environ['KAGGLE_USERNAME'] = “your Kaggle usename”
os.environ['KAGGLE_KEY'] = “your Kaggle key”
api = KaggleApi()
api.authenticate()
# Now, we exercise our power to summon the dataset from the Kaggle dimension.
api.dataset_download_files(dataset_name, path=dest_path, unzip=True)
# then code, that I’m omitting here, which iterates over
# downloaded csv datasets from dest_path and loads each as
# Panda Dataframe using pd.read_csv into local dict
# called “sandbox_locals”.
Step 4.
Setup prompt for Dataframe data as Jinja2 template:
prompt = """
### Instruction:
You should respond in valid JSON format only.
{{history}}
You should use the tools below to answer the question posed of you:
python_repl_ast: A Python shell. Use this to execute python commands.
Input should be a valid python command, that does not return errors. When
using this tool, sometimes output is abbreviated - make sure it does not
look abbreviated before using it in your answer.
Always use the following format for response:
{
"Question": "the input question you must answer",
"Thought": "you should always think about what to do",
"Actions":
[
{
"Function": "The action to take, should be one of
[python_repl_ast]",
"Input": "The input to the action.",
},
# you can have multiple Actions
],
"Answer": "The final answer to the original input question. Provide
details.",
"Is Answer Final": "Yes if the answer is final, No if not."
}
{% if df is mapping %}{% for key, value in df.items() %}
You are working with a pandas dataframe in Python. The name of the
dataframe is `{{key}}`.
This is the result of `print({{key}}.head(2))`:
{{value.head(2)}}
{% endfor %}
{% else %}
You are working with a pandas dataframe in Python. The name of the
dataframe is `df`.
This is the result of `print(df.head(2))`:
{{df.head(2)}}
{% endif %}
Begin!
Question:
{{user_question}}
### Response:
"""?
When actual text prompt needs to be generated, all that needs to be done is this:
template = Template(prompt)
# sandbox_locals dict should have everything needed to render
# in this POC, datframe is loaded with key 'df' as eityher single
# dataframe, and dict of dict
use_prompt = template.render(sandbox_locals)
Step 5. Load either openai or custom llm
# download model if using local model
model = 'deepseek-coder-33b-instruct.Q4_K_M.gguf'
model_path = 'TheBloke/deepseek-coder-33B-instruct-GGUF'
# this will use huggingface-cli to download proper model from huggingface.
!huggingface-cli download {model_path} {model} --local-dir . --local-dir-
use-symlinks False
# GPU_LAYERS is the number of layers that increase performance,
# but dependent on available memory.
llm = Llama(model_path=model, n_ctx=4000, n_threads=4,
n_gpu_layers=int(os.environ['GPU_LAYERS']))
# -----------------------------
# OR?
# -----------------------------
llm = = OpenAI()
# the logic elsewhere does the switching, etc, omitting here for brevity
Step 6. Implementation of the function that takes user's input and passes it into llm, which uses the instruction from the prompt. If response is not a final answer, call is repeated with included results (as 'history'). Can't repeat too many times, as token limit is reached, I've gone to 10 (as long as data sets are not too complex). This implementation is a few pages long, so skipping it here, but might share it later as link to source.
Here is the promt example for NBA Stats Dataset, using deepseek-coder-33b-instruct.Q4_K_M.gguf model.
领英推荐
print_answer("Take player stats that determine MVP, then use the stats to
figure out who is MVP in 2015.")
This is what it responds with:
{
"Question": "Take player stats that determine MVP, then use the stats
to figure out who is MVP in 2015.",
"Thought": "To find the MVP of a specific year, we need to filter the
dataframe by the year and sort it based on some criteria. The most common
criteria are points (PTS), assists (AST), rebounds (TRB) etc. For
simplicity, let's use PTS.",
"Actions": [
{
"Function": "python_repl_ast",
"Input": "Seasons_Stats_csv['Year'] == 2015"
},
{
"Function": "python_repl_ast",
"Input": "MVP_2015 =
Seasons_Stats_csv[Seasons_Stats_csv['Year'] == 2015].sort_values('PTS',
ascending=False)"
}
],
"Answer": "The player with the most points in 2015 is MVP. The name of
the player can be found by accessing the 'Player' column of the first row
of the sorted dataframe: MVP_2015['Player'].iloc[0].",
"Is Answer Final": "No"
}
So now few things will happen:
A. The tool that it thinks should be used for 2 of the actions above is python_repl_ast, which in both cases should execute python code that it thinks will get to proper results.
B. The code is executed, with the end result of MVP_2015 dataframe that it created (printed to show and also to be used by llm subsequently). More specifically, I have written bunch of code that takes a sting (say "Seasons_Stats_csv['Year'] == 2015") and executes it as python code (I might go into nuances some other time on how this is done).
print(MVP_2015)
Unnamed: 0 Year Player Pos Age Tm G GS \
23114 23114 2015.0 James Harden SG 25.0 HOU 81.0 81.0
23000 23000 2015.0 Stephen Curry PG 26.0 GSW 80.0 80.0
23480 23480 2015.0 Russell Westbrook PG 26.0 OKC 67.0 67.0
23154 23154 2015.0 LeBron James SF 30.0 CLE 69.0 69.0
23215 23215 2015.0 Damian Lillard PG 24.0 POR 82.0 82.0
... ... ... ... .. ... ... ... ...
22999 22999 2015.0 Seth Curry PG 24.0 PHO 2.0 0.0
23476 23476 2015.0 David Wear PF 24.0 SAC 2.0 0.0
22916 22916 2015.0 Jerrelle Benimon PF 23.0 UTA 2.0 0.0
23222 23222 2015.0 Kalin Lucas PG 25.0 MEM 1.0 0.0
23210 23210 2015.0 Malcolm Lee SG 24.0 PHI 1.0 0.0
MP PER ... FT% ORB DRB TRB AST STL BLK \
23114 2981.0 26.7 ... 0.868 75.0 384.0 459.0 565.0 154.0 60.0
23000 2613.0 28.0 ... 0.914 56.0 285.0 341.0 619.0 163.0 16.0
23480 2302.0 29.1 ... 0.835 124.0 364.0 488.0 574.0 140.0 14.0
23154 2493.0 25.9 ... 0.710 51.0 365.0 416.0 511.0 109.0 49.0
23215 2925.0 20.7 ... 0.864 49.0 329.0 378.0 507.0 97.0 21.0
... ... ... ... ... ... ... ... ... ... ...
22999 8.0 -11.4 ... NaN 0.0 2.0 2.0 1.0 0.0 0.0
23476 7.0 2.4 ... NaN 2.0 0.0 2.0 1.0 0.0 0.0
22916 3.0 4.7 ... NaN 1.0 2.0 3.0 0.0 0.0 0.0
23222 6.0 -0.7 ... NaN 0.0 0.0 0.0 0.0 1.0 0.0
23210 2.0 -19.7 ... NaN 0.0 0.0 0.0 0.0 0.0 0.0
TOV PF PTS
23114 321.0 208.0 2217.0
23000 249.0 158.0 1900.0
23480 293.0 184.0 1886.0
23154 272.0 135.0 1743.0
23215 222.0 164.0 1720.0
... ... ... ...
22999 0.0 2.0 0.0
23476 0.0 1.0 0.0
22916 1.0 0.0 0.0
23222 0.0 1.0 0.0
23210 0.0 0.0 0.0
If you look the response, the Answer is not final, because LLM doesn't know what the result is yet (we need to pass it back), but it's own hint for the Answer is to look at the first position of the MVP_2015 that it created (MVP_2015['Player'].iloc[0]).
So, we take the above print out, and pass it back as 'history' after it's original response (omitted for brevity, all details above to figure out what it is).
After this call, llm responds with Final Answer as James Harden was an MVP in 2015. It basically understands that functions were run on this data and results were provided it to to answer the question.
Few interesting bits:
The key of this approach, if you understand what this thing is doing, is that you basically give it multiple ways it can choose to get to the answer using different means (tools), by explaining what the these tools can do - and it is very clever at how it picks these tools to use and for what. I've played around with complex asks where it used multiple tools for multiple parts of the question and also when it used one tools and subsequently results of it used for another (with above example, I asked for MVP as above and then to write it using ascii art, which I added as another tool, which it did).
Few more points that are worth mentioning:
For Part 3, I'm going to go into fine-tuning, to improve results for smaller models. I also will likely experiment with using long-term memory (which I'd add to this setup, basically a database) and use it to continuously and automatically fine-tune as well (based on successful task completion, etc).