Redash chatbot add-on: LLM-based chatbot for Data Analytics, Visualisation, and Automated Insight Extraction
Aaron Gebremariam
Data Scientist | Generative AI Engineer | Machine Learning Engineer | Python Developer
Introduction
In a bold and visionary quest to revolutionize data analysis capabilities, our company is strategically immersing itself in the expansive landscape of YouTube data exploration. At the heart of this transformative initiative lies the ambitious goal of pioneering the development of an avant-garde Redash chat add-on. This innovative tool empowers our organization, facilitating the extraction of valuable insights effortlessly from a myriad of interconnected Redash dashboards and databases through the natural language medium.
The core of this groundbreaking chat add-on creates a dynamic and interactive space, fostering conversations in a fluid question-and-answer format. It unlocks the potential for autonomous knowledge discovery, providing users with a seamless experience. Queries span a spectrum, ranging from inquiries about information displayed on dashboards to those necessitating the generation of SQL queries using sophisticated Large Language Models (LLMs). These queries are then executed against our interconnected databases. The comprehensive end-to-end system we are embarking on building is poised to empower our company, enabling the extraction of profound, meaningful, and actionable insights from our Business Intelligence (BI) platforms.
Our company's BI dashboards serve a dual purpose; they act as powerful monitors of our business processes and transformative tools that convert data collected from YouTube into actionable insights. These insights are pivotal in steering strategic decisions and providing a competitive edge in understanding digital content consumption trends. As we embark on this data-driven journey, we envision a future where our organization effortlessly navigates the intricate landscape of data analytics, unlocking new dimensions of understanding and strategic advantage.
Tech-stack used
Data source
Initially, they give data gathering raw data from YouTube, which is meticulously curated and stored in CSV format. Before its integration into the database, a crucial phase unfolds in the data analytics journey. This involves the thorough cleansing and refining of the data within the Python script. This meticulous data-cleaning process ensures that the dataset is pristine and well-prepared, setting the stage for a robust and reliable foundation for subsequent analytical endeavors.
import pandas as pd
class DataFrameManipulator:
def __init__(self, df):
self.df = df
def rename_columns(self, new_column_names):
"""
Rename columns in the Pandas DataFrame.
Parameters:
- new_column_names: Dictionary of old-to-new column names
Returns:
- Modified DataFrame
"""
return self.df.rename(columns=new_column_names)
def drop_rows(self, rows_to_drop, by_index=True):
"""
Drop rows from the Pandas DataFrame.
Parameters:
- rows_to_drop: List of index labels or row numbers to drop
- by_index: If True, drop rows by index labels; if False, drop rows by row numbers
Returns:
- Modified DataFrame
"""
if by_index:
return self.df.drop(rows_to_drop)
else:
return self.df.drop(self.df.index[rows_to_drop])
def drop_columns(self, columns_to_drop):
"""
Drop columns from the Pandas DataFrame.
Parameters:
- columns_to_drop: List of column names to drop
Returns:
- Modified DataFrame
"""
return self.df.drop(columns=columns_to_drop)
Designing tables in a schema with a Python script is a precise process involving configuring a structured database. The script efficiently constructs the schema, giving life to well-crafted tables tailored to specific data model requirements.
from sqlalchemy import create_engine, Column, Integer, String, Float, Date
from sqlalchemy.orm import declarative_base
from sqlalchemy.orm import sessionmaker
from postgres_conn import ConnectToPostgres
from datetime import datetime
Base = declarative_base()
class Views(Base):
__tablename__ = 'view_chart'
date = Column('date', datetime)
city_id = Column('city_id',String,primary_id=True)
city_name = Column('city_name', String)
views = Column('views',Integer)
def __init__(self,date,city_id,city_name,views):
self.date = date
self.city_id = city_id
self.city_name = city_name
self.views = views
class Device_type(Base):
__tablename__ = "device_type"
date = Column('date', datetime)
device_type = Column("device_type", String)
Views = Column('Views',Integer)
def __init__(self, date, device_type, Views):
self.date= date
self.device_type = device_type
self.Views = Views
Generating SQL Query with OpenAI
We employ OpenAI's text-davinci-003 model to generate SQL queries in response to user questions. By providing the question as a prompt to the model, we extract the resulting SQL query. The model's operation is rooted in the Transformer architecture, a deep learning model utilizing attention mechanisms to capture relationships between different words or tokens in a given text. Leveraging self-attention, the model focuses on various parts of the input text throughout the generation process, enabling a nuanced understanding of context and the generation of accurate, contextually relevant responses.
For SQL query responses based on user questions, we utilize OpenAI's text completion API, specifying parameters in the openai.Completion. create() function.
prompt = "Generate a SQL query to retrieve data from the youtube_data table."
response = openai.Completion.create(
engine="text-davinci-003",
prompt=prompt,
max_tokens=50,
n=1,
stop=None,
temperature=0.5,
top_p=1.0,
frequency_penalty=0.0,
presence_penalty=0.0
)
sql_query = response.choices[0].text.strip()
Upon acquiring the SQL query, we execute it with the cursor object, specifically crafted to fetch data from the device_type_chart table in response to the user's inquiry. Subsequently, we collect the query's result using the fetchall() method.
In the concluding phase, we present the user's question alongside the generated SQL query, the obtained SQL result, and the response generated by OpenAI. This comprehensive display provides insight into the interaction among the user's question, the generated query, and the corresponding result.
Question: the ttal number of device
SQL Query: SELECT COUNT(*) FROM device_type_chart;
SQL Result: [(5116,)]
Answer: SELECT COUNT(*) FROM device_type
Redas installation
领英推荐
Redash boasts a robust API, offering users a programmatic means to interact with their data. Following RESTful principles, the API facilitates operations like query execution, dashboard management, and integration with external tools. To streamline data exploration, Python has been employed to automate the conversion of SQL queries into Redash queries. Below is a simple example demonstrating how you can effortlessly retrieve the list of devices from the "device_chart_table" using the query "SELECT * from device_chart_table."
import requests
api_url = "https://your-redash-url/api/queries"
headers = {"Authorization": "Key your_api_key"}
query_data = {
"query": "SELECT * from device_chart_table",
"data_source_id": your_data_source_id,
}
response = requests.post(api_url, headers=headers, json=query_data)
query_id = response.json()["id"]
2. Querying
This Python script takes care of the complex tasks, generating a new Redash query and assigning a unique ID for future use. It's a quick and efficient process that makes our interaction with Redash smoother.
3. Query on Redash
Upon visiting your Redash dashboard, you'll discover our freshly generated query all set for execution. Redash provides a comprehensive display, showcasing the query text, its current status, and the pivotal result set, including details like the list of devices sourced from the device_type table
4. Creating the visual
With a Python script, we seamlessly transform query results into engaging visualizations. The script leverages the POST endpoint of the Redash API, specifically targeting the /api/visualizations path.
api_url = f"https://your-redash-url/api/queries/{query_id}/visualizations"
headers = {"Authorization": "Key your_api_key"}
visualization_data = {
"name": "Your Visualization",
"type": "TABLE", #"CHART",
# other necessary details...
}
response = requests.post(api_url, headers=headers, json=visualization_data)
visualization_id = response.json()["id"]
Challenge
As we embark on the journey of automating our data exploration endeavors, an intriguing challenge presents itself: the creation of automatic visualizations through the Redash API. This endeavor requires the adept utilization of Python to seamlessly generate visual representations derived from our queries. The intricacy of this task resides in the nuanced interpretation of pivotal elements such as the x-axis, y-axis, and various other parameters, all extracted from our English instructions.
Future Work
The project was completed promptly, and I'm eager to pursue further enhancements in the future. Plans involve refining automatic visualizations on Redash. Your contributions from the GitHub community are greatly appreciated. Feel free to explore the repository through the provided link. Github
References
#DataScience #MachineLearning #Python #SQL #DeepLearning #TechSkills #CareerDevelopment#Data ngineering#Data analyst.
M.D|Internist|Data analyst
1 年Thank you again Aaron Gebremariam for your brilliant breakdown of the subject matter!