Deploying SingleStore on Kubernetes for GenAI and RAG Applications
In today's AI-driven world, where terms like RAG (Retrieval-Augmented Generation) and GenAI (Generative AI) are becoming increasingly popular, I've realized how critical it is to manage data effectively. The rise of vector databases has shown that the right database choice isn't just about storage, it's about empowering AI applications to deliver real-time insights and advanced capabilities. From my experience, picking the right database can make or break an AI project, especially when dealing with the complex data needs of modern AI solutions.
Databases are foundational components of computing environments, enabling efficient data storage and management. The choice of a database often depends on an application's specific needs, such as real-time analytics, horizontal scaling requirements, or the complexity of transactions it must support. In this blog, I will discuss one such database, SingleStore, and demonstrate how to deploy it in a cloud-native environment.
Prerequisites
Challenges Faced with Traditional Databases
Traditional databases present several challenges that can affect performance and scalability in modern applications. Below are some of the most common issues:
How SingleStore Solves Common Database Challenges
SingleStore is a versatile database designed to handle complex SQL, JSON, and vector workloads. It helps developers overcome the challenges of traditional databases with innovative features and capabilities:
Key Features of SingleStore
SingleStore comes with various features that help overcome the limitations of traditional databases. Below are some of the key features:
Leverage SingleStore Kubernetes Operator
In this section, I will demonstrate how to set up SingleStore in your Kubernetes cluster and run MySQL.
Step 1: Create a Kubernetes Cluster on Civo
I will be using Civo's Kubernetes offering to set up a Kubernetes cluster. You can visit Civo Documentation to create a Kubernetes cloud cluster. We are going to create a 4-node medium-performance cluster.
Once the cluster is up and running, you can download the kubeconfig file and export it to your local machine to use the Kubernetes cloud cluster created on Civo or use the Civo CLI.
To verify the setup, you can run the following command, which will list the number of nodes in your Kubernetes cluster:
kubectl get nodes
NAME STATUS ROLES AGE VERSION
k3s-singlestore-demo-ece0-845f22-node-pool-7077-c3aod Ready <none> 4h53m v1.28.7+k3s1
k3s-singlestore-demo-ece0-845f22-node-pool-7077-lx7zr Ready <none> 4h53m v1.28.7+k3s1
k3s-singlestore-demo-ece0-845f22-node-pool-7077-fixzt Ready <none> 4h53m v1.28.7+k3s1
k3s-singlestore-demo-ece0-845f22-node-pool-7077-ifcsi Ready <none> 4h53m v1.28.7+k3s1
To list the number of pods running, you can use the following command:
kubectl get pods
NAME READY STATUS RESTARTS AGE
install-traefik2-nodeport-si-qnpgr 0/1 Completed 0 4h58m
NOTE: SingleStore Kubernetes Operator requires a cluster with 4 nodes. Each node should have 4 CPU cores and a minimum of 16GB RAM.
Step 2: Pull SingleStore Docker Image
Once the cluster setup is complete, you can pull the SingleStore Operator from Docker Hub using the following command:
docker pull singlestore/operator:3.258.0-f5ba0d6a
3.258.0-f5ba0d6a: Pulling from singlestore/operator
c129dd76092a: Pull complete
23034e1e5ba6: Pull complete
6718c92c55bf: Pull complete
Digest: sha256:a733109632b968b8ef9b77e4f805088a631ed9cfba26dec0b84815f10404a386
Status: Downloaded newer image for singlestore/operator:3.258.0-f5ba0d6a
docker.io/singlestore/operator:3.258.0-f5ba0d6a
NOTE: Before performing the above step, ensure Docker is up and running.
Step 3: Create the Object Definition Files
First, store the following values in a sdb-rbac.yaml file, which will create a Role-Based Access Control (RBAC) manifest that generates a ServiceAccount, Role, and RoleBinding object for use with the Operator.
apiVersion: v1
kind: ServiceAccount
metadata:
name: sdb-operator
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: sdb-operator
rules:
- apiGroups:
- ""
resources:
- pods
- services
- endpoints
- persistentvolumeclaims
- events
- configmaps
- secrets
verbs:
- '*'
- apiGroups:
- policy
resources:
- poddisruptionbudgets
verbs:
- '*'
- apiGroups:
- batch
resources:
- cronjobs
verbs:
- '*'
- apiGroups:
- ""
resources:
- namespaces
verbs:
- get
- apiGroups:
- apps
- extensions
resources:
- deployments
- daemonsets
- replicasets
- statefulsets
- statefulsets/status
verbs:
- '*'
- apiGroups:
- memsql.com
resources:
- '*'
verbs:
- '*'
- apiGroups:
- networking.k8s.io
resources:
- networkpolicies
verbs:
- '*'
- apiGroups:
- ""
resources:
- serviceaccounts
verbs:
- get
- watch
- list
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: sdb-operator
subjects:
- kind: ServiceAccount
name: sdb-operator
roleRef:
kind: Role
name: sdb-operator
apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: sdb-operator
rules:
- apiGroups:
- storage.k8s.io
resources:
- storageclasses
verbs:
- get
- list
- watch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: sdb-operator
subjects:
- kind: ServiceAccount
name: sdb-operator
namespace: default
roleRef:
kind: ClusterRole
name: sdb-operator
apiGroup: rbac.authorization.k8s.io
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: backup
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: backup
rules:
- apiGroups: ["batch"]
resources: ["jobs"]
verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: backup
subjects:
- kind: ServiceAccount
name: backup
roleRef:
kind: Role
name: backup
apiGroup: rbac.authorization.k8s.io
Create the resource using the following command:
kubectl create -f sdb-rbac.yaml
serviceaccount/sdb-operator created
role.rbac.authorization.k8s.io/sdb-operator created
rolebinding.rbac.authorization.k8s.io/sdb-operator created
clusterrole.rbac.authorization.k8s.io/sdb-operator created
clusterrolebinding.rbac.authorization.k8s.io/sdb-operator created
serviceaccount/backup created
role.rbac.authorization.k8s.io/backup created
rolebinding.rbac.authorization.k8s.io/backup created
After creating the RBAC, create a Custom Resource Definition (CRD) for the MemsqlCluster resource by saving the following code in a sdb-cluster-crd.yamlfile.
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: memsqlclusters.memsql.com
spec:
group: memsql.com
names:
kind: MemsqlCluster
listKind: MemsqlClusterList
plural: memsqlclusters
singular: memsqlcluster
shortNames:
- singlestore
- singlestoredb
- memsql
scope: Namespaced
versions:
- name: v1alpha1
served: true
storage: true
schema:
openAPIV3Schema:
description: Schema for the SingleStore Cluster
properties:
apiVersion:
description: 'APIVersion defines the versioned schema of this representation
of an object. Servers should convert recognized schemas to the latest
internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
type: string
kind:
description: 'Kind is a string value representing the REST resource this
object represents. Servers may infer this from the endpoint the client
submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
type: string
metadata:
type: object
spec:
description: Spec defines the desired state of Cluster
type: object
x-kubernetes-preserve-unknown-fields: true
status:
description: Status defines the observed state of Cluster
type: object
x-kubernetes-preserve-unknown-fields: true
type: object
subresources:
status: {}
additionalPrinterColumns:
- name: Aggregators
type: integer
description: Number of Aggregators
jsonPath: .status.expectedAggregators
- name: Leaves
type: integer
description: Number of Leaf Nodes (per availability group)
jsonPath: .status.expectedLeaves
- name: Redundancy Level
type: integer
description: Redundancy level of the Cluster
jsonPath: .spec.redundancyLevel
- name: Age
type: date
jsonPath: .metadata.creationTimestamp
Create the resource using the following command:
kubectl create -f sdb-cluster-crd.yaml
customresourcedefinition.apiextensions.k8s.io/memsqlclusters.memsql.com created
Once the CRD and RBAC are configured, create an Operator Deployment object that will spawn and maintain the Operator by saving the following code in a sdb-operator.yaml file.
apiVersion: apps/v1
kind: Deployment
metadata:
name: sdb-operator
labels:
app.kubernetes.io/component: operator
spec:
replicas: 1
selector:
matchLabels:
name: sdb-operator
template:
metadata:
labels:
name: sdb-operator
spec:
serviceAccountName: sdb-operator
containers:
- name: sdb-operator
image: <singlestore/operator docker image>
imagePullPolicy: Always
args: [
# Cause the operator to merge rather than replace annotations on services
"--merge-service-annotations",
# Allow the process inside the container to have read/write access to the `/var/lib/memsql` volume.
"--fs-group-id", "5555",
"--cluster-id", "sdb-cluster" ]
env:
- name: WATCH_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: OPERATOR_NAME
value: "sdb-operator"
Change the value of the container image under the spec section to the SingleStore Operator Docker image, e.g., image: singlestore/operator:3.258.0-f5ba0d6a.
Create the resource using the following command:
kubectl create -f sdb-operator.yaml
deployment.apps/sdb-operator created
The above resource should create a pod with a name starting from sdb-operator. You can verify the pod creation using the following command:
kubectl get pods
Wait for the new pod to be created, and proceed once the pod status is running.
After creating all the above resources, create a MemsqlCluster object (from the custom resource definition) that spawns the cluster and all associated necessary objects/resources based on the configurations supplied within by saving the following code in a sdb-cluster.yaml file.
apiVersion: memsql.com/v1alpha1
kind: MemsqlCluster
metadata:
name: sdb-cluster
spec:
license: license_key
adminHashedPassword: "hashed_password"
nodeImage:
repository: singlestore/node
tag: node_tag
redundancyLevel: 2
serviceSpec:
objectMetaOverrides:
labels:
custom: label
annotations:
custom: annotations
aggregatorSpec:
count: 2
height: 0.5
storageGB: 256
storageClass: standard
objectMetaOverrides:
annotations:
optional: annotation
labels:
optional: label
leafSpec:
count: 2
height: 0.5
storageGB: 1024
storageClass: standard
objectMetaOverrides:
annotations:
optional: annotation
labels:
optional: label
Replace license_key with your license from the Cloud Portal.
Replace hashed_password with a hashed version of a secure password for the admin database user on the cluster.
The following Python script shows how to create a hashed password. Note that the asterisk (*) must be included in the final password.
from hashlib import sha1
print("*" + sha1(sha1('secretpass'.encode('utf-8')).digest()).hexdigest().upper())
Create the resource using the following command:
kubectl create -f sdb-cluster.yaml
memsqlcluster.memsql.com/sdb-cluster created
The above resource should create a few pods with names starting from node-sdb-cluster. You can verify the pod creation using the following command:
kubectl get pods
NAME READY STATUS RESTARTS AGE
install-traefik2-nodeport-si-qnpgr 0/1 Completed 0 5h3m
sdb-operator-6c5f67f5b6-2tc12 1/1 Running 0 4m6s
node-sdb-cluster-aggregator-0 0/1 Running 0 3m4s
node-sdb-cluster-leaf-ag1-1 0/1 Running 0 3m4s
node-sdb-cluster-master-0 2/2 Running 0 3m32s
node-sdb-cluster-leaf-ag1-0 1/1 Running 0 3m4s
Step 4: Connect to Your Cluster
After all the pods are up
and running, also check for the services created using the following command:
kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.43.0.1 <none> 443/TCP 5h36m
sdb-operator ClusterIP 10.43.222.191 <none> 9090/TCP,6060/TCP 35m
svc-sdb-cluster ClusterIP None <none> 3306/TCP 35m
svc-sdb-cluster-ddl LoadBalancer 10.43.157.50 74.220.16.228 3306:30487/TCP 35m
svc-sdb-cluster-dml LoadBalancer 10.43.124.20 74.220.16.107 3306:30483/TCP 35m
The Operator creates two services for use with clients and database users. Use these IP addresses from the CLUSTER-IP column to access these services. If the Operator cluster resides on a cloud service provider, use the IP addresses from the EXTERNAL-IP column instead, which can be reached externally.
Step 5: Connect with the MySQL Client
To connect to the SingleStore cluster, run the following command from a computer that can access the Kubernetes cluster. Use the admin database user with the password you defined in the sdb-cluster.yaml definition file.
~/singlestore-demo
mysql -u admin -h 74.220.16.228 -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 346
Server version: 5.7.32 SingleStoreDB source distribution (compatible; MySQL Enterprise & MySQL Commercial)
Copyright (c) 2000, 2024, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
If you used the above Python code to generate the hashed password, you can log in to the MySQL Client by entering secretpass as the password.
Step 6: Run MySQL Queries
You can now run queries and perform various database tasks. Visit SingleStore Docs to explore further steps.
I have attached outputs of a few queries involving the creation of tables and storing data in them.
mysql> CREATE DATABASE memsql_example;
Query OK, 1 row affected (3.21 sec)
mysql> use memsql_example;
Database changed
mysql> CREATE TABLE departments (
-> id int,
-> name varchar(255),
-> PRIMARY KEY (id)
-> );
Query OK, 0 rows affected (0.33 sec)
mysql> CREATE TABLE employees (
-> id int,
-> deptId int,
-> managerId int,
-> name varchar(255),
-> hireDate date,
-> state char(2),
-> PRIMARY KEY (id)
-> );
Query OK, 0 rows affected (0.23 sec)
mysql> CREATE TABLE salaries (
-> employeeId int,
-> salary int,
-> PRIMARY KEY (employeeId)
-> );
Query OK, 0 rows affected (0.22 sec)
mysql> INSERT INTO departments (id, name) VALUES
-> (1, 'Marketing'), (2, 'Finance'), (3, 'Sales'), (4, 'Customer Service');
Query OK, 4 rows affected (0.24 sec)
Records: 4 Duplicates: 0 Warnings: 0
mysql> INSERT INTO employees (id, deptId, managerId, name, hireDate, state) VALUES
-> (1, 2, NULL, "Karly Steele", "2011-08-25", "NY"),
-> (2, 1, 1, "Rhona Nichols", "2008-09-11", "TX"),
-> (3, 4, 2, "Hedda Kent", "2005-10-27", "TX"),
-> (4, 2, 1, "Orli Strong", "2001-07-01", "NY"),
-> (5, 1, 1, "Leonard Haynes", "2011-05-30", "MS"),
-> (6, 1, 5, "Colette Payne", "2002-10-22", "MS"),
-> (7, 3, 4, "Cooper Hatfield", "2010-08-19", "NY"),
-> (8, 2, 4, "Timothy Battle", "2001-01-21", "NY"),
-> (9, 3, 1, "Doris Munoz", "2008-10-22", "NY"),
-> (10, 4, 2, "Alea Wiggins", "2007-08-21", "TX");
Query OK, 10 rows affected (0.23 sec)
Records: 10 Duplicates: 0 Warnings: 0
mysql> INSERT INTO salaries (employeeId, salary) VALUES
-> (1, 885219), (2, 451519), (3, 288905), (4, 904312), (5, 919124),
-> (6, 101538), (7, 355077), (8, 900436), (9, 41557), (10, 556263);
Query OK, 10 rows affected (0.23 sec)
Records: 10 Duplicates: 0 Warnings: 0
mysql> SELECT COUNT(*) from employees;
+----------+
| COUNT(*) |
+----------+
| 10 |
+----------+
1 row in set (0.19 sec)
mysql> SELECT id, name FROM employees ORDER BY id;
+----+----------------+
| id | name |
+----+----------------+
| 1 | Karly Steele |
| 2 | Rhona Nichols |
| 3 | Hedda Kent |
| 4 | Orli Strong |
| 5 | Leonard Haynes |
| 6 | Colette Payne |
| 7 | Cooper Hatfield|
| 8 | Timothy Battle |
| 9 | Doris Munoz |
| 10 | Alea Wiggins |
+----+----------------+
10 rows in set (0.29 sec)
mysql> SELECT id, name FROM employees WHERE state = 'TX' ORDER BY id;
+----+--------------+
| id | name |
+----+--------------+
| 2 | Rhona Nichols|
| 3 | Hedda Kent |
| 10 | Alea Wiggins |
+----+--------------+
3 rows in set (0.30 sec)
mysql> SELECT id, name FROM employees WHERE state = 'NY' ORDER BY id;
+----+---------------+
| id | name |
+----+---------------+
| 1 | Karly Steele |
| 4 | Orli Strong |
| 7 | Cooper Hatfield|
| 8 | Timothy Battle|
| 9 | Doris Munoz |
+----+---------------+
5 rows in set (0.15 sec)
SingleStore vs. Traditional Databases
The following comparison outlines the key differences between SingleStore and traditional databases:
Speed and Performance
SingleStore: Ideal for real-time analytics and hybrid transactional/analytical processing (HTAP).
Traditional Databases: Traditional databases like Oracle and MySQL often struggle with performance under heavy loads.
Scalability
SingleStore: Supports horizontal scalability by adding nodes to handle large datasets.
Traditional Databases: Many traditional databases rely on vertical scaling, which can be expensive and limited.
Flexibility
SingleStore: Offers deployment flexibility across on-premises, cloud, and containerized environments (e.g., Kubernetes).
Traditional Databases: Some databases have limited deployment options.
Cost Efficiency
SingleStore: Delivers high performance at a lower cost than enterprise solutions like Oracle.
Traditional Databases: Enterprise solutions typically come with high operational costs. Open-source databases like PostgreSQL or MySQL are more cost-effective but may require additional features to meet application demands.
Ideal Market
SingleStore is ideally suited for markets and industries that require high-performance and flexible database solutions. Below are key market segments where SingleStore provides value:
Getting Involved
Conclusion
Databases are the backbone of any application, ensuring efficient data management. The choice of a database varies depending on specific use cases. This blog has covered SingleStore as an effective solution for deployment in cloud-native environments, particularly using a platform like Civo. SingleStore addresses the limitations of traditional databases by offering performance, security, scalability, and flexibility.
Shoutout to SingleStore for collaborating with me on this post.
Crafting Scalable Systems Architect @ Apple | Enthusiast in Distributed Systems, Kubernetes, TypeScript, Python, React, Rust | Aspiring ML/AI Innovator
2 个月This is hot topic!!! Thanks for sharing!
Lead Engineer-DevOps|Cloud Focused at KodeKloud
2 个月Useful tips