My evaluation of Elastic Cloud Serverless on Microsoft Azure (Technical Preview)
Arnold ?? Van Wijnbergen
Independent ?? Consultant | Microsoft MVP | MCT Trainer | Speaker | Empowering Global Clients with DevSecOps, Reliable Architectures, Observability ?? Insights & Cybersecurity ?? Acceleration through Threat Intelligence
Introduction
Last week I decided to try out the technical preview deployment on Azure. I’ve read through the announcement. After some first thoughts I’ve started to spin up a serverless project and give it a try.
During the blog we will try to verify the following statements done by the announcement.
Let’s see what we can conclude here.
Project creation
First I’ve selected the use case to try-out. I’ve chosen “Elastic for Security“.
It took around 3 minutes to spin up a fully functional instance and get ready with a Kibana like console. The big difference is that it’s fully focussed on the chosen use case, so only under the hood it’s powered by Kibana.
Awesome, that’s already an A+ for the project deployment, but I want to see more.
Elastic Cloud
In the Elastic Cloud portal you directly notice the two different sections for Hosted deployments and Serverless projects.
We have just created the serverless project called “My Security project“, which is launched in Azure Cloud, Virginia East US region.
When we compare both, within Serverless projects no hardware profile or version can be chosen anymore. Just select the project type “like Security” and go.
Serverless project Management
Now we go to Manage our first Serverless project called “My Security project“.
Here we look into the Overview section. You can find information about the project ID and connection details like URLs. Big tiles are available for managing Data and Integrations, managing API keys and adding project Membership.
The first two are native project features within the console. Membership is managed in the well-known Elastic Cloud portal organization feature
Also notice we can only change the Project name or features. Project features are all related to the type, in my case Security. Below you see the configuration options available to us.
Important things to keep in mind are that you can’t change the project type or cloud provider / region. This would require a recreation of the particular project. Data migration seems to be handled by you as a customer, since there is only a Delete project action available.
One of the questions remains how we can handle snapshots or duplication from a project migration or clone perspective? This would be a desirable future request.
Renewed Storage Architecture
It took a while, but Elastic did a good job creating an Object storage based Storage Architecture, as they call it Search AI Lake Architecture.
It combined Object Storage capabilities, such as Azure Blob Storage to ensure performance, durability and minimising latency. As an Elastic user you can only provide the Cloud Region, where your data resides. This is helpful for compliance reasons, such as GDPR.
And as many due expect no Hot-Warm Architecture anymore.
Everything is abstracted in the Search AI lake. Another benefit is that you don’t have to manage the storage layer yourself. We can configure a default setting for retention that fits-all, set the maximum retention or we can optimize the retention per data stream. Everything seems straightforward and we do assume our data is stored in Azure Cloud Region East US.
Below is how Data Retention is configured in the Elastic Cloud portal.
If you want to know more about Search AI Lake Architecture, read the blog here.
Exploring the Project Console
When we open the Project console we directly are directed to the Elastic Security UI. Everything is focussed on the type of project you have chosen. At the moment of writing I didn’t spot any limitations or missing features. Core functionality like Alerts, Cases, Findings, Rules and Attack Discovery are available. This includes the Security AI Assistant, which requires you to configure your Azure OpenAI Instance first.
We also have Stack Management. Most options are familiar, but I’m curious about the Index creation process. Let’s look into this.
Index Creation
Index management is still available, but technically everything has been evolved to the Search AI Lake architecture. This makes creation an index simple, no need to set number_of_replicas or number_of_shards anymore. Only a name and Index mode is required.
Luckily we still have Developer Tools available, let’s go over this in a new section.
Developer Tools
Developer Tools is a crucial and powerful console for managing your Elastic environment. It provides an API driven management approach, for example using the Compact and Aligned Text (CAT) API. Here I’m curious what the differences are when using the famous CAT API.
When exploring the basics we directly find differences, which are expected since we don’t manage the infrastructure components anymore. Famous insights like shards, segments, allocations and recovery are gone. I do miss things like health, thread_pool and snapshot insights. Again the question remains how we could troubleshoot or handle data recovery.
From a true platform perspective only indices and data related API are still available. See the list below. Try it yourself using “GET _cat”.
=^.^=
/_cat/indices
/_cat/indices/{index}
/_cat/count
/_cat/count/{index}
/_cat/aliases
/_cat/aliases/{alias}
/_cat/component_templates
/_cat/ml/anomaly_detectors
/_cat/ml/anomaly_detectors/{job_id}
/_cat/ml/datafeeds
/_cat/ml/datafeeds/{datafeed_id}
/_cat/ml/trained_models
/_cat/ml/trained_models/{model_id}
/_cat/ml/data_frame/analytics
/_cat/ml/data_frame/analytics/{id}
/_cat/transforms
/_cat/transforms/{transform_id}
Doing Data Ingesting
Now that we explored most features it’s time to do some actual data ingestion. As input I’m using a public available data set from Kaggle. Kaggle is an interesting community website for sharing Machine Learning and Data Science data.
I chose a CSV data set that includes crime data from 2020 to present. This data set is available here. Credits to Avis02 for publishing this data set.
CSV Integration
For uploading file contents like CSV, TSV and JSON there is a helpful Integration available. Let’s first add this integration.
Now provide the index name, keep the defaults and start the import.
After the import completed 1,004,847 documents were created. It took around 5 minutes, so roughly the ingest was around 3,350 documents per second.?
Normally I will look into the Index stats which shows statistics about indexing and search. Unfortunately they are unavailable for serverless projects. Again for troubleshooting and monitoring purposes, such insights are still helpful.
Query performance
Now let’s look further in Query Performance. To validate this we are going to use the Bool query below. Let’s try this in the Developer Console first.
领英推荐
POST crimes/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"Crm Cd Desc": "VEHICLE"
}
},
{
"range": {
"AREA": {
"gte": 10
}
}
}
],
"should": [
{
"match": {
"Status": "IC"
}
}
]
}
}
}
The first query response took 35 ms, but the second results query response had an average around 4 ms. Again here seems the caching layer to drop in.
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": 2.8923671,
"hits": [
...
]
Search Profiler
Now let us analyse the Bool query above using the Search Profiler, which is still available to us, next to other great tools like the Grok Debugger and Painless Lab.
Now select the index crime, copy over the query part and send the request.
First execution is around 22 ms. See below.
After the first query, which seems to warm up the caching layer again, we are constantly around 12-13 ms. We could slightly optimize the Range query by setting a ‘lte’ value, such as 50.
Benchmarking Performance with Rally
Now let’s burn up that serverless power. For this I’m going to use Rally. Rally is an Open Source tool for benchmarking Elasticsearch environments. There is a special section, which explains the Serverless capabilities here.
Setting up Rally
Installation on Ubuntu is simple when Python3 (default nowadays) is installed. Just follow the steps below.
sudo apt install python3-pip -y
sudo apt install git -y
sudo apt install pbzip2 -y
python3 -m pip install --user --upgrade pip
sudo apt install python3.10-venv -y
python3 -m venv .venvsource .venv/bin/activate
python3 -m pip install esrally
Now first have a look at the various tracks available to race.? Ensure you are in the virtual environment (venv).
esrally list tracks
Most applicable for us is the security track. Take notice of the requirements, such as local storage for Rally and the required API Key.
Let’s execute and enjoy the ride!
Security Track
Below is an example that hits the project in test mode.
esrally race --track=elastic/security --target-hosts=${ES_HOST}:443 --pipeline=benchmark-only --client-options="use_ssl:true,api_key:${ES_API_KEY}" --on-error=abort --test-mode
Follow the logging like below. Also see that a serverless mode is detected and cluster health checks are getting skipped.
____ ____
/ __ \____ _/ / /_ __
/ /_/ / __ `/ / / / / /
/ _, _/ /_/ / / / /_/ /
/_/ |_|\__,_/_/_/\__, /
/____/
[INFO] Race id is [c0f06d40-dc46-48f6-a794-a5f465fab00b]
[INFO] Detected Elasticsearch Serverless mode with operator=[False].
[INFO] Installing track dependencies [geneve==0.2.0, pyyaml, elastic-transport==8.4.1, elasticsearch==8.6.1]
[INFO] Treating parallel task in challenge [security-querying] as public.
[INFO] Excluding [check-cluster-health] as challenge [security-querying] is run on serverless.
[INFO] Downloading track data (97.2 kB total size) [100.0%]
During the run of our Rally track I directly was missing insights with Stack Monitoring. Looking for an equivalent , I’ve opened the Elastic Cloud portal and looked at the Usage and Performance metrics. Seems that the Ingest rate has a delay, so my ingestion (which is caused by Rally) is not shown yet. Something to keep in mind. Below is a screenshot.
After the ‘elastic/security’ track I’ve looked into a smaller track to execute, that includes nested documents.
Nested Track
This track is good to verify performance. Especially that nested documents can be complex and can cause Performance Bottlenecks.
Let’s start the nested track.
esrally race --track=nested --target-hosts=${ES_HOST}:443 --pipeline=benchmark-only --client-options="use_ssl:true,api_key:${ES_API_KEY}"
Looking at the indices (using CAT api) I do see an index called sonested growing.
green open sonested? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? erb3My1MRoya0_cXHHhXvw 1 1 25567891 0 1.1gb 1.1gb ? 2.8gb
After a SUCCESSFUL run that took 2,226 seconds the following report was returned.
____ ____
/ __ \____ _/ / /_ __
/ /_/ / __ `/ / / / / /
/ _, _/ /_/ / / / /_/ /
/_/ |_|\__,_/_/_/\__, /
/____/
[INFO] Race id is [8255f112-e75d-48ee-9337-90a66cb03f75]
[INFO] Detected Elasticsearch Serverless mode with operator=[False].
[INFO] Excluding [check-cluster-health], [force-merge], [wait-until-merges-finish] as challenge [nested-search-challenge] is run on serverless.
[INFO] Racing on track [nested], challenge [nested-search-challenge] and car ['external'] with version [serverless].
Running delete-index [100% done]
Running create-index [100% done]
Running index-append [100% done]
Running refresh-after-index [100% done]
Running refresh-after-force-merge [100% done]
Running randomized-nested-queries [100% done]
Running randomized-term-queries [100% done]
Running randomized-sorted-term-queries [100% done]
Running match-all [100% done]
Running nested-date-histo [100% done]
Running randomized-nested-queries-with-inner-hits_default [100% done]
Running randomized-nested-queries-with-inner-hits_default_big_size [100% done]
------------------------------------------------------
_______ __ _____
/ ____(_)___ ____ _/ / / ___/_________ ________
/ /_ / / __ \/ __ `/ / \__ \/ ___/ __ \/ ___/ _ \
/ __/ / / / / / /_/ / / ___/ / /__/ /_/ / / / __/
/_/ /_/_/ /_/\__,_/_/ /____/\___/\____/_/ \___/
------------------------------------------------------
| Metric | Task | Value | Unit |
|-------------------------------:|-----------------------------------------------------------:|-----------:|-------:|
| Min Throughput | index-append | 19304.2 | docs/s |
| Mean Throughput | index-append | 20042.7 | docs/s |
| Median Throughput | index-append | 20093.6 | docs/s |
| Max Throughput | index-append | 20262.3 | docs/s |
| 50th percentile latency | index-append | 797.295 | ms |
| 90th percentile latency | index-append | 1069.43 | ms |
| 99th percentile latency | index-append | 2194.52 | ms |
| 99.9th percentile latency | index-append | 2760.21 | ms |
| 100th percentile latency | index-append | 2965.42 | ms |
| 50th percentile service time | index-append | 797.295 | ms |
| 90th percentile service time | index-append | 1069.43 | ms |
| 99th percentile service time | index-append | 2194.52 | ms |
| 99.9th percentile service time | index-append | 2760.21 | ms |
| 100th percentile service time | index-append | 2965.42 | ms |
| error rate | index-append | 0 | % |
| Min Throughput | randomized-nested-queries | 17.17 | ops/s |
| Mean Throughput | randomized-nested-queries | 18.52 | ops/s |
| Median Throughput | randomized-nested-queries | 18.65 | ops/s |
| Max Throughput | randomized-nested-queries | 19.23 | ops/s |
| 50th percentile latency | randomized-nested-queries | 3647.95 | ms |
| 90th percentile latency | randomized-nested-queries | 4602.93 | ms |
| 99th percentile latency | randomized-nested-queries | 5026.31 | ms |
| 99.9th percentile latency | randomized-nested-queries | 5058.38 | ms |
| 100th percentile latency | randomized-nested-queries | 5066.09 | ms |
| 50th percentile service time | randomized-nested-queries | 95.8041 | ms |
| 90th percentile service time | randomized-nested-queries | 101.063 | ms |
| 99th percentile service time | randomized-nested-queries | 103.555 | ms |
| 99.9th percentile service time | randomized-nested-queries | 115.023 | ms |
| 100th percentile service time | randomized-nested-queries | 235.051 | ms |
| error rate | randomized-nested-queries | 0 | % |
| Min Throughput | randomized-term-queries | 23.59 | ops/s |
| Mean Throughput | randomized-term-queries | 23.65 | ops/s |
| Median Throughput | randomized-term-queries | 23.65 | ops/s |
| Max Throughput | randomized-term-queries | 23.68 | ops/s |
| 50th percentile latency | randomized-term-queries | 2541.45 | ms |
| 90th percentile latency | randomized-term-queries | 3384.52 | ms |
| 99th percentile latency | randomized-term-queries | 3539.69 | ms |
| 100th percentile latency | randomized-term-queries | 3557.58 | ms |
| 50th percentile service time | randomized-term-queries | 82.8646 | ms |
| 90th percentile service time | randomized-term-queries | 83.9267 | ms |
| 99th percentile service time | randomized-term-queries | 87.8161 | ms |
| 100th percentile service time | randomized-term-queries | 109.894 | ms |
| error rate | randomized-term-queries | 0 | % |
| Min Throughput | randomized-sorted-term-queries | 11.77 | ops/s |
| Mean Throughput | randomized-sorted-term-queries | 11.91 | ops/s |
| Median Throughput | randomized-sorted-term-queries | 11.92 | ops/s |
| Max Throughput | randomized-sorted-term-queries | 12.02 | ops/s |
| 50th percentile latency | randomized-sorted-term-queries | 24935.3 | ms |
| 90th percentile latency | randomized-sorted-term-queries | 27700.2 | ms |
| 99th percentile latency | randomized-sorted-term-queries | 28473.5 | ms |
| 100th percentile latency | randomized-sorted-term-queries | 28597.1 | ms |
| 50th percentile service time | randomized-sorted-term-queries | 148.491 | ms |
| 90th percentile service time | randomized-sorted-term-queries | 201.326 | ms |
| 99th percentile service time | randomized-sorted-term-queries | 226.093 | ms |
| 100th percentile service time | randomized-sorted-term-queries | 227.999 | ms |
| error rate | randomized-sorted-term-queries | 0 | % |
| Min Throughput | match-all | 5 | ops/s |
| Mean Throughput | match-all | 5 | ops/s |
| Median Throughput | match-all | 5 | ops/s |
| Max Throughput | match-all | 5 | ops/s |
| 50th percentile latency | match-all | 82.9413 | ms |
| 90th percentile latency | match-all | 84.1061 | ms |
| 99th percentile latency | match-all | 86.3922 | ms |
| 100th percentile latency | match-all | 101.242 | ms |
| 50th percentile service time | match-all | 81.2109 | ms |
| 90th percentile service time | match-all | 82.1217 | ms |
| 99th percentile service time | match-all | 84.0972 | ms |
| 100th percentile service time | match-all | 98.8632 | ms |
| error rate | match-all | 0 | % |
| Min Throughput | nested-date-histo | 1 | ops/s |
| Mean Throughput | nested-date-histo | 1 | ops/s |
| Median Throughput | nested-date-histo | 1 | ops/s |
| Max Throughput | nested-date-histo | 1 | ops/s |
| 50th percentile latency | nested-date-histo | 740.46 | ms |
| 90th percentile latency | nested-date-histo | 745.375 | ms |
| 99th percentile latency | nested-date-histo | 757.155 | ms |
| 100th percentile latency | nested-date-histo | 774.359 | ms |
| 50th percentile service time | nested-date-histo | 737.498 | ms |
| 90th percentile service time | nested-date-histo | 742.6 | ms |
| 99th percentile service time | nested-date-histo | 753.964 | ms |
| 100th percentile service time | nested-date-histo | 771.635 | ms |
| error rate | nested-date-histo | 0 | % |
| Min Throughput | randomized-nested-queries-with-inner-hits_default | 17.92 | ops/s |
| Mean Throughput | randomized-nested-queries-with-inner-hits_default | 17.95 | ops/s |
| Median Throughput | randomized-nested-queries-with-inner-hits_default | 17.96 | ops/s |
| Max Throughput | randomized-nested-queries-with-inner-hits_default | 17.97 | ops/s |
| 50th percentile latency | randomized-nested-queries-with-inner-hits_default | 98.135 | ms |
| 90th percentile latency | randomized-nested-queries-with-inner-hits_default | 103.348 | ms |
| 99th percentile latency | randomized-nested-queries-with-inner-hits_default | 107.32 | ms |
| 99.9th percentile latency | randomized-nested-queries-with-inner-hits_default | 132.372 | ms |
| 100th percentile latency | randomized-nested-queries-with-inner-hits_default | 143.779 | ms |
| 50th percentile service time | randomized-nested-queries-with-inner-hits_default | 96.3666 | ms |
| 90th percentile service time | randomized-nested-queries-with-inner-hits_default | 101.587 | ms |
| 99th percentile service time | randomized-nested-queries-with-inner-hits_default | 104.37 | ms |
| 99.9th percentile service time | randomized-nested-queries-with-inner-hits_default | 112.876 | ms |
| 100th percentile service time | randomized-nested-queries-with-inner-hits_default | 117.065 | ms |
| error rate | randomized-nested-queries-with-inner-hits_default | 0 | % |
| Min Throughput | randomized-nested-queries-with-inner-hits_default_big_size | 15.88 | ops/s |
| Mean Throughput | randomized-nested-queries-with-inner-hits_default_big_size | 15.93 | ops/s |
| Median Throughput | randomized-nested-queries-with-inner-hits_default_big_size | 15.94 | ops/s |
| Max Throughput | randomized-nested-queries-with-inner-hits_default_big_size | 15.96 | ops/s |
| 50th percentile latency | randomized-nested-queries-with-inner-hits_default_big_size | 114.945 | ms |
| 90th percentile latency | randomized-nested-queries-with-inner-hits_default_big_size | 120.141 | ms |
| 99th percentile latency | randomized-nested-queries-with-inner-hits_default_big_size | 123.643 | ms |
| 99.9th percentile latency | randomized-nested-queries-with-inner-hits_default_big_size | 148.916 | ms |
| 100th percentile latency | randomized-nested-queries-with-inner-hits_default_big_size | 153.608 | ms |
| 50th percentile service time | randomized-nested-queries-with-inner-hits_default_big_size | 113.26 | ms |
| 90th percentile service time | randomized-nested-queries-with-inner-hits_default_big_size | 118.335 | ms |
| 99th percentile service time | randomized-nested-queries-with-inner-hits_default_big_size | 121.079 | ms |
| 99.9th percentile service time | randomized-nested-queries-with-inner-hits_default_big_size | 123.385 | ms |
| 100th percentile service time | randomized-nested-queries-with-inner-hits_default_big_size | 125.117 | ms |
| error rate | randomized-nested-queries-with-inner-hits_default_big_size | 0 | % |
----------------------------------
[INFO] SUCCESS (took 2226 seconds)
----------------------------------
After going through the Final Score we can conclude that:
See the Ingest rate below.
Again this is just a benchmark, but can give a good view on stability and performance.?
Conclusion
Now it’s time to wrap up and set our conclusions. We see Elastic transforming to a truly SaaS Platform. From a user perspective this is great. I’m impressed about the work and the current offering with some minor remarks.
Let’s go through the promises below.
No compromise on speed or scale
I can agree on this. Performance tests show that still warming up is needed, but after that we have an excellent performance.
Hassle-free operations
Indeed Infrastructure challenges are gone, but of course other challenges will pop up. To troubleshoot, recover and monitor I do miss some functionality.
Purpose-built product experience
Fully agree here. Elastic Serverless projects are built to provide full Product experience. No infrastructure complexity anymore, since it’s all abstracted away. Just ingest, store and use your data.
Simplified pricing model
Due to the simplicity of using a usage-based pricing model pricing becomes flexible and really data-centric. This helps teams to make business valuable decisions on data storage, but Elastic should be cautious not to scare to store large volumes of data. Luckily for that there are Pricing Packages available. Read more about this here.
Security and compliance certified
This is an important topic, especially when you process and store PII or PHI data. Looking in Europe we have the GDPR. If you want to assess the service you can download most documentation in the Trust portal.
Still I have some findings that I’ve asked myself while writing this blog. Below is a list of findings that I would like to get answered in the future.
Next steps
Ready to Unlock the Full Potential of Elastic Cloud Serverless?
Are you looking to streamline your search, observability, and security workflows with Elastic Cloud Serverless but unsure where to start? Let’s connect!
Whether you need guidance on implementation, architectural design, or best practices for optimization, we are available to help you during the journey.
Drop me a message or book a consultation to discuss how you can get the most out of Elastic Cloud.
Let’s build secure, scalable, and efficient solutions together!
Elasticsearch Expert
1 个月Great article! Thanks for pointing out the missing features. Hopefully they will be on the roadmap for the GA release
Independent ?? Consultant | Microsoft MVP | MCT Trainer | Speaker | Empowering Global Clients with DevSecOps, Reliable Architectures, Observability ?? Insights & Cybersecurity ?? Acceleration through Threat Intelligence
1 个月^^ Philipp Krenn thanks for the support and advocacy. Hopefully this helps the Elastic community.