Making Magic
I get to be the best kind of magician.? You see, most magicians agree that “a true magician never reveals their secrets”.? This is because people love the allure of seeing what they cannot believe.? The truth is, it's all sleight of hand, and misdirection, and as soon as you know how the trick works, the magic is gone.? In one of the greatest films about the art of being a stage magician, The Prestige, about a prolonged feud between two stage magicians obsessed with creating the best stage illusion, they explore the three steps to an illusion.? The Pledge, the Turn, and the Prestige.? The Pledge: The magician shows you something relatively ordinary, like a large complex dataset. Second is The Turn: The magician takes the dataset and makes it do something extraordinary, like execute with high concurrency and low latency. Finally, there’s The Prestige: The magician tops that performance and shows you it was on data in your data lake.? You might be thinking, how is any of that possible?? This defies the conventional wisdom that we understand about Cloud technology.? Well the reason I feel like I get to be the best kind of magician, is I get to show you the tricks behind the scenes.? That's right, I am going full Masked Magician here, and I am going to expose the methods of how these modern marvels operate.??
While I was in Las Vegas, NV for Re:invent this past week, I took some time towards the end of my trip to see a magic show.? I had heard about Shin Lim from America’s Got Talent, and the Mirage was just down the street from where I was staying.? His show was EPIC.? He has collaborated with another magician, Colin Cloud.? Colin is a mental magician with a Sherlock persona.? If you haven’t seen Shin, he is a master at sleight of hand, especially with cards, and has an aloof persona.? Together, their show was an amazing collaboration with a twist at the end I never saw coming.? In the movie the Prestige, the feud between the two magicians drove them to create some of the most amazing magic we saw on screen.? Spoiler alert if you haven’t seen the movie, but its all about a special trick called the Transported Man.? Where you see a man at one location, then immediately see him transported to another location.? You see the magician played by Christian Bale, Bordon, was able to do the Transported Man trick, but his rival Angier could not figure out how he did it.? It drove him to madness, then his deaths.? Bordon was really a twin, but that fact is kept from the characters as well as the viewers until the end, that was the trick.? The twins shared one life, so they could execute the best magic trick in the world.? Angier on the other hand went the science route via a fictional narrative where Nicola Tesla himself built him a cloning machine.? Angier was finally able to execute the Transported Man, but at the cost of his life, every night.? What I took from this tale was that you can either be a one trick pony, or you can do things the hard way and scale to unimaginable heights.? Let’s check on the current feud in the headlines, between Databricks and Snowflake.??
I first heard of Snowflake at a Tableau Conference Expo hall in 2017.? That was the year that Tableau announced they acquired HyPer, a column-store database technology that sped up Tableau Extracts.? I was doing research on the new BI tools for the Cloud, and if you wanted a data warehouse in the Cloud, Snowflake seemed great.? They billed their architecture as separated compute and storage, because you could resize the warehouse, and separate warehouses could use the same datasets, and you could pause warehouses when you weren’t using them and not be charged.? It seemed like magic to me at the time.? As an Analyst and Manager in Data Analytics, having a flexible scalable cost effective solution was the holy grail.? Although, as a side note, you cannot read the data in Snowflake without their warehouse, only via JDBC/ODBC or UNLOAD.? I went back to my corporate world and shared the new tricks I learned, and we successfully validated Snowflakes magic.? If you look at the engineering, what Snowflake did was clever, they use Object Storage as a persistence layer, with ephemeral cloud Compute to provide a warehouse service that is quite simple.? Although, data must be ETL’d into Snowflake in order to use it, and this involves other tooling.? Most people were ELTing data into Snowflake, but when I ran the numbers on what it would cost to do that at an Enterprise level, it was unviable.? At $3 per Snowflake unit per hour, even if you only pay for while its working, ELTing is generally not performant.? This is why tools like Informatica were used in ETL, they performed transformations faster and more efficiently.? Although at the time I did not know this, Databricks is excellent at ETL, and is commonly used with Snowflake as the presentation layer for Warehouse workloads.? Unfortunately, Apache Spark was never good at Low Latency and High Concurrency, this led to some interesting innovation over the years at Databricks.??
At Databricks, we get to work with some of the most amazing engineers in the world, it's almost like working with master magicians every day, on the most amazing data magic you have ever seen.? You see core to the magic of watching the unbelievable before your very eyes has a lot to do with engineering and perspectives.? The magic show stage is a maze of technical brilliance that when viewed from the audience's perspective, looks like magic.? But if you were one of the technicians behind the curtains, from your perspective its all engineering.? I could go into the gorey details of the new polymorphic vectorized execution engine that speeds up the execution of Spark SQL queries by up to 3x or more.? We call it Photon.? But lets take a look at the magic first.? Photon accelerated SQL queries on Delta Lake tables are able to execute within seconds, with some sub-second once cached.? Some people might be thinking, well a data warehouse can execute queries that fast, so what’s the big deal.? Well, the Prestige here is that size of the data has little impact on the execution time.? That means that as data scales from Gigabytes to Terabytes to Exabytes, Databricks will scale infrastructure efficiently.? Data Warehouses on the other hand were created to handle data in the hundreds of Gigabytes, and start to struggle in the hundreds of Terabytes range, and Exabytes are unheard of. This means in demos, I can ETL hundreds of Gigbytes of data within minutes, and have SQL queries executing in Tableau or PowerBI on fresh data within seconds.? Most of the time that is actually spent on query execution is metadata.? Spark scales big-data very well, and Photon executes Spark extraordinarily faster than ever before.? Because of the amazing engineering that has gone into the stack on the Databricks SQL side; enhanced ASYNC I/O to fetch small files faster from Object Storage, and Cloud-Fetch to enable high-bandwidth BI style ODBC/JDBC connections via proxying object storage, Databricks can execute and transfer data faster than any other platform in the public hyper-scale clouds.??This was recently validated by an official audited submission to TPC.org, and an independent third party the Barcelona Super Computing center.
领英推荐
Most of the time I feel like a street magician, demoing cheap tricks on the mean e-streets of zoom, meet, and teams.? As a field engineer, I get a front row seat to the master magicians at work on the engineering side.? I take their great feats of engineering, and package them up to show prospects and customers the modern marvels at work.? Quite often, we will introduce ourselves, and they explain what they need from their data platform in the cloud.? I listen as they explain their use cases, and the technical pain they have experienced attempting to take their experiments to production.? While listening I pull up a relevant deck and demo that covers their use case.? I pledge that I have a similar dataset to what they work with, then I turn to show them how they can apply their use case to similar data without all the friction they had previously experienced, and the prestige is that not only could we solve their problems with Databricks, but it also comes with a complete ETL, ML/AI, and SQL platform for an economical cost.? Most of the time, people just can’t believe their eyes.? That’s when we collaborate on their data, and I help them understand the marvelous engineering behind the scenes.? While the magic is disappears, the engineering value is appreciated, and most go on to implement this magic at their organization.??
Arthur C. Clarke came up with one of the most cited laws about the amazement of technology: “Any sufficiently advanced?technology?is indistinguishable from?magic.” Imagine if you could go back 100 years and show someone an iPhone with a recorded video of driving a Tesla in ludicrous mode with the accelerator floored. The observers would only be able to interpret what they saw as magic. Until they were informed of all the sufficiently advanced technology that happened in the following 100 years. I believe that the amazing team of engineers that Databricks has hired, and will hire, will push the boundaries of what is possible in the cloud with data. These great feats of engineering would certainly look like magic to us now, but it will just be the amazing and hard engineering work of the best engineers in the world over the next 5 to 10 years. Funnily enough, while performing research for this article, I stumbled upon snowclones of this law; “Any sufficiently advanced?troll?is indistinguishable from a genuine kook?or?the viewpoints of even the most extreme?crank?are indistinguishable from sufficiently advanced satire (Poe's law)”. Snowclone is an interesting term, a cliche or phrasal template that can be used to create variants in amusing ways.
While the data feuds between Snowflake and Databricks unravel, customers are the ones who will gain, in the form of better technology to suit their needs.? No matter the winner, we are sure to be in for some amazing displays of magic.? ? If you are interested in putting the best data magic in the cloud at work in your company, lets collaborate!? If you want to join Databricks to bring the most amazing modern marvel engineering to the world, we are hiring!?
I help Data Analytics teams get value from their data faster, cheaper, and reliably
3 年Awesome read!
Author of “Building Modern Data Apps Using Databricks Lakehouse” | Lead SSA @Databricks | Content Creator @Netrig Analytics | Follow me for the latest Data Lakehouse tips! ????
3 年Really well written article Franco! Love your perspective
Lead Spark Solutions Engineer @ Databricks | Data, AI, & Gen AI
3 年You let the genie out of the bottle!
Data & AI Strategy for Payments @Databricks
3 年Now you’ve got me thinking, Databricks is like Penn & Teller too: look at this open, simple and collaborative trick https://m.youtube.com/watch?v=8osRaFTtgHo