100 Days at Databricks

100 Days at Databricks

As I hit the 100-day mark at Databricks, I want to review the journey so far with some of the bigger themes that stood out.

The Beauty of Creative Destruction

I'm very technical but ultimately a field engineer, which is under the sales organization. It's been an eye-opener. To make things very clear for everyone - Databricks is a consumption-based business. We get paid when you use things. You use things when you are happy and they deliver value. One of the big values at Databricks is being customer obsessed and it is a daily reality. For example, I was able to help a customer move from 10+ endpoints to a single one by leveraging PyFunc flavors and a meta model wrapper that pulls other models from Unity Catalog into a single endpoint that they keep live. This does zero to help our consumption, but delivers real value for them. We celebrated this, despite losing consumption as a result.

The pace of innovation here is breathtaking. New features are rolling out at an incredible rate, but our core philosophy remains refreshingly straightforward:

- Focus on use cases

- Ship things that deliver real value as fast as possible

- Prioritize open data, open frameworks, and rock-solid governance

This approach ensures we only succeed when our customers derive real value from our products. It is smart in my opinion, because the scale of creative destruction in technology is insane. In oil and gas, we plan projects for 50 years. Some projects in technology only last for 50 days. Take for example the DBRX instruct model. It was a state-of-the-art model ... for several weeks. It took millions of dollars to train and develop. No one views that project as a failure - just another example of the culture of innovation and research.

Adapting to Rapid Change

The evolution of our products, particularly in the Generative AI space, is nothing short of mind-blowing. It's both exciting and challenging to keep up with the pace, but it's a testament to our ability to pivot and respond to market demands quickly. For example, from the time I started, a vast three months ago, we've changed how models are served, introduced an agent framework, and revamped the MLFlow evaluation framework. Oh, and LangChain completely changed their framework and deprecated every solution accelerator we built on it.

To survive in this world, focusing on getting things done pragmatically is essential. Our serverless features are a prime example of how we're simplifying complex processes for users by abstracting away the intricacies of Spark execution and cluster management. For every unicorn out there, there are 100 non-experts that are just as smart but need a simple platform to put their ideas into action. They don't want to understand Spark execution, tracking server file structures, or even the statistics behind machine learning models. Worried about what packages are in DBR 14.3? Who cares. Worried about execution plans under the hood? F(orget) them. Scared about how you federate data sources? Let the platform admin deal with it in a consistent way, mount to volumes, and get to work. Even if you are a unicorn, you probably should be focusing on higher-value things - which is why the SaaS / PaaS ecosystem is one of the biggest economic drivers in the world today.

In this fast-paced environment, prioritization becomes crucial. There are a million things you can do. Trust me, I'm skilled but not exceptional, and you can do anything I post about if you give yourself some time to learn it. Yes, I'm talking to you, non-analytics people. Don't let the jargon break you down - we have AI for that. But you need to pick your battles. I may be the worst at this - I want to say yes to everything. I want to learn everything. And I'm always optimistic about how long things will take. So one of the key skills I'm working on at Databricks is prioritizing which technical streams are going to be most impactful for our customers. Help me with this, and then help yourself. Pick something and understand it deeply if possible. Time Series, Bayesian Optimization, Geospatial, Generative AI - whatever floats your boat. Tie it together and go for a float.

Small Teams, Big Impact

Despite Databricks' rapid growth to over 8,000 employees, the company has maintained a culture that empowers individuals and small teams to make significant contributions. This approach has led to impressive outcomes, such as the development of Databricks Apps by a small but talented team. This development was done in months, not years. And it works nearly flawlessly with a simple philosophy. I don't think this would be possible if you tripled the team.

Databricks runs quarterly hackathons and technical training weeks to provide opportunities for individual creativity and skill development. These initiatives, along with the formation of self-organized "tribes" of experts, foster an environment where passionate individuals can thrive and drive innovation. A great example of this is how quickly I was able to get involved with the geospatial and time series teams. Everyone is passionate and wants to improve things and the biggest challenge is honestly saying no. Something I need to get better at.

Closing

It's been a lot and I will probably have to be strategic about what battles I pick for the next three months. But my energy is still really high, mostly as a result of the awesome people I get to talk with, both at Databricks and our customers. I'm planning on getting back to time series and generative AI for the next couple of months and would love to hear things you're interested in. I'll likely focus on some smaller code-based examples with a bit of poetic waxing - so consider yourself warned if you don't speak up.

Additional Reading

Serving endpoints

Databricks Apps

Databricks Agent Evaluation

LLM Model Serving

Serverless

Cory Prostebby

Subsurface Geologist - Field Development - Expert Petrel Modeler

4 个月

So very well written. Have you considered journalism?

回复
Muhammad Abbas

Geotechnical Engineer at Stantec

4 个月

Hey Scott, do you have an AI Scott with a blogger persona who writes these articles with such discipline? Good read though

回复
Nick McKerrall

Field Engineering Director at Databricks

4 个月

I can’t believe you’ve already been here 100 days Scott, seems like you just joined a few months ago. Your blog posts are always great and I love your perspective. Glad to have you on the team!

David Morton

Data Scientist | Data Engineer | Machine Learning Engineer | Software Developer | Workforce Analyst | People Scientist | HR | Python | Spark | PostgreSQL | Pandas | Strategic Collaborator | 2X Databricks Certified

4 个月

This is an excellent post (and congrats on 100 days!) I was just telling someone the other day that Databricks seems to release something new almost every day, and while that’s overwhelming, coming from the perspective of someone who has built products that consistently added new features, I can’t agree more with your statement that much of the speed of development is related to smaller, self organizing teams.

回复
?? Alastair Muir, PhD, BSc, BEd, MBB

Data Science Consultant | @alastairmuir.bsky.social | Risk Analysis and Optimization | Causal Inference

4 个月

It’s always challenging to keep up with you. Lead the way

回复

要查看或添加评论,请登录

Scott McKean的更多文章

  • Databricks Logging and Debugging

    Databricks Logging and Debugging

    Let’s talk about logging on Databricks, specifically in Notebooks, Spark, and Ray. Effective logging is critical for…

    4 条评论
  • DS Fortune Cookies: FTI Architecture

    DS Fortune Cookies: FTI Architecture

    Three sisters dancing in endless flow, feature, train, and infer they go! I read the LLM Engineer's Handbook over the…

  • Azure Databricks CI/CD

    Azure Databricks CI/CD

    This is an opinionated article on continuous integration and continuous delivery (CI/CD). These are specific practices…

    5 条评论
  • DS Fortune Cookies: LangChain, Agents, and Authentication

    DS Fortune Cookies: LangChain, Agents, and Authentication

    “Embrace LangChain's evolution and your spirit will be unbreakable, unlike your code.” This fortune cookie clarifies…

    2 条评论
  • An Opinionated Primer on Fine-Tuning

    An Opinionated Primer on Fine-Tuning

    Databricks Week 18 I'll admit that when I first heard about 'small language models', I thought it was a ridiculous fad.…

    4 条评论
  • DS Fortune Cookies: System Prompts

    DS Fortune Cookies: System Prompts

    "Lucky numbers: 0, 1. Lucky words: Your system prompt.

    2 条评论
  • Text Similarity

    Text Similarity

    Databricks Week 16 This week I had the pleasure of speaking with a couple of customers that want to compare two bits of…

    1 条评论
  • Anomaly Detection

    Anomaly Detection

    Databricks Week 12/13 I was asked to help a customer out with anomaly detection. I brushed off some of the thoughts I…

    4 条评论
  • Forecasting Deep Dive

    Forecasting Deep Dive

    Databricks Week 10/11 Today is the day - I’m going to really let myself talk nerd. Let’s dive into time series…

    2 条评论
  • DS Fortune Cookies: Liquid AI

    DS Fortune Cookies: Liquid AI

    "When time is of the essence, closed-form solutions make all the difference." Liquid AI introduced a novel class of…

    1 条评论

社区洞察

其他会员也浏览了