Does your Dev Team have the skills for AI Development?  Ask Simon Last, co-founder and CTO of Notion?

Does your Dev Team have the skills for AI Development? Ask Simon Last, co-founder and CTO of Notion?

This article summarizes insights from a podcast interview with Simon Last , co-founder and CTO of Notion, discussing the development and implementation of Notion AI. Key themes include the iterative and experimental nature of building AI products, the importance of robust evaluation and testing, and the shift in required skill sets for AI product development. Notion's philosophy centers around viewing AI as a "primitive" or building block, allowing users to create custom workflows, and emphasizes the need for a user-centered approach in adopting this new technology.

Key Themes & Ideas

AI as a Primitive:

Notion views AI as a fundamental building block, similar to a relational database or table view. The goal is to integrate AI into the existing Notion framework, allowing users to create their own custom software and workflows.

"for notion we think a lot about like Primitives or building blocks our goal as a company is to try to break the pattern of these like rigid vertical SAS tools and instead reconstruct them from their underlying Primitives and allow users to make their own custom software so things like like a relational database or a table view or different blocks within a page I think of AI as another primitive in the toolbox lets you do really useful automations on top"

Iterative Product Development:

Notion’s approach to AI development has been highly iterative, starting with a small team that moved quickly to ship products and learned along the way. They began with an AI writing assistant, then added AI autofill for databases and later Q&A.

"I'm a big fan of starting like a tiger team for a new initiative I think it's good to have like small group of people that can move really fast and have a clear Focus mandate so that's how it started"

Organizational Structure & Democratization:

The initial AI team was a centralized "tiger team," but they are now working on democratizing AI usage within Notion by embedding AI specialists into other teams. This is proving challenging; it is difficult to share the mental model around best practices for AI development. Having new people join the core AI team temporarily to get hands-on experience has been more successful.

"what we've had most success with actually is when someone new wants to work on AI we've had a lot of success with just having them join the AI team temporarily and then they just join our stand-ups and get in the weeds with us every day and then quickly pick up all the context around like how to do evils how to iterate on prompts"

The Challenge of Evaluation and Testing:

Unlike traditional software development, evaluating and testing AI is much more difficult. It’s crucial to have systems for logging, reproducing failures, collecting failures into datasets, and running evaluations to determine improvements.

"it's super hard it's a really different experience than the prei world... in the AI world I can often have an idea and and then be surprised that it didn't work in some some way I didn't expect"

Logging and Reproducibility:

Exact reproduction of errors is extremely important and this requires thorough logging. Notion allows users to opt-in to share data that is used for evaluation purposes (not training). This shared data is crucial for creating regression datasets.

"extremely important to be able to exactly reproduce the the failure situation the log has to be an exact reproduction of the error and then you should be able to just rerun that inference with the exact same input right that's really important if you can't do that you're totally screwed"

Focus on Regression Testing:

Notion prioritizes identifying and fixing regressions over collecting "good" cases. The focus is on continuously improving and ensuring that previously fixed errors do not resurface.

"I tend not to care about good cases that much I think the the main focus is because what's the actionable value of that...to me the main goal is to collect concrete regressions and then fix them and then make sure that they stay fixed"

Deterministic and Non-Deterministic Evaluation:

Notion uses both types of evaluations, and prefers to use deterministic evals when possible. Deterministic evals include formatting the output (e.g., XML, JSON) to make invalid output impossible, as well as using classifiers to classify results and compare them with ground truth values. Model-graded evals work best when specific and concrete; otherwise, a new layer of problems is created.

"you can have deterministic or non-deterministic evals definitely best use deterministic when you can and then you can use like a model gr Val when not for model gr vals they work best when it's as concrete and specific as possible"

Fine-Tuning vs. In-Context Learning:

Notion initially experimented heavily with fine-tuning, but now prefers in-context learning due to the rapid pace of new model releases, and the difficulty of debugging fine-tuned models. Notion tries to use the best model at any given time. Prompt engineering is more crucial than changing models.

"I'm honestly not that bullish on companies outside of the foundation model companies doing fine tuning and if I like meet a startup and they say they're doing fine tuning I I actually now think of that as like a negative update a lack of experience"

Importance of Task Specificity:

Clearly defining the task and the desired output is crucial for getting good results from AI models. The evolution of Notion's Q&A feature highlights the importance of having a detailed rubric explaining what constitutes a "good" answer within their product experience.

"in general yeah like the model doesn't know what you want until you tell it and right yeah I think a big one Q&A is probably a really big example"

New Skill Sets for AI Product Development:

The people building AI products need to be comfortable with uncertainty and iterate quickly. They need to be empirical in their approach, and not overly attached to old ways of thinking from pre-language model era Machine Learning.

"especially on the more producty side if you looking for someone that's going to be able to lead an AI product team you definitely need someone that's going to be very okay with a lot of uncertainty and really pushing to Think Through every day are we doing the right thing"

Trust and Transparency:

Notion focuses on building user trust by making AI actions visible (e.g., diffs before accepting edits), providing citations in Q&A, and having the AI communicate uncertainty rather than provide incorrect information.

"the most obvious tools are one if the AI is taking an action have the user confirm that and be able to see in a nicely visualized way what is action being performed"

Garbage In, Garbage Out:

Notion takes a multi-pronged approach to poor quality data in its knowledge base. They do have a ranking and filtering step, which prioritizes more up-to-date, frequently viewed documents from higher authority users and then they give all the relevant data to the model for evaluation. They prefer not to show the model non-transparent black-box scores because it can easily lead to incorrect answers.

User-Centered Design:

Notion is working to create AI experiences that are embedded in existing user habits. They aim to create products that do not require a change in behavior to be adopted and also to provide users with a journey to more complex workflows and automations.

Future Vision:

Notion envisions AI automating tedious tasks, allowing users to focus on higher-level, more strategic activities, and using Notion as a host for workflows. They also want databases to become an implementation detail, where AI manages them in the background and the human user is free to set more high-level strategies and observe outputs.

"the North Star for me for AI is basically can you automate all the things that people don't want to be doing right all the cumbersome TDS tasks that people do every day and ultimately lift humans up into a high level of abstraction where where they're doing like High leverage more fulfilling activities"

AI in Development:

Notion engineers are actively using AI tools such as Cursor for autocomplete and other code editing tasks. They are also experimenting with various coding agent startups to try to further automate development processes. Simon uses Claude for coding and Chat GPT-4 for chat tasks.

Conclusion

Notion’s experience building Notion AI underscores the significant shift in product development introduced by AI. The focus on rapid iteration, robust evaluation, and user-centered design highlights the challenges and opportunities in this evolving landscape. Notion’s vision for the future involves AI not only enhancing but fundamentally transforming how users interact with the platform and approach knowledge work. They believe that AI is key to abstracting away the tedious work and empowering users to focus on higher level strategic thinking.

要查看或添加评论,请登录

Nagesh Nama的更多文章

社区洞察

其他会员也浏览了