FOD#69: Why NotebookLM is blowing everyone’s minds – after a year since launch

FOD#69: Why NotebookLM is blowing everyone’s minds – after a year since launch

we discuss the evolution of NotebookLM, from a mere AI assistant to a breakthrough; the tech behind it and how to us it + a carefully curated list of the best news and papers

This Week in Turing Post:

  • Tuesday, Guest Post: Your infrastructure shouldn’t live in a “black box”
  • Wednesday, AI 101: What is DoRA, QLoRA, QDoRA?
  • Friday, Agentic Workflows series: The History of Agents


The main topic

This last few days everyone cannot stop being amused by a not-so-novel AI-assistant, NotebookLM. It’s been around since July 2023, but chances are you haven’t heard much about it until recently. Since it’s intriguing from both technological and user experience angles, let’s explore what NotebookLM is about, where it comes from, and why it’s gaining traction.

Tailwind becomes NotebookLM

Initially developed and called Tailwind in Google Labs, the project was renamed NotebookLM, as it seemed more reflective of the goal to help users manage large volumes of information by organizing, summarizing, and generating insights from user-uploaded documents. You can feed it Google Docs, PDFs, and recently YouTube links and audio files, and it’ll provide grounded responses, complete with citations and relevant quotes. While this isn’t completely groundbreaking in the AI world, its seamless execution has caught the attention of many who deal with information overload.

To try it out, I uploaded about 50 files from my book project on Citizen Diplomacy. These included audio interviews in two languages, articles in PDFs, annual reports in docs, and links to Google Docs with drafts. I’m currently working on the seventh chapter, and since the narrative spans over 40 years, it’s crucial to have a concise overview of how the ideas connect and flow. Within seconds, NotebookLM generated a perfect brief, even helping me rediscover a point I wanted to include in this chapter but had forgotten. There's still plenty to explore, but that was already quite impressive.

Okay, that’s convenient but not mind-blowing.

What is mind-blowing about NotebookLM?

It’s actually amazing – a feature that’s turning heads lately is its ability to generate AI-driven podcasts called Deep Dives. It’s not just reading out the text. NotebookLM creates a conversation between two AI hosts discussing the material. They discuss the material, they banter, they laugh, and they make sense. This feature offers a fresh, passive way to consume information, which is a welcomed alternative to reading dense material.

Examples

Thomas Wolf suggested a self-care hack: download your LinkedIn profile and let the AI hosts dive deep into how amazing you are. Andrej Karpathy turned a C code that trains GPT-2 into a podcast, and while he noted he might have framed and emphasized some things differently, he found the podcast entertaining and surprisingly coherent. I uploaded Alan Turing’s article "Computing Machinery and Intelligence," and you can listen to the result. It’s super interesting and makes information easier to digest. However, it does make it sound as if Ada Lovelace and Alan Turing were from the same time, so as always, fact-checking is essential with GenAI.

Tech behind NotebookLM

The tool is powered by Google’s long-context Gemini 1.5 Pro, a Transformer model utilizing a sparse Mixture-of-Experts (we explain MoE here) architecture, which ensures efficiency by activating only relevant parts of the model. This allows NotebookLM to process up to 1,500 pages of information at once, making it suitable for those tackling large datasets or complex topics. It digest an enormous amount of information and so far doesn’t seem to be lost in it.

NotebookLM uses:

  • Retrieval-Augmented Generation (RAG) to process content from multiple sources.
  • Text-to-Speech (TTS): Generates the voices for the AI podcast hosts, creating a convincing conversational experience.
  • SoundStorm to generate realistic audio conversations. It converts scripts into natural dialogue with high-quality, engaging audio output.
  • Disfluency Injection to add human-like pauses, filler words, and natural speech patterns, making the dialogue sound more realistic.
  • Prompt Engineering to structure AI interactions and ensure the hosts maintain a natural, conversational tone.

Compelling UIUX exploration and evolving ways it’s being used

As Karpathy puts it “That's what I think is ultimately so compelling about the 2-person podcast format as a UIUX exploration. It lifts two major "barriers to enjoyment" of LLMs. 1 Chat is hard. You don't know what to say or ask. In the 2-person podcast format, the question asking is also delegated to an AI so you get a lot more chill experience instead of being a synchronous constraint in the generating process. 2 Reading is hard and it's much easier to just lean back and listen.”

How can use it?

It offers useful features for all audiences, both tech and non-tech, and can be immediately useful for students, researchers, and writers. It balances practicality with experimentation, offering a novel way to interact with personal data.

Maybe we are all overreacting, and it’s certainly not perfect, as none of the AI tools are. But if we’re being practical, tools like ChatGPT and now NotebookLM are like a lift to a different dimension of productivity. It’s like having an inflated external brain that doesn’t necessarily think but certainly processes.


?? We recommend - a free ebook about Mastering RAG

Galileo just released a new free eBook: Mastering RAG - A Developer's Guide to Enterprise-Grade RAG Systems

Download Mastering RAG now to access 200 pages of technical content covering:

  • Chunking strategies
  • Embedding and reranking model selection
  • Vector database comparisons
  • RAG architecture best practices
  • Testing and evaluation methods

GET YOUR COPY TODAY


Twitter library

Weekly recommendation from AI practitioner????:

Crawl4AI – an open-source web crawler and scraper. Think of it as the go-to engine for automating your web scraping while scaling up those AI-driven projects with minimal setup.


News from The Usual Suspects ?

News from The Usual Suspects ?

  • California’s AI Bill Hits a Wall

Governor Gavin Newsom vetoed California’s landmark AI safety bill SB 1047, citing concerns over stifling innovation and prompting AI firms to relocate. The bill, aimed at regulating powerful AI models with mandatory safety tests and "kill switch" mechanisms, faced strong opposition from tech giants like OpenAI and Google. Supporters argue it’s essential to prevent unchecked AI risks.

  • OpenAI’s Revolving Door: Who’s Next?

OpenAI faces another leadership shake-up as Chief Research Officer Bob McGrew and VP Barret Zoph exit, following CTO Mira Murati’s abrupt departure. CEO Sam Altman downplays the resignations and says he will be the one now focusing more on technical matters (me! me! me!) But with whispers of a $150B valuation and a potential 7% stake for Altman himself, the real question is whether the company’s shift toward a for-profit model is driving the talent exodus.

Also, according to NYTimes, OpenAI is projecting a $5 billion loss for 2024 despite 1,700% revenue growth since the beginning of 2023. The company is targeting $11.6 billion in revenue next year and is raising $7 billion in a funding round that could value it at $150 billion. Rising computational costs, and ops expenses are contributing to its financial challenges. Thrive Capital leads this round, Microsoft is involved. Apple just exited talks to unvest.

Meanwhile, a new, mysterious image generation model named "Blueberry" has surfaced on the leaderboards, beating FLUX.1. We are not into speculation usually but sounds like OpenAI to us.

  • ?? Hugging Face Hits 1 Million Models!

Hugging Face now hosts over 1 million public models, from big names like LLaMA to countless specialized, custom AI models. With a new repository created every 10 seconds, the platform is proving that tailored AI is the future.

  • Meta’s Orion Glasses: The Future is Holographic

At Meta Connect 2024, Project Orion stole the show last week. These futuristic AR glasses feature holographic displays and a neural interface that responds to wrist gestures, bringing sci-fi tech into reality. While Orion is still in development, the potential for blending the digital and physical worlds promises to push augmented reality to new heights. The best demo experience described in Stratechery.

  • Nvidia Gobbles Up OctoAI: Acquisition Fever Continues

It is the fifth startup Nvidia has acquired in 2024. Before OctoAI, it brought under its roof Run:ai, Deci AI, Shoreline, and Brev.dev. As Nvidia tightens its grip on the AI infrastructure market, concerns about regulatory scrutiny and competition intensify.

  • Microsoft's AI Trust Plan: Locking It Down

Microsoft unveils its latest push for "Trustworthy AI," emphasizing robust security, safety, and privacy. New capabilities like confidential inferencing and safety measures for content keep AI outputs clean and compliant. A key player in AI, Microsoft’s all-in on responsible AI, ensuring users are protected while unlocking the full potential of AI-driven innovations.

  • AI adoption: Insights

The NBER Working Paper on generative AI adoption reveals rapid growth in the U.S., with 39.4% of adults aged 18-64 using the technology by August 2024. AI adoption is particularly high among younger, educated, and higher-income individuals, with men using it more than women. Usage is widespread across occupations, especially in management and tech roles, though notable adoption exists even among blue-collar workers. Generative AI primarily assists with writing, administrative tasks, and data interpretation. An estimated 0.5-3.5% of work hours are now supported by AI, suggesting its growing influence on productivity and economic impact.



The freshest research papers were published. We categorized them for your convenience ????


要查看或添加评论,请登录

TuringPost的更多文章

社区洞察

其他会员也浏览了