An Average Day at Ghibli Diffbot.
关于我们
We Structure the World's Knowledge. Diffbot is a world-class group of AI engineers building a universal database of structured information, to provide knowledge as a service to all intelligent applications. Whether you are building an app that uses web content, an enterprise business application, or a smart robotic assistant, we've got you covered. Thousands of leading companies rely on Diffbot data for their enterprise and consumer applications.
- 网站
-
https://www.diffbot.com/
Diffbot的外部链接
- 所属行业
- 科技、信息和网络
- 规模
- 11-50 人
- 总部
- Menlo Park,California
- 类型
- 私人持股
- 创立
- 2011
- 领域
- machine learning、relation extraction、truth discovery、knowledge fusion、computer vision、web scraping、data extraction、information retrieval、artificial intelligence和ecommerce
地点
-
主要
333 Ravenswood Ave
US,California,Menlo Park,94025
Diffbot员工
动态
-
Diffbot转发了
It's been a nonstop week with Diffbot AI Open Mic on Tuesday and Startup San Diego's Hack Night on Wednesday. AI Open Mic is a new event concept by Diffbot to bring together #AI builders across the PENINSULA ??. I'm joined by my usual partners in crime Adam Chan @ Weaviate and Jason Koo @ Neo4j as well as Chia Hwu @ the one and only Wikimedia Foundation. Big thanks to our open mic speakers for making it all happen in under 5 mins each — Puneet Anand, William Lyon, Scott Persinger, sreeprasad Govindankutty, and Uday Kiran Chaka! Special shout out to the folks who prepped a talk but didn't get to because we ran out of time. You'll see them on the next one! Behind the scenes, I can't forget Diffbotters Elena Browne and Ananya Gupta for logistics, and Mike Tung ?? for taking the stage. Thank ya'll so much. WE'LL BE BACK!
-
-
-
-
-
+4
-
-
Diffbot转发了
Huge thanks to Startup San Diego for throwing an awesome event, and to Neo4j and Intuit for sponsoring! Shoutout to our team Bea Bautista, PPS and Caden Stewart had a blast building with you. Appreciate Scott Mullens for sharing your idea with us. We had a great time bringing it to life! And big thanks to Jerome Choo ?? for helping us implement Diffbot tech! It was great seeing all the demos and everyone coming together. Excited for the next one! ??
-
Diffbot转发了
A reminder for outies in the bay area who haven't yet seen this notice. Can't wait to see ya'll next week!
SUBJECT: [NOTICE] OVERTIME CONTINGENCY PROTOCOL REQUESTED BODY: Dear Severed Employee, Your Innie delivered well above expectations this past quarter and the board has graciously extended them an invitation to our upcoming AI OME (AI Open Mic Experience). The AI OME will be held after hours outside of the Severed floor. Thus we would like to request your cooperation to arrive at SRI International in Menlo Park, 5:00 PM sharp on Tuesday, February 18, 2025. Rest assured that we will deactivate the Overtime Contingency Protocol promptly at 8:00 PM. Your Innie is one of the sweetest members of our Severed team. We would be disappointed should they not be able to attend. As compensation, we will be rewarding select participating Innies with a limited edition company t-shirt. Please refer to the event package attached for more information. CC: Diffbot, Jason Koo, Neo4j, Chia Hwu, Wikimedia Foundation, Adam Chan, Weaviate
-
-
Diffbot转发了
SUBJECT: [NOTICE] OVERTIME CONTINGENCY PROTOCOL REQUESTED BODY: Dear Severed Employee, Your Innie delivered well above expectations this past quarter and the board has graciously extended them an invitation to our upcoming AI OME (AI Open Mic Experience). The AI OME will be held after hours outside of the Severed floor. Thus we would like to request your cooperation to arrive at SRI International in Menlo Park, 5:00 PM sharp on Tuesday, February 18, 2025. Rest assured that we will deactivate the Overtime Contingency Protocol promptly at 8:00 PM. Your Innie is one of the sweetest members of our Severed team. We would be disappointed should they not be able to attend. As compensation, we will be rewarding select participating Innies with a limited edition company t-shirt. Please refer to the event package attached for more information. CC: Diffbot, Jason Koo, Neo4j, Chia Hwu, Wikimedia Foundation, Adam Chan, Weaviate
-
-
Diffbot转发了
Perplexity Sonar Pro API launched last week as the best performing model on factuality. 24 hours later, it's the 2nd best performing model (and it's not because of DeepSeek). While working on my talk last week, Perplexity released Sonar Pro API with a special emphasis on its factuality benchmark F1 score of 0.858, handily beating other internet connected LLMs like Gemini-2.0-flash. The SimpleQA benchmark they used is open source and LLM judged, so I set it up to run the 4000 question eval on Diffbot LLM overnight and went to bed. The next morning, we beat Sonar Pro. Let's be frank here. The score difference is insignificant. And we'll probably play SimpleQA tag for awhile. What IS significant is how we got here vs. Perplexity. Diffbot LLM is a side project. Sonar is Perplexity's entire business. We used the profits from our primary business to train Diffbot LLM. Perplexity raised $915M to train theirs. We open sourced Diffbot LLM. Perplexity chose to keep theirs secret. The model isn't the moat. Perplexity can be recreated as a side project. #DeepSeek proved this. We proved this. Download Diffbot LLM. Run it off your own GPU. Congrats, your on-prem #AI is smarter than Perplexity.
-
-
Diffbot转发了
With OpenAI's launch of the O-series of models, "reasoning" has become the hot new buzzword in the AI research community as various groups look to replicate the performance of O-1 more efficiently. But what exactly is this new "reasoning" capability? And how should you think about applying it in your real-world AI applications? I'll offer my own simplified take as of early 2025: the current "reasoning" of the O-1 and O-3 models is simply extended chain-of-thought++. Instead of telling the language model to just "think step by step" aloud when it responds, it is training the language model to think through the possibilities for even longer periods of time, by introducing innovations such as tokens that are not visible to the user (sometimes called "thinking" or "reflection" tokens). You can think of these as an inner monologue or internal stream-of-consciousness that you might have before starting to speak. In O-1, what they did is teach the LLM to have a chain-of-thought/internal monologue that leads to correct answers on competition-level math questions. The nice thing about math competition questions is there is a definite correct answer, and there are valid and invalid ways of reasoning towards the answer that can be aligned in training. The billion dollar question is whether this "math CoT" translates into highly accurate inference-time thinking on other non-math tasks as well. (what would be the "correct" CoT for writing a competition-winning poem?) You can see in the first picture below what "reasoning" looks like when it is applied in isolation to a model that isn't as large as the base model for O-1??. This "how many r's are in strawberry" question was one of the problems used to motivate the O-1 line of research as it was a question that couldn't be answered previously by the GPT-4 models (main reason being the tokenizer). As you can see, while reasoning certainly holds a lot of promise, it isn't a panacea that can be applied to any existing AI pipeline. In contrast, the Diffbot LLM model (second and third screenshot) takes a different approach to answering the "strawberry" question. The design philosophy of the Diffbot LLM is that structured reasoning (aka classical logical reasoning) is something that should be handled outside the language model's weights, and the language model's job is to be an expert user of tools. Diffbot's LLM is trained to realize this question can be much more quickly and reliably answered with a code interpreter (it just so happens that LLMs are great at writing code), and instead grounds the answer to an exact result calculated by an auditable piece of Javascript. Check out the comments for a link to the run the Diffbot LLM locally. Unlike O-1, it's a fully open source model and has gotten over 10K downloads so far in its first week!
-
-
Diffbot转发了
We launched Diffbot LLM last week and it's truly a monumental difference in how I use #AI to learn and research basic every day things. Even with the newest AI chatbots, I would find myself googling in parallel to confirm facts or find human opinions. Diffbot LLM (Diffy, affectionately) isn't smarter than me, but it's waaayy better at googling. I don't have to trust what it's telling me, because everything Diffy says is always backed up by real sources. Anyway, in case you wanted to know what pea protein tastes like —
-
-
Diffbot转发了
We're excited to publicly release the Diffbot GraphRAG LLM! With larger and larger frontier LLMs, we realized that they would eventually hit a limit in terms of hardware requirements needed to both train and infer on these large models, as well as challenges with keeping the data used in training both relevant and up-to-date. Our hypothesis was that general purpose reasoning would eventually get distilled down to ~1B parameters (a view that Andrej Karpathy has recently espoused), and in terms of storing what the model knows, that Knowledge Graphs are a superior structure to LLM weights in terms of maintaining, updating, and providing verified provenance (i.e. where did this content come from?). After two years of development, we're proud to open source the Diffbot GraphRAG LLM, a function calling LLM that outperforms Google Gemini, ChatGPT search mode, and Perplexity on realtime accuracy as measured on FreshQA (and with much fewer weights that you can run on your own hardware!). Instead of training it with knowledge, Diffbot LLM has been explicitly trained to distrust its pretraining knowledge and instead trained to be an expert in the use of tools in order to find and align citations. These tools include a web browser, a structured graph query language (DQL), an unstructured query language, and a code interpreter. It's able to expertly use these tools in order (in many cases it's better at writing DQL queries than I am already) to verify the information it needs to write a cited answer. Much more to come, but find the model on Github and Huggingface in the comments below:
-