Diffbot

科技、信息和网络

Menlo Park，California 5,035 位关注者

We structure the world's knowledge.

关注

查看全部 34 位员工

关于我们

We Structure the World's Knowledge. Diffbot is a world-class group of AI engineers building a universal database of structured information, to provide knowledge as a service to all intelligent applications. Whether you are building an app that uses web content, an enterprise business application, or a smart robotic assistant, we've got you covered. Thousands of leading companies rely on Diffbot data for their enterprise and consumer applications.

网站: https://www.diffbot.com/
Diffbot的外部链接
所属行业: 科技、信息和网络
规模: 11-50 人
总部: Menlo Park，California
类型: 私人持股
创立: 2011
领域: machine learning、relation extraction、truth discovery、knowledge fusion、computer vision、web scraping、data extraction、information retrieval、artificial intelligence和ecommerce

地点

主要

333 Ravenswood Ave

US，California，Menlo Park，94025

获取路线

Diffbot员工

查看全部员工

动态

Diffbot

5,035 位关注者
1 天前
举报此动态
An Average Day at Ghibli Diffbot.

An Average Day at Ghibli Diffbot

1 条评论

赞评论分享
Diffbot转发了
Jerome Choo ??

Growth at Diffbot
1 个月
举报此动态
It's been a nonstop week with Diffbot AI Open Mic on Tuesday and Startup San Diego's Hack Night on Wednesday. AI Open Mic is a new event concept by Diffbot to bring together #AI builders across the PENINSULA ??. I'm joined by my usual partners in crime Adam Chan @ Weaviate and Jason Koo @ Neo4j as well as Chia Hwu @ the one and only Wikimedia Foundation. Big thanks to our open mic speakers for making it all happen in under 5 mins each — Puneet Anand, William Lyon, Scott Persinger, sreeprasad Govindankutty, and Uday Kiran Chaka! Special shout out to the folks who prepped a talk but didn't get to because we ran out of time. You'll see them on the next one! Behind the scenes, I can't forget Diffbotters Elena Browne and Ananya Gupta for logistics, and Mike Tung ?? for taking the stage. Thank ya'll so much. WE'LL BE BACK!
- +4
3 条评论

赞评论分享
Diffbot转发了
Jay Giang

Software Engineer @Sudokrew Solutions
1 个月已编辑
举报此动态
Huge thanks to Startup San Diego for throwing an awesome event, and to Neo4j and Intuit for sponsoring! Shoutout to our team Bea Bautista, PPS and Caden Stewart had a blast building with you. Appreciate Scott Mullens for sharing your idea with us. We had a great time bringing it to life! And big thanks to Jerome Choo ?? for helping us implement Diffbot tech! It was great seeing all the demos and everyone coming together. Excited for the next one! ??

9 条评论

赞评论分享
Diffbot转发了
Jerome Choo ??

Growth at Diffbot
1 个月
举报此动态
A reminder for outies in the bay area who haven't yet seen this notice. Can't wait to see ya'll next week!
Jerome Choo ??

Growth at Diffbot
1 个月

SUBJECT: [NOTICE] OVERTIME CONTINGENCY PROTOCOL REQUESTED BODY: Dear Severed Employee, Your Innie delivered well above expectations this past quarter and the board has graciously extended them an invitation to our upcoming AI OME (AI Open Mic Experience). The AI OME will be held after hours outside of the Severed floor. Thus we would like to request your cooperation to arrive at SRI International in Menlo Park, 5:00 PM sharp on Tuesday, February 18, 2025. Rest assured that we will deactivate the Overtime Contingency Protocol promptly at 8:00 PM. Your Innie is one of the sweetest members of our Severed team. We would be disappointed should they not be able to attend. As compensation, we will be rewarding select participating Innies with a limited edition company t-shirt. Please refer to the event package attached for more information. CC: Diffbot, Jason Koo, Neo4j, Chia Hwu, Wikimedia Foundation, Adam Chan, Weaviate
赞评论分享
Diffbot转发了
Jerome Choo ??

Growth at Diffbot
1 个月
举报此动态
SUBJECT: [NOTICE] OVERTIME CONTINGENCY PROTOCOL REQUESTED BODY: Dear Severed Employee, Your Innie delivered well above expectations this past quarter and the board has graciously extended them an invitation to our upcoming AI OME (AI Open Mic Experience). The AI OME will be held after hours outside of the Severed floor. Thus we would like to request your cooperation to arrive at SRI International in Menlo Park, 5:00 PM sharp on Tuesday, February 18, 2025. Rest assured that we will deactivate the Overtime Contingency Protocol promptly at 8:00 PM. Your Innie is one of the sweetest members of our Severed team. We would be disappointed should they not be able to attend. As compensation, we will be rewarding select participating Innies with a limited edition company t-shirt. Please refer to the event package attached for more information. CC: Diffbot, Jason Koo, Neo4j, Chia Hwu, Wikimedia Foundation, Adam Chan, Weaviate
5 条评论

赞评论分享
Diffbot转发了
Jerome Choo ??

Growth at Diffbot
1 个月
举报此动态
Perplexity Sonar Pro API launched last week as the best performing model on factuality. 24 hours later, it's the 2nd best performing model (and it's not because of DeepSeek). While working on my talk last week, Perplexity released Sonar Pro API with a special emphasis on its factuality benchmark F1 score of 0.858, handily beating other internet connected LLMs like Gemini-2.0-flash. The SimpleQA benchmark they used is open source and LLM judged, so I set it up to run the 4000 question eval on Diffbot LLM overnight and went to bed. The next morning, we beat Sonar Pro. Let's be frank here. The score difference is insignificant. And we'll probably play SimpleQA tag for awhile. What IS significant is how we got here vs. Perplexity. Diffbot LLM is a side project. Sonar is Perplexity's entire business. We used the profits from our primary business to train Diffbot LLM. Perplexity raised $915M to train theirs. We open sourced Diffbot LLM. Perplexity chose to keep theirs secret. The model isn't the moat. Perplexity can be recreated as a side project. #DeepSeek proved this. We proved this. Download Diffbot LLM. Run it off your own GPU. Congrats, your on-prem #AI is smarter than Perplexity.
3 条评论

赞评论分享
Diffbot

5,035 位关注者
2 个月
举报此动态
Our first hack night with Weaviate and Neo4j was a blast last year so we're kicking off 2025 with another! Same place, same time. We might even demo Diffbot LLM. RSVP in the comments! ??
2 条评论

赞评论分享
Diffbot转发了
Mike Tung ??

CEO at Diffbot
2 个月已编辑
举报此动态
With OpenAI's launch of the O-series of models, "reasoning" has become the hot new buzzword in the AI research community as various groups look to replicate the performance of O-1 more efficiently. But what exactly is this new "reasoning" capability? And how should you think about applying it in your real-world AI applications? I'll offer my own simplified take as of early 2025: the current "reasoning" of the O-1 and O-3 models is simply extended chain-of-thought++. Instead of telling the language model to just "think step by step" aloud when it responds, it is training the language model to think through the possibilities for even longer periods of time, by introducing innovations such as tokens that are not visible to the user (sometimes called "thinking" or "reflection" tokens). You can think of these as an inner monologue or internal stream-of-consciousness that you might have before starting to speak. In O-1, what they did is teach the LLM to have a chain-of-thought/internal monologue that leads to correct answers on competition-level math questions. The nice thing about math competition questions is there is a definite correct answer, and there are valid and invalid ways of reasoning towards the answer that can be aligned in training. The billion dollar question is whether this "math CoT" translates into highly accurate inference-time thinking on other non-math tasks as well. (what would be the "correct" CoT for writing a competition-winning poem?) You can see in the first picture below what "reasoning" looks like when it is applied in isolation to a model that isn't as large as the base model for O-1??. This "how many r's are in strawberry" question was one of the problems used to motivate the O-1 line of research as it was a question that couldn't be answered previously by the GPT-4 models (main reason being the tokenizer). As you can see, while reasoning certainly holds a lot of promise, it isn't a panacea that can be applied to any existing AI pipeline. In contrast, the Diffbot LLM model (second and third screenshot) takes a different approach to answering the "strawberry" question. The design philosophy of the Diffbot LLM is that structured reasoning (aka classical logical reasoning) is something that should be handled outside the language model's weights, and the language model's job is to be an expert user of tools. Diffbot's LLM is trained to realize this question can be much more quickly and reliably answered with a code interpreter (it just so happens that LLMs are great at writing code), and instead grounds the answer to an exact result calculated by an auditable piece of Javascript. Check out the comments for a link to the run the Diffbot LLM locally. Unlike O-1, it's a fully open source model and has gotten over 10K downloads so far in its first week!
17 条评论

赞评论分享
Diffbot转发了
Jerome Choo ??

Growth at Diffbot
2 个月
举报此动态
We launched Diffbot LLM last week and it's truly a monumental difference in how I use #AI to learn and research basic every day things. Even with the newest AI chatbots, I would find myself googling in parallel to confirm facts or find human opinions. Diffbot LLM (Diffy, affectionately) isn't smarter than me, but it's waaayy better at googling. I don't have to trust what it's telling me, because everything Diffy says is always backed up by real sources. Anyway, in case you wanted to know what pea protein tastes like —
4 条评论

赞评论分享
Diffbot转发了
Mike Tung ??

CEO at Diffbot
2 个月已编辑
举报此动态
We're excited to publicly release the Diffbot GraphRAG LLM! With larger and larger frontier LLMs, we realized that they would eventually hit a limit in terms of hardware requirements needed to both train and infer on these large models, as well as challenges with keeping the data used in training both relevant and up-to-date. Our hypothesis was that general purpose reasoning would eventually get distilled down to ~1B parameters (a view that Andrej Karpathy has recently espoused), and in terms of storing what the model knows, that Knowledge Graphs are a superior structure to LLM weights in terms of maintaining, updating, and providing verified provenance (i.e. where did this content come from?). After two years of development, we're proud to open source the Diffbot GraphRAG LLM, a function calling LLM that outperforms Google Gemini, ChatGPT search mode, and Perplexity on realtime accuracy as measured on FreshQA (and with much fewer weights that you can run on your own hardware!). Instead of training it with knowledge, Diffbot LLM has been explicitly trained to distrust its pretraining knowledge and instead trained to be an expert in the use of tools in order to find and align citations. These tools include a web browser, a structured graph query language (DQL), an unstructured query language, and a code interpreter. It's able to expertly use these tools in order (in many cases it's better at writing DQL queries than I am already) to verify the information it needs to write a cited answer. Much more to come, but find the model on Github and Huggingface in the comments below:
60 条评论

赞评论分享

关联主页

LeadGraph

软件开发

Menlo Park，California

相似主页

查看职位

融资

Diffbot 共 3 轮

上一轮

A 轮 2016年3月11日

US$10,000,000.00

投资者

Felicis Tencent +4 其他投资者

在 Crunchbase 上查看更多信息

登录看看您认识Diffbot的哪些人

Diffbot

科技、信息和网络

Menlo Park，California 5,035 位关注者

We structure the world's knowledge.

关于我们

Crawl

数据提取软件

Extract

数据提取软件

Knowledge Graph

数据提取软件

Natural Language

自然语言处理 (NLP) 软件

地点

Diffbot员工

Sky Dayton

Partner at Dayton Family Enterprises

Kris Negulescu

Andy Chou

CEO and Founder at Ventrilo.ai

John Hitchins

Senior Sales & Business Development resource...

动态

An Average Day at Ghibli Diffbot

立即加入，查看您错过的职场动态

关联主页

LeadGraph

相似主页

Southwestern Consulting

Sales Platoon

Neo4j

AlwaysHired Sales Bootcamp

Weaviate

Writer

Octoparse - Octopus Data Inc.

IADSS

Teleport

Tenbound

查看职位

工程师职位

机械工程师职位

客户专员职位

解决方案架构师职位

高级业务总监职位

创新经理职位

总监职位

机器学习工程师职位

软件工程师职位

初级软件工程师职位

全栈工程师职位

项目管理实习生职位

财务实习生职位

科学家职位

实习生职位

工程副总裁职位

视觉助理职位

营销主管职位

营销总监职位

融资