OpenAI’s o1 model recently scored 120 in an IQ test — what’s the first thought on your mind? (Average humans have an IQ of about 100, and it’s nearest competitor, Claude, scored 90!) If your immediate reaction is awe or concern, read on ??
Proximity Works的动态
最相关的动态
-
OpenAI has recently released o1 prompting guide. It focuses on avoiding chain-of-thought prompts, simplicity, and the use of delimiters. Here’s the full guide for you:
要查看或添加评论,请登录
-
-
Interesting blog post by Greptile's CEO! At Ubicloud, we look at both quantitative and run qualitative tests to understand how AI models perform. Since AI models can overfit to the data, they can shine in AI benchmarks but fail in qualitative ones. For example, our qualitative tests led to a disappointment with Alibaba's QwQ-32B-Preview test. Daksh Gupta evaluated OpenAI o1 and DeepSeek-R1 on real world PR data, something their company is specialized in. The results are stunning. I'd love to save you the click, but I don't want Daksh to hate me. ?? https://lnkd.in/ejX2PsiV
?? New blog post! OpenAI o1 vs. DeepSeek R1: which one can catch more bugs in a pull request? We gave both models the same prompt and the same diff and asked them to find issues in a series of buggy pull request. One of the models caught nearly every known bug in the PRs. The other one caught almost *none*. Full post on our website!
要查看或添加评论,请登录
-
-
The communication around OpenAI’s latest model release have been abyssmal. This was supposed to be the complete version of their o1 inference-scaling reasoning model, their path to a general intelligence platform. All I’m seeing are misreferences to benchmark scoring and anecdotal stories that range from “this model feels dumber than previous releases” to “this feels like AGI.” My view remains unchanged. Multi-turn reasoning models, as is, are too slow, too expensive, provide answers that are too long that also still need thorough auditing. Can they get better in the future? Sure, but that remains in the future. I need to evaluate options available today. And right now that means o1 is pretty useless for the vast majority of people.
要查看或添加评论,请登录
-
-
OpenAI launches o3-mini, its latest reasoning model that it says is largely on par with o1 and o1-mini in capability, but runs faster and costs less (Kyle Wiggers/TechCrunch)?
要查看或添加评论,请登录
-
OpenAI released a new o1 prompting guide. Use this to get the most accurate prompt results with OpenAI o1: Try to keep it simple Be very clear in your instructions Don't use fancy chain of thought prompts Only give it relevant information so it doesn't get distracted ?? : Superhuman
要查看或添加评论,请登录
-
-
LawDroid is incorporating OpenAI's latest o1 model into our Copilot and Builder products for complex reasoning tasks! ?? ?? By the way, I think OpenAI is using OpenAI to answer its email replies. What do you think?
要查看或添加评论,请登录
-
-
Important limitations of OpenAI's new o1 model - -No multi-modal. Limited to text only, no images or file analysis. -Slower taking a minute or often several, to respond. -Not able to browse internet, so no external knowledge. -Knowledge cutoff date is October 2023. -Limited to 20 API calls per minute, limited by calls not tokens like other models.
要查看或添加评论,请登录
-
?? New blog post! OpenAI o1 vs. DeepSeek R1: which one can catch more bugs in a pull request? We gave both models the same prompt and the same diff and asked them to find issues in a series of buggy pull request. One of the models caught nearly every known bug in the PRs. The other one caught almost *none*. Full post on our website!
要查看或添加评论,请登录
-