Fordi转发了
Spent a bunch of time over the past couple weeks comparing how OpenAI's new GPT-4-1106 Preview model ("Turbo") performs against the older GPT-4 in our context of team feedback. OpenAI is certainly trying to incentivize migration to Turbo by offering far better pricing and far larger context windows. I care because Fordi uses a variety of OpenAI's #AI models, both to ingest free-form feedback (e.g. handling unstructured text feedback at the time of submission) as well as analytics later on (e.g. analyze a team's feedback after submission). Anecdotally, I'm seeing that the new Turbo model doesn't respond to nuance as well and doesn't follow complex prompt instructions as reliably as GPT-4. Don't get me wrong, its capabilities are still monumentally impressive, I'm just noticing a marginal performance degradation against the outgoing GPT-4. I suspect this will be remedied in future updates, as is often the case. This also feels like a bit of a funny time in the early days of AI #LLMs. In so many other technologies, we take for granted that subsequent product releases are almost universal improvements over the outgoing version. E.g. new iPhone models have the higher-res camera, brighter screen, bigger battery, more storage, etc. But benchmarking LLMs isn't as straightforward (yet), and subsequent releases aren't always universal improvements. If I'm missing something, let me know! ??