Claude can now view images within a PDF, in addition to text. Enable the feature preview to get started: https://claude.ai/new?fp=1. This helps Claude 3.5 Sonnet more accurately understand complex documents, such as those laden with charts or graphics. The Anthropic API now also supports PDF inputs in beta: https://lnkd.in/emvau9Ez
We recently published a blog showing examples of how you can potentially use computer-use to help automate SEO and marketing tasks! https://starterseoaudit.com/blog/using-anthropic-claude-35-computer-use-for-seo/
Anthropic This is great.. Internally combining text and image both to pass on to the llm.. This is exactly what we have done multiple times in projects as it gave us better results.. But I am also confused about how it works.. in the doc it says, 1. First the doc gets converted to to image 2. Second, text is extracted and combined with Image So far we have using OCR services like Textract the text as a first step in every IDP pipeline and to do that, we always rasterize the doc to image first.. I would say thats internal to Textract.. But how do you guys do that? In your second step when you extract the text, do you use the image from first step or you use native pdf functionality to extract.. ? I would assume its through image.. as one of the most common use case is scanned pdf.. But this is great.. For one of use cases, user won’t have to do itnon their own.. but would you say from a design angle, it still better to first extract the text and store it as it has many other downstream use cases and we wont have to pass along the text any more to Claude, as it is doing it anyway..
"?Totalmente de acuerdo! En el ámbito de document intelligence, estamos viendo cómo los OCR tradicionales están siendo rápidamente superados por modelos multimodales avanzados. Estos modelos no solo reconocen texto, sino que también comprenden el contexto visual, estructuras y patrones de los documentos, lo cual eleva considerablemente la precisión y el valor de la extracción de datos. Los modelos multimodales pueden interpretar tablas, gráficos y otros elementos visuales de los documentos, algo que los OCR convencionales no alcanzan a hacer bien sin una preprocesamiento extenso. Así, en lugar de una simple lectura de texto, obtenemos una 'comprensión' profunda, ideal para aplicaciones empresariales complejas. ?? ?El futuro? Document intelligence será un entorno de ‘comprensión’ total, donde los multimodelos ofrecerán un análisis detallado en tiempo real, optimizando procesos y facilitando una toma de decisiones más inteligente. ????"
Is it also in the API? Come share what you build and learn with us in the AI Agents group on linkedin: https://www.dhirubhai.net/groups/6672014
Woah, that is very cool. PDF is a printer language and text can be pulled directly out of the file. Whereas, an image of text requires OCR to recognize and pull the text out. The ability to so quickly and effectively recognize the content of images as easily as PDFs is huge for working with large document repositories.
Claude is just the best for text generation and summarization, we are using it as the last step in several pipelines
I wonder if the new model can transcribe a tech journal of sorts in a “for dummies” edition ??
Founder @ WizaLabs | I save money and retain customers for Shopify brands
3 周I love anthropic