The Music Index is a publication that?indexes music-related articles, news, reviews, and obituaries.?It was first published in print in 1949 by Harmonie Park Press, and is now published electronically by EBSCO Information Services.?The online version of The Music Index covers 1973 to the present, and includes full text for over 170 journals. The print versions look like the sample below. Is it possible to process digital copies (images) of past publications and covert them into a highly structured records that can be ingested into a database? Margaret Richter
关于我们
Historical records, whether in special collections of libraries or basements of companies, have a wealth of information waiting to be extracted. However, mere digitization is not enough, data has to be extracted in a structured manner. Doxie is unlocking this rich data source by building custom pipelines to extract information from the most challenging data sources.
- 所属行业
- IT 服务与咨询
- 规模
- 2-10 人
- 总部
- San Jose
- 类型
- 私人持股
- 创立
- 2021
- 领域
- Digital Data Extraction
地点
-
主要
US,San Jose
Doxie AI员工
动态
-
EBSCO Information Services and Doxie AI have been working on an exciting project that is producing some great results. We will be releasing more information about this over a period of next several weeks. Stay tuned. Margaret Richter
-
Backstage Library Works Thank you for inviting us! It was a great meeting at #NAGARA2024 to showcase the possibilities using #AI #ML to extract and enable structure on unstructured data.
What in the world could an 80’s children’s toy have to do with #NAGARA2024 and #digitization? If you couldn’t catch Thomas Forsythe’s lunchtime comments today at NAGARA, stop by our table to learn more. We’ll also be presenting later this afternoon at 4:25: sit in on a presentation by Beth Brevik and Anna Newman at Block 3, Session 14 to discuss preparing your archival collections for digitization.
-
Doxie AI will be at #ALAAC24 in San Diego along with our partners Backstage Library Works. https://lnkd.in/exdwe3fu Digitization initiatives can take a big step towards providing both access and insight by extracting highly structured data into XML, CSV etc. from their digital assets. Doxie AI uses cutting edge #AI models to extract domain specific and highly accurate data and metadata that is much more than simple OCR. Learn why Bancroft Library (Berkeley), EBSCO Information Services, U of W and Backstage have used Doxie.AI's custom data extraction services. Beth Ann Goodwill Casey Cheney, PMP Alexandra Parran
-
Doxie AI转发了
The Bancroft Library worked with Doxie.AI, a company started by MIDS alums, to extract fielded data from Japanese American internment records. “There is great potential for machine learning and AI in libraries. There is a lot of discussion right now in library forums around what AI [and machine learning] can do to help us work better and faster,” said Mary Elings, the Interim Deputy Director and Head of Technical Services at the Bancroft Library. More: https://lnkd.in/gHPrpqud
Data Science Helps Bancroft Library Organize Historic Japanese-American Confinement Records
ischool.berkeley.edu
-
Doxie AI转发了
Here's a nice use of ML to capture historical data from over 100,000 Americans thanks to UC Berkeley School of Information, Bancroft Library and Doxie AI https://lnkd.in/gxpg75-Q
Expanding Access to WWII Japanese American Incarceree Data Using Machine Learning
https://www.youtube.com/
-
The Library Corporation's mission is "... providing the latest technology for your libraries and embrace a service mission that breaks through the software and hardware." Doxie AI specializes in performing highly accurate and customized data & meta-data extraction using the latest computer vision and natural language #ai #technology. Based in the heart of Silicon Valley, Doxie AI can convert unstructured data such as images, PDFs and audio data into curated information ready for research. We would be honored to become a partner. Kindly reach out. Below is a small sample of our work. Justin Duewel-Zahniser Annette Harwood Murphy Sam Brenizer
-
Doxie AI转发了
UDOP, a new generative model by Microsoft useful for document intelligence tasks, is now available in the Transformers library. See below for more info:
There's a new, powerful document AI model in the Transformers library: UDOP (short for Universal Document Processing) by Microsoft Research. ?? A recent trend in document AI is the move towards generative (GPT-like) models, which are trained to generate structured text given an image of a PDF or similar document. This is more general and end-to-end compared to the BERT-like models like LayoutLM, as they just take in an image as input and produce text as output, without relying on any OCR engine or painful subword token-level classification. Hence they can be trained to generate JSON given a document image, answer questions users might have, or generate whatever useful text people may want given an image. Examples of these are Donut ?? and Pix2Struct which are already available in Hugging Face, as well as models like GPT-4V, Gemini Pro Vision, or Claude-3 which came out yesterday. Document AI is going to move more and more towards these end-to-end Transformer models. UDOP is similar in the sense that it also has this vision encoder, text decoder architecture, but it extends this with the use of a traditional OCR engine to combine the best of LayoutLMv3 and Donut in a single model. The model is pre-trained with both a text and vision decoder, allowing it to learn the layout structure of documents. Docs: https://lnkd.in/eVG_dHJv Checkpoints: https://lnkd.in/efgx4MR4 Demo notebooks: https://lnkd.in/e-tsdUrw #documentai #microsoft #huggingface #artificialintelligence