I read your PDF
Alex Bruskin
Bespoke Generative AI for Engineering & Manufacturing (PLM, MES, ERP) | Cloud Native | Air Gapped | System Integration | Concepts, Technologies, Execution
In 1944, inspired by Rommel 's book, George Patton embarked on his own French adventure . Hollywood responsibly supported the national war spirit in various ways. The WW2 defense effort was bolstered by high-quality paper drawings , statistical process control and strict enforcement of industry standards.
In 1984, Ronald Reagan won in a landslide. Tom Clancy published "The Hunt for Red October ," and Orwell's nightmare was brought to the screen, each demonstrating the value of information in their own fashion. Adobe released PostScript , making printing both author-controlled and device-agnostic.
In 1994, the world was supposedly getting "flat ," while Hollywood was definitely shifting left. Adobe released PDF , a descendant of PostScript. PDF guaranteed consistent, author-controlled, and device-agnostic client-side viewing and printing, with the added capability of secure signing. Around the same time, S1000D emerged to enforce rules for technical documentation layout.
In 2004, the USA was deeply engaged in the Global War on Terror . Hollywood's role in generating partisan consent began to overshadow its original entertainment purpose. The concepts of the digital thread started to proliferate through the engineering and manufacturing domains, encompassing CAD , simulations , and requirements . Meanwhile, PDF continued to conquer the world.
It is November 2024, and I suddenly feel quite optimistic about many topics. Hollywood's very existence is under threat. In the ongoing quest to connect the engineering and manufacturing puzzle to the digital thread, the PDF consumption process might be up for disruption. In one such case, recipients either print PDFs authored in various PLM ecosystems onto paper (sic!) or manually retype and copy-paste their content into MES . They do this because, due to the 40-year-old system architecture, PDF data cannot be readily extracted into JSON or XML . This issue is relevant for PDFs created from authoring tools like MS Word using API, as well as those created from scans.
Patton's exploits in France in 1944 originated from British conceptual musings circa 1924, German and Russian experiments around 1934, and the subsequent German blitzkrieg successes. He hardly invented anything; instead, he was able to orchestrate the already matured stack of technologies, battlefield techniques, and the overwhelming American industrial and logistical advantages in the most creative and consistent manner. We can learn from Patton a lot as we think about the next phase of the industrial revolution.
We still expect to see PDF being used on a grand scale in the MES/MRO domain in 2034, as it will remain extremely cost-efficient, especially in the context of AI. The Senticore team has experimented extensively with several LLMs and a number of public GitHub projects to extract data from PDFs, and we would like to share our conclusions.
Feeling exhausted from plowing through the avalanche of inbound PDF files? Would you like to integrate the engineering and manufacturing data trapped inside these files into your digital thread ecosystem reliably and at a reasonable cost? Talk to us; like General Patton, we may have a solution for you.
Originally published at Senticore blog .
Digital Business/IT Strategist | IT Director | Program Management | Enterprise Architectures | CRM-ERP-PLM-SCM Consultancy |
2 天前https://www.dhirubhai.net/posts/nvenkatraman_five-levels-of-intelligent-textbooks-activity-7266198141845094404-o5Pu?utm_source=share&utm_medium=member_ios
Bespoke Generative AI for Engineering & Manufacturing (PLM, MES, ERP) | Cloud Native | Air Gapped | System Integration | Concepts, Technologies, Execution
4 天前Also, Duff Johnson https://pdfa.org/30-years-of-change-30-years-of-pdf/
Adjunct Professor @ Oakland University | Product Lifecycle Management (PLM) | Speaker, Consultant, Expert Witness | Advocate for Workforce Development | Ex-Siemens PLM
5 天前I gave a lecture last week where I showed the following slide, hypothesizing the nature of a Digital Thread which might have avoided the Boeing 737 Max crashes. There are 4 different organizations in this thread, making decisions across 7 years, and the impact of the failure was 346 lives and $20B lost. (At least. Boeing issued $20B in bonds for the Max crashes, but recently issues another $19B for the variety of problems which have come to light since these crashes.) So, there are lives and dollars which might be saved by a comprehensive digital thread, which truly spans the lifecycle, but as I point out in my lecture, I don’t see either a business model or an ethical model which can address this. 10 & 15 years ago I was doing defense work, and was told: “we use the 3D to create prints, and then we pitch the 3D”. In the past couple years I’ve heard: “if it currently floats, flys, or drives the only info we have on it is 2D PDF.” We are where we are due to decades-long business and cultural models. I wouldn’t bother with the tech until we address incentives and culture.
Bespoke Generative AI for Engineering & Manufacturing (PLM, MES, ERP) | Cloud Native | Air Gapped | System Integration | Concepts, Technologies, Execution
5 天前James Allen Regenor, PhD, Col USAF(ret) is Veritex somehow a fit here?
Nice read. I think PDF is there to stay, however some trends will be visible gradually. - Knowledge intensive PDFs like Standards, and Specs will be converted semi-automatically to Knowledge Graphs to serve in parallel to hybrid RAGs. These PDFs are maybe 2% of the unstructured data in companies. - Less critical PDFs or content like instructions, tutorials, minutes, will be used mainly to build different RAG systems. These PDFs are probably 10% of unstructured data in companies. - The remaining content and the above ones (i.e. 100%) will be used to fine tune LLMs. And PDFs will stay in use as they are not only relevant for legacy systems and approaches, but also legally binding almost everywhere in the world. And you know how quickly legal terms change ;).