Adobe Partners with NVIDIA to Harness the Power of PDF Intelligence with Next-Gen LLMs
Image credit: Adobe Stock/ aamir.
Today, we announced an expansion of our longstanding partnership with NVIDIA to explore new ways for enterprises to harness the business intelligence stored in the world’s more than 3 trillion portable document format (PDF) documents. Together, we’re combining the power of Adobe’s PDF services with NVIDIA technologies to advance what’s possible with large language models (LLMs) and PDFs on three fronts: we will draw on Adobe’s PDF technology and expertise to support NVIDIA in training and tuning next-generation LLMs — help companies apply generative AI to the data stored in PDFs utilizing NVIDIA NIM and NVIDIA NeMo microservices and Adobe PDF services — and develop open datasets to promote LLM research and development.
Our recent introduction of AI Assistant in Acrobat leveraged the intelligence in PDFs to accelerate individual productivity, and now we’re looking to scale that value to the enterprise. NVIDIA’s innovation with LLMs and retrieval augmented generation (RAG), combined with Adobe PDF extract technologies, represents a step toward a new era of document intelligence.
“PDF documents store immense volumes of valuable information that can be used to customize the intelligence of every enterprise’s generative AI applications,” said Manuvir Das, vice president of Enterprise Computing at NVIDIA. “This new chapter of NVIDIA and Adobe’s partnership will help businesses tap into the value of their PDF data using NVIDIA microservices to make their generative AI agents and copilots powerful productivity tools.”
Training and tuning next-gen LLMs
Adobe and NVIDIA are working together to train new NVIDIA LLMs. Pairing NVIDIA AI foundry services with NVIDIA AI Enterprise software and leveraging Adobe’s PDF Extract services, we are building datasets to train and tune next-generation NVIDIA AI foundation models, including NVIDIANemotronLLMs. These models and open-source and commercial LLMs run on NVIDIA NIM inference microservices in the NVIDIA AI Enterprise software suite.
PDF documents house some of the world’s most valuable information. However, transforming the data into usable intelligence is usually difficult or impossible because the data is unstructured. As the inventor and innovator of PDF, Adobe is the global authority on PDF structure and content. Our PDF Extract services deliver highly accurate data extraction across various document types — native and scanned PDFs — without any customization or setup. This technology leverages the same artificial intelligence and machine learning models behind the award-winning Acrobat Liquid Mode to transform unstructured data in PDFs into rich, structured data for efficient analysis.
Scaling generative AI to global enterprises
We are also looking at new ways to scale high-performance generative AI capabilities to global enterprises. Retrieval augmented generation (RAG) capabilities can scale the usefulness of LLMs for enterprises by combining generated responses with proprietary data and external knowledge bases to provide more up-to-date and reliable answers.
We are exploring combining Adobe’s PDF services to extract proprietary information from inside organizations’ PDFs into specialized knowledge bases with pre-trained NVIDIA LLMs, enabling enterprises to talk to their data using natural language. We are also investigating ways to help enterprises use Adobe Document Cloud products and services with NVIDIA NIM inference and NVIDIA NeMo Retriever microservices to enable synchronous use of PDFs in production apps.
In addition, we’re exploring using NVIDIA accelerated computing and enterprise AI software, like NVIDIA NIM, with Adobe AI-enabled products, including Adobe Firefly, to speed time to market and provide high-performance, interactive customer experiences.
Building open datasets for research
Adobe and NVIDIA are also bringing their collective expertise to accelerate research on LLMs and the role of digital documents. We are collaborating on a jointly curated dataset to enhance the research and development of LLMs using PDFs as pre-training data. Together, we will enable open research into post-processing techniques for making PDF data maximally useful for LLM and very large model (VLM) training. We plan to publish this dataset's findings, methodologies, and impact and make it available for research purposes, ensuring easy access and widespread distribution among researchers, developers, and enthusiasts in the field of artificial intelligence and machine learning.