documentPro
document automation
Enhancing GPT-3.5 and GPT-4 Document Data Extraction with Advanced OCR
How layout detection and table detection can improve data extraction from GPT
By Abizer
March 25th, 2024

In today’s fast-paced business environment, extracting valuable information from various document types is essential for driving efficiency in various operations from streamlining invoice processing in finance to digitizing medical forms in healthcare. When processing a myriad of documents, each type of document has its own format and layout but all of them need to be processed in a way to efficiently put information into data management systems.

Considering the complexities and challenges of extracting data from various types of documents, DocumentPro aims to provide a flexible solution that can offer accurate data extraction for a wide range of documents, hence helping businesses in their automation and efficiency goals.

In this article, we talk about some recent updates to our document parsers that enhance the levels of control and accuracy when dealing with various, complex document processing use cases.

Combining Advanced OCR with GPT

DocumentPro’s document parsers combine OCR with GPT to accurately capture information. OCR (Optical character recognition) scans an entire document to recognize all the text content within. GPT (Generative Pre-trained Transformers) opens up a new dimension in data extraction. With it’s broad knowledge base and ability to understand specific context and instructions, GPT is excellent at understanding document text to retrieve the information we query.

Updates to our OCR + GPT Document Parsers

We’ve been working hard on improving our parsers, in response to complexities and challenges highlighted by our users. These enhancements increase the customizability of parsers and increase the overall accuracy.

Detecting Layout with Advanced OCR

Our enhanced OCR engine now extracts document content in reading order, a critical improvement that maintains the relational integrity of the data. This update addresses a key challenge in document parsing: ensuring that the extracted text reflects the document's original layout and logical flow. By accurately capturing the reading order, the GPT models can better understand and interpret the document's content, leading to more accurate data extraction and structuring.

Here's an example of document extraction without layout detection.

DocumentPro

Here's an example of document extraction with layout detection.

DocumentPro

Detecting Tables with Advanced OCR

One of the complexities of automating data extraction from documents is the variation in table formats that are present in across a wide range of documents. While layout detection itself can be sufficient for GPT to properly extract information from simple tables, most documents can benefit greatly from being able to identify the exact cells and headers of a document.

For example, in many cases longer text within a table cell wraps into the next line, and this becomes hard to distinguish from another cell by OCR, if cell borders haven’t been printed. Table detection can help make a better guess about where the text ends rather than separating each line of text into it’s own cell.

In addition, when table detection is enabled in DocumentPro, you also have the option of specifying what formatting you want to apply to the document. Enabling formatting helps structure the table by applying spacing between columns and rows and drawing borders around cells to clearly distinguish them for GPT.

Here’s an example of the difference between enabling and disabling table detection.

DocumentPro

Introducing GPT-4 Turbo for Document Parsing

Shifting our focus to parsing information from your document, we’ve recently introduced the ability to parse your documents with GPT-4 Turbo. Choosing between GPT-3.5 and GPT-4 models for document parsing offers users tailored options to match their specific needs. GPT-3.5 provides robust performance and speed, making it suitable for straightforward parsing tasks. On the other hand, GPT-4 shines in handling longer documents, complex tables, documents with rows that aren't clearly delineated, and tables that span multiple pages or have irregular gaps. Its enhanced understanding of content relationships and domain-specific knowledge allows for more nuanced data interpretation. However, it's important to note that the advanced capabilities of GPT-4 come with slower response times compared to GPT-3.5.

Limitations of Models

While these updates significantly enhance document parsing capabilities, it's important to be aware of both models' limitations at this stage. Both GPT-3.5 and GPT-4 can output responses up to 4096 tokens, which may restrict the amount of data that can be extracted by a single parser. Planning document processing tasks with this constraint in mind is essential for maximizing efficiency and effectiveness.

Conclusion: Integrating AI with Seamless Workflows

To be effective in data extraction, it’s important to have a toolkit that is adaptable to the huge range of document types and layouts that you will come across. DocumentPro’s AI parsing methods aim to ensure both flexibility and accuracy to cover the widest range of use cases. However, it’s important to note that DocumentPro is not just a stand-alone AI parser. For it to be effective in your document management toolkit DocumentPro has to integrate seamlessly with various platforms and systems, allowing you to create customized workflows and extract data to your preferred applications. DocumentPro covers options like importing documents using our API, exporting data to webhooks, Excel, CSV and more. With our Zapier integration, you can connect to 5000+ apps that you use in your business today.