In this research, we have analysed 5 most popular AI document workflow automation tools to test how well they work “out-of-the-box” on a set of digital documents and find the best tools for processing scanned invoices. Read on to learn:
Document workflow automation and Intelligent Document Processing (IDP) services play a critical role in modern business operations by enhancing efficiency, reducing errors, and streamlining workflows.
As organizations handle an increasing volume of documents, from financial reports to customer data, the need for efficient processing becomes paramount. Intelligent document processing systems utilize advanced technologies such as optical character recognition (OCR), machine learning, and natural language processing to automate the extraction, interpretation, and management of data from various document formats.
This document workflow automation not only speeds up processing times but also significantly reduces the likelihood of human error, ensuring data accuracy and reliability.
The role of IDP services extends beyond just efficiency into strategic business transformation.
By automating routine tasks, document automation tools allow employees to focus on higher-value activities that require human insight and creativity.
This shift not only boosts productivity but also enhances job satisfaction by removing mundane tasks from daily routines.
Intelligent document processing solutions support compliance and risk management by ensuring that documents are processed and stored according to regulatory standards, thus protecting businesses from potential legal and financial penalties.
The integration of document automation through intelligent document processing services is essential for businesses looking to optimize their operations and maintain a competitive edge in the digital age.
We have worked on dozens of intelligent document processing projects, and have seen time and time again the need to fine-tune and train AI models to extract data from documents with high enough accuracy.
For many projects, extracting data from documents is one of the many moving parts, so putting a lot of time in tinkering with an AI model may not be the best idea. Choosing a document automation tool best suited for the task at hand and using it out-of-the-box is often the best course of action.
However, here is the tricky part: it’s not often clear what AI service will work best for a particular document type. Testing each service against a dataset of documents is timely and costly, rendering the whole “no fine-tuning” approach null.
This need to know how well different AI services work at extracting data from different documents is the main purpose of this research. We have put together a dataset of digital documents and prepared a list of popular AI services to test how well document workflow automation tools work at detecting and extracting relevant data fields.
We have decided to use invoices in this analysis as they are among the most commonly utilized documents in AI data extraction projects.
As for the AI services, we have developed a set of criteria to select the optimal options to keep the research well-rounded and informative.
For any analysis, it’s important to pay close attention to selection criteria as it can greatly influence results and conclusions. We have carefully evaluated dozens of AI models and services for document processing to choose the most optimal selection for the purposes of processing invoices real-world projects:
Given these criteria, we have chosen five AI models able to recognise invoices. We’ve given each one a nickname for ease of understanding:
All five models are specialised in analysing invoices, have API integration capabilities and are very popular in the field of smart document analysis.
We have put together a dataset containing scanned digital invoices in the following formats: JPG, PNG, PDF (without a text layer). All scans are of high quality and contain minimal distortions and visual noise.
Each invoice contains tabular data, and the dataset itself contains at least 3 different types of layouts, which allows us to test the models across a variety of document designs.
Another important aspect is the year of the document: the dataset includes invoices issues from 1971 to 2020, allowing us to see how well modern AI services handle older document formats.
Given our experience of working on document processing automation applications, we have chosen the following criteria for evaluating AI model performance:
To start with, we have compiled all fields extracted by all models into one list:
No |
Resulting Field Name |
AWS |
Azure |
|
---|---|---|---|---|
1 |
Invoice Id |
INVOICE_RECEIPT_ID |
InvoiceId |
invoice_id |
2 |
Invoice Date |
INVOICE_RECEIPT_DATE |
InvoiceDate |
invoice_date |
3 |
Net Amount |
SUBTOTAL |
SubTotal |
net_amount |
4 |
Tax Amount |
TAX |
TotalTax |
total_tax_amount |
5 |
Total Amount |
TOTAL |
InvoiceTotal |
total_amount |
6 |
Amount Due |
AMOUNT_DUE |
AmountDue |
- |
7 |
Amount Paid |
AMOUNT_PAID |
- |
- |
8 |
Total Discount |
DISCOUNT |
TotalDiscount |
- |
9 |
VAT |
- |
vat |
vat |
10 |
Due Date |
DUE_DATE |
DueDate |
due_date |
11 |
Purchase Order |
PO_NUMBER |
PurchaseOrder |
purchase_order |
12 |
Payment Terms |
PAYMENT_TERMS |
PaymentTerm |
payment_terms |
13 |
Billing Address |
- |
BillingAddress |
- |
14 |
Billing Address Recipient |
- |
BillingAddressRecipient |
- |
15 |
Customer Address |
RECEIVER_ADDRESS |
CustomerAddress |
receiver_address |
16 |
Customer Address Recipient |
- |
CustomerAddressRecipient |
- |
17 |
Customer Name |
RECEIVER_NAME |
CustomerName |
receiver_name |
18 |
Customer Tax Id |
- |
CustomerId |
receiver_tax_id |
19 |
Customer Phone |
RECEIVER_PHONE |
- |
- |
20 |
Shipping Address |
- |
ShippingAddress |
ship_to_address |
21 |
Shipping Address Recipient |
- |
ShippingAddressRecipient |
ship_to_name |
22 |
Vendor Address |
VENDOR_ADDRESS |
VendorAddress |
supplier_address |
23 |
Vendor Address Recipient |
- |
VendorAddressRecipient |
- |
24 |
Vendor Name |
VENDOR_NAME |
VendorName |
supplier_name |
25 |
Vendor Tax Id |
- |
- |
supplier_tax_id |
26 |
Vendor Phone |
VENDOR_PHONE |
- |
supplier_phone |
27 |
Vendor Email |
- |
- |
supplier_email |
28 |
Vendor Iban |
- |
- |
supplier_iban |
29 |
Remittance Address |
- |
RemittanceAddress |
remit_to_address |
30 |
Remittance Address Recipient |
- |
RemittanceAddressRecipient |
remit_to_name |
31 |
Service Start Date |
- |
ServiceStartDate |
- |
32 |
Service End Date |
- |
ServiceEndDate |
- |
33 |
Tax Details |
- |
TaxDetails |
- |
34 |
Payment Details |
- |
PaymentDetails |
- |
35 |
Currency |
- |
- |
currency |
36 |
Account Number |
ACCOUNT_NUMBER |
- |
- |
37 |
Items |
Items parsed |
- |
Items as strings |
We have edited down the fields due to inconsistencies in recognition results:
1. Google outputs “invoice_type” elements which are always empty, so there elements are not included in the report,
{ "type": "invoice_type",
"mention_text": "",
"confidence": 0.8378010392189026 }
2. Azure detects additional fields called CustomerAddressRecipient and VendorAddressRecipient which always coincide with CustomerName and VendorName and are specific to Azure only. We have decided to omit these results to avoid data duplication.
3. AWS extracts address fields, like Zip Code, Country, City, etc., which are often not usable as they contain multiple values. These fields were not included in the report:
"ZIP_CODE": [
{"Text": "94134",
"Confidence": 99.93},
{"Text": "94134",
"Confidence": 99.95},
{"Text": "94535",
"Confidence": 99.93}],
"COUNTRY": [
{"Text": "France",
"Confidence": 99.89},
{"Text": "France",
"Confidence": 99.95}]
4. AWS does not distinguish between vendor and customer tax IDs, so these values were not included in the report:
5. Google extracts currency symbols separately from the number values, in fact, it’s the only service that does this. We have omitted the Currency field and compare the number values instead.
{ "type": "currency",
"mention_text": "$",
"confidence": 0.6610121726989746 }
6. GPT (both variants) was asked to extract all fields from the list above in the following format: (Resulting Field Name).
We have decided to evaluate how each model processes product lists as it’s often the core functionality required from an invoice processing system. Knowing which model is better at extracting product lists in particular is important for apps like these.
We have evaluated models’ performance in two ways:
AI model notes:
The full row recognition and a Description attribute were evaluated using a 'strong' match criterion, where results were considered correct only if they matched the source 100%.
We have used two field lists for the purposes of comparing automated invoice processing recognition results:
We have excluded Customer Address Recipient and Vendor Address Recipient from both lists as they duplicate the customer and vendor values accordingly.
In a short list, we used Billing Name and Billing Address field values to populate Customer Name and Customer Address fields when the latter fields are missing or empty. Different services treat these fields interchangeably, so this approach helps keep results consistent.
The fields marked as “Exact” require the field value to be the exact match to the data from an invoice. Fields marked as “Non-strict” require the field value to be correct and readable and may include a minimal level of typos.
Field |
Match |
---|---|
Invoice Id |
Exact |
Invoice Date |
Exact |
Net Amount |
Exact |
Tax Amount |
Exact |
Total Amount |
Exact |
Due Date |
Exact |
Purchase Order |
Exact |
Payment Terms |
Non-strict |
Customer Address |
Non-strict |
Customer Name |
Non-strict |
Vendor Address |
Non-strict |
Vendor Name |
Non-strict |
For the purposes of assessing the models’ performance, we have allocated each field value with one of four states:
Using these stages, we have used the following formula to assess the recognition efficiency:
[efficiency] = SUM(all fields@[positive outcome]) / ( SUM(all fields@ [positive outcome]) + SUM(all fields@[negative outcome]) )
As GPTt and GPTt have a “by a token” payment method, we have used the following formulas to calculate costs:
[total_cost] = [input token cost] * ([prompt token count] + [OCR input json token count]) + [output token cost] * [result json token count]
[total_cost] = [input token cost] * ([prompt token count] + [input image token count]) + [output token cost] * [result json token count]
We have used our invoice dataset to test each model and evaluate how efficient they are at extracting data without any additional fine-tuning. We have evaluated the detection quality using the aforementioned field lists:
Based on our efficiency calculations, Azure is the leading service (86% efficiency), GPTi and GPTt provide comparable efficiency and Google and AWS are the least efficient:
These results include invoices created before the year 2000. As most modern AI models for invoice processing were trained on modern invoices, we’ve decided to test each service against a dataset of more modern invoices.
GPTt here is leading the list with 97% score, AWS shows the worst results with 78% efficiency. GPTi and Azure are comparable to the leader:
We have tested the services against a dataset composed of invoices created before 2000. Azure shows the most promising results when extracting data from older invoices, providing 74% efficiency on average. Google shows the worst results with efficiency of only 43%:
When it comes to comparing recognition results that include all fields detected by each service, GPTt turned out to be the most effective with 57.5% overall recognition accuracy. AWS as GPTi show comparable recognition quality at around 54% with Google showing the worst results with 39% recognition quality. Azure, despite its great performance in extracting fields from a short list, fails at detecting any other fields, so we’ve placed it near the bottom of the overall score.
Product list extraction efficiency:
AWS |
Azure |
|
GPTt |
GPTi |
82% |
97% |
40% |
63% |
57% |
Google is the least efficient, while Azure is the best at extracting products’ list line by line.
Both GPT AIs failed to correctly recognize invoices with a big list of items (23 in total), which resulted in very low efficiency.
Product attributes extraction efficiency:
AWS |
Azure |
GPTt |
GPTi |
94% |
94% |
56% |
59% |
Both GPT models failed to process invoices with a lost list of products (23 in total), resulting in low overall score.
Service |
Link |
Cost, per 1000 pages |
Cost, additional |
---|---|---|---|
AWS |
Intelligently Extract Text & Data with OCR - Amazon Textract Pricing - Amazon Web Services |
$10 |
$0.008 per page after one million per month |
Azure |
$10 |
- |
|
|
$10 |
- |
|
GPT-4o using 3d party OCR (Prebuilt Layout model by Azure AI) |
Pricing - Azure AI Document Intelligence | Microsoft Azure : Prebuilt Layout |
$10 + est. $10 ($2.50 / 1M input tokens $10.00 / 1M output tokens) |
- |
GPT-4o only |
est. $10 ($2.50 / 1M input tokens $10.00 / 1M output tokens) |
- |
Service |
Processing duration, s |
---|---|
AWS |
2.9 ± 0.2 |
Azure |
4.3 ± 0.2 |
|
3.8 ± 0.2 |
GPT-4o using 3rd party OCR |
33.0 ± 2.3 |
GPT-4o only |
16.9 ± 1.9 |
GPT-4o + text processing duration comprises OCR duration as well as GPT-4o request duration.