Research: Best AI Services For Automatic Invoice Processing

In this research, we have analysed 5 most popular AI document workflow automation tools to test how well they work “out-of-the-box” on a set of digital documents and find the best tools for processing scanned invoices. Read on to learn:

  • Which AI model shows 97% accuracy, and which — only 37%,
  • Which models to avoid when you need to process documents fast,
  • How well modern AI models work on older document formats.

Importance Of AI In Document Processing

Document workflow automation and Intelligent Document Processing (IDP) services play a critical role in modern business operations by enhancing efficiency, reducing errors, and streamlining workflows.

As organizations handle an increasing volume of documents, from financial reports to customer data, the need for efficient processing becomes paramount. Intelligent document processing systems utilize advanced technologies such as optical character recognition (OCR), machine learning, and natural language processing to automate the extraction, interpretation, and management of data from various document formats.

This document workflow automation not only speeds up processing times but also significantly reduces the likelihood of human error, ensuring data accuracy and reliability.

The role of IDP services extends beyond just efficiency into strategic business transformation.

By automating routine tasks, document automation tools allow employees to focus on higher-value activities that require human insight and creativity.

This shift not only boosts productivity but also enhances job satisfaction by removing mundane tasks from daily routines.

Intelligent document processing solutions support compliance and risk management by ensuring that documents are processed and stored according to regulatory standards, thus protecting businesses from potential legal and financial penalties.

The integration of document automation through intelligent document processing services is essential for businesses looking to optimize their operations and maintain a competitive edge in the digital age.

The Goal Of This Analysis

We have worked on dozens of intelligent document processing projects, and have seen time and time again the need to fine-tune and train AI models to extract data from documents with high enough accuracy.

For many projects, extracting data from documents is one of the many moving parts, so putting a lot of time in tinkering with an AI model may not be the best idea. Choosing a document automation tool best suited for the task at hand and using it out-of-the-box is often the best course of action.

However, here is the tricky part: it’s not often clear what AI service will work best for a particular document type. Testing each service against a dataset of documents is timely and costly, rendering the whole “no fine-tuning” approach null.

This need to know how well different AI services work at extracting data from different documents is the main purpose of this research. We have put together a dataset of digital documents and prepared a list of popular AI services to test how well document workflow automation tools work at detecting and extracting relevant data fields.

We have decided to use invoices in this analysis as they are among the most commonly utilized documents in AI data extraction projects.

As for the AI services, we have developed a set of criteria to select the optimal options to keep the research well-rounded and informative.

AI Model Selection Criteria

For any analysis, it’s important to pay close attention to selection criteria as it can greatly influence results and conclusions. We have carefully evaluated dozens of AI models and services for document processing to choose the most optimal selection for the purposes of processing invoices real-world projects:

  • Popularity: Popular services and AI models tend to have better support and more extensive documentation, making them easier to implement,
  • Invoice Processing Capability: Generic document processing AI models always require adjustments and fine tuning to process specific documents, which is why we’ve focused on testing models and services already specialize in processing invoices,
  • API Integration: The ability of a model or a service to be integrated via API allows them to be easily integrated into applications, resulting in highly performant, secure, and scalable products.

Tested Document Automation Tools

Given these criteria, we have chosen five AI models able to recognise invoices. We’ve given each one a nickname for ease of understanding:

  • Amazon Analyze Expense API, or “AWS”,
  • Azure AI Document Intelligence - Invoice Prebuilt Model, or “Azure”,
  • Google Document AI - Invoice Parser, or “Google”,
  • GPT-4o API - text input with 3rd party OCR, or ”GPTt”,
  • GPT-4o API - image input, or “GPTi”.

All five models are specialised in analysing invoices, have API integration capabilities and are very popular in the field of smart document analysis.

Invoice Dataset

We have put together a dataset containing scanned digital invoices in the following formats: JPG, PNG, PDF (without a text layer). All scans are of high quality and contain minimal distortions and visual noise.

Each invoice contains tabular data, and the dataset itself contains at least 3 different types of layouts, which allows us to test the models across a variety of document designs.

Another important aspect is the year of the document: the dataset includes invoices issues from 1971 to 2020, allowing us to see how well modern AI services handle older document formats.

AI Model Evaluation Criteria

Given our experience of working on document processing automation applications, we have chosen the following criteria for evaluating AI model performance:

  • Recognition Accuracy: How accurately an AI model recognises invoice field names and their content, like Invoice Id, Invoice Date, Total Amount, Customer Address,
  • Processing Duration: How long it takes a model to process each invoice on average,
  • Cost: The processing cost per 1000 pages and any additional costs.

List of Invoice Fields

To start with, we have compiled all fields extracted by all models into one list:

No

Resulting Field Name

AWS

Azure

Google

1

Invoice Id

INVOICE_RECEIPT_ID

InvoiceId

invoice_id

2

Invoice Date

INVOICE_RECEIPT_DATE

InvoiceDate

invoice_date

3

Net Amount

SUBTOTAL

SubTotal

net_amount

4

Tax Amount

TAX

TotalTax

total_tax_amount

5

Total Amount

TOTAL

InvoiceTotal

total_amount

6

Amount Due

AMOUNT_DUE

AmountDue

-

7

Amount Paid

AMOUNT_PAID

-

-

8

Total Discount

DISCOUNT

TotalDiscount

-

9

VAT

-

vat

vat

10

Due Date

DUE_DATE

DueDate

due_date

11

Purchase Order

PO_NUMBER

PurchaseOrder

purchase_order

12

Payment Terms

PAYMENT_TERMS

PaymentTerm

payment_terms

13

Billing Address

-

BillingAddress

-

14

Billing Address Recipient

-

BillingAddressRecipient

-

15

Customer Address

RECEIVER_ADDRESS

CustomerAddress

receiver_address

16

Customer Address Recipient

-

CustomerAddressRecipient

-

17

Customer Name

RECEIVER_NAME

CustomerName

receiver_name

18

Customer Tax Id

-

CustomerId

receiver_tax_id

19

Customer Phone

RECEIVER_PHONE

-

-

20

Shipping Address

-

ShippingAddress

ship_to_address

21

Shipping Address Recipient

-

ShippingAddressRecipient

ship_to_name

22

Vendor Address

VENDOR_ADDRESS

VendorAddress

supplier_address

23

Vendor Address Recipient

-

VendorAddressRecipient

-

24

Vendor Name

VENDOR_NAME

VendorName

supplier_name

25

Vendor Tax Id

-

-

supplier_tax_id

26

Vendor Phone

VENDOR_PHONE

-

supplier_phone

27

Vendor Email

-

-

supplier_email

28

Vendor Iban

-

-

supplier_iban

29

Remittance Address

-

RemittanceAddress

remit_to_address

30

Remittance Address Recipient

-

RemittanceAddressRecipient

remit_to_name

31

Service Start Date

-

ServiceStartDate

-

32

Service End Date

-

ServiceEndDate

-

33

Tax Details

-

TaxDetails

-

34

Payment Details

-

PaymentDetails

-

35

Currency

-

-

currency

36

Account Number

ACCOUNT_NUMBER

-

-

37

Items

Items parsed

-

Items as strings

 

We have edited down the fields due to inconsistencies in recognition results:

1. Google outputs “invoice_type” elements which are always empty, so there elements are not included in the report,

{ "type": "invoice_type", 
"mention_text": "",
"confidence": 0.8378010392189026 }

2. Azure detects additional fields called CustomerAddressRecipient and VendorAddressRecipient which always coincide with CustomerName and VendorName and are specific to Azure only. We have decided to omit these results to avoid data duplication.

3. AWS extracts address fields, like Zip Code, Country, City, etc., which are often not usable as they contain multiple values. These fields were not included in the report:

"ZIP_CODE": [
   {"Text": "94134",
       "Confidence": 99.93},
   {"Text": "94134",
       "Confidence": 99.95},
   {"Text": "94535",
       "Confidence": 99.93}],
"COUNTRY": [
   {"Text": "France",
       "Confidence": 99.89},
   {"Text": "France",
       "Confidence": 99.95}]

4. AWS does not distinguish between vendor and customer tax IDs, so these values were not included in the report:

"TAX_PAYER_ID": [
   { "Text": "981-94-7235",
       "Confidence": 99.96 },
   { "Text": "996-81-8911",
       "Confidence": 99.96 }

5. Google extracts currency symbols separately from the number values, in fact, it’s the only service that does this. We have omitted the Currency field and compare the number values instead.

{ "type": "currency",
   "mention_text": "$",
   "confidence": 0.6610121726989746 }

6. GPT (both variants) was asked to extract all fields from the list above in the following format: (Resulting Field Name).

Product List Recognition

We have decided to evaluate how each model processes product lists as it’s often the core functionality required from an invoice processing system. Knowing which model is better at extracting product lists in particular is important for apps like these.

We have evaluated models’ performance in two ways:

  • Full row recognition,
  • Product attributes recognition: Description, Quantity, Unit Price, Total Price.

AI model notes:

  • Azure detection had more attributes but they weren’t taken into consideration in this comparison.
  • Google AI does not provide any item attributes, just full rows, so it was excluded from comparison B.

The full row recognition and a Description attribute were evaluated using a 'strong' match criterion, where results were considered correct only if they matched the source 100%. 

Fields Used For Comparative Analysis

We have used two field lists for the purposes of comparing automated invoice processing recognition results: 

  • Full list: This list contains all fields detected by all AI models,
  • Short list: This list is an intersection of each field list produced by every AI model, meaning it includes only fields detected by every AI model.

We have excluded Customer Address Recipient and Vendor Address Recipient from both lists as they duplicate the customer and vendor values accordingly.

In a short list, we used Billing Name and Billing Address field values to populate Customer Name and Customer Address fields when the latter fields are missing or empty. Different services treat these fields interchangeably, so this approach helps keep results consistent.

Validity Criteria

The fields marked as “Exact” require the field value to be the exact match to the data from an invoice. Fields marked as “Non-strict” require the field value to be correct and readable and may include a minimal level of typos.

Field

Match

Invoice Id

Exact

Invoice Date

Exact

Net Amount

Exact

Tax Amount

Exact

Total Amount

Exact

Due Date

Exact

Purchase Order

Exact

Payment Terms

Non-strict

Customer Address

Non-strict

Customer Name

Non-strict

Vendor Address

Non-strict

Vendor Name

Non-strict

 

Efficiency Assessment

For the purposes of assessing the models’ performance, we have allocated each field value with one of four states:

  • State 1, or Positive outcome: The field value is correct and matches the data in a document (a positive outcome),
  • State 2, or Negative outcome: The opposite of State 1 (a negative outcome),
  • State 3, or Field not provided by service: A field is not populated by any value (a negative outcome),
  • State 4, or False positive outcome: A field is populated with irrelevant data (a negative outcome).

Using these stages, we have used the following formula to assess the recognition efficiency:

[efficiency] = SUM(all fields@[positive outcome]) / ( SUM(all fields@ [positive outcome]) + SUM(all fields@[negative outcome]) )

Cost Assessment

As GPTt and GPTt have a “by a token” payment method, we have used the following formulas to calculate costs:

  • GPT + text formula, for single invoice:

[total_cost] = [input token cost] * ([prompt token count] + [OCR input json token count]) + [output token cost] * [result json token count]

  • GPT + image formula, for single invoice:

[total_cost] = [input token cost] * ([prompt token count] + [input image token count]) + [output token cost] * [result json token count]

Automated Invoice Processing Results

We have used our invoice dataset to test each model and evaluate how efficient they are at extracting data without any additional fine-tuning. We have evaluated the detection quality using the aforementioned field lists:

Short List Results: Essential Invoice Fields Only

Based on our efficiency calculations, Azure is the leading service (86% efficiency), GPTi and GPTt provide comparable efficiency and Google and AWS are the least efficient:

These results include invoices created before the year 2000. As most modern AI models for invoice processing were trained on modern invoices, we’ve decided to test each service against a dataset of more modern invoices.

Modern Dataset Recognition Results

GPTt here is leading the list with 97% score, AWS shows the worst results with 78% efficiency. GPTi and Azure are comparable to the leader:

Older Dataset Recognition Results

We have tested the services against a dataset composed of invoices created before 2000. Azure shows the most promising results when extracting data from older invoices, providing 74% efficiency on average. Google shows the worst results with efficiency of only 43%:

Full List Results: All Invoice Fields

When it comes to comparing recognition results that include all fields detected by each service, GPTt turned out to be the most effective with 57.5% overall recognition accuracy. AWS as GPTi show comparable recognition quality at around 54% with Google showing the worst results with 39% recognition quality. Azure, despite its great performance in extracting fields from a short list, fails at detecting any other fields, so we’ve placed it near the bottom of the overall score.

Product List Detection Results: Full Rows

Product list extraction efficiency:

AWS

Azure

Google

GPTt

GPTi

82%

97%

40%

63%

57%

Google is the least efficient, while Azure is the best at extracting products’ list line by line.

Both GPT AIs failed to correctly recognize invoices with a big list of items (23 in total), which resulted in very low efficiency.

Product List Detection Results: Product Attributes

Product attributes extraction efficiency:

AWS

Azure

GPTt

GPTi

94%

94%

56%

59%

Both GPT models failed to process invoices with a lost list of products (23 in total), resulting in low overall score.

Invoice Recognition: Detection Comparison

Invoice Recognition: Services Cost Comparison

Service

Link

Cost, per 1000 pages

Cost, additional

AWS

Intelligently Extract Text & Data with OCR - Amazon Textract Pricing - Amazon Web Services

$10

$0.008 per page after one million per month

Azure

Pricing - Azure AI Document Intelligence | Microsoft Azure

$10

-

Google

Pricing  |  Document AI  |  Google Cloud

$10

-

GPT-4o using 3d party OCR (Prebuilt Layout model by Azure AI)

Pricing - Azure AI Document Intelligence | Microsoft Azure : Prebuilt Layout

 

Pricing | OpenAI

$10

 

+

 

est. $10 ($2.50 / 1M input tokens

$10.00 / 1M output tokens)

-


 

GPT-4o only

Pricing | OpenAI

est. $10 ($2.50 / 1M input tokens

$10.00 / 1M output tokens)

                    -

 

Invoice Recognition: Comparing Processing Duration

Service

Processing duration, s

AWS

2.9 ± 0.2

Azure

4.3 ± 0.2

Google

3.8 ± 0.2

GPT-4o using 3rd party OCR

33.0 ± 2.3

GPT-4o only

16.9 ± 1.9

 

GPT-4o + text processing duration comprises OCR duration as well as GPT-4o request duration.

Conclusion

  • The most effective services for detecting invoice fields are Azure AI Document Intelligence and GPT-4o API with 3rd party OCR, with Azure being more efficient at processing older invoices (created before 2000) and extracting essential fields, and GPT-4o being more effective at processing modern invoices (created after 2000) and extracting all fields,
  • Google Document AI - Invoice Parser shows the worst recognition results in almost every test, except for processing newer invoices where AWS shows the weakest detection accuracy results,
  • All services are similar in cost: $10 per 1000 invoices,
  • GPT-4o and GPT-4o using 3rd party OCR are the slowest by far: 33 and 16 seconds per page respectively, while AWS, Azure and Google show similar processing duration of 2 to 4 seconds per page,
  • The maximum extraction quality of an out-of-the-box invoice processing solution is 97% shown by GPT-4o API with 3rd party OCR. These results include newer invoices only,
  • Modern AI services show decent results of extracting data from invoices without fine tuning, however none of them can be used in a real invoice processing system without extra configurations, as recognition results are either limited to a short list of fields or very low in quality. However, GPT-4o API with 3rd party OCR and Azure AI Document Intelligence are the most promising. 
BWT Chatbot