Skip to main content
Skip table of contents

Configuring AI Powered Data Extraction in WebApps

1. The Evolution of AI

Before we had AI, programmers always had to write programs to perform each and every task the application was going to do. This often took 100’s, 1000’s, 10,000’s of hours of time writing programming code to get those applications to operate in the desired way. Those applications could only do what they had programmed them to do. They were unable to perform new tasks unless more programming code was added to the application.

There has always been a ‘Holy Grail’ in software development to be able to utilise the power of AI to make it easier for humans to use and interact with computers. Early attempts at using AI in software development fell way short of being able to do this or even part of this, and early implementations of AI presented to software users were often quite limited in their usefulness.

AI as a concept has been around for a long time in various shapes and forms since the 1940’s, but it was difficult to bring AI solutions to market because computers were not invented until the 1960’s and for a long time they were expensive and slow.

Intelligent, accurate AI responses are only possible when AI has been trained using a large amount of input data, and there wasn’t a lot of data available for training purposes, and training an AI engine took a very long time and was extremely expensive.

In the late 1990’s these limiting factors started to change. PC's and servers were becoming cheaper, faster with more hard disk drive capacity and more memory. Data was started to be downloaded from all over the wide world web by Google onto it’s servers.

Useful AI applications started to creep into our lives in 1998 with the launch of Google. It was the first browser based application where you could ask Google to search for something and it would provide you with a ranked list of results that it thought were relevant to your search criteria. It was really a search engine that had categorized an enormous amount of information downloaded from crawling millions and millions of web sites. Google added there own artificial intelligence on top of the data so it could be easily searched and the data was returned as search results ranked from most likely result to least likely result.

This was a giant step forward in how humans could access information that would have otherwise been impossible to find and get access to. Search results were returned quickly to the user. The application was web browser delivered and keyboard driven and it was free, resulting in quick uptake amongst users.

In 2011 another big step forward in AI came about when Apple bought Siri and then released it as an inbuilt app on the iPhone. You were able to say ‘Hey Siri’ followed by the task you wanted it perform like “Call John” or “Set an alarm called car service for a 7am”. It came bundled on the iPhone for free. If you owned an iPhone 4S or higher you just could use it. Siri was available on any Apple device, anytime, anywhere.

In 2015 Open AI was founded and started to work on a new AI technology based on prediction and text generation. In 2022 it released a chat bot called ChatGPT, and in 2024 ChatGPT v4 was released. This was the time that AI was actually was able to create new content , not just recite exciting content.

The ability of generative AI to create new textual content quickly made it popular with ‘content creators’. Being able to create version 1 of new content by just typing a few text commands, was much easier than spending time handcrafting it word by word. Being able to further finesse that content using the chat bot over and over until you end up with the final product, with all the heavy lifting done by AI, with oversight provided by a human.

In 2020 Microsoft released its AI powered Forms Recognizer solution. It came with some pre-trained AI models, and also allowed users to train their own custom models. It was promoted to “accelerate your business processes by automating information extraction”. Then in 2024, Microsoft released its Open AI chat bot.

Chat bots have now made their way into most web browsers and onto most mobile devices. Chat bots are now within reach of a large portion of the worlds population.

The chat bots from Open AI and Microsoft have been trained on very large datasets (often referred to as ‘Large Language Models’, ‘LLM’s or ‘LLAMAS’). Due to their size of their training data set size (now in 2024 sitting at around 140TB) the resulting large language model files are now 380GB in size. LLM’s are currently growing at somewhere between 4-5 times in size per iteration. Larger language model sizes usually deliver better performance.

A key part of the user acceptance of chat bots, is that they are commanded to perform tasks in natural language (i.e how humans speak to each other), not in a programming language. With their large training models they are able to respond intelligently to a vast range of questions.

This discussion from here-on is focused on how EzeScan is leveraging these latest innovations in AI to improve the data extraction capabilities of unstructured data within its WebApps application.

We will be focussing on how WebApps is using 2 of Microsoft’s AI offerings:

a) Azure Document Intelligence

b) Azure OpenAI

These were chosen because they both offer our customers the ability to utilise the benefits of using AI whilst ensuring the highest levels of data privacy and data security. Both platforms do not retain any customer data for training purposes. Conversations with AI are one time, and destroyed after each conversation.

These 2 AI options will be discussed in greater detail below.


2. Using AI when WebApps is deployed ‘on premise’

  • For ‘On Premise’ EWA installs, EWA requires customer will need to have their own Microsoft Azure Portal account, and they must have their own subscription access to use either Azure Document Intelligence, or Azure OpenAI , or both of these depending on which AI method(need to need to be used.

https://portal.azure.com/#home

image-20250416-044851.png

On your Azure portal you should be able to see your available Azure Portal AI Resources listed here:

image-20250416-045744.png

The 2 Microsoft AI options use different pricing models.

  • Microsoft Azure Document Intelligence charges a fee per page processed. The pre-trained default models provided ‘out of the box' are the cheapest to use. Custom user trained models built by the customers are 3-4 more expensive to use per page than default models.

  • Microsoft Azure OpenAI charges a fee based on token usage. In a nutshell tokens are consumed based on a combination of

    • Input tokens (total number of words + punctuation on the incoming OCR layer)

    • Command tokens (total number of words + punctuation in the chat bot command).

    • Output tokens (total number of words + punctuation in the generated output text)

Specific model pricing for either AI option is available online and is best found by Google searching for something like ‘Azure Document Intelligence model pricing in Australia' or for ‘Azure Open AI gpt-3.5-turbo pricing in Australia’. Make sure you are looking for the region you are in (i.e. Australia, UK or USA)

Customers are solely 100% responsible for paying their Azure usage fees to Microsoft for the Microsoft AI services they are subscribing to.


3. Using AI when WebApps is deployed in EzeScan Cloud

For EzeScan Cloud’ installs, AI is available on our Cloud Platform.

However, your subscription will only include AI if it was specifically included in our sales quote to you.

The 2 Microsoft AI options use different pricing models.

  • Microsoft Azure Document Intelligence charges a fee per page processed. The pre-trained default models provided ‘out of the box' are the cheapest to use. Custom user trained models built by the customers are 3-4 more expensive to use per page than default models.

  • Microsoft Azure OpenAI charges a fee based on token usage. In a nutshell tokens are consumed based on a combination of:

    • Input tokens (total number of words + punctuation on the incoming OCR layer)

    • Command tokens (total number of words + punctuation in the chat bot command).

    • Output tokens (total number of words + punctuation in the generated output text)

Please talk to our sales team, if you’d like to upgrade your EzeScan Cloud subscription to include AI for an agreed number of document pages/month.


4. What version of WebApps supports AI?

  • The 2 Microsoft AI pre-processors (Azure Document Intelligence, Azure Open AI) and are only enabled in EWA 3.11 or later only.


5. AI requires a text layer

There is one important thing you need to know.

For AI to be able to find and extract data from each page of an input document, the input document submitted to AI must include a text layer for each page that AI is going to process.

The file format that works best with AI is PDF, especially text searchable PDF (because it already contains a text layer for each page in the document).

In WebApps It's possible to process other file formats by implementing a File Converter pre-processor on the RIA page (e.g to convert TIF to PDF, JPG to PDF).

What if your PDF does not yet include a text layer? refer to the next section.


6. How can WebApps generate a text layer?

There are 2 types of pre-processors in WebApps that can create this OCR text layer.

  1. Azure Document Intelligence pre-processor (uses Azure inbuilt OCR+ICR engine)

a. The Read model - generates OCR/ICR text layer (i.e. raw text layer)

b. The other models such as ‘General Document’ and ‘Invoice’ - generate the raw OCR/ICR text layer and provide the OCR/ICR data as Key Value Pairs (KVP).

or

  1. OCR pre-processor (uses EWA’s inbuilt OCR engine)

a. Only generates the raw OCR text layer.


7. Tips for setting up a quick AI demo

If you are trying to setup a RIA page to use AI to populate the fields for a demo

  • When creating a new field that will be populated from AI, consider just using a Custom text field (text box or text area)

    • By using a custom field, instead of a numeric field you will find that the OCR data found by the AI process such as an ABN number (e.g. 23 101 456 898) will be displayed in the field, without first having to remove spaces from it.

    • By using a custom field, instead of a numeric field you find that the OCR data found by the AI process such as Invoice Total (e.g. $1,634.31) will be displayed in the field, without having to remove comma’s from it.

    • By using a custom field, instead of a date field you find that the OCR data found by the AI process such as Invoice Date (e.g.24th September, 2024) will be displayed in the field, without having to first convert the date into a DD/MM/YYYY.

    • This also helps to improve the usability of Fine Value in viewer on focus option, available on the field Values tab. As you click in the field the area of the image where the AI found the data will be highlighted in green.

image-20250429-034754.png

Click in the field

image-20250429-053009.png

The viewer displays this green highlighting over the image where the data is located.

image-20250429-053121.png
  • When using custom fields for AI data, you may need to use an Update Metadata stage to clean up the data (e.g. remove spaces, remove comma’s, convert date to a specific format) when the data is being written out during the Submit button action.

  • For example below, it shows the Mapping Values needed to:

    • remove all spaces from abn_number field output

    • convert long spaces to short spaces in supplier_name field output

    • remove all spaces from invoice_amount field output

    • replace comma with nothing in gst_amount field output

    • replace comma with nothing in invoice_total field output

image-20250429-052719.png
image-20250429-061537.png

These source placeholders were built using the Placeholder Creator tool. Each placeholder value created in the tool was cut and pasted into the applicable source placeholder cell. The tool is visible when using the Admin panel in WebApps.

image-20250429-062528.png


8. Using AI to generate text layer and extract data

There are 3 methods:

Method 1: if you choose to add the pre-processor Azure Document Intelligence ‘General Document' or ‘Invoice’ model, it will generate an OCR text layer, and it will include Key Value Pairs derived from that OCR data. You will also need to add a Map Metadata pre-processor to map the Key Value Pairs into the relevant RIA page fields

 

Method 2: if you choose to add the pre-processor Azure Document Intelligence ‘Read’ model, it will generate an OCR text layer, but you will also need to add an additional Azure OpenAI pre-processor to find and extract the data in the OCR layer. You will command the OpenAI chat bot to read data from the page OCR layers, and output that data with field names that match the RIA page field id’s.

 

Method 3: If you choose to add the OCR pre-processor, it will generate an OCR text layer, but you will also need to add an additional Azure OpenAI pre-processor to find and extract the OCR data. You will command the OpenAI chat bot to read data from the page OCR layers, and output that data with field names that match the RIA Page field id’s.

A detailed overview of how to setup each of these methods follows below.


8.1. Use Azure Document Intelligence pre-processor ‘Built in’ model, with a Map Metadata pre-processor

Document Intelligence uses models to read data from documents.

There are 2 types of models supported in Document Intelligence on Microsoft Azure:

  1. Default inbuilt models (e.g. ‘General Document’ and ‘Invoice’). These were pre-trained by Microsoft using large datasets. Work well in many cases, but you have then 'as is; because no retraining/modification to the base model is allowed by Microsoft.

  2. Custom models. Built by the user using Microsoft Model building and training tools. Can be used to build custom solutions not supported by Default models. May require constant retraining/update if too small a dataset is used when training the model.

Basically both model types are trained to find key value pairs of data.

Hint: This discussion will focus on using built-in models only.

A Key value pair consists of 2 strings. The first string is the label or name of the data field. The second string is the actual data value. For example:

Label Value

Invoice Number: 100001

Date: 22/10/2024

Price: $100.00

A simplified way to look at how Key Value pairs work is that the software has been trained to look for some label text and then find the value that is written to the right of it or below it.

Thus for the data laid out on the document as

Invoice Number: 100001

Depending on the model being used the Key value pair would be output as

Kvp.InvoiceNumber, 100001 or as afr.InvoiceNumber, 100001

(Note: KVP names never include spaces in them)

Configuring a RIA page to extract data from Document Intelligence Key Value Pairs.

Create A RIA Page

  1. Create a new RIA Page. Go to the App Page tab on the Admin Menu, Pages Tab, + Add Page and select RIAPage - Remote indexing Assistant from the drop-down menu.

Screenshot 2024-10-22 at 1.26.32 pm.png
  1. Give your RIA page a Name and Description.

image-20241022-051426.png
  1. Create a Queue, and assign it

image-20241022-051546.png
image-20241022-051610.png
  1. Go to the Pre-Processing Tab, Select + Create New and select File Converter from the drop down.

(Note: This file converter will be used to help to convert input documents that aren’t in PDF format to PDF format)

image-20241022-053630.png

Name it as ‘File Converter’

image-20241022-052211.png

Apply and Save the changes.

  1. Go to the Pre-Processing Tab, Select + Create New and select Azure Document Intelligence from the drop down.

image-20241022-053752.png

Name it as Azure Document Intelligence.

Select Azure Document Intelligence (Built-in) from the Azure Document Intelligence Connection drop down list.

Select ‘Invoice from the model type drop down list.

image-20241022-053339.png

Scroll down further and tick ‘Key Value Pairs’

image-20241022-054008.png

Apply and Save the changes.

  1. Go to the Pre-Processing Tab, Select + Create New and select Map MetaData from the drop down.

image-20241022-053752.png

Name it as Map KVP Metadata

image-20241022-054504.png

Apply and Save the changes you have made (even though no mappings were added at this time).

Note: Its not possible to map the actual Key Value Pairs yet, because you don’t know what names the Azure Document Intelligence (Built-in) Invoice model is going to generate.

When the first invoice is submitted to the RIA page and the Azure Document Intelligence pre-processor is run, look on the item history form, metadata tab it will display all of KVP name,value pairs returned by the model.

image-20241022-083407.png
image-20241022-083507.png

Click on the Metadata tab. The key value pairs and grid items are displayed. This can be a long list, so remember to scroll down to see more.

 

image-20241022-083645.png

 

image-20241022-083837.png

Find the KVP values that you want to display in the RIA Page fields.

Let's use these 2 results to show how this is done.

afr.InvoiceId 200

afr.InvoiceDate 15/04/2008

 

  1. Edit the RIA App page and create 2 fields.

Invoice Date

Invoice Number

Hint: Write these field names down, so you can recall them for use in the next step.

Apply and Save the field edits that you made.

 

  1. Edit Map KVP Metadata pre-processor that you created earlier.

image-20241022-054504.png

On the Field Mapping tab , press the Mapping Values + button, to add new mapped values.

image-20241022-090030.png

Select the field named ‘Invoice Number’ from the list.

Insert the text afr.InvoiceId into #1 mapping box (hint: it is case sensitive so make sure you type it exactly as it was displayed on the item history metadata tab)

Enable any transformations that you want to apply to the KVP data as it is read into the field.

Press Apply to save the new field mapping.

 

Repeat this step again for the Invoice Date field.

image-20241022-090812.png

 

Select the field named ‘Invoice Date’ from the list.

Insert the text afr.InvoiceDate into #1 mapping box (hint: it is case sensitive so make sure you type it exactly as it was displayed on the item history metadata tab)

Enable the transformations that you wan to apply to the KVP data as it is read into the field.

Press Apply to save the new field mapping.

Repeat these steps for all the the invoice field data that you want to map across into other app page fields.

 

  1. Once you’ve added your app page fields, added the File Converter pre-processor, added the Azure Document Intelligence Invoice Model pre-processor, added the Map KVP Metadata pre-processor and mapped the KVP pairs, you should drag and drop one of your sample invoices onto the app page.

 

  1. All going well, the Azure Document Intelligence pre-processor should OCR the pages and create KVP data from the OCR data. The Map KVP Metadata pre-processor should map the KVP data into the relevant fields on the RIA page. The OCR data should automatically populate the app page fields.

 

  1. Troubleshooting. If no data is displayed , check each of the pre-processor logs for the presence of any errors and check that the Item History Metadata tab to make sure the KVP is being generated, and then double check you didn’t make a typo in the Map Metadata screens for each mapping that you set up. Repeat the process of resolving each error and retesting until it works.


8.2. Use Azure Document Intelligence pre-processor ‘Read' Model, with an Open AI pre-processor

Configuring a RIA page to extract OCR data using an Open AI chat bot.

  1. Create a new RIA Page. Go to the App Page tab on the Admin Menu, Pages Tab, + Add Page and select RIAPage - Remote indexing Assistant from the drop-down menu.

Screenshot 2024-10-22 at 1.26.32 pm.png

  1. Give your RIA page a Name and Description.

Screenshot 2024-10-22 at 1.29.14 pm.png
  1. Assign your page a Queue.

    Screenshot 2024-10-22 at 1.30.18 pm.png
  2. Go to the Pre-Processing Tab, Select + Create New and select File Converter from the drop down.

(Note: This file converter will be used to help to convert input documents that aren’t in PDF format to PDF format)

image-20241022-053630.png

Name it as ‘File Converter’

image-20241022-052211.png

Apply and Save the changes.

  1. Go to the Pre-Processing Tab, Select + Create New and select Azure Document Intelligence (Built-in) from the drop down.

image-20241022-053752.png

Name it as Azure Document Intelligence.

Select Azure Document Intelligence (Built-in) from the Azure Document Intelligence Connection drop down list.

Select Read from the model type drop down list.

image-20241022-071115.png

Apply and Save the changes.

This will provide an OCR text layer to any downstream AI pre-processor.

  1. Go to the Pre-Processing Tab, Select + Create New and select Azure OpenAI from the drop down.

Screenshot 2024-10-22 at 1.30.50 pm.png

Give your pre-processor a Name and Description.

Screenshot 2024-10-22 at 1.37.41 pm.png

From the Azure OpenAI Connection drop down Select Azure OpenAI (Built In).

Screenshot 2024-10-22 at 1.49.47 pm.png

Select a Model from the drop-down menu.

Screenshot 2024-10-22 at 1.52.09 pm.png

Model

 

Input Tokens

Output Tokens

gpt-35-turbo

Region Restricted model that ensures data doesn't leave the containing region.

2.1 App Credits per 10,000 input tokens processed

4.1 App Credits per 10,000 output tokens processed.

7 a) There are 2 payloads that can be returned by the OpenAI pre-processor.

The first type of payload is a single command payload as demonstrated below:

Configure the chat bot Instructions field to return what you are after.

The default instruction will be “You will be supplied the text layer of a document. You are to respond with a short summary of the contents of the document”.

Screenshot 2024-10-22 at 1.59.20 pm.png

In the following example we are asking the AI engine to return what the complaint is about.

You will be supplied with the text layer of a complaint letter. You are to respond with the type of complaint.

The target Metadata Id is the id that will call the OpenAI results in on the RIA page. You will need to create a RIA field with this ID.

Screenshot 2024-10-22 at 2.00.44 pm.png
Screenshot 2024-10-22 at 2.02.30 pm.png

When indexing a document you will see the response from the AI engine in the corresponding field.

Screenshot 2024-10-22 at 2.07.33 pm.png

Simply re-configure the chat bot instructions to modify the result.

In the following example, we asked the engine to return it in 5 words.

You will be supplied with the text layer of a complaint letter. You are to respond with the type of complaint in 5 words or less.

Screenshot 2024-10-22 at 2.12.40 pm.png

7b. The second type of payload is a multi value JSON payload as demonstrated below:

Returning Multiple Results

We can also use the OpenAI engine to return multiple results that we can link to different metadata fields by parsing the results as JSON. Find the Parse response as JSON toggle and turn it on.

Screenshot 2024-10-22 at 2.16.09 pm.png

You will need to modify the Instructions so that are asking the engine to respond with the following fields and how you want them mapped to your metadata fields.

In the following example, we are now asking the engine to also find who raised the complaint and return this as raised_by. This means we will need to create a new field with the Id raised_by.

Screenshot 2024-10-22 at 2.28.32 pm.png

Not sure how JSON works? Here is the base structure of how you can ask the AI engine in JSON. It is recommended you keep the text in black and only modify that in red.

You will be supplied with the text layer of a complaint letter. You are to respond with the following fields:
"Type of complaint in 5 words" as complaint_type
"Person who submitted the complaint" as raised_by
If no value for a field is found then leave it blank
You are to respond with a JSON object. You will reply only with the JSON itself, and no other descriptive or explanatory text.

The engine has now populated both metadata fields.

Screenshot 2024-10-22 at 2.32.04 pm.png

8.3. Using an OCR pre-processor, with an Open AI pre-processor

Configuring a RIA page to extract OCR data using an Open AI chat bot.

  1. Create a new RIA Page. Go to the App Page tab on the Admin Menu, Pages Tab, + Add Page and select RIAPage - Remote indexing Assistant from the drop-down menu.

Screenshot 2024-10-22 at 1.26.32 pm.png
  1. Give your RIA page a Name and Description.

Screenshot 2024-10-22 at 1.29.14 pm.png
  1. Assign your page a Queue.

Screenshot 2024-10-22 at 1.30.18 pm.png
  1. Go to the Pre-Processing Tab, Select + Create New and select File Converter from the drop down.

(Note: This file converter will be used to help to convert input documents that aren’t in PDF format to PDF format)

image-20241022-053630.png

Name it as ‘File Converter’

image-20241022-052211.png

Apply and Save the changes.

  1. Go to the Pre-Processing Tab, Select + Create New and select OCR from the drop down.

image-20241022-082439.png

Name it as OCR.

image-20241022-082613.png

This will provide an OCR text layer to any downstream AI pre-processor.

  1. Go to the Pre-Processing Tab, Select + Create New and select Azure OpenAI from the drop down.

Screenshot 2024-10-22 at 1.30.50 pm.png

Give your pre-processor a Name and Description.

Screenshot 2024-10-22 at 1.37.41 pm.png

From the Azure OpenAI Connection drop down Select Azure OpenAI (Built In).

Screenshot 2024-10-22 at 1.49.47 pm.png

Select a Model from the drop-down menu.

Screenshot 2024-10-22 at 1.52.09 pm.png

Model

 

Input Tokens

Output Tokens

gpt-35-turbo

Region Restricted model that ensures data doesn't leave the containing region.

2.1 App Credits per 10,000 input tokens processed

4.1 App Credits per 10,000 output tokens processed.

7a. There are 2 payloads that can be returned by the OpenAI pre-processor.

The first type of payload is a single command payload as demonstrated below:

Configure the chat bot Instructions field to return what you are after.

The default instruction will be “You will be supplied the text layer of a document. You are to respond with a short summary of the contents of the document”.

Screenshot 2024-10-22 at 1.59.20 pm.png

In the following example we are asking the AI engine to return what the complaint is about.

You will be supplied with the text layer of a complaint letter. You are to respond with the type of complaint.

The target Metadata Id is the id that will call the OpenAI results in on the RIA page. You will need to create a RIA field with this ID.

Screenshot 2024-10-22 at 2.00.44 pm.png
Screenshot 2024-10-22 at 2.02.30 pm.png

When indexing a document you will see the response from the AI engine in the corresponding field.

Screenshot 2024-10-22 at 2.07.33 pm.png

Simply re-configure the chat bot instructions to modify the result.

In the following example, we asked the engine to return it in 5 words.

You will be supplied with the text layer of a complaint letter. You are to respond with the type of complaint in 5 words or less.

Screenshot 2024-10-22 at 2.12.40 pm.png

7b. The second type of payload is a multi value JSON payload as demonstrated below:

Returning Multiple Results

We can also use the OpenAI engine to return multiple results that we can link to different metadata fields by parsing the results as JSON. Find the Parse response as JSON toggle and turn it on.

Screenshot 2024-10-22 at 2.16.09 pm.png

You will need to modify the Instructions so that are asking the engine to respond with the following fields and how you want them mapped to your metadata fields.

In the following example, we are now asking the engine to also find who raised the complaint and return this as raised_by. This means we will need to create a new field with the Id raised_by.

Screenshot 2024-10-22 at 2.28.32 pm.png

Not sure how JSON works? Here is the base structure of how you can ask the AI engine in JSON. It is recommended you keep the text in black and only modify that in red.

You will be supplied with the text layer of a complaint letter. You are to respond with the following fields:
"Type of complaint in 5 words" as complaint_type
"Person who submitted the complaint" as raised_by
If no value for a field is found then leave it blank
You are to respond with a JSON object. You will reply only with the JSON itself, and no other descriptive or explanatory text.

The engine has now populated both metadata fields.


JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.