This pre-processor supports a number of methods of finding Personally Identifying Information (PII) on a document submitted to a RIA queue.
Its main function is to find different types of objects in a document (i.e. text, picture) along with the bounding rectangle co-ordinates for each object found.
Once the objects are found the document is usually either:
-
presented to a user for QA and redaction of all found objects.
-
passed to queue where automated redactions are applied to all found objects (No QA),
The pre-processor is added to a RIA Page by pressing the Create New button on the Pre-Processing tab and selecting the Identify Information menu selection.
It displays this form:
Enter a meaningful name for the pre-processor.
Select an Identify Type from the list of available types.
Here is a brief description of each type.
|
Identify Type |
Description |
Returns |
|---|---|---|
|
Advanced Text
|
This model uses ‘user defined’ regexes to intelligently find PII text in OCR data. This model requires an existing text layer. or a separate OCR pre-processor |
Its results are passed back as a grid object. |
|
Azure AI Language |
This model provides a pre-built list of over of 40+ Entity Types that it knows how to find in the OCR data (e.g. Person, Address, Company, Phone Number). Performs its own OCR before finding data. |
Its results are passed back as a grid object. |
|
Azure Custom Vision |
Users can build their own Custom Vision model in Azure of their different plastic card types. Users train the model using 1000’s of their sample images. Uses vision recognition to recognise card types not OCR. |
Its results are passed back as a grid object. |
|
Credit Card |
This model uses prebuilt regexes to find valid credit numbers in an OCR text layer. This model requires an existing text layer. or a separate OCR pre-processor |
Its results are passed back as a grid object. |
|
EzeScan Object Detection Engine |
Still being developed at this time … |
|
|
Presidio |
This model provides a pre-built list of over of 30+ Entity Types used for indentification verification (e.g. Person Name, Australia TFN, Australia Medicare). This model requires an existing text layer. or a separate OCR pre-processor |
Its results are passed back as a grid object. |
|
Text |
Uses to find simple PII text in OCR data. This model requires an existing text layer. or a separate OCR pre-processor |
Its results are passed back as a grid object. |
The RIA page will need to include a grid field to receive the objects returned by these models.
The critical thing is to remember that these models return the bounding rectangles position, size and page number for every object found.
RIA includes a built in feature that will automatically draw a bounding rectangle on the PDF pages in the viewer for every object found. As rows in the grid are selected the corresponding bounding rectangle is given focus in the PDF viewer.