Automated Data Extraction for Commercial Documents

DocBuilder CORE Plugin for IBM Datacap Insight Edition

 

DocBuilder Extractor

DocBuilder  Extractor provides a unique set of artificial intelligence engineered matrix models that include an extensive knowledge on how commercial documents are scrutinized and handled by most customary businesses.

 

This robust process does not require manual creation of templates or any additional rulesets to accurately detect key-value pairs and tables’ elements through any document types.

 

DocBuilder models are packaged with pre-governed mechanisms on how to distinguish and accurately sort data regardless of document layout dynamics.

 

Our unique technology and approach significantly reduces implementation costs and increases confidence in any data capturing process.

Functionality

icon-out of the box.png
Embedded Algorithms

Every DocBuilder Model includes a set of embedded algorithms out-of-the box.

The algorithms combine machine learning methods with document layout heuristics to identify extraction elements through structural and functional analysis.

icon-predictive-with CNN.png
Predictive Analytics with Neural Network

NO training and NO templates creation needed.

Thanks to embedded algorithms our prepackaged analytics can efficiently detect data points and assemble them into key-value pairs or tables.

icon-Semantic Matching.png
Semantic Matching

The plugin includes a powerful Semantic mechanism capable of matching key names even if they include common OCR errors.

icon-key-value pairs.png

Pre-Configured Algorithms include over a thousand Index keys commonly used in Financial Documents. It is also aware of data types like amount, date, address, etc.

Automated Key-Value Extraction
Dynamic Table Detection
icon-tables.png

DocBuilder Table Extraction Plugin includes a Datacap Action Library with a set of actions ready to be included in your application rulesets.

DocBuilder's actions process all recognized(OCRed) pages at the same time.
This comprehensive process will create a data frame in memory and detect, classify, or merge tables on every page.

Geographical Entity Recognition
icon-global entities.png

The plugin includes the discovery of Geographical Entities. (Country, State/Province, Cities).
Using this information, the plugin is also aware of the country's tax formats and abbreviations.

Synergy with IBM Datacap Insight Edition
 

The plugin includes the Datacap library which will natively process data in Datacap rulesets.
The extraction process runs in memory, which significantly decreases processing time.

icon-hybrid approach.png
icon-datacap studio.png

As part of Automated extraction, DocBuilder creates multiple files. DocBuilder plugin for Datacap Insight will return populated DCO objects , XML with table details ready for CSV export, and a HTML Data Viewer which includes a table extraction summary and details for every processed page.

Graphic-Datacap Studio with DocBuilder p

About Us

~
We are capture specialists, data scientists, and really just nerds that are passionate about innovation.

With over 45 years of combined experience in data capture extraction methodology, we produce exciting solutions that are as intelligent, as they are effective.

 

Contact us

~

Thanks for submitting!