Automated Data Extraction for Commercial Documents
DocBuilder CORE Plugin for IBM Datacap Insight Edition
DocBuilder Extractor
DocBuilder Extractor provides a unique set of artificial intelligence engineered matrix models that include an extensive knowledge on how commercial documents are scrutinized and handled by most customary businesses.
This robust process does not require manual creation of templates or any additional rulesets to accurately detect key-value pairs and tables’ elements through any document types.
DocBuilder models are packaged with pre-governed mechanisms on how to distinguish and accurately sort data regardless of document layout dynamics.
Our unique technology and approach significantly reduces implementation costs and increases confidence in any data capturing process.
Functionality
Embedded Algorithms
Every DocBuilder Model includes a set of embedded algorithms out-of-the box.
The algorithms combine machine learning methods with document layout heuristics to identify extraction elements through structural and functional analysis.
Predictive Analytics with Neural Network
NO training and NO templates creation needed.
Thanks to embedded algorithms our prepackaged analytics can efficiently detect data points and assemble them into key-value pairs or tables.
Semantic Matching
The plugin includes a powerful Semantic mechanism capable of matching key names even if they include common OCR errors.
Pre-Configured Algorithms include over a thousand Index keys commonly used in Financial Documents. It is also aware of data types like amount, date, address, etc.
Automated Key-Value Extraction
Dynamic Table Detection
DocBuilder Table Extraction Plugin includes a Datacap Action Library with a set of actions ready to be included in your application rulesets.
DocBuilder's actions process all recognized(OCRed) pages at the same time.
This comprehensive process will create a data frame in memory and detect, classify, or merge tables on every page.
Geographical Entity Recognition
The plugin includes the discovery of Geographical Entities. (Country, State/Province, Cities).
Using this information, the plugin is also aware of the country's tax formats and abbreviations.
Synergy with IBM Datacap Insight Edition
The plugin includes the Datacap library which will natively process data in Datacap rulesets.
The extraction process runs in memory, which significantly decreases processing time.
As part of Automated extraction, DocBuilder creates multiple files. DocBuilder plugin for Datacap Insight will return populated DCO objects , XML with table details ready for CSV export, and a HTML Data Viewer which includes a table extraction summary and details for every processed page.
About Us
~
We are capture specialists, data scientists, and really just nerds that are passionate about innovation.
With over 45 years of combined experience in data capture extraction methodology, we produce exciting solutions that are as intelligent, as they are effective.