To help us deliver a superior product to our valued customers, we've spent the past year working extensively on improving Parsel’s extraction accuracy and unlocking new abilities, such as form field extraction and custom algorithm configurations for Enterprise clients.
In this article, we share six major improvements we are rolling out with this release, courtesy of our amazing team of data scientists, engineers, and researchers.
1. Increased extraction accuracy
In the automated data extraction sphere, one metric is king: accuracy. Without accuracy, any solution, no matter how shiny or feature-rich, is useless.
High accuracy saves time, by reducing the time required for manual validation and by reducing compounding errors in downstream business logic and communication.
For this reason, we’ve always made accuracy our core metric to measure our performance. We’ve spent more than 18 months of R&D and implementation from the ground up to build our V2 extraction algorithm; a revamped table content structure analysis algorithm, proprietary computer vision ML module and OCR corrector.
With this new release, we’ve reduced the error rate by half when tested on our broadest and most challenging set of test data. This solidifies the superiority of our table and form field extraction accuracy compared to our competitors.
2. Automated form field extraction is out of beta!
A year after launching our beta field extraction algorithm, and following extensive testing and feedback from our test group, we’re now happy to announce that our form field extraction feature has entered general availability.
Every user with an Enterprise Parsel plan can now leverage our fully automated field extraction algorithm to effectively extract any field-like data alongside table data, whether it’s from invoices, purchase orders, customer records and more.
3. Faster processing time
Additionally, with this new release, we’ve gone live with an entirely new application architecture for V2, which has greatly improved lower and upper bound latency, along with other key metrics, like scalability, stability, and our ability to monitor performance.
On average, we are now able to process documents at scale with an average processing duration of less than 1 second per page. In practice, this means that the majority of the documents you’re processing will be fully analysed and ready to export within a minute or two.
4. Custom Configurations
In addition to our aim of boosting accuracy across all document domains, we also understand that there are certain scenarios where a generic catch-all algorithm does not provide the optimal solution.
This is especially evident in dealing with document formats that are exceptionally complicated and difficult to parse, or when the client is interested in non-standard data items and various other metrics that we track.
With support for customisation at its core, Parsel V2 can now analyse table formats that are ambiguous even to the human eye, in which the meaning of the encapsulated data is determined only by the client’s interpretation and utilisation of the data.
With V2’s native full support for per-customer custom algorithm models, we can run different configurations of our algorithm for individual Enterprise customers, and even for different templates/formats that the customer uploads
Have a specific use case to discuss? Schedule a free discovery session with our team today.
5. Nested and hierarchical data capture
Often, with more complicated tables (such as the one below), a single column of data cells is described by multiple levels of column captions above it, with hierarchical relationships that are challenging to accurately parse.
With the Parsel V2 algorithms, we have now gained the capacity to extract all hierarchical row and column captions from tables in all domains. This feature will help users process tables with complex caption structures and automatically detect the full context that describes each cell’s value.
Figure 1: V1 algorithm without column hierarchy support
Figure 2: V2 algorithm with column hierarchy support
6. Format upgrades
All formats, JSON, CSV and Excel, have now been upgraded to include form field outputs, alongside tables, when enabled through the Enterprise API.
We’ve also introduced more visual improvements in the Excel output format, to enhance readability and reduce cognitive load whilst conducting analysis, as can be seen in the figures below.
Furthermore, we’ve fully revamped our CSV format to be fully machine-readable with a consistent format, all while retaining readability for humans who’d like to inspect the data manually.
Figure 3: V1 Excel format
Figure 4: V2 Excel Format
Get started today
We have a full roadmap of product improvements coming up — so keep your eyes peeled for Parsel updates. In the meantime, as always, we'd love to get your thoughts!
To provide feedback on these features and more, please don’t hesitate to reach out to our support team.