Automated Redaction Becomes VAR’s Calling Card
Vertical market focus is going to be the wave of the future for document imaging resellers. There is just too much going on in the way of standardization and application integration for horizontal document imaging to continue to be as important as it has been in the past. DIR recently caught up with a reseller who has already successfully leveraged the vertical model. That is Computing System Innovations (CSI) out of Orlando, FL.
CSI recently completed the installation of an automated redaction system with the Marion County Clerk of the Circuit Court in Ocala, FL, to help the county get into compliance with a new regulation that goes into effect at the end of the year. CSI’s own-branded IntelliDact solution has helped Marion County prepare seven million back-file documents, as well as two million day-forward documents annually, to meet the requirements of Florida State Statute (FSS) 119.07, which makes county clerks responsible for protecting certain information in public records.
This information includes Social Security, credit and debit card, and bank account numbers—data that could be used for identify theft. Currently, individuals are responsible for their own redaction of this information. “Florida is the first state to implement such a law,” said Henry Sal, president of CSI. “But, I’ve already had conversations with courts in other states, such as California, Alaska, Ohio, and Texas. Some of them are in the process of approving laws similar to Florida’s.”
To reference the reference we made in our cover story, CSI’s introduction of IntelliDact could probably be considered its “tipping point.” And typically, the company’s success in this niche was not an overnight sensation. CSI had previously been marketing a data extraction application to county courts.
“We had been working with Florida counties for several years, supplying technology to automatically locate and extract certain information from their unstructured court documents,” said Sal. “This might include names, case numbers, etc., to help the courts prepare cases. You can be looking for a multitude of information located anywhere on the documents within millions of court files. When FSS 199.07 was passed, we began working with Marion County on retrofitting our extraction technology to handle its redaction requirements.”
The nice part about FSS 199.07 is that it presently only requires CSI to locate and redact three categories of data. “Of course, this data can appear anywhere on a variety of document types, and there may not be any keywords preceding them,” said Sal. “Also, the redaction requirements are subject to change, so we have architected our technology to rapidly accommodate any changes to the laws.”
Other major challenges lie in the fact that this data is often hand-printed and in the sheer volume of documents that must be processed. “The biggest fallacy in the market is that you can’t automatically recognize hand-printed data,” Sal told DIR. “Ocè’s ICR engine enables us to find handprint. It has to be constrained and well-formed, but still, it’s a huge differentiating factor. It’s the only way to guarantee accuracy. If you can’t recognize handprint, your application might not even realize data that needs to be redacted is there. Then you end up having to review every document, which kind of negates the advantages of auto-redaction.”
CSI utilizes Ocè’s DOKuStar recognition engine in IntelliDact. “CSI has a very good understanding of how our engine works and has written business rules to deal with its strengths and weaknesses,” said Michael Breithaupt, technical director of Ocè ODT. “We’ve worked with them a little, but CSI has pretty much taken our product out of the box and tweaked and tuned it to fit its application. One of their rules, for example, involves not necessarily recognizing all the digits in a Social Security number, but realizing there are, in fact, nine digits strung together—and, just based on that knowledge, performing the automated redaction.”
According to Sal, IntelliDact is now 98.5% accurate, right out of the box. “In production, more than 99% of the documents will be redacted completely correct,” he said. “To achieve these rates, we recommend our customers take a look at 8-12% of the documents we process, based on confidence levels. As far as errors go, we try to err on the side of over-redacting information.”
CSI has tackled the volume challenge by linking together multiple servers running Ocè’s recognition engines. “We even surprised Ocè by showing them we could run 120 engines at the same time and perform parallel processing for millions of documents,’ said Sal.
IntelliDact has the ability to handle electronic, paper, or microfilm documents. “For backfiles, we can plug into various repositories and don’t even have to export the images,” said Sal. “We typically create a duplicate TIFF file that is redacted for public viewing, while the court’s private file remains in place. We can also provide our customers with electronic text of the files we redact. This enables them to perform full-text searches for images previously searchable only by a few index fields. To capture and process paper and microfilm images, we use a combination of Kofax Ascent and VRS and Ocè technology.”
CSI bases its billing on a per page model, which is the same way Ocè and Kofax typically charge for their technology. “We include our extraction, redaction, and full-text indexing capabilities all in the same package,” said Sal. “So, while IntelliDact’s extraction technology helps reduce manual labor, existing staff can be redeployed performing quality control for redaction. The bottom line is a wash. No one is losing their job because of technology, but the courts don’t have to hire extra personnel either, to meet new requirements.”
Yes, it seems CSI has found its vertical market. The company also sells a document management, case tracking, and workflow system to county courts. CSI has also filed for several patents based on the business logic it applies in IntelliDact. “Our primary focus now is the redaction of public records,” concluded Sal. “Any government agency that has a large amount of records and can’t afford to invest resources into manually redacting them is a candidate.”
For more information: http://www.csisoft.com/; http://www.odt-oce.com/usa/default.asp |