Home for the total source for document imaging technologies; document scanning, forms processing, OCR, ICR, integrated document
DOCUMENT IMAGING REPORT SPONSORS
2010 Census

Here's a link to the latest and greatest info on the 2010 Census Project:

http://www.census.gov/procur/www/2010dris

2010 Census Forms to be Scanned in Color

Color scanning will be part of the U.S. Census Bureau’s 2010 Decennial Response Integration System (DRIS). That’s the word from ADI, LLC, the Rochester-based research team helping to put together the RFP for the 2010 DRIS. ADI is currently testing technology it will recommend for the RFP, which is due to be issued next February.

“Our overall goal is reducing the requirements needed to process the census forms,” explained Steve Spiwak, VP, engineering for ADI. “During the last census, in 2000, because of the form’s design and color, some severe quality assurance (QA) had to be done to ensure the background would drop out so recognition technology could be applied effectively. We’d like to design the 2010 DRIS, so that no matter what color the form is, we will be able to drop out the background using color imaging technology.”

Earlier RFP to Facilitate Dress Rehearsal
2000 marked the first time the Census Bureau used digital imaging to process census forms. In 1980 and 1990, the Census Bureau used a microfilm-based mark recognition and key-from-image system. As with any initial venture, some mistakes were made in 2000, and ADI has set out to eliminate them this time around. ADI also served as an advisor for the 2000 system.

“For the 2000 census, we focused a lot of effort convincing the Bureau that electronic data capture could be done effectively,” said Spiwak. “It took more than two years before we convinced them. We started working with the Census Bureau in 1993, and the RFP for the 2000 system didn’t go out until 1996. As a result, it wasn’t awarded to Lockheed Martin until 1997. They were scrambling from that point on. Because of changing requirements, we never really made a dress rehearsal in 2000.”

Electronic Dropout Should Increase Tolerances
One of the biggest problems with the 2000 system was the color of the forms—an orange-yellow blend. Because the 2000 processing system used a colored lamp to drop out the background, any form completed with red ink had to be processed manually. “It affected less than 1% of the forms, but when you’re talking about 140-150 million forms—well, I wouldn’t want them on my desk,” said Spiwak. “We think electronic color dropout can eliminate this problem.”

ADI also views electronic dropout, along with improved image processing, as a means for widening the tolerances involved with printing census forms. This could possibly lead to the implementation of a targeted replacement mailing (TRM)—a process designed to reduce the overall cost of the census, which ran between five and six billion dollars in 2000.

“We made an attempt to introduce a TRM in 2000, but there wasn’t enough up-front planning,” said Spiwak. “A TRM involves mailing a second form to everyone that doesn’t respond to the first mailing by a certain deadline. The goal of a TRM is to increase the number of responses received through the mail. This reduces the money the Census Bureau has to spend hiring temporary workers to knock on doors and collect info. That process can get expensive.”

Several Options for Second Mailing
ADI is studying its options for a second mailing. “To fit within the Census Bureau’s timeframe, a second mailing would have to be turned around in a week to 10 days,” said Manuel Trevisan, VP of Imaging for ADI. “That’s quite a challenge, considering the initial print run is projected to take four to nine months. Granted, the second printing will not be as large as the first, but even if it’s 40 million forms, you’re talking a minimum of four months using traditional methods.”

Opening up the tolerances of the forms processing system could potentially simplify the QA process and increase the number of printing sites available for a TRM. “Increasing printing locations is just one option we’re considering,” said Trevisan. “We could also print TRM forms during our initial run and store them in a warehouse until it is time to mail them.”

Voting Could be Added to Recognition
According to Spiwak, TRM and color imaging represent the major focuses of ADI’s current research. “For the 2000 census, we were more focused on data capture, and we were very happy with the results we received,” he told DIR. “We achieved greater than 98% accuracy on hand-printed fields, and more than 99% on check-box fields. While we will consider recognition improvements such as voting techniques, we are leaving a lot of the details up to the contractor.

“The RFP will probably require the same level of accuracy as the 2000 system or better. For the first time, we are also requiring bidders submit a cost-per-document pricing model. This will encourage them to increase the accuracy of their automated recognition to keep their costs down.”

Several Vendors Contributed in 2000
As we mentioned, Lockheed Martin won the contract to implement the forms processing system for the 2000 census. The system was used to process 140 million forms in less than 170 days. This included peak periods during which six million forms were received daily. Each form had to be processed within two days of its receipt.

The majority of these forms were 25 1/2-inch long “short” forms. They included 93 handprint and 120 check box fields. Lockheed deployed some 160 Kodak 9500 bi-tonal high-speed scanners at four sites. Technology from Kofax, TMS Sequoia, Captiva, Staffware, Optimum Solutions, and CGK (now Océ ODT) was used for image processing, workflow, and data capture.

Spiwak provided some insight into how CGK won the character recognition part of the contract. “In 1995, we created a test deck of census forms, which we sent to 30 OCR vendors along with a letter detailing the results we were looking for,” said Spiwak. “Only three responded. CGK was one. Based on these responses, we could confidently include our recognition specifications in the RFP—we knew they could be met. We let Lockheed know that CGK was one of the vendors that could meet them.”

Looking to Simplify Data Conversion
According to Spiwak, Lockheed built into the system a number of automatic and manual data correction and verification techniques. “One improvement we’d like to make for the 2010 DRIS, is combining data capture with the coding process used by the Census Bureau,” said Spiwak. “To organize its data for analysis, the Census Bureau assigns a numeric code to each piece of data. Each state, for example, has a different number.

“In the 2000 system, coding was a separate process. In 2010, once a field is recognized as ‘New York,’ we’d like it to be immediately coded. In addition to eliminating an extra step, this should increase the field-level recognition accuracy because misspellings won’t really matter. As long as there is a high-level of certainty that a respondent meant ‘New York,’ the data can be converted to a code and the spelling won’t need to be corrected.”

Spiwak added that the potential elimination of the long form, a 40-page booklet that was randomly distributed in 2000, could also improve recognition rates. “The Census Bureau has initiated a project called the American Community Survey (http://www.census.gov/acs/www/), designed to collect long-form type information every year from a sampling of the population,” said Spiwak. “If that project proves successful, the long-form could be eliminated for the 2010 census.”

Only the Biggest and Best Need Apply
Lockheed was one of only four contractors who bid on the 2000 census processing contract. “We weeded out a lot of competitors by making them provide us with hard data on their success in capturing information from a test deck of census forms we distributed,” said Spiwak. “There will be a similar test for the 2010 RFP.”

After the 2000 census forms were processed, Kodak bought back the majority of the 9500 scanners, so the 2010 DRIS will be a brand new system. “One difference is that, for 2010, both the staffing and the system will be handled by the same contractor,” said Spiwak. “In 2000, TRW handled the staffing.”

Spiwak added that once again the contract for printing the census forms will be awarded separately in a deal brokered by the Government Printing Office with technical input from ADI. “Because the processing contractor will have to handle everything output by the printing system, I think there should be more interaction between the two sides,” said Spiwak.

“The Census Bureau, for example, is currently testing a number of different forms designs and will send out a sampling of these over the next two years. The current plan is to process these test forms by keying from the paper. The Bureau will then pass on the forms to ADI for our own testing. To get the best feedback, I think the Census Bureau should be processing these test forms initially using imaging and recognition technology.”

Archiving Part of 2010 Contract
Also new for the DRIS 2010 system will be a requirement that the contractor manage the archiving of the census forms. After the 2000 Census, a controversy erupted over who was responsible for the costs associated with managing the long-term storage of the electronic images. Eventually, the images were converted to microfilm, at a cost of more than $16 million to the Census Bureau.

“For the 2000 census, archiving was an afterthought,” said Trevisan. “It was not part of the data capture contract. When the Census Bureau offered the digital images to the National Archives and Records Administration (NARA), NARA insisted the Census Bureau provide money for maintenance. The Census Bureau refused.

“Census forms aren’t even made available to the public for 72 years. Over that period of time, technology is bound to change, so if you don’t use a human readable solution like microfilm, you have to keep migrating your images and information.”

Eventually, it was decided that the Census Bureau would foot the bill for the filming of some 560 TIFF images. Because the images were duped, the project involved over a billion images. The Cerebral Palsy Research Foundation (CPRF) was awarded the contract—the amount of which was undisclosed. However, services giant ACS reported it received a $16 million subcontract from CPRF [see DIR 12/21/01].

According to a presentation made by the Census Bureau at a recent vendor meeting regarding the 2010 DRIS, the 2010 contractor will be responsible for all phases of DRIS lifecycle, including disposition and archiving. Trevisan indicated that microfilm would again be a likely choice for archiving. “The Census Bureau and NARA have some pretty extensive experience with microfilm and view it as a medium they can trust,” he said.

ADI Currently Running Tests on Color
The RFP for DRIS 2010 is scheduled be awarded in October 2005. Whoever wins will be required to subcontract 18% of the deal to small businesses. A complete dress rehearsal is scheduled for completion before the end of 2008. “We’ve already looked at several color scanners and some color IP and dropout technology,” said Spiwak. “We do not recommend products or brands, but will recommend that certain types of technology be included in the RFP. We may also post some results of our testing on the 2010 DRIS Web site. We won’t list any vendor names, but, privately, we will let the vendors know where they placed.”

Any vendor interested in having its products tested by ADI should contact Dr. Brad Paxton at phone number: (585) 239-6057. He can be reached via e-mail at brad.paxton@adillc.net. More information on the 2010 DRIS can be found at http://www.census.gov/procur/www/2010dris

Some Background on ADI

So just who is ADI? According to Steve Spiwak, VP, engineering, the company, which has six full-time employees, acts as the technical arm of the U.S. Census Bureau. The ADI team started working with the Census Bureau in 1993. At that time, it was part of a research corporation owned by the Rochester Institute of Technology (RIT). “In 2002, when RIT discontinued its research corporation, we formed our own company,” said Spiwak. “ADI stands for Advanced Document Imaging.”

Although the Census Bureau is ADI’s primary customer, the company is also looking at other revenue opportunities. “We recently trademarked the term ‘Digital Test Deck’ and have filed for a patent for technology related to a product we have by that name,” said Spiwak. “It involves a method for digitally creating simulated handprint-filled forms for testing automatic recognition systems. We plan on using this technology in testing for the 2010 DRIS. We will market it commercially as well.”

Spiwak can be reached by phone at (585) 239-6055, or
e-mail: steve.spiwak@adillc.net

NARA Initiative Does Not Account for Imaging

You may have seen that the National Archives and Records Administration (NARA) recently awarded $20.1 million in contracts to a pair of integrators to develop competing Electronic Records Archiving (ERA) prototypes. Harris Corporation and Lockheed Martin will each spend a year developing a blueprint for the ERA system. According to NARA, “ERA will be a comprehensive, systematic, and dynamic means for preserving virtually any kind of electronic record, free from dependence on any specific hardware or software.”

So does ERA, along with NARA’s support for the developing PDF-A standard, mean NARA may be migrating from microfilm to digital imaging for some of its long-term archiving requirements—such as those presented by the census? Well, here’s the answer we received from Harris’ Karen Knockel, ERA program manager, regarding microfilm and paper’s place within an ERA infrastructure, “At this time, the ERA program only includes the effort for maintaining the catalog for NARA’s records which were not ‘born electronic.’”

What is unclear is the ‘birth’ status of TIFF images created through the scanning efforts of an agency such as the Census Bureau. As there is a good chance the ERA system will be in place by 2010, it will be interesting to see what call NARA makes on the census images. It could have wide-reaching ramifications in the archiving market.

For more information, check out the ERA Web site at http://www.archives.gov/electronic_records_archives

COPYRIGHT 2003 - RMG ENTERPRISES | ALL RIGHTS RESERVED | PRIVACY STATEMENT | DISCLAIMER