What’s Next For PDF/A?
Now that ISO standard has been finalized, efforts toward adoption begin.
The PDF/A document archiving format has been ratified as an ISO standard, but now the real work begins according to Melonie Warfel, Adobe’s director of worldwide standards. “So many standards get developed and published by ISO, but are never utilized because there isn’t any proper guidance,” Warfel told DIR. “To ensure PDF/A is adopted, we need to develop items like application notes, FAQs, and a Web site to help people implement PDF/A-based solutions.”
The standard was unanimously approved by an ISO committee on May 23 and should be published in its final form by this fall. “Because the vote was unanimous, it will go straight to publication,” said Warfel. “It will take a couple months to work out some details, but no technical changes will be made. Countries like China, France, Germany, Italy, Japan, and the Russian Federation all approved without comment. Sweden, the Ukraine, the U.K., and the U.S. each added some commentary.”
It’s easy to understand why Adobe is excited about PDF/A. It is the first real attempt we are aware of to come up with a standardized format for permanent electronic storage of documents. Most electronic document file formats are either application-specific or transaction-oriented. Falling into this second category are formats like TIFF, ASCII, and XML.
Adobe, of course, controls the PDF format, which is the basis for PDF/A. So, logically, Adobe is in a position to benefit from widespread acceptance of PDF/A. With the emergence of federal regulations governing document archiving, as well as increased attention being given to the topic by non-regulated businesses, the potential for widespread acceptance is definitely at hand.
“We have many organizations looking at how to utilize the standard,” said Warfel. “This includes the U.S. Courts, NARA (National Archives and Records Administration), and the U.S. Patent and Trademark Office. We expect these organizations to start coming up with guidelines on how PDF/A files should be submitted to directly meet their needs.”
Some of these guidelines might include specific meta data requirements. It’s important to note that the PDF/A standard includes support for XMP, an XML-based standard that will enable sharing of PDF/A meta data among applications. There are also two levels of PDF/A that end users can choose from—A and B. “Level A creates a tagged PDF, which carries over the structure of a document when displaying it in different formats,” said Warfel. “This is especially important in government applications that require compliance with Section 508 [of the U.S. Rehabilitation Act.”]
A User Perspective DIR caught up with end user Geof Huth, manager, records archive services for the New York State Archives. He discussed with us his plans for implementation of PDF/A. [Huth recently presented on PDF/A at an ARMA Long Island chapter meeting.] “We’ve decided, at least tentatively, that we are going to start using the general PDF format for archiving,” said Huth. “We are going to utilize a very simple version and not put in anything too complicated. Once there is a usable version of PDF/A available, we plan to adopt it.”
Huth cited two main reasons for the choice of PDF for archiving. “With PDF, it’s fairly easy and simple to capture the look and feel of a document,” he said. “This is becoming increasingly important as a lot of today’s electronic documents are pretty heavily formatted. If you save those documents in a text-only based format such as ASCII, you lose that formatting, which can be important to their meaning. Also, a PDF is not a dead image like a TIFF. PDFs maintain the ability to be searched like native electronic document formats.”
Huth noted that the state will not use PDF to archive complex documents containing information elements such as java scripts and video. It will also not use PDF for paper documents or images. “We typically retain paper documents in their original format, or on microfilm,” he said. “We may scan and convert them to PDF if they are fragile or high-access documents. For example, we recently scanned a bunch of documents pertaining to the Civil War. Typically though, paper is an efficient archiving medium. We have plenty of space, and paper requires very little intervention as long as it is kept in temperature and humidity controlled areas.
“As far as images go, we see no advantage in converting TIFFs or JPEGs to PDFs. Both formats have been around awhile and have proven they have some legs.” [It’s worth noting that since PDF/A has the capability to encapsulate TIFF and JPEG images, technically these images could always be converted to PDF/A files and viewed with a PDF reader at any point in the future.]
PDF 1.6 Next Stop On Roadmap The first version of PDF/A is based on PDF 1.4, which means its files can definitely be read with Adobe Reader version 5.0 or higher. [Warfel said Adobe hadn’t tested anything below 5.0] As far as document imaging goes, 1.4 supports JBIG2 compression of text, but nowhere in the PDF/A draft is JBIG2 specifically addressed. The draft does say, however, “a conforming file may include any valid PDF 1.4 feature that is not explicitly forbidden by this part of ISO 19005 [the PDF/A standard number].”
“PDF/A is a restricted version of the PDF format,” said Warfel. “We restricted it to ensure we can render reliable representations of documents in the future. The PDF format allows for a lot of things, but users may not want some of that flexibility in their archiving format. For example, PDF/A does not allow encryption or password protection, which are part of PDF. PDF/A also requires that fonts be embedded in the file.”
According to Warfel, version 2 of PDF/A will be based on PDF 1.6 and will include additional functionality in the areas of imaging, complex documents, and digital signatures. “We’ve received guidance, especially from the Library of Congress, to include JPEG 2000, which is incorporated in PDF 1.6,” said Warfel “There is also a lot of talk about writing application notes on how to incorporate digital signatures in PDF/A. The initial version neither includes nor disallows digital signatures. The feedback we’ve received is that the user community wants more guidance on digital signatures for their archived files. PDF 1.6 provides a lot more functionality around digital signatures.
“PDF 1.6 also has the ability to include layers, which is important when dealing with engineering drawings. Working with 1.6 covers the PDF/E (engineering) standard we currently have in development.”
Warfel estimated the second version PDF/A would be approved in two years. The first version was in development for three years after the initiative was launched in 2002 [see DIR 9/20/02]. “We have now established a solid base,” said Warfel.
Applications, Guidelines Needed To Drive Adoption Adobe has plans to incorporate the current version of PDF/A in its market leading Acrobat line of PDF creation products. Acrobat 7 Professional already supports the draft version of the standard. Warfel indicated Adobe is working on improving its PDF/A functionality. Adobe will also begin working with end users to help them come up with policies and procedures for creating acceptable PDF/A documents for permanent storage.
The two example software development guidelines for creating PDF/A files that are listed in the draft of the PDF/A standard both pertain to document imaging applications: -- “Writers of conforming files should not use lossy compression, subsampling, downsampling, or any other process that either alters the content or degrades the quality of source data in the conforming file.” -- “Software should not substitute searchable text, based on optical character recognition, for the original scanned text within the bit-mapped image of documents that are scanned to conforming files from paper or converted to conforming files from image formats.”
Warfel said other guidance could be provided in areas such as how to incorporate pull-down lists utilized with electronic forms being archived. Agencies like the New York State Archives are also wrestling with hardware requirements, which are left out of the PDF/A standard. Huth said the New York State Archives is currently accepting CD and DVD WORM discs and soon expects to expand that to include specific types of WORM tape.
Market Opportunity Is At Hand We should probably conclude by saying that Adobe will definitely not be the only software vendor competing in the PDF/A market. There are several hundred commercially available PDF creation applications, and there is no reason to believe that most of them won’t incorporate PDF/A. PDF/A is, after all, an openly published standard. And unlike regular PDF, which is completely controlled by Adobe, PDF/A needs approval by a committee, which Adobe’s PDF competitors are free to get involved with.
There are also alternatives to PDF/A—especially in the area of scanned images. After all, PDF was not specifically designed with document scanning in mind, while formats like DjVu and JPEG 2000, Part 6 were. In fact, the Irish Local Government Computer Services organization recently standardized on LizardTech’s DjVu application, and LizardTech is attempting to market its software to other state and local government organizations with archiving responsibilities. To help make the proprietary DjVu format more palpable for long-term storage, LizardTech recently announced support for an Open Source Java-based DjVu viewer.
Germany-based LuraTech is currently marketing a JPEG 2000, Part 6 application, which creates files in a .JPM format. .JPM and JPEG 2000 are both based on open standards. The Library of Congress has experimented with products from both LizardTech and LuraTech.
The war for document archiving market share is far from over with the ratification of PDF/A. In fact, it is just beginning. Granted, Adobe has quite a bit of marketing clout and mindshare over its competitors— with its billions in revenue and large install base of free PDF readers. We’re not going to begrudge the fine job Adobe did promoting the adoption of PDF and then driving the creation of a much needed electronic document archiving standard.
We will say there appears to be room for more than one solution in the electronic archiving space. Many organizations like the New York State Archives, for example, look at paper and electronically generated documents in separate lights, and maybe PDF isn’t always the best choice for scanned documents. However, it is also safe to say that Adobe is definitely not asleep at the wheel when it comes to scanning, and their vision of a unified world of images and electronic documents being stored in a single format does sound pretty attractive.
For more information: http://www.aiim.org/standards.asp?id=25013; http://www.digitalpreservation.gov/formats/fdd/fdd000125.shtml |