Think of the DTD (document type definition) as a road map of a document. It uses tags and attributes to define the structure of the document: the order of the elements, their relationships and the rules governing them. What's most important for publishers is that DTDs enable the creation of XML documents bound by predetermined rules and structures.

“Any content, be it a cookbook or an aircraft manual, can be represented in DTD, which at a granular level allows us to capture and identify anything, down to a single word space or character,” explained Rajiv K. Seth, executive director at Bangalore-based Macmillan Publishing Solutions (MPS). “There is really no limit to DTD, and not every DTD has to be built from scratch. There are many public-domain DTDs that can be customized to meet a publisher's requirements. It is also possible to plug into a newly developed DTD these public modules, such as CALS for tables and MathML for math.” And for those looking for further cost savings, industry-standard DTDs such as DocBook or OeB DTD for books and NLM DTD for journals are available online.

V.N. Kumar, chief technology officer at MPS, offered advice on DTD development: “Creating a DTD in modules would make customization possible and ease maintenance down the line. Portions can then be added or deleted without needing a complete reworking. For instance, if a book contains more straight text than formulas, then the relevant modules can be used, and vice versa.”

Most content services projects, however, extend beyond DTD development. Kumar recalled one such project: “It was a directory requiring constant additions, deletions and modifications, and last-minute changes that made it into print were sometimes not captured in the client's database. Our team developed a generic DTD that took care of the client's specific requirements and moved the project onto an XML-first workflow. We also added auto-pagination and index-generation features. Now the publisher has the same print and online content and is able to make use of the XML content for more purposes.” Kumar and his team also handled one 200,000-page annual project involving XML transformation in 22 European languages. “Our client supplied the DTD in English. Tags need not be written in the same language as the content.”

However, said Sagayaraj Irudhayaraj, head of technology at Integra, an understanding of the language used in the content “is essential to document analysis prior to DTD development. Otherwise, it would be difficult to identify the elements of a document and create the appropriate tags.” Added CEO Sriram Subramanya, “Analyzing and understanding a publisher's typical projects or documents and creating a DTD that accommodates various structures residing in them is often the biggest challenge.”

Take one 5,000-page project that landed on its doorstep recently. The raw material for this nontechnical book came in a database as MS Word files and images nested in tables. “Our first tasks were to understand the relationships between the various tables and to develop a suitable program to extract and rearrange the content in a logical manner before building the DTD,” said Irudhayaraj. The content was then converted using the new DTD and the pages composed on an XML-first workflow. The project was completed within three weeks. So far, this 14-year-old Pondicherry-based company has yet to come across a situation where it is unable to build a DTD.

As to what a publisher can do to make DTD-building easier for the vendor, Irudhayaraj said, “Letting us know the various products that they plan to create out of the XML content is crucial. This would allow us to determine the granularity right from the start and to build the DTD accordingly. Needless to say, the more complex the DTD, the more maintenance and support required. Publishers—and vendors, for that matter—must strive to keep a DTD as simple as possible.”

And for those intimidated by the mere mention of the acronym, Subramanya said, “DTDs are in a simple, human-readable format, though you need to know the syntax to understand and comprehend them.” Fortunately, for publishers out there, you can bank on most content services vendors, in India or elsewhere, to build, rebuild or deconstruct DTDs for any projects that you can possibly imagine.

Note: This is the fifth column in a regular series highlighting content/publishing services provided by India-based companies.