Content services jobs heading for India used to be 100% in English. But with globalization and localization on every astute publisher's mind, releasing co-editions of core products and ancillaries is becoming commonplace. At the same time, English-language publishers aren't the only ones seeking out Indian vendors: Asian, Middle Eastern, Eastern European and Scandinavian publishers are following in their footsteps. As a result, languages with double-byte character sets (Chinese, Japanese), elaborate hyphenation and diacritics (Thai, Croatian), and right-to-left or bidirectional text (Arabic, Hebrew) have ceased to be as perplexing as they were a decade or so ago. For vendors, anything that is recognizable by computers and can be tagged in XML is fair game. Language-specific rendering scripts have made it possible.

With non-English projects on the rise, it's hardly surprising that Delhi-based Thomson Digital ( went offshore in its hunt for foreign-language specialists. It became the first India-based vendor to have an offshore production facility when its Mauritius operation started in 2004. “The island is just seven hours by air from Delhi and 70% of its population is of Indian origin, giving us both geographical and cultural proximity,” explained COO Vinay Singh. “Best yet, its multiethnic people speak French. We couldn't have a better location that provides a cost-effective solution to meet the growing number of French-language projects.”

Presently, the facility offers A-to-Z services, including multiformat conversion, translation, content development, design and project management. “Our Mauritius team usually receives inputs in digital format. But for one recent project, where we had to work from scratch, we were given printed documents, handwritten chapters and files of varying formats. Our personnel had to key in mathematical equations, insert corrections, decide on word breaks and hyphenation, add accented characters and decipher the handwritten pages—the whole works, essentially!” The project was such a success—delivered on time and to the client's satisfaction—that Thomson Digital was duly elevated to preferred vendor status. Two and a half years since its entry into the French-language business, the company has established a solid reputation in this segment and now boasts four key publishing customers.

Encouraged by this success, Thomson Digital has extended its reach to the Spanish, German and Dutch markets. Said Singh, “We offer the same skill set and services for all languages, and we have in-house tools and technology in place ready to be deployed seamlessly in all of our five production facilities.” Japan may be the next destination: Singh is exploring a joint venture with a Japanese industry leader to offer content services in Japanese, Chinese and Korean.

Over at Planman Technologies (, also based in Delhi, non-English newspaper conversion/digitization has long been a niche segment. V-p for international sales Amit Vohra recalled, “One of our earliest projects involved producing 50,000 pages of Arabic text, and we hired 30 linguists to ensure that both language and content were accurate. More recently, one 200,000-page Danish news agency project arrived at our door with a six-month deadline. We extracted the text using OCR technology and integrated Danish-language dictionary and hyphenation support into the QC workflow. The content was then converted into XML.”

Planman has also been digitizing Norwegian newspapers—the daily Aftenposten as well as Adresseavisen and Trondades—for the National Library of Norway. “We usually receive microfiches, which are scanned at 300 dpi grayscale. We process the scanned pages according and deliver the final output in several formats: issue-level PDF, page-level PDF, JPEG2000 compressed image and XML,” explained director Sourav Chatterjee.

Every month, Planman processes about five non—English-language projects—two newspaper digitizations and three general content conversions—which make up roughly 40% of its output. Besides the languages mentioned above, it has also tackled French, German, Welsh, Swedish and Hebrew projects. “Text extraction and quality assurance are the biggest challenges in non-English projects, especially when special characters and diacritics are involved,” added Vohra, whose team has just completed two Spanish-language el-hi projects comprising a total of 2,500 pages that require the full treatment: composition, formatting and proofreading.

Multilingual processing is a fast-growing segment within the industry. Other vendors such as DCS BPO, Scope eKnowledge and Lapiz Digital are also busy exploring opportunities outside the English-speaking territories and reaping first-mover advantages.

This is the third column in a regular series highlighting companies and services in the $4 billion business processing outsourcing industry.