Developing international software and localization problems

Whenever software is developed, it is influenced by the culture and the language of the developers. The process of extracting the culturally and linguistically dependent part of software applications is called internationalization. Script-specific aspects (character encoding, character sizes, line size and spacing), language-specific aspects (collating sequence, hyphenation rules, morphosyntactic rules), numbers and dates, societal conventions (semantopragmatics), icons and symbols, the use of colour, and the use of controlled language by technical writers are some of the issues of an internationalization process.

New products should be designed in such a way that they are culture-independent. Such products are sometimes described as being "enabled" for localization — i.e., they can easily be adapted for customers within a particular local market. Software internationalization is a framework for software localization, and it also is the process of designing and developing products with sets of features, functions and options in order to facilitate the adaptation of the product to various international markets.

Localization is the opposite process, since it takes a previously internationalized software application adding features and elements that match the target culture and market. The transparent input/output of the local language, the translation of menus, messages, help scripts, on-line tutorials and manuals are issues that must be handled in a localization process. Public, commonly agreed guidelines and methods need be defined to support the internationalization and localization process. Tools need be specified and developed to assist the software developers and the localizers.

Localization is a linguistic task because the translation requirement is not simply the substitution of one body of text by another. During execution several pieces of translated text may need to be brought together and composed — the result has to feel natural for a native speaker. In developing international software we need to be able to indicate the required text in a neutral way (frequently this is done with a "message number") and extract this at run time. This represents the intended meaning. Producing the message at run time is a problem of language generation, given the elements of meaning and the rules of composition.

It is also a linguistic problem because many software packages capture and manipulate text that has been supplied by the users. Examples of this are word processors and database management systems. In using these packages we frequently are required to match text. What constitutes an acceptable match depends upon the language. We frequently ask for text to be sorted — sort orders are language and culture specific. Software embeds assumptions very deeply — for example, hashing algorithms will be constructed with the statistical properties of a particular corpus of words or names in mind.

Localization is also a software technology problem because we must be able to organize the software so that the linguistic components are isolated and can easily be replaced. This leads to the consideration of how standard software packages like window management systems, word-processors and database management systems are constructed — where the assumptions about a particular natural language and culture are embedded. We will have to propose extensions to current best practice so that the linguistic and cultural assumptions are factored out. We are also led into more general approaches to software construction, involving software reuse and componentry and the programming languages that are used to describe components and their interconnection. We will be considering reusable linguistic components that can be deployed as required in the construction of software packages.  


adapted from: Project Glossasoft