Quantcast
Viewing all articles
Browse latest Browse all 20084

But Unicode had a specific

In reply to Language and writing system:

But Unicode had a specific technical motivation for distinguishing characters and glyphs: it allowed them to define an encoding principle independent of text shaping and to clearly define areas of responsibility between different levels and kinds of software. There simply isn't any parallel motivation in encouraging diacritics to be treated as letters in orthographies. Indeed, insofar as Unicode has anything to say about this at all, it is explicit that the kinds of operations that rely on the definition of a letter within an orthography -- of which collation is the principal example -- are the responsibility of higher level protocols, which generally have no difficulty capturing the vagaries of different orthographic sorting conventions.

Unicode Technical Report #10 defines the Default Unicode Collation Element Table (DUCET), which provides a generic sorting and string comparing method, but the whole point of DUCET is that it is customisable for individual languages. The default collation is only used when individual language information is not available. As soon as one is working with a specific language, the orthographic conventions of that language, including the the distinction of letter and diacritic, become available data. Unicode further provides the Locale Data Markup Language (LDML) to capture that data and make it available to software in a consistent and standard interchange format.

So I'd say that all of Unicode's efforts in this area are towards enabling language-specific conventions, not to influencing a trend towards treating diacritics as letters.


Viewing all articles
Browse latest Browse all 20084

Trending Articles