Quantcast
Channel: Typophile - Comments
Viewing all articles
Browse latest Browse all 20084

[Michel Boyer:] On the other

$
0
0

In reply to Phonetic symbols in Calibri and Cambria:

[Michel Boyer:] On the other hand, using the slashed circle for a linguistics null symbol looks inappropriate. In any case, shouldn't linguists have their own symbol?

Semantically the linguistic null symbol is the same as the mathematical empty set. Both are used to mean ‘nothing’ or something similar. But typographically they are very much distinct. From a Unicode perspective this means that the two are to be identified as the same character, and hence the problem is left to font design. It’s similar to the language-specific issues regarding letters like italic U+0442 ‘Cyrillic Small Letter Te’ in Serbian versus other Cyrillic, or whether a dollar sign should have one vertical bar or two. In these sorts of cases the meaning of the symbol is the same, it’s just the shape that differs. But where OpenType provides contextual indicators of languages, there is no such indication for scientific disciplines. I’m not suggesting that such things are necessary, just that the parallel between the two situations breaks down there.

Linguists do generally prefer TeX’s default $\emptyset$, and mathematicians generally seem to like $\varnothing. (La)TeX makes this distinction explicitly available, but Unicode does not. I actually side with Unicode on this, though obviously I bemoan the lack of differentiation outside of the TeXosphere.

[charles_e:] As a typesetter, one problem I always encounter is whether to use the proper Unicode characters, U+02BB & U+02BD, or U+2018 & U+2019 (single open & close quotes).

The difference between the modifier letters (U+02BB – U+02BD and U+02EE) versus the open and close quotes (U+2018 & U+2019) is not really in their appearance. Instead, the modifier letters in Unicode belong to the Lm (Modifier_Letter) category and hence they are meant to be processed like alphabetic characters (A–Z, etc.). That means that they are part of the language’s alphabet rather than being accessory symbols like punctuation. In contrast, U+2018 and U+2019 are in the Pi and Pf categories respectively (Initial_Punctuation and Final_Punctuation). They are punctuation characters just like the ampersand, hyphen, question mark, octothorpe, and so forth.

This distinction between modifier letter and punctuation is crucial but largely invisible. The modifier letters should be treated just like any other letter in the language’s alphabet. So if a modifier letter is part of a digraph then it shouldn’t be divided for hyphenation, for the same reason that you wouldn’t divide ch as c-h in English. It’d be equally wrong to divide Tlingit’s tsʼ as ts-ʼ. Although the idea of hyphenating a quotation mark seems strange, there are other contexts where the difference is important. In American-style punctuation practice it’s possible to reorder the punctuation symbols at the end of a quotation, so that commas and periods should be shifted to the left of a quotation mark: ‘... foo’, he said should become ‘... foo,’ he said. For modifier letters this rearrangement is absolutely prohibited: ... tʼoochʼ, yéi yaawaḵaa should never ever become ... tʼooch yéi yaawaḵaa because that latter form is nonsense in Tlingit. Modifier letters are simply not punctuation, and should never be confused with punctuation. They are instead inherent parts of a letter or word, just like all the other alphabetic characters. (Your mileage may vary regarding apostrophes in English and French used to mark elision of letters, or English’s possessive suffix -’s.)

Now, most linguists don’t make this distinction in Unicode because they don’t know about it. Or rather, they don’t know that they know – they have an implicit understanding of the distinction but don’t realize that Unicode actually puts this distinction into practice. Nearly every linguist I’ve ever met is ignorant of Unicode’s fine structure, they are only concerned with “is there a character in the ‘symbols’ dialog box that looks like what I want?”. So it’s the typesetter’s job to figure out from the submitted typescript that there should be a distinction between one kind of apostrophe symbol used for punctuation and another kind of apostrophe symbol used alphabetically. Often this is obvious, but sometimes you have to ask. The simplest question to ask the author is whether a particular apostrophe is part of a letter or word, or whether it’s just punctuation like in English. You may get back an essay on ejectives or glottal stops or something, or you may get a nastygram saying “it’s there for a reason, don’t change it, dammit”, but in either case you’ll get feedback which is better than mangling the results and being chewed out later.


Viewing all articles
Browse latest Browse all 20084

Trending Articles