Aditya Mukerjee: I Can Text You A Pile of Poo, But I Can’t Write My Name ～ take

Aditya Mukerjee: I Can Text You A Pile of Poo, But I Can’t Write My Name

My family’s native language, which I grew up speaking, is far from a niche language. Bengali is the seventh most common native language in the world, sitting ahead of the eighth (Russian) by a wide margin, with as many native speakers as French, German, and Italian combined.

[..] Until 2005, Unicode did not have one of the characters in the Bengali word for “suddenly”. Instead, people who wanted to write this everyday word had to combine three separate, unrelated characters. For English-speaking teenagers, combining characters in unexpected ways, like writing ‘w’ as ‘\/\/’, used to be a way of asserting technical literacy through “l33tspeak” – a shibboleth for nerds that derives its name from the word “elite”. But Bengalis were forced to make similar orthographic contortions just to write a simple email: ত + ্ + ‍ = ‍ৎ (the third character is the invisible “zero width joiner”).

Even today, I am forced to do this when writing my own name. My name is not only a common Indian name, but one of the top 1,000 names in the United States as well. But the final letter has still not been given its own Unicode character, so I have to use a substitute.

Worth reading in its entirety.

My prior view on the "Han unification" process was that it was undertaken to make the process of capturing all characters in use easier, and to thus increase the number of characters that end up in Unicode. But the comparison to a Latin/Greek/Slav equivalent gives me some pause as to whether it is an effective process, and whether people with deep knowledge would want to participate in it.

(Note that the original article is from 2015; things undoubtedly change over time, but I don't remember hearing about significant changes in the Unicode process recently.)