Published: 20/12/2024
I have long struggled with missing diacritics in web-fonts when I am attempting to write transliterated Sanskrit according to the International Alphabet of Sanskrit Transliteration (IAST) scheme. Even certain system fonts on Windows seem to omit necessary diacritics, meaning you either have to deal with Sanskrit words being represented at best jankily and at worst totally inaccurately on some computers.
Alternatively, you can use an ASCII transliteration scheme like the Harvard-Kyoto system or ITRANS, but I find these to be rather inelegant and often unintuitive. I also dislike that they necessitate you to forego capitalisation of proper nouns, as capitalisation is used as a substitute for appropriate diacritic marks. Your only option then, unless you are willing to subject your readers to the bandwidth tyranny of an absurdly large font file, is to subset a unicode font.
'Subsetting' is removing the (typically) hundreds of unnecessary characters included in the full file, such as niche mathematical ligatures and alternative alphabets like Cyrillic and Greek. This often makes the font file dramatically smaller: the subset version of the font I use, Source Sans 3, is 20x smaller that the full font file.
Typically, I have enjoyed using google-webfonts-helper for this, but the issue is that their subsetting options are too limited to know for sure whether the total range of diacritics needed for IAST are included. Inevitably, this led me to the CLI tool fonttools, which is not particularly intuitive to use, hence this blog post. Here's how you do it:
pip install fonttools
, or if you use Homebrew on Mac, brew install fonttools
..ttf
static font files but you can alter the command as needed.pyftsubset /path/to/your/font.ttf \
--output-file=/path/to/your/font-subset.woff2 \
--unicodes="U+0020-007E,\
U+00D1,U+00F1,\
U+0100,U+0101,U+012A,U+012B,U+015A,U+015B,U+016A,U+016B,\
U+1E0C,U+1E0D,U+1E24,U+1E25,U+1E36,U+1E37,U+1E38,U+1E39,\
U+1E42,U+1E43,U+1E44,U+1E45,U+1E46,U+1E47,\
U+1E5A,U+1E5B,U+1E5C,U+1E5D,\
U+1E62,U+1E63,\
U+1E6C,U+1E6D" \
--layout-features='*' \
--no-hinting \
--desubroutinize \
--flavor=woff2
The unicode ranges specified in this command include all of the necessary diacritics for IAST as well as all of the letters, punctuation, and symbols necessary for typical English communication. If you require any other specialist symbols, you can simply add their unicode ranges to the command.