• languages
    character set

Underware Latin Plus is a character set developed by Underware, supporting over 200 Latin languages. This character set, which includes 446 characters in total, is created to offer decent language support for fonts. All characters are mapped to languages, because they go hand in hand. A detailed overview of the Latin Plus character set and its languages is presented here.

There are plenty of character sets already, but we had to make our own. A character set is a defined list of characters which can be recognised by a computer, because every character is represented by a number. ASCII, more than 50 years old, is one of the most famous character sets. ASCII includes 128 specified characters. ASCII was originally developed as 7-bit code, which only allowed 128 different characters. Fortunately, nowadays there is also Unicode. The first 128 characters are the same as ASCII, but Unicode includes an additional 100.000 characters. This large amount of supported characters is possible because Unicode uses 1-4 bytes for each value, which allows for more than 1 million different characters.

So why another character set? If you consider Unicode a unique number for every possible character which exists in any language, you can imagine many characters which have a Unicode value are hardly ever used. Unfortunately, Unicode does not provide reliable information on which language requires which character. Therefore we created a hand-picked list: the least number of characters possible, which support (relatively) the largest number of languages possible. That is Underware’s Latin Plus. Those manually selected characters are mapped to languages which require them, which offers new possibilities (like the validator for example).

This overview of languages and their required diacritics was originally created by Underware to offer thorough language support in their fonts. However, this info can also be useful while designing typefaces (think of specific OpenType features), or for other purposes.

Orthographic information for each language has mostly been collected from at least four different (on- and offline) sources. In case of contradicting orthographic sources, and if we couldn’t convincingly point out the superseded version, we included all diacritics mentioned in various sources. So there could be some superfluous diacritics listed. Better safe than sorry. In some rare cases only one orthographic source was available, so we had to rely on that.

Some sources are limited to one or just couple certain languages. Those are not listed here, but might be listed at each specific language. Other orthographic sources cover a wider range of languages, for example: Unicode Consortium, Decode Unicode, Evertype, Omniglot, Eesti Keele Instituut, Wikipedia, Geonames, Language Museum, Context of Diacritics.

Note that this is a work in progress (since 2008), and subject to future changes. This data is presented as is, without any warranty. For questions, additions and improvements please contact Underware.
The overview of languages which are supported by (all) Underware fonts is shown in a heatmap, representing the number of speakers. A mouseover reveals related languages, not based on the language family tree but on diacritics usage. Currently 220 languages are listed. Note: this is not an all-embracing universal language database. Dozens of researched languages have been excluded for various reasons. For example: some languages require characters which don’t have a Unicode, or because there aren’t any design standards for some required exotic characters. Read more about these decisions in the case-study Notes on Underware Latin Plus.

Each language shows the number of native speakers. Where possible, only first-language speakers have been counted, no second language. When different sources mention different numbers of speakers, the smallest number has been used. Note that these numbers might not be entirely correct, as nobody knows the exact number of native speakers for any given language.

You’ll also see the required diacritics for that language, the most important piece of information in this section. The word diacritic is a bit misleading. Not just characters which are marked for accent or tone, but any character beyond the basic A-Z are called diacritics for the sake of convenience (so also schwa for example). Note that the goal of this database is to be able to create fonts with specific language support. Because this database minimizes the risk of failures in which a document requires diacritics which are not in the font, languages could contain superfluous diacritics according to (some?) official orthographic standards.

If a corpus is available for a language, the diacritics are shown in order of frequency. Otherwise they are listed alphabetically. {Source of corpus is Invoke IT} A small world map shows in which countries this language is an official language. {Source for all maps: Ethnologue} Additionally there is a short text about the language with some links for more extensive information. {Texts are mostly extracts from Wikipedia and/or Omniglot} If information about an endangered language is known, its “health” will be listed. {Data by Unesco}

If diacritics are required for a language, related languages are listed as “brothers”. These relationships are not based on their language family tree, but on the use of their diacritics. Just another way to look at relationships between languages.

If available, a translated sample text will be shown in the respective language. These sample texts are all translated by people. The text is always identical, allowing for comparisons of sound, look and structure of languages:

Don’t be a cuckoo if you’re a nightingale.
Don’t be a nightingale or a flycatcher, if you’re a dog.
But anyone can make sound.
We are Underware.
The overview of characters which are included in all Underware fonts. Diacritics are shown in a heatmap, representing the amount of (native) speakers. A mouseover highlights related diacritics: diacritics which are essential companions of the selected diacritic.

Each character can be viewed in a large size of Sauna Mono-Bold (this complete section is set in Sauna Mono), followed by its name, the amount of native speakers using this diacritic, the amount of languages using this diacritic, its official Unicode name and Unicode number.

Next to that you’ll see a world map for a specific diacritic. Oh yes, we also think that’s cool. {Source for all maps: Ethnologue}

This is followed by a list of languages which are using this diacritic. If a corpus is available (for at least one of the languages using this diacritic), you can see how often a diacritic is used in those languages under “popularity”. While “friends” shows a list of diacritics which are most often used on left and/or right side of the selected diacritic.
Upload your font and see which of the Underware Latin Plus languages are supported. Be aware that we have high standards (like a required IJacute for full Dutch support), which are not common everywhere.