DO NOT MERGE [why] A lot of the fonts have incorrect naming after patching. A completely different approach can help to come up with a consistent naming scheme. [how] See bin/scripts/name-parser/README.md Signed-off-by: Fini Jastrow <ulf.fini.jastrow@desy.de>
9.5 KiB
Creating Consistently Grouped Patched Fonts
This is a small sub-project to font-patcher that uses a little bit more knowledge to come up with font names and name parts. In applications multiple fonts are grouped under a 'Family'. Each member of the Family has a different 'SubFamily' or 'Style'.
Consider a font named 'Times' that has two variants: normal and bold. For this font the Family would be 'Times' and the 'Style' would be 'Regular' (i.e normal) in one file and 'Bold' in the other file.
With this information applications are able to group all 'Times' together and additionally choose the 'Bold' font if the user pushes the 'B' button on the font style dialog in that application.
Motivation
Quite a number of patched fonts have inconsistent or simply wrong font grouping. The naming in general is sometimes surprising and not following naming conventions. This is in part due to the font-patcher, but in part the source fonts are already strange. This results in invisible (but installed) fonts in some applications, inconsistent naming (Familyname differs from Fullname) and not correctly working bold/italic selectors in some applications.
And we would like to have the information within the names sorted in a consistent way. usually a font name consists of these parts (in this order):
- Name base (e.g.
Noto
) - Variant (e.g.
Sans
) - Subvariant (e.g.
Display
) - Weight (e.g.
Black
) - Style (e.g.
Italic
)
This is important because we want to add subvariant information, namely the Nerd Font
part.
Example:
- (old)
Iosevka Term Light Italic Nerd Font
- (new)
Iosevka Term Nerd Font Light Italic
The Plan
To solve these issues the font name parts have to be analyzed more thoroughly and then categorized. These categories are then used to assemble the names in correct order. The simple (not typographically aware) applications shall always get groups of at most four styles, and these are Regular, Bold, Italic, and Bold-Italic. Other styles turn up as Families, because this is the only way they would work in these more simple applications.
Typographically aware applications, on the other hand, get all styles grouped under one Family name.
First experiments showed that the full information can usually be restored already from the file names that our source fonts have.
This new naming is complete optional (but recommended). Give the option --parser to
font-patcher
and it will try to come up with reasonable grouping and naming. Leave the option out and it will
work as it always did.
The Tests
In this directory there are two tests.
- The first test checks the basics of the algorithm. It takes the filenames of all fonts in
src/unpatched\_fonts
, then it calculates the naming and compares it to the original naming in the font files. Ideally they would be equal. - The second test does a 'production run'. It patches each font in
src/unpatched_fonts/
and patches it two times: Once without--parser
and once with. Then it compares the naming, and it also shows the original font naming (for comparison).
All tests base on these assumptions
- Fullname must be roughly equal
- Fontname must be roughly equal
- Familyname must roughly equal, order of all words does not matter (Order of words is ignored with test 2 only)
- SubFamilyname must be equal, order of words does not matter (First word must be equal, order of other words is ignored with test 2 only)
- Typographic names can be empty if the correct typographic name would be equal to the ordinary name
- Tests are done case insensitive
- Some special exemptions are made (see
lenient_cmp()
in test scripts)
Test 1
fontforge name_parser_test1 ../../../src/unpatched-fonts/**/*.[ot]tf 2>/dev/null
This test takes the filename of a font, parses it and generates names from it. Then the actual font is opened and the generated names are compared with the stored names. This test is used to test the algorithm itself. Of course no SIL table is active as we want to preserve the original names.
The output shows all the names, always two lines: first the generated names, then the readout
names. If there are differences the generated names are tagged with +
and the readout ones
with -
. If there are differences the actually different name part is marked with an X
.
The differences have reasons, and there is a file with textual explanations for them. So far all differences are 'ok'. A new run of the script will compare all differences with the stored ones and alert the user if a new difference is detected (or a difference vanished). In this way changes of the algorithm can be tested with a wide base of inputs.
Test 2
fontforge name_parser_test2 ../../../src/unpatched-fonts/**/*.[ot]tf 2>/dev/null
This test compares actually patched fonts. Every font in src/unpatched_fonts/
is patched two
times: First with the 'old/classic' font-patcher
naming, and second with the new naming
algorithm in action (by specifying --parser
). Again the name parts are compared with some
lenience and an output generated like test 1 does.
Also again a file with known differences (with explanations) is read, and any new or vanished
differences are reported. In the report an additional line is given, tagged with >
, that
contains the names of the original font, for human interpretation (often the reason
for a difference is obvious, because the classic font-patcher
dropped information.
Note: Fonts NotoColorEmoji
and Lilex-VF
are not patchable, and thus ignored
Note: Fonts iosevka-heavyoblique
, iosevka-term-heavyoblique
, iosevka-mediumoblique
crash my machine and are ignored
Differences
The naming of the patched fonts, if --parse
is applied, will be different. Of course, that is the goal.
What are the differences in particular:
Nerd Font
is not added in the end, but after the extended base name before the style- The SubFamily contains only 4 Styles max: Regular, Bold, Italic, Bold-Italic
- The Noto fonts retain their abbreviated style names in the Family information
Nerd Font Mono
fonts get aM
in windows mode (I believe that has been left out accidentally before)
Apart from these general things, all changes are documented in detail in the name_parser_test2
issues file.
Here is an overview over all the things that get renamed and why:
Occurences | Description |
---|---|
511 | Add weight/style to family |
43 | The fonts name is M+ not Mplus |
36 | Drop unneeded Typogr.Family/Typogr.Style |
26 | 'Term' is missing from Family |
22 | Change regular-equivalent name to Regular |
19 | Put Oblique into own SubFamily (and mark it as italic) |
5 | Drop Regular from Style |
4 | We handle (TTF) as sub-name |
4 | Fullname has been missing 'Nerd Font' |
4 | Bold / Bold-Italic are just a styles of Regular |
2 | Original font broken (Light in Family) |
2 | Classify Medium as own weigt and not Bold |
2 | Bold and Italic are styles of a basefont |
1 | Weight Condensed does not belong to base name |
1 | Use only Regular/Bold/Italic in SubFamily |
1 | Handle Retina as Weight and not Style |
1 | Do not call Semibold Light-Bold |
From the count we see that almost all fonts are affected by incorrect Family naming.
Further steps
One can examine all the (current) naming differences in the name_parser_test2.known_issues
file. The Explanation is followed by three lines of names: source-file, patched-with-parser,
and patched-classic.
The Explanation sorts most differences into common groups. This helps to weed out explanations that might do not need much attention.
Helper scripts
There are some helper scripts that help examining the font files. Of course there are other, more professional tools to dump font information, but here we get all we need in a concise way:
query_names
font_name [font_name ...]
query_panose
font_name
query_sftn
[<sfnt-name>] font_name
query_version
font_name
They can be invoked like this $ fontforge query_sfnt foo.ttf
.
Appendix: The name_parser_test*.known_issues
files
All differences of 'old' to 'new' naming (if not one of the very general kind like resorting of
the words) are documented in the known_issues
files. For each difference a reason is given.
The files consist of entries that spans 3 (for test 1) or 4 (for test 2) lines.
Line starts with | Contents |
---|---|
# | Reson for the difference (or AUTOGENERATED ) |
> | Naming fo the original/source font (only test 2) |
+ | Naming with --parser (new naming) |
- | Naming classically generated by font-patcher |
After any test run a known_issues.new
file is generated. It contains all the issues
from the known_issues
file that were detected. Original issues that are not
existing anymore are at the bottom of the new file, clearly marked as such. If new
(previously unexplained) issues were detected they show up with the AUTOGENERATED
reason.
After adding new fonts or replacing font files the test can be rerun. If there are issues
in the .new
file they should be documented there, and the .new
file replace the
original known_issues
file (after removing possible 'obsolete' issues that are listed in
the bottom of the new file).
In this way one can tweak the parser code and compare very easily what a change means for all the fonts, which will break or be repaired.