Modern Text Features in R

  ragg, graphic-device, systemfonts

  Thomas Lin Pedersen

I’m extremely pleased to present the culmination of several years of work spanning the systemfonts, textshaping, and ragg packages. These releases complete our efforts to create a high-quality, performant raster graphics device that works the same way on every operating system.

This blog post presents our improvements to ragg’s font rendering so that it now “just works” regardless of what you throw at it. This includes:

  1. Support for non-Latin scripts including Right-to-Left (RtL) scripts
  2. Support for OpenType features such as ligatures, glyph substitutions, etc.
  3. Support for color fonts
  4. Support for font fallback

All of the above comes in addition to the fact that ragg is able to use all of your installed fonts.

To access these features all you need to do install the latest version of ragg:

But I’d invite you to read on to learn how it works, how to control it, and what it all means for you as a user.

Using ragg

  • ragg can be used directly in the same way as the built-in devices, such as png(), jpeg(), and tiff(), by opening the device, running some code that renders graphics, and closing it again when done using dev.off(). The devices in ragg are prefixed with agg_ and named by the file format they produce (e.g.  agg_png()).

  • You can use ragg with ggsave() by passing the device function to the device argument (e.g.  ggsave(device = agg_tiff)).

  • You can tell RStudio to use ragg in the Plots pane be setting the backend to AGG under Global Options > General > Graphics.

  • ragg can be used when knitting Rmarkdown files by setting dev="ragg_png" in the code chunk options.

Read more about using ragg in the previous release blog posts: 0.2.0 and 0.1.0

Graphical tl;dr;

With the new version of ragg, you’ll be able to render plots such as this and expect it to simply work:

library(ggplot2)
city_names <- c(
  "Tokyo (東京)",
  "Yokohama (横浜)",
  "Osaka (大阪市)",
  "Nagoya (名古屋市)",
  "Sapporo (札幌市)",
  "Kobe (神戸市)",
  "Kyoto (京都市)",
  "Fukuoka (福岡市)",
  "Kawasaki (川崎市)",
  "Saitama (さいたま市)"
)
main_cities <- data.frame(
  name = city_names,
  lat = c(35.690, 35.444, 34.694, 35.183, 43.067, 
          34.69, 35.012, 33.583, 35.517, 35.861),
  lon = c(139.692, 139.638, 135.502, 136.9, 141.35, 
          135.196, 135.768, 130.4, 139.7, 139.646)
)
japan <- rnaturalearth::ne_countries(
  scale = 10, 
  country = "Japan", 
  returnclass = "sf"
)
ggplot() + 
  geom_sf(
    data = japan, 
    fill = "forestgreen", 
    colour = "grey10", 
    size = 0.2
  ) + 
  ggrepel::geom_label_repel(
    aes(lon, lat, label = name), 
    data = main_cities,
    fill = "#FFFFFF88",
    box.padding = unit(5, "mm")
  ) + 
  geom_point(aes(lon, lat), main_cities) +
  ggtitle(
    "Location of largest cities in Japan (日本) 🇯🇵"
  ) +
  theme_void() + 
  theme(panel.background = element_rect("steelblue"),
        plot.title = element_text(margin = margin(5, 0, 5, 0)))

Note the effortless mix of text in English and Japanese, along with emoji in the title. If this has piqued your interest, read on!

Advanced script support

English, the lingua franca of programming, has tended to dominate everything related to text within programming, ranging from encoding to rendering. This has made the Latin script, used in most of the Western world, the best (or often only) supported script in many text-rendering pipelines. This has been true in the R world where the built-in graphic devices have struggled to display other scripts (with the exception of Cairo devices on Linux). It is about time (overdue, really!) that the graphics system in R becomes more inclusive of which languages can be used. It is thus with great joy that I announce that ragg finally supports all scripts.

Right-to-Left scripts

To start off we will look at a sample of different scripts (Arabic, Hebrew, and Sindhi) that pose a challenge because they are written from right to left:

arabic_text <- "هذا مكتوب باللغة العربية"
hebrew_text <- "זה כתוב בעברית"
sindhi_text <- "هي سنڌيءَ ۾ لکيو ويو آهي"

p <- ggplot() + 
  geom_text(
    aes(x = 0, y = 3:1, label = c(arabic_text, hebrew_text, sindhi_text)), 
    family = "Arial"
  ) + 
  expand_limits(y = c(0, 4))

preview_devices(p, "rtl_example")

If you’re not familiar with the languages above it can be hard to see what is right and what is wrong. You may, however, look at how the text in the code is rendered in the browser and compare that to the device rendering. If you do that, you can see that the Hebrew script is rendered in the wrong direction for all the non-ragg devices (except Cairo on Linux). For the Arabic and Sindhi it’s even harder to see what’s wrong because the text looks fundamentally different. That’s because both Arabic and Sindhi rely extensively on text substitution rules and ligatures; the way a letter is written depends critically on what letters it is next to. Still, by comparing to the browser rendering you can see that the same devices failing on the Hebrew script fail here as well.

The Cairo device on Linux handles this task well, as we have noted above. How come this works, but only on one OS? Cairo is built in to most Linux distributions and is designed to work with Pango, the library that linux uses to layout text. R’s Cairo graphics device bundles Cairo on all platforms, but doesn’t include Pango, due to the challenges of building it on other operating systems.

Bidirectional text

What happens if you combine right-to-left and left-to-right text in the same sentence? The string needs to be split into pieces that each consist of text running in one direction, laid out individually, and then combined back together

bidi_text <- "The Hebrew (עִברִית) script\nis right-to-left"

p <- ggplot() + 
  geom_text(
    aes(x = 0, y = 0, label = bidi_text), 
    family = "Arial"
  )

preview_devices(p, "bidi_example")

Given that most devices struggle with RtL scripts, it’s not surprising that they also fail when mixed. Again the exception is ragg, and Cairo on Linux.

Advanced font feature support

A part of supporting some of the non-Latin scripts described above is to have support for ligatures (substituting multiple glyphs with a single new glyph). While ligatures are a requirement for the correct rendering of some scripts it is also an optional feature of fonts in general in order to support different text variations. More generally, the OpenType font format describes a long range of features, many optional, that defines specific glyph substitutions (both one-to-one and many-to-one) or position adjustments that can be turned on or off and will affect the look of the final rendered text. Some of these features are turned on automatically for specific scripts (e.g. required ligatures for Arabic), while others are left for the user to turn on at their discretion (e.g. tabular numerics). As part of the work to add support for non-Latin scripts the infrastructure to support all OpenType features was built. This, of course, requires that the font in use supports the requested feature.

Some fonts, like the popular Fira Code programming font, use ligatures as a main part of their appeal. These now work as expected with ragg:

code <- "x <- y != z"
logo <- "twitter"
p <- ggplot() + 
  geom_text(
    aes(x = 0, y = 2, label = code), 
    family = "Fira Code"
  ) + 
  geom_text(
    aes(x = 0, y = 1, label = logo), 
    family = "Font Awesome 5 brands"
  ) + 
  expand_limits(y = c(0, 3))

preview_devices(p, "def_features")

But what about non-default features? The capabilities of the graphics engine in R presents a problem here. There is very little information that the user is able to send along with the text to be plotted, apart from location and font (bold and italic on/off is the extent of it). So, having a device with support for advanced OpenType features in and of itself is nearly useless as there is no way to specify in your plot code that you want to turn a feature on or off.

To work around this limitation, systemfonts now allows you to register font variants, providing a custom name that you can use to refer to a font with certain features enabled:

library(systemfonts)
register_variant(
  name = "Montserrat Extreme", 
  family = "Montserrat", 
  weight = "semibold",
  features = font_feature(ligatures = "discretionary", letters = "stylistic")
)

The code above creates a new font based on Montserrat using a semibold weight and turning on standard ligatures and stylistic letter substitution. Now, in your text plotting code all you have to do is specify "Montserrat Extreme" as the font family and the features and weights will be used. This only works with ragg, because none of the other devices are build on top of systemfonts, so don’t know how to access the registered font:

ggplot() + 
  geom_text(
    aes(x = 0, y = 1, label = "This text should definitely differ"),
    family = "Montserrat",
    size = 6
  ) + 
  geom_text(
    aes(x = 0, y = 0, label = "This text should definitely differ"),
    family = "Montserrat Extreme",
    size = 6
  ) + 
  expand_limits(y = c(-1, 2))

We can see that by using this font registration we not only gain access to weights other than normal and bold, but also to glyph substitutions such as the “Th” ligature, and the stylistic variations seen with the “t”, “f”, “l”, and “e” glyphs.

While a lot of the optional OpenType features are mainly of interest to achieve a specific stylistic look of the rendered text, some have more importance for data visualizations, such as those related to how numbers are displayed. It is both possible to force even-width numbers, as well as correct display of fractional numbers (using font_feature(numbers = c("tabular", "fractions")) using OpenType. as long as the font supports it. So this is definitely something to look into when you want to add that final polish to your visualization.

Color fonts

A recent (in font technology terms) development is the availability of color fonts, i.e. fonts where the glyphs have designated colors. This development is largely driven by the ubiquity of emojis in modern text, and while it may seem that emojis have been around forever, it is recent enough that the world has yet to converge to a single standard for color fonts. The system emoji font on macOS, Windows, and Linux all uses different font technologies for storing the color glyphs, ranging from storing a single bitmap, to storing each glyph as an SVG. This, unsurprisingly, complicates things. To add insult to injury, emojis often get rendered slightly larger than the surrounding text and with a slightly lowered baseline in a very OS-specific way (this does not apply to all color fonts; only emojis).

Why am I telling you this? Well, honestly it is mostly to make you appreciate the labor that went into the fact that color fonts (and by extension, emojis) now just work:

emojis <- "👩🏾‍💻🔥📊"

p <- ggplot() + 
  geom_label(
    aes(x = 0, y = 0, label = emojis), 
    family = "Apple Color Emoji"
  )

preview_devices(p, "emoji")

As one can see, the failures range from not being able to render anything, to rendering in monochrome. Further, it appears as if the devices have trouble figuring out the dimensions of the glyphs. One additional wrinkle is that while Cairo on macOS is capable of rendering in monochrome, it fails to get the correct emoji. This is because emojis rely heavily on ligatures, and the “dark-skinned woman at a computer” emoji is actually a ligature of the “woman”, “dark skin”, and “computer” emojis.

Font fallback

In all of the above examples we have been very mindful in setting the font-face to a font that contains all the glyphs we need. This is not always practical, especially when you want to mix emojis and regular text. It is also an absolute requirement when mixing Latin and CJK (Chinese, Japanese, and Korean) text, as it is infeasible to include all CJK glyphs in a single font. However, we are used to things just working at the system level. No matter which font we choose it seems that a glyph is always displayed in e.g. browsers and text editors. This is because the OS is employing font fallback, which is the act of figuring out an alternative font to use when a glyph is not present in the chosen font. Wouldn’t it be great if we could have that in a graphic device? Well, now we do!

fallback_text <- "This is English, この文は日本語です 🚀"

p <- ggplot() + 
  geom_text(aes(x = 0, y = 0, label = fallback_text), size = 2.5)

preview_devices(p, "fallback")

The bottom line is that with ragg, you now don’t need to think about missing glyphs in any font you choose (unless you request a character that is not covered by any font on your system).

Where’s the catch

Most of what we have shown today simply works automagically and may (depending on your prior frustrations with script support in R) seem too good to be true. Is there any catch? Not really. systemfonts, textshaping, and ragg try to be as smart as possible about text shaping and only take additional action if required. Further everything is heavily cached, so the impact on performance is negligible.

There is something missing though, which we haven’t touched upon. Not all scripts are LtR or RtL. A few, especially Asian scripts, are top-to-bottom. Top-to-bottom scripts are sadly not yet supported. This is not due to any limitation in the underlying shaping technology, but due to limitations in the R graphics engine, which assumes horizontal text in key places of the API. This means that, until the graphics engine is updated, it is outside the grasp of graphic devices to support vertical text. Hopefully, this is an area that will improve in the future.

Wrapping up

I hope you’ll appreciate the new features described here. I’d like to thank everyone who have helped validate the text rendering on Twitter. A special thanks goes out to Behdad Esfahbod (http://behdad.org) for his work on HarfBuzz, Fribidi, and almost everything else underlying modern font rendering. He has been especially gracious in his help and support.