Flip the Script: Enhancing Web Accessibility and Global Reach with Multilingual HTML Practices
Yesterday was an important holiday for many who speak Arabic, so I was inspired to share some basics on incorporating different languages, and different scripts, in HTML. It is imperative to communicate the language to the browser or user agent for accessibility and internationalization. Yet the most recent WebAIM Million report found that missing document language is still one of the most common accessibility failures. It is also important to consider the direction of the script (left-to-right or right-to-left) and note this in the markup of the page. Two of the fastest growing languages in the world use right-to-left scripts: Arabic and Urdu. Any lesson in multilingual HTML should discuss the impact of text direction.
The Basics
Label the language of the webpage and the language of parts that differ from the page language. To communicate the language of the page, add a language ("lang") attribute to the html element of the page with a value representing the language, called the language tag. The language tag is a two- or three-letter code (lang="en" for English) standardized by ISO 639-1. Optionally, the language tag can include a longer code that includes a dialect (lang="en-GB" for British English), but dialect should only be indicated if necessary. If more than one language is used on the page, indicate the one language that is most prevalent.
Example:
<html lang="en">...</html>
When text in other languages occurs on the page, add a new language attribute to that element to overwrite the language set by the page language. In the following example, the phrase in French is noted in a span tag with lang="fr".
<p>
The waiter at the French restaurant said,
<span lang="fr">"Voici la carte."</span>
</p>
Why is it important to mark the language(s) in HTML?
Handling RTL languages
Right-to-left (RTL) languages use scripts that read from the right side of the page to the left side, in contrast to a left-to-right language, like English, that reads from the left of the page to the right.
Right-to-left languages include Arabic, Hebrew, Persian (Farsi) and Urdu, and have special considerations in web content. Even if you primarily develop in English, you may need to insert snippets of text, such as names and phrases, in a RTL language.
First, as we mentioned earlier, place the primary language of the site in the html tag. If the site or content primarily uses a RTL script, you will also want to include the direction attribute ("dir") with a value of "rtl". The direction attribute can accept values of "rtl," "ltr," and "auto." Note that the default value is "ltr," so it’s not necessary to add the direction attribute on LTR scripts. I'll discuss the "auto" value later in this article.
Here's an html tag labeled with the primary language as Arabic ("ar") and the direction value of "rtl."
<html lang="ar" dir="rtl">
What happens when we need to add a RTL language snippet within web content that is primarily LTR? Just as with any language change, you will want to label the language of the RTL element, since it differs from the primary language of the page. As in the html tag, also include the direction attribute. In the following example, the Arabic text is placed in a span tag with the direction marked as right-to-left.
<p>A typical greeting at the end of Ramadan is <span lang="ar" dir="rtl">???? ????????</span>.</p>
Sometimes text on the page is generated dynamically, and you don’t know the direction of the script, meaning you won’t be able to label the language or direction of the text. The formatting behavior of Unicode, which is the character set containing all language characters that we use in HTML (“UTF-8”), includes rules for handling the direction of text known as the bidirectional algorithm, or bidi algorithm. The bidi algorithm can usually determine the direction the characters should be displayed in (LTR or RTL) without special instructions, but it has trouble determining the proper direction for some characters, like certain punctuation marks, especially when RTL scripts are mixed with LTR scripts.
Consider this example page that dynamically updates the daily high scores of a game.
领英推荐
<h2>Today’s High Scores</h2>
<ul>
<li><span class="name">Julie Price</span> - 1st place</li>
<li><span class="name">Pedro Alvarez</span> - 2nd place</li>
</ul>
Displaying as
Today’s High Scores
But when the name Ahmed in Arabic (???????) is on the list, the bidi algorithm wants to include “ - 1” into the RTL text, even though the characters are outside of the span. Notice how it messes up the formatting without instructions on the text direction.
<h2>Today’s High Scores</h2>
<ul>
<li><span class="name">???????</span> - 1st place</li>
<li><span class="name">Sam Chowdhury</span> - 2nd place</li>
</ul>
Displaying as
Today’s High Scores
In this instance, the player's name generates dynamically, so we can't predict the text direction by including a "dir='rtl'" in the span with Ahmed's name. We can fix this by placing the name within a bidirectional isolate tag (<bdi>), which will prevent the bidi algorithm from changing the text direction of the characters outside that isolated element. In the following code example, the names are confined within <bdi> tags. Note the image following the code that displays how this renders visually on an HTML page. Ahmed's status is now formatted in the same way as Sam's.
<h2>Today’s High Scores</h2>
<ul>
<li><bdi class="name">???????</bdi> - 1st place</li>
<li><bdi class="name">Sam Chowdhury</bdi> - 2nd place</li>
</ul>
A functional alternative to the bidirectional isolate tag is to use a span tag with a dir="auto" attribute. Using dir="auto" will effectively isolate the element and let the bidi algorithm determine the text direction only within that element. However, the <bdi> tag is preferred because it is semantically more meaningful.
To summarize, definitely consider that an open, accessible, inclusive web accommodates a multitude of languages, some that read right to left. Some simple steps to this end include:
Fascinating!