An introduction to machine translation for localisation
Alconost's Kris Trusava looks into the advantages and challenges of machine translation to localise games
Machine learning has made its way into nearly every industry, and game localization is no exception. Software providers claim that their machine translation products mark a new era in localization, but gamers are often left wishing that game publishers would pay more attention to detail.
As a professional localization company that currently is working with machine translation post-editing, Alconost could not pass up the topic. In this article we aim to find out what's hot (and what's not) about machine translation (MT) and how to get the most out of it without sacrificing quality.
NMT in a nutshell: what on Earth is it?
When machine learning was introduced to localization, it was seen as a great asset, and for quite a while localization companies worked using the PEMT approach. PEMT stands for post-edited machine translation: it means that after a machine translates your text, translators go through it and edit it. The main problem with PEMT is that the machine translates without comparing the text to previous or current translations and a glossary -- it just translates as it "sees" it. So naturally this method results in numerous mistakes, creating a need for manual editing.
As time passed and technology advanced, NMT (neural machine translation) came into play. This proved a much more reliable and robust solution. NMT uses neural networks and deep learning to not just translate the text but actually learn the terminology and its specifics. This makes NMT much more accurate than PEMT and, with sufficient learning, delivers high-quality results much faster than any manual translation.
NMT solutions: a brief overview
It's no surprise that there are dozens of ready-made NMT solutions on the market. These can be divided into two main categories: stock and custom NMT engines. We will talk about custom (or niche-specific) NMT tools a bit later; for now, let's focus on stock NMT.
Stock NMT engines are based on general translation data. While these datasets are vast and rich (for example, Google's database), they are not domain-oriented. This means that when using a stock NMT tool you get a general understanding of the text's meaning, but you don't get an accurate translation of specific phrases and words.
Examples of stock NMT engines include Google Cloud Translation, Amazon Translate, DeepL Translator, CrossLang, Microsoft Translator, Intento, KantanMT.
The chief advantage of these solutions is that most of them are public and free to use (like Google Translate). Commercial stock NMTs offer paid subscriptions with their APIs and integration options. But their biggest drawback is that they don't consider the complexity of game localization. More on that below.
The many intricacies of game localization
While machine translation works fine in many industries, game localization turned out to be a tough nut to crack. The main reason for this is that gaming (regardless of the type of game) always aims for an immersive experience, and one core part of that experience is natural-sounding dialogue and in-game text. So what's so challenging about translating them properly?
It may sound like a given, but creativity plays a massive role in bringing games to life, especially when it comes to their translation. A translator might have a sudden flash of inspiration and come up with an unexpected phrasing or wording that resonates with players much better than the original text.
Can a machine be creative? Not yet. And that means that machine translations will potentially always lack the creative element that sometimes makes the whole game shine.
- Specific phrasing, dialects, and slang
One of the biggest challenges in localization is making the translation sound as natural as possible. And since every country and region has its own specific languages and dialects, it takes a thorough understanding of one's culture to successfully adapt a translation to it.
While a machine learning solution can be trained on an existing database, what if it comes across a highly specific phrase that only locals know how to use? This is where professional translation by native speaking linguists and community feedback are highly helpful. Input from native speakers of the target language who know its intricacies can advise on the best wording. And for that, you need to have a feel for the language that you're working with, not just theoretical knowledge.
- Tone and overall vibe
Certain words convey a certain tone, and this is something that we do without thinking, just by feel. So when translating a game, a human translator can sense the overall vibe of the game (or of a specific dialogue) and use not just the original wording but synonyms that better convey the tone and mood. Conversely, a machine is not able to "sense the mood," so in some cases the translation may not sound as natural as it could.
Advantages of using MT for game localization
Despite all the challenges around game localization, machine translation still does a pretty decent job. This technology has several significant benefits that make MT a great choice when it comes to certain tasks.
Speed is probably the biggest benefit of machine translation and its unique selling point. A machine can translate massive chunks of text in mere minutes, compared to the days or even weeks it would take a translator. In many cases it proves faster and more efficient to create a machine translation first and then edit it. Besides, the speed of MT is very handy if you need to quickly release an update and can manage with "good enough" translation quality.
- Translation of out-of-game content
When talking about game localization, the first thing that comes to mind is usually in-game dialogue. But game localization is much more than that: it includes user manuals, how-tos, articles, guides, and marketing texts. This kind of copy doesn't employ much creativity and imagery, since these materials don't really impact how immersive the gaming experience will be. If a user spots a mistake while reading your blog, it's less likely to ruin the game experience for them.
- Cost of services
One more huge advantage of machine translation is its relatively low cost. Compared to the rates of professional translators, machine translation tends to be more affordable. Hence, it can save you money while letting you allocate experts to more critical tasks.
- Consistency of translation
One more way MT can benefit your project is translation consistency. When several independent translators work on a text, they may translate certain words differently, so that you end up with different translations. But with machine translation repetitive phrases are always translated the same way, improving the consistency of your text.
Is machine translation good enough for games?
MT is not 100% accurate, according to gamers. For example, a recent Reddit discussion features hundreds of comments left by frustrated gamers, the majority of whom say the same thing: companies are going for fast profits instead of investing in high-quality translation. And what's the tool to deliver quick results that are "good enough"? You guessed it -- machine translation.
Unfortunately, when gaming companies try to release games faster it leads not only to a poor user experience but also to a significant drop in brand loyalty. Many gamers cite poor translations as one of the biggest drawbacks of gaming companies.
So what options are there when Google NMT isn't enough? Here's an idea for what might work best.
- Localization-specific NMT
While neural machine translation has certain flaws, it has many benefits as well. It's quick, it's moderately accurate, and it can actually be quite helpful if you need to quickly translate massive amounts of documents (such as user manuals). So what we see as the perfect solution is niche-oriented, localization-specific NMT (or custom NMT).
For instance, Alconost is currently working on a product that uses neural machine learning and a vast database of translations in different languages. This lets us achieve higher accuracy and adapt the machine not just for general translation, but for game translation -- and there is a big difference between the two. In addition, we use cloud platforms (such as Crowdin and GitLoсalize) with open-source data. That means that glossaries and translation memories from one project can be used for another. And obviously our translators post-edit the text to ensure that the translation was done right.
Custom domain-adapted NMT solutions may become a milestone in localization, as they are designed with a specific domain in mind. Their biggest advantages are high translation accuracy, speed, affordability (as they're cheaper than hiring professional translators), and the option to explore new niches and domains.
Some content, such as user reviews, sometimes goes untranslated because it is too specific and there is not much of it. It wouldn't make much sense to use a stock NMT solution for their translation, as it would require heavy post-editing.
Custom NMT tools, however, can be designed to work with user reviews and "understand" the tone of voice, so that even this specialized content can be translated by a machine. This solution has been implemented by Airbnb, where reviews and other user-generated content are translated in a flash just by pressing the "Translate" button.
In addition, machine translators can be trained to recognize emotions and mood and, when paired with machine-learning classifiers, to label and prioritize feedback. This can also be used to collect data on users' online behavior, which is a highly valuable asset to any company.
How machine translation differs from traditional localization
Finally, let's talk about the intricacies of localizing a text translated by a machine, and how the process differs from standard localization. We'll compare the two approaches based on our own experience acquired while working on different projects.
When we localize a project from scratch, it's safe to say we are in full control of the quality, since the team has glossaries and context available from the start. Here the text is translated with a specific domain in mind, and only rarely do we have to post-edit the translated copy.
With machine translation, however, things are a bit different. The source text can be translated by different engines, all of which differ in terms of quality and accuracy. So when we start working with these texts, we request all available materials (style guides, glossary, etc.) from the client to ensure that the translation fits the domain and the brand's style. This means that post-editing machine translations requires the additional step of assessing the quality and accuracy for the given project.
When you choose a traditional localization approach, there is a 99% chance that your project will be assigned to a person who has the most experience with your particular language and domain.
But with machine translation you can't really be sure how well the machine has been trained and how much data it has for different languages. One engine may have learned 10,000 pages of Spanish-English translations, while another engine has studied 1,000,000 pages. Obviously, the latter is going to be more accurate.
The bottom line is that when working with a machine translation engine "trained" by a professional localization company on niche topics, there's an excellent chance that they'll ensure the "proficiency" of the customized MT engine and, consequently, the quality of the translation. With an ample translation database and professional editors by side, you can put your mind at ease, knowing that your project is in good hands.
Kris Trusava is localization growth manager at Alconost, a provider of localization services for games and other software into over 80 languages.