E.g., 08/09/2020
E.g., 08/09/2020

Localizing Video Content - What You Need To Know

By: Al Black

03 December 2019

Video is everywhere, but the process of localizing video content is not always easily understood by project and creative teams. In this article, Voquent’ s production director Al Black explores in detail what’s involved and includes practical advice for everyone involved in the process.

In recent years, the use of video to communicate has grown exponentially with no signs of slowing.  81% of organisations already include video content in their marketing strategy (HubSpot), and with video production more accessible and cost-effective than ever before, it’s only set to increase. Everyone in the translation industry will at some point be involved in translating video content and this article attempts to demystify the process and provide helpful advice to everyone involved.

Before I delve in, I should point out that John Burke, Sales Manager at Voquent.com recently presented a GALA webinar on this very subject and much of what I discuss in this blog is covered in more detail in the Webinar video.  It’s worth watching if you can set aside 40 mins!

What Does Video Localization Mean?

Localizing video is about translating the sound and visuals to make the content accessible to foreign speaking audiences.  On occasion, this means recreating the video to better fit with the culture and humour of the regional audience. But more often, the original video is kept the same, and the audio and text content are translated in one (or all) of three ways:

  • Voice-Over/Dubbing
  • Subtitling/Closed Captions
  • Video Editing

The main components of a video for localization purposes are:

  • Storyteller: an off-screen narrator’s voice
  • Actors: presenters/characters
  • Visual Cues: on-screen text/animations

To translate the storyteller and actors, it’s normally a choice between subtitling or voice over services. Subtitling/Closed Captioning is less expensive, but translating the voice over is generally preferred because it’s easier for the viewer to consume all the content, especially if there are a lot of visual cues (for example, in a training video). To translate the visual cues, video editing services are employed.

There are many different considerations for each of these services and asking the customer the right questions before any work begins is critical. Remember, it’s not only about making the content understood, the audience should be engaged and the content memorable.

Let’s look at some of the main considerations for each service and the questions to ask the customer before getting any studios involved.


Whether you’re working with a voice-over agency like Voquent or booking freelance voice-over artists directly, it’s important to understand how costs are calculated so you can provide contractors with all the appropriate information.

At a minimum most agencies will want to know:

  • the number of voice-over artists needed
  • the word count of the script for each voice
  • the number of roles each voice is playing
  • if there are time constraints to match (time-synced or wild)
  • if time-synced, what is the number of videos and the total duration
  • how will the project be delivered i.e. the file formats and types?
  • the usage of the content e.g. internal use, website or paid advertising

This is already a lot of information to consider but the number of voices, word count, roles etc. are all critical to calculate the recording time required. The usage is also important because talent often charge additional fees for the rights to use their voice. This applies most commonly to promotional material, but anything in the public domain should have a usage agreement attached.  It’s like paying for the rights to use a music track or image.

The type of material (animation, presentation, film etc.) and the total duration of the material are also key information. This allows the agency to make voice artist recommendations specific to the project type and to further refine the production time and budget required.

Most agencies calculate their costs hourly and expect minimum fees for very short scripts.

Matching the translated audio to an existing audio track (time-sync or synchronised recording) is commonly required for video projects. For example, when replacing a narrator’s voice in a corporate video or dubbing the actors in a training film. The voice must record in sync to the original time constraints - if they don’t, the voice over will be too long to fit back into the video.

Always check the translated script can be read at a natural pace within the time constraints before booking the voice over

This is a very common problem we come up against here at Voquent and it is relatively easy to avoid if the translators have been asked to reference the video. Not only is the video a useful contextual reference, it’s vital the translator hears the existing speech to ensure the translation will fit. If the translator has only seen the script, it’s highly likely their translation can’t be read within the existing timings and additional costs can be incurred due to the cancellation of studio time and voice artist bookings. If your translator isn’t confident in their ability to translate observing the time constraints, most dubbing agencies will offer to adapt the translated scripts further before recording. This is especially important for lip-sync projects.

If the customer says no time-sync is needed, always double-check they understand that the new audio will not match the timings of the existing audio. Therefore, the video itself will need to be edited. If the customer has access to their own video team, this option makes sense, but if they are expecting you to provide a finished video, it’s unlikely, they want a ‘wild’ recording. Editing a video to match the new audio is usually more expensive than recording time-synced.

Recording time-sync takes around twice as long as recording wild, but there are different levels of synchronisation. Replacing a narrator’s voice requires ‘loose time constraints’, whereas recording lip-sync is recording with ‘strict time constraints’. Recordings with strict time-sync take a lot longer to produce because the voice will have to repeat their lines many more times to get it to fit.

The following table gives you an idea of what to expect in a one-hour recording session with one voice artist, assuming a relatively straight-forward script:

Type Finished Audio (minutes)
Wild (reading at a natural pace) 10-15 minutes
Time-sync (matching an off-screen narrator) 8-10 minutes
Phrase-sync (interpreting the speaker matching timings loosely) 6-9 minutes
Lip-sync (matching lip movements exactly) 3-5 minutes


Note: for lip-sync recordings, the script will often have to be adapted after translation to better match the lip movements. This and the difficulty of recording lip-sync can make it prohibitively expensive which is why lip-sync is rarely used for business content.


  • Are there time constraints?
  • What is the number of videos and the total duration?
  • What number of voice artists are needed?
  • What is the word count of the script for each voice?
  • How many roles is each voice playing?
  • What file format should be delivered?
  • Where is the content being published?

Subtitling/Closed Captioning

Subtitling (or Closed Captioning) is the most common method of translating the speech in a video for a non-native audience. It’s also used to make content accessible for the deaf and hard of hearing and many brands use captions on their social videos so viewers can still understand the content with the sound muted. There are two types of captions, open captions and closed captions.

Open Captions: when subtitles are burnt-in i.e. encoded into the video file itself. They can’t be turned off. Open Captions are preferred where more control over the style of the subtitles is required. With open captions you can control the font, colour, background and placement.  

Closed Captions: are a text file format and can be turned on/off in the video player. YouTube, Facebook and Vimeo all support open captions, as do Amazon Video, Netflix and more. Almost all video players support the SRT file format, which is why it’s the most frequently used file format for subtitles. The style of the subtitles is controlled by the player.

Many subtitling companies avoid open captions because their subtitling software doesn’t support the burn-in of the subtitles to the video file. This means they may have to book a separate video editor or agency to do the video editing and encoding. If you’re booking a subtitling company, it’s crucial to check what they can provide. Be aware that open captions are usually more expensive because of the extra video editing work required.


Here’s just a few of the subtitling styles that can be produced with Open Subtitles

‘Can I get a discount if I supply the transcript?’

This is a common question we get asked at Voquent. Unfortunately, most transcripts are not in the correct format for captions. They aren’t broken down into 2-6 second sections (subtitles) and they won’t have the correct time-stamps with the Hours:Minutes:Seconds:Frames format. The timestamps tell the player when to display each subtitle.

Of course, the transcripts may be useful for referencing spellings, but they won’t speed up the subtitlers work by much. The subtitler will still have to go through the entire video to create a new transcript that works to industry guidelines. This is called ‘spotting’. Qualified subtitlers will ensure the subtitles can be read easily and provide cues to indicate new speakers.

Keep in mind that captions are not translated in the same manner as documents. They are a condensed version of what is being said.

Subtitling templates

If your customer only works with pre-approved translators and you can’t use the subtitling companies’ translators, ask if the subtitling company can create a template for the approved translators to use. A template is essentially the transcript, broken down into subtitles and provided in MS Excel. The template often includes a character counter which informs the translator how many characters they have left for each subtitle. Most translators will find it easier to work with the template even if they don’t know anything about subtitles, because they can use their existing translation tools.


  • Open or Closed Captions?
  • Number of videos and total duration
  • Source language and target language(s)
  • Also caption on-screen text e.g. titles.
  • File formats and/or style guidelines

Video Editing

If there is any text embedded inside the video itself such as introductory text and titles, animated text or lower-thirds with names/job titles, these will obviously need translated to provide a localized version of the video.

The most important thing to remember here is - the studio will need the project source files to do this work professionally. These are the files used to build the video in software such as Adobe Premiere Pro, Adobe After Effects, Final Cut Pro etc. 

Without the project source files, the job of translating the video is made much more complicated (and more expensive) because the on-screen text layers cannot be edited, and the studio will have to get creative about how they tackle translating each piece of text.

A few examples of on-screen text including titles, lower-thirds and animated text.

Assuming the project source files are available, the studio will most likely want to review before they estimate the time required to complete the editing work. This is because not all projects are made the same. Some of the text may in fact be non-editable images. Sometimes the video editor will leave all the unfinished compositions in the source instead of compiling only what is used. This can mean wading through a lot of files trying to work out how everything fits together and makes it difficult to understand quickly where all the editable text layers are.

Some studios will charge for this review time. It can take many hours to condense the project for localization which often makes localizing the first language more expensive than consecutive languages.

If the project source files are well organised with neatly labelled text layers, it will make all the difference! The video editor will insert the translations and then reposition and adjust the animations as required. The video can then be encoded out for approval.


  • Are editable project source files available?
  • Were any third-party custom fonts or plugins used?
  • Are there any specific fonts required for each language version?
  • Will there be source files available for any non-editable text (images) in the project?

Subtitles or Voice-Over?

So, we’ve talked about different ways you can localize video content, but which method is best?  Translating on-screen text is always recommended if there are titles or animated text for full accessibility, but this is always dependent on the budget. With regards to spoken audio, in most cases we are just deciding between subtitles or voice over.

There are pros and cons to each. Voice over is certainly more engaging but it’s also more expensive.  Subtitling offers a more cost-effective interpretation, however subtitles are a condensed version of the spoken audio, so it’s just to give the audience an idea of what’s being said.

Subtitles might be a better option if there’s a large volume of video content that needs to be made accessible in several languages, for example, a TV series.

Subtitles can also be good for social media videos, where a lot of viewers are likely to watch the video muted.

If the purpose of the video is to sell a product or talk about a brand, a voice over is definitely the best option. You’re much more likely to deliver a powerful message with a voice over, rather than subtitles.

Training videos, or anything designed to educate, is also better suited to a voice over. Not just for the engagement level, but also because the video will often contain information or diagrams on-screen which would be partially blocked by subtitles.


Ultimately, it’s the client’s choice how they wish to localize a video. As I’ve demonstrated above, asking the right questions will help you to recommend the best approach and give the client all the information they need to make an informed decision. You’ll also be prepared to speak to studios and agencies about the project.