Move Over Big Data - Big Content Is Here
For most of the last decade, big data was the technological phenomenon, more than any other, which companies tried to harness to their benefit. The idea that the thousands of actions people take every day in their online lives could be exploited to target advertising, identify trends and even predict behavior was a tantalizing one. Marketing departments and entire companies built themselves around the premise of data as the ‘oil of the 21st century’.
These companies mined our search history, our browsing behavior and our internet viewing habits, and some of them made huge profits from selling it or using it to target their marketing.
Now there’s a new phenomenon coming. Recent data misuse scandals have taken the shine off the big data movement, shifting the industry’s focus to a more sustainable model. Enter Big Content.
Think about how big the internet is. No, really. There are 55 billion web pages out there. If each of those pages has, say, 100 words, that’s 5.5 trillion words. And that’s probably a conservative estimate. Obviously not all of that content is useful to the companies that own it. Many of those pages contain filler text or legalese. The pages that really interest marketers from a big content point of view are the ones that can be harvested to be useful to data analysts. So, what is ‘Big Content’? Let’s highlight eCommerce product descriptions and reviews as the best example here – there are millions of products for sale on the internet and each of them needs a well-written description to sell effectively. Similarly, the vast majority of those products has one or more customer reviews available on the site. If they are informative and clear, they can help buyers make their decision, and also help retailers take any feedback on board. All of this is content relating to a specific product.
So there’s a lot of useful content, but what do you do with this information now it's readily available? Now we know that all this info exists, we identify as an LSP that it is most likely going to be needed in multiple languages at some point – to sell products in multiple countries, localization is required. However, the huge word counts involved rule out the traditional 2,000 words/day approach to translation as far too time-consuming.
This is where MTPE and crowd-sourcing come in. A proven way to efficiently process such a mass of content is to machine-translate it wholesale and to use a large crowd of vetted linguists to post-edit it and optimize its impact on readers. Using NMT is fast becoming commonplace in the industry. However, there are still pitfalls with machine conversion and accuracy is still in question. This is where we introduce a human element to ensure quality. But due to the volume of words in ‘Big Content’ projects, employing a crowdsourcing strategy is a quick and efficient way of speeding up the process. By attracting multilinguals and giving them the required training and knowledge to become successful post-editors, we can generate faster, guaranteed quality results
This is why crowd-sourcing continues to be an important focus for Jonckers and to the industry in general – in the age of big content, it’s apparent that this methodology is one of the only sustainable solutions.