Gengo, a world leader in expert, high-scale crowdsourced translation services, is taking aim at the growing need for high-quality multilingual data to train tomorrow’s advanced AI (artificial intelligence) based systems., which launches today, is an on-demand platform that provides developers of machine-learning systems access to a wide array of multilingual services delivered by Gengo’s fast and efficient crowdsourced network of 25,000+ vetted contributors.

“Without diverse, well-labeled data, machine-learning algorithms simply can’t advance beyond basic capabilities. The algorithms that businesses increasingly depend on to identify relationships, develop understanding, make decisions, and predict outcomes can only be as good as the training data they’re given,” said Matthew Romaine, founder and CEO of Gengo. “AI and machine-learning developers are starving for data — and cannot build AI applications without it. Our speed and quality surpass any other service that offers multilingual data for machine-learning training.”

The trusted source for high-scale crowdsourced multilingual data.

With 10 years of know-how in providing large datasets at scale, Gengo is a trusted partner for some of the biggest names in the technology sector. Since 2008, global companies such as Airbnb, BuzzFeed, Ctrip, Expedia, Etsy, eBay, Facebook, Nike, Salesforce, Sony, and TripAdvisor have turned to Gengo for assistance with mission-critical language services. Gengo has developed a reputation for fast turnaround of challenging translation tasks:

  • Translates up to one million words per week per language pair
  • Completes customer orders within three hours (on average)
  • To date, has translated more than 950 million words for 65,000+ customers

Accelerates commercialization of tomorrow’s AI systems

The platform builds on the firm’s proven multilingual translation platform to offer data-curation services for both text and speech, including sentiment analysis, transcription, and content summarization. Equipped with this data, software developers at global technology companies can now accelerate the training of their AI systems and deliver more sophisticated products to market, faster.

Rasmus Rothe, founder of AI incubator Merantix, says that matching the training data to the desired sophistication of the AI is crucial to ensure quality results and a high ROI: “Advances in deep learning allow the use of significantly larger amounts of high-quality training data. Many people underestimate the technical infrastructure and operational excellence needed to get such data. Services like are great for engineering teams who need data fast and at scale.”

Charly Walther, VP of product and growth for, and a former product manager in Uber’s Advanced Technologies Group, contends that what distinguishes is its ability to apply multilingual expert crowds to solve the difficult 1% of edge cases that other services simply overlook. “If you’re building an AI-based system that centers on language, accuracy is paramount. We can provide highly accurate data for challenging cases that involve sentiment, dialects, slang, and edge cases where context matters—and we can do this both at scale and at speed,” he said.

Addresses the urgent need for rich, multilingual data to train AI systems can quickly source, clean up, and label data for machine-learning algorithms. By harnessing a highly skilled, multilingual crowd of 25,000+ fully vetted contributors, stands in stark contrast to other services available today: it can deliver millions of data points, at high quality, across 37 languages in just a few days.

Advanced services available include sentiment analysis, content moderation, or any kind of content evaluation service such as entity extraction, search engine training, chatbot training, and more. Examples of tasks that can be submitted to the platform include:

1. Content generation — translation, transcription, copywriting, content summarization, chatbot training data.

2. Content categorization — classification of content into appropriate categories including keyword tagging, and categorization for images, product descriptions or websites. Extraction of particular words or phrases to determine whether content is positive, negative or neutral. These services are ideal for content moderation, sentiment analysis, product categorization, image and video tagging, and data annotation.

3. Content assessment and analysis — review of sponsored listings against a set of specific guidelines determined by the client as well as scoring the quality of machine-translated segments or fixing errors to produce natural, error-free translations. Applications include ad reviews, machine translation quality evaluation, audio speech analysis, and sales call analysis.

“Working with Gengo gave us access to a large network of skilled crowd workers across 37 languages. This enabled us to collect a wide range of high-quality training data for AI development,” said CrowdWorks CEO Koichiro Yoshida. “I believe Gengo plays a key role in the development of natural language processing AI systems, especially for monolingual countries like Japan.”

How to use for AI training

Engineering project leaders and AI developers have three simple ways to order AI training data:
1. Send a preexisting file to Gengo’s personal account managers for review
2. Use our API for a seamless connection to high-volume data
3. Specify your requirements — we’ll create the data from scratch to suit your specific business needs

Find out how to get started with