0331.sdt-diffbot

Computer vision and machine-learning startup Diffbot is releasing a new developer API designed to scan the Web for forums, comments and reviews.

“All of the long-tail discussion pages on the Web such as forums, comments and product review threads have previously been pretty difficult for people to access and scale,” said Mike Tung, CEO of Diffbot. “Businesses live and die on customer feedback, and as all our lives move online, businesses really need to understand if there is a negative review about them on the Web, and they need to be able to address it.”

The Discussions API provides developers the tools they need to unearth these comments, according to Tung. Previously, developers would have to write a bunch of scrapers for the different sites they wanted to monitor, and those scrapers are a burden to maintain and really expensive, he explained.

“The Discussions API allows them to start to monitor many more sources and immediately start getting data from it without having to write a scraper,” said Tung. “It is not possible to go beyond several thousand sites just writing scrapers.”

With the API, developers and companies can develop apps that monitor brands, products and keywords; analyze and identify trends; and completely extract an entire site’s worth of content.

“Our new APIs allow developers to treat forums, comment threads and review collections as virtual databases, accessing their data on the fly and making this massive component of the Web newly usable,” said Tung. “It will even help developers find those (admittedly rare) useful and constructive YouTube comments.”

The API supports websites such as Blogger, Disqus, Facebook, Hacker News, Reddit and WordPress.

The company already provides APIs for analyzing home pages, article text, product data, images and videos, and plans to tackle profile pages, event pages and location pages next.

“Our main goal is to be able to understand all of the different kinds of pages on the Web,” said Tung. “Once we are able to understand substantially all the kinds of pages on the Web, then we can really convert the entire Web into a structured representation, like a database version of the Web.”

More information about Diffbot is available here.

About Christina Cardoza

Christina Cardoza, formerly known as Christina Mulligan, is the Online & Social Media Editor of SD Times. She covers agile, DevOps, AI, and drones. Follow her on twitter at @chriscatdoza!