Topic: datasets

Data Profiler: Capital One’s open-source machine learning technology for data monitoring

With the move to the cloud, the amount of data that companies are able to manage has grown exponentially. This is why Capital One created Data Profiler, the open-source Python library that utilizes machine learning in order to help users monitor big data and detect information that should be properly protected.   Data Profiler brings users … continue reading

When data gets big: Best practices for data preparation at scale

Today we work with data that has grown up in diversity, scale and complexity — this applies to not only data scientists and academic researchers, but also the rest of us. Business analysts across a spectrum of industries are asked to include larger volumes of data in their work, now pervasive due to diminishing costs … continue reading

Jenkins new Declarative Pipeline Syntax, Google’s Project Wycheproof, OpenSSH 7.4, and Microsoft’s MS MARCO dataset—SD Times news digest: December 20, 2016

Jenkins has announced the beta version of its new Declarative Pipeline Syntax. Currently Jenkins provides a Scripted Pipeline Syntax. The Declarative Pipeline is not meant to replace the Script Pipeline, but extend it so users don’t have to worry about scripting at every aspect during the pipeline, according to the Jenkins team. “Declarative Pipeline enables … continue reading

SD Times GitHub project of the week: FastText

For humans, writing posts on social media just comes naturally. Humans understand each word that’s said or typed, but for machines, it’s not that easy. Understanding the meaning of words is one of the biggest challenges that artificial intelligence researchers face today, and this week’s GitHub project named fastText aims to solve that challenge. Automatic … continue reading

DMCA.com Protection Status