Data is the information that drives business. It can be structured in rows and columns, like a customer name, address, and phone; and it can be unstructured, such as an email or a social media post. Structured data is what is populated in Relational Database Management Systems such as those created by Oracle, IBM and Microsoft, and open-source PostgreSQL and MySQL, among others. That data can be accessed using the standard Structured Query Language (SQL). Unstructured data resides in what are called NoSQL databases, such as Cassandra, Couchbase, MongoDB and many, many others. Many organizations today run both kinds of databases.
Once the data is stored, it must be easily retrievable, found amid the mountains of data organizations collect, and made available at scale. Numerous tools exist for those jobs, including Hadoop, Apache Spark and many more. It is through the collection and analysis of data that businesses can make decisions that affect their bottom line.
Fifteen years ago, the Hadoop data management platform was created. This kicked off a land rush of companies looking to plant their flags in the market and open-source projects began to spring up to extend what the platform was designed to do. As often happens with technology, it ages, and newer things emerge that either … continue reading
Data is becoming more important than ever, and developers are beginning to realize they need better ways to harness and work with data. The problem, however, is that data isn’t handled the same way development is and therefore it can become a time-consuming and complex process. “The rise of git, docker, and DevOps has created … continue reading
Facebook wants to avoid another data privacy blunder. The company is quickly trying to address a problem it found with its Groups API access. Last year, in an effort to better protect users privacy, Facebook removed and restricted a number of developer APIs — Groups API was one of these restricted APIs. Before restrictions were … continue reading
Earlier this week at Microsoft Ignite, Microsoft announced the first new service in Microsoft 365 since they released Microsoft Teams in 2017. Project Cortex is a solution that will allow business users to gain valuable insights with their data, in ways they previously couldn’t. Over the years Microsoft has created a number of different ways … continue reading
As data science becomes more and more important, so does data visualization. Data is practically useless if you don’t have a human-readable way of communicating the insights from that data with the business. This week, we’re highlighting Perspective, which is an open-source tool for creating interactive visualizations of large, real-time datasets. Perspective is a project … continue reading
The Linux Foundation has announced its intent to form a new project called Alvarium. Project Alvarium will be focused on building a Data Confidence Fabric (DCF). This will help facilitate trust and confidence in data across heterogeneous systems. According to the Linux Foundation, a DCF is a framework that inserts trust into a data’s path, … continue reading
Netflix has announced that it is open sourcing Polynote, which, as the name implies, is a polyglot notebook. Polynote provides Scala support, Apache Spark integration, as-you-type autocomplete, and multi-language interoperability with Scala, Python, and SQL. According to Netflix, Polynote will allow data scientists to integrate its JVM-based machine learning platform with Python’s ecosystem of machine … continue reading
The international standards organization Object Management Group (OMG) announced that it has begun working on defining artificial intelligence standards. These standards will be designed to help “accelerate and improve the creation of useful AI applications,” OMG explained. “When a technology area reaches a certain degree of maturity, standards enable innovation, rather than impede it,” said … continue reading
Databricks has announced it is donating its open-source data lakes project to the Linux Foundation. Delta Lake is designed to improve the reliability, quality and performance of data lakes. Databricks first announced the project in April. “Today, nearly every company has a data lake they are trying to gain insights from, but data lakes have … continue reading
Melissa has announced new updates to its customer data verification solution Unison. Unison is a browser-based data cleaning and reporting solution designed to help data stewards create and maintain data quality without any programming knowledge. New features include a wizard-based matching interface, fuzzy match scoring, and improved reporting. “Our existing MatchUp deduplication software is known … continue reading
Companies are obsessing over data — whether it’s gathering data, analyzing data, or gaining insights from that data. But perhaps they’re not making the most of that data. The biggest challenge when it comes to data is not in the collection, storage, or analysis of that data, it’s how to effectively use that data to … continue reading