Data is the information that drives business. It can be structured in rows and columns, like a customer name, address, and phone; and it can be unstructured, such as an email or a social media post. Structured data is what is populated in Relational Database Management Systems such as those created by Oracle, IBM and Microsoft, and open-source PostgreSQL and MySQL, among others. That data can be accessed using the standard Structured Query Language (SQL). Unstructured data resides in what are called NoSQL databases, such as Cassandra, Couchbase, MongoDB and many, many others. Many organizations today run both kinds of databases.
Once the data is stored, it must be easily retrievable, found amid the mountains of data organizations collect, and made available at scale. Numerous tools exist for those jobs, including Hadoop, Apache Spark and many more. It is through the collection and analysis of data that businesses can make decisions that affect their bottom line.
Hybrid work structures have become a popular choice for many modern companies. Not only do they add more flexibility when scaling a business, but they also help to create a better work-life balance for all employees. However, while maintaining a decentralized workforce does come with a number of advantages, there are also certain risks associated … continue reading
Teradata, a provider of data analytics solutions, today announced a new database offering for managing vector data. Teradata Enterprise Vector Store manages unstructured data in multi-modal formats like text, video, images, and PDFs. It can process billions of vectors and integrate them into pre-existing systems, and offers response times in the tens of milliseconds. According … continue reading
With AI making its way into code and infrastructure, it’s also becoming important in the area of data search and retrieval. I recently had the chance to discuss this with Steve Kearns, the general manager of Search at Elastic, and how AI and Retrieval Augmented Generation (RAG) can be used to build smarter, more reliable … continue reading
High-quality data is the key to a successful AI project, but it appears that many IT leaders aren’t taking the necessary steps to ensure data quality. This is according to a new report from Hitachi Vantara, the State of Data Infrastructure Survey, which includes responses from 1,200 IT decision makers from 15 countries. The report … continue reading
The data information and analytics company LexisNexis today announced the launch of Nexis Data+, a new API that provides access to corporate, legal, financial, and compliance data, as well as generative AI-approved licensed news content. Developers and data analysts will be able to integrate that data into their existing tools, platforms, AI models, and workflows … continue reading
With all the potential benefits promised by the use of AI, it’s no wonder companies are wanting to get in on the action. But a new survey from Capital One reveals a stark disconnect between how confident business leaders are in their company’s ability to implement AI and how the technology professionals actually implementing the … continue reading
The data platform Snowflake is hosting its annual user conference, BUILD 2024, bringing together data scientists and developers and sharing new functionality across its platform that will enable customers to get more value from their data and build AI functionality on top of it. New updates across Snowflake platform enable greater collaboration, flexibility, and security … continue reading
Elastic is implementing a new approach for storing vectorized data that will require 95% less memory. Better Binary Quantization, or BBQ, is based on a technique called RaBitQ, which was developed earlier this year by researchers at Nanyang Technological University Singapore. According to Elastic, the biggest differences between BBQ and native binary quantization are that: … continue reading
The U.S. Postal Service (USPS) delivers mail to almost 167 million addresses in the United States, and anyone who has tried to order something online has likely had the experience of not getting a package delivered on time (or at all) because the address was entered incorrectly or in a weird format, causing shipping delays. … continue reading
Microsoft has announced that GitHub Copilot is now integrated with Data Wrangler, an extension for VS Code for viewing, cleaning, and preparing data. By integrating GitHub Copilot capabilities into the tool, users will now be able to clean and transform data in VS Code with natural language prompts. It will also be able to provide … continue reading
Google has announced that it is open sourcing a new Java-based differential privacy library called PipelineDP4J. Differential privacy, according to Google, is a privacy-enhancing technology (PET) that “allows for analysis of datasets in a privacy-preserving way to help ensure individual information is never revealed.” This enables researchers or analysts to study a dataset without accessing … continue reading
Having the correct customer information in your databases is necessary for a number of reasons, but especially when it comes to active contact information like email addresses or phone numbers. “Data errors cost users time, effort, and money to resolve, so validating phone numbers allows users to spend those valuable resources elsewhere,” explained John DeMatteo, … continue reading