One of the biggest challenges faced by companies who work with large amounts of data is that their databases may end up with several instances of duplicate records, leading to an inaccurate overall picture of their customers. 

According to Tim Sidor, data quality analyst at Melissa, there are a number of reasons why duplicate records may end up in a database. They can be added unintentionally during the data entry process when data is entered across multiple transactions in different ways. Changes in how names are formatted, abbreviations of company names, or unstandardized addresses are common ways these issues can make their way into a database, he explained during an SD Times microwebinar in October.

This becomes a problem if the database is merged with another source because most database systems only provide basic string-matching options and will not catch those subtle differences.

RELATED CONTENT:

Using GPS location to obtain or target physical locations
How electronic identity verification helps financial companies stay on top of regulations for preventing financial crime
Proper identity verification can result in an increased trust with your customer base
The significance of national watchlist screening

Another way that these problems enter a database is that the database software itself adds every transaction as a new distinct record. There’s also the chance that a sales representative is intentionally altering contact information when entering it so that it appears like they’ve entered a brand-new contact. 

No matter how duplicate records end up in a database, it “results in an inaccurate view of the customer” because there will be multiple representations of a single contact, explained Sidor. Therefore, it’s important that companies have processes and systems in place to deal with those errors. 

One recommended way to deal with this is by creating what is called a “Golden Record,” which is the “most accurate, complete representation of that entity,” said Sidor. This can be achieved by linking related items and choosing one to act as the Golden Record. Once established, duplicates that have been used to update the Golden Record can be deleted from the database. 

This is set up by first determining what constitutes a matching record, which Sidor explained in greater detail in the microwebinar on Oct. 26. That episode focused more on matching strategies. Once the rules are established, a company can go in and identify matches and determine which record should be chosen as the Golden Record. That decision is based on metrics such as a Best Data Quality score – derived from the verification levels of the data points, most recently updated, the least missing data elements, or other custom methods. 

“The end goal here is to get the best values in every domain or data type and have the most accurate record, maybe retain the data or discard outdated or unwanted data, to create a single, accurate master database record,” Sidor said in the microwebinar. 

And once the current state of the database is addressed, there is also a need to prevent new duplicates from entering the system in the future. Sidor recommends having a point of entry procedure that uses that same matching criterion.

Melissa can help companies deal with this issue through its MatchUp solution, which automates the process of linking records and deduplicating the database.