Return Policy Guide is a website that aggregates the policies of many major retailers, so if a customer is unhappy with his purchase, he can learn how to most effectively return the product and get his money back.
The site handles a ton of data, from the individual policies themselves, to ads on the site, to user reviews and more. Ashutosh Panda, senior developer at the company, explained that his developers do not get a say about the data creation or input, but are responsible for the data they choose to use in their applications.
The SD Times Data Quality Project
The first step to ensuring data quality is validation
Data quality: It’s a matter of trust
“We just give the developer a set of problems and then the solution we want from him,” Panda said. “That’s it. The process that he uses is absolutely upon him. So he is the one who makes the call as to which data to include, which data not to include, to get the best results. And so I would say that, yes, the developer is essential in this process, but equally responsible are the customers we are getting the reviews from or the data from. So yes, developer is important, but we cannot say the developer is the person responsible for the data that we get.”
While the accuracy of data is extremely important, Panda said the biggest issue his organization is with data authenticity. “I will give you one example. If you have bought one item, and I asked you for the review, you have experienced it, you have used it firsthand. And so you give me a review, which is really authentic, which can be trusted by other customers.”
Yet sometimes, a person sees the reviews and creates a review of his own from that — good, bad or otherwise — in an attempt to manipulate others to either make a purchase or to shop somewhere else. The review might be accurate based on the others it was influenced, but it is not authentic, because that reviewer has not used the product.
Panda said Return Policy Guide has a methodology to determine who is authentic responder that includes a series of questions about a particular profession or age range. “So before coming to our original set of questions, we take them through a series of three to five questions, and their answers define the authenticity of the data that we get on our next set of questions, the original set of questions that we were going to pose to them.”
The amount of time developers need to spend to ensure the data they use is of good quality depends on the quality of the data set they are provided, Panda said, as well as the question that is pushed. So time can be defined as working time, or the number of hours given to a single data set. If they’re asking a broad question, they’ll use all the data that comes in. But if there is something very specific to be found out, he said, “Before we take the data, we have to have like a week of data cleaning, data managing and data validation. Then we need to sit for at least two, three hours each day for a couple of days to choose the correct type of questions and to decide what kind of people — range of age, or profession — the question should be posed to. So even before receiving the data set, developers start working on data so when a person comes to us, we’re making sure that the data set we receive back is a good data set to start our work with. If it’s very dirty,then the developer has to sit for like eight hours a day, for a week or two weeks, to get it right and then put it into the model for the best results.”