Wednesday, August 7, 2013

Can you trust the data you use? We tell you the truth.

Thus far, we have gone over how Big Data is extremely useful in analytical decision making and the cases in which it can have unwanted consequences.

Surely though, Big Data is not the panacea to all our problems. It comes with its own set of problems, one of the biggest ones being reliability of the data you use to make your decisions. Clearly, if you use the wrong set of data, the outcome is going to be an undesirable one. We will take you through the factors you need to go over before using Big Data for decision making.

Big Data's characteristics are usually described by using the 3Vs:



The above is a popular, ubiquitous 3V framework that is usually used to show the capacity and potential that Big Data can hold in terms of its speed, volume and variety for our decision making processes.

However, what may be the shortcomings of this process?

The decisions we make must be based on truthful data otherwise we will end up wasting our time, money and may have dire consequences. Hence, we put in the next V you need to consider VERACITY.

Example, you may have found the data on social networking sites possibly useful to decide which promotions to use to market your product. But social media data can be quite uncertain and unreliable, one may find it doubtful to project his/her sales based on the data from social media.

We also need to consider VARIABILITY of data. Do not confuse it with Variety, they are completely different. For example, the chocolate cake you order from the cake shop near you will be the same chocolate you order over the next 3 days. However, the chocolate cake may taste different on each of the 3 days. This is variability. In technology terms, it means the data is possibly undstrcutured or keeps changing rapidly. This can be a very worrying proposition for someone who has to make a decision on this sort of data.

The other V of data you want to consider is VALUE. Self-explanatory you might think but people forget this very often. You want to ensure that the data you use creates value for you and you need to ensure this by using data that correlates and is applicable to the business process and business decision you want to make.

Hence, all the data you use may not be 100% accurate. Some amount of it is bound to have some 'noise' because you want the widest sample size possible before you sort them. However, the truth is that you can only trust the data if you pick the right ones and understand completely to what purpose and how you will be analyzing and using the data. We need to able to understand the trade-offs of using certain vs uncertain data.

No comments:

Post a Comment