In a previous post, we talked about types of Big Data. However, another way to look at big data and define it is by looking at the characteristics of Big Data. In other words, what helps to identify makes Big Data as data that is big.
This post will explain the 6 main characteristics of Big Data. These characteristics are often known as the V’s of Big Data. They are as follows
Volume has to do with the size of the data. It is hard to comprehend how volume is measured in computer science when it comes to memory for many people. Most of the computers that the average person uses works in the range of gigabytes. For example, a dvd will hold about 5 gigabytes of data.
It is now becoming more and more common to find people with terabytes of storage. A terabyte is 1,000 gigabytes! This is enough memory to hold 500 dvds worth of data. The next step up is petabytes which is 1000 terabytes or 5,000,000 dvds.
Big data involves data that is large as in the examples above. Such massive amounts of data called on new ways of analysis.
Variety is another term for complexity. Big data can be highly or lowly complex. There was a previous post about structured and unstructured data that we won’t repeat here. The point is that these various levels of complexity make analysis highly difficult because of the tremendous amount of data mugging or cleaning of the data that is often necessary.
Velocity is the speed at which big data is created, stored, and or analyzed. Two approaches to processing data are batch and real-time. Batch processing involves collecting and cleaning the data in “batches” for processing. It is necessary to wait for all the “batches” to come in before making a decision. As such this is a slow process.
An alternative is real-team processing. This approach involves streaming the information into machines which process the data immediately.
The speed at which data needs to be processed is linked directly with the cost. As such, faster may not always be better or necessary.
The quality of the data is what veracity is. If the data is no good the results are no good. The most reliable data tends to be collected companies and other forms of enterprise. The next lower level is social media data. Finally, the lowest level of data is often data that is captured by sensors. The differences between the levels is often the lack of discrimination.
Valence is a term that is used in chemistry and has to do with how an element has electrons available for bonding with other elements. This can lead to complex molecules due to elements being interconnected through sharing electrons.
In Big Data, valence is how interconnected the data is. As there are more and more connections among the data the complexity of the analysis increases.
Value is the ability to convert Big Data information into a monetary reward. For example, if you find a relationship between two products at a point of sale, you can recommend them to customers at a website or put the products next to each in a store.
A lot of Big Data research is done with a motive of making money. However, there is a lot of Big Data research happening that is driven exclusively by a profit motive such as the research being used to analyze the human genome. As such, the “value” characteristic is not always included when talking about the characteristics of Big Data.
Understanding the traits of Big Data allows an individual to identify Big Data when they see it. The traits here are the common ones of Big Data. However, this list is far from exhaustive and there is much more that could be said.