Five V'S Of Big Data
Introduction
Big data is characterized by various Vs most prominent being volume, velocity and variety. Big data Vs are used to define big data. Big data can be data that is of enormous magnitude that requires specialized tools beyond our simple data management tools. Big data comes from varied heterogeneous sources. This data can be structured, semi-structured or unstructured. To work with big data, data scientists must understand the Vs that define big data. Comparing and contrasting the major three Vs, volume, velocity and variety of big data. Volume, velocity and variety can be considered features of big data. They complement each other to define big data.
i. Volume
I think of volume as the "BIG" in big data. Big data is, of course, big of incomprehensible proportions. This means only data of enormous size can be considered big data. The size of the data determines whether it is big data or not. For example, data created by social media platforms is of enormous volume and is considered significant volume. The data created by Snapchat, over 210 million snaps daily, and the 4 petabytes of daily data generated by Facebook are all examples of big data. The volume makes a difference in what is to be considered big data. Therefore, volume is a better determinant of big data.
ii. Velocity
Velocity, the speed at which big data is generated and put to use, is another key feature of big data. Big data is also recognized from the rate at which the data is generated. Looking at Facebook as the example I used above, the speed at which this 4 petabytes of data is generated a day makes it big data as we understand too that velocity is crucial in the real-time application of big data like in Airline Jet engines. How fast this data is generated and processed makes it big data.
iii. Variety
Variety is a distinctive feature used to define the forms of big data. We have structured big data, the data with a defined format, and there is unstructured data like the one we get from the web. Finally, there are semi-structured big data, one mixed between structured and unstructured data.
Of these three, volume is big data's best or first determinant. After all, when we think of big data, we think of the size that differentiates it from our common everyday data. Velocity and variety are crucial when working with big data, but I personally think they mainly supplement the critical feature, the volume of big data.
What role do you feel data quality plays in the overall importance of big data collection and analysis? How does it impact these three base elements?
Big data collection is an integral part of management and services. Organizations and companies make use of big data analytic systems that aid them in functionality in the form of improving customer service, operational efficiency, decision-making, etc. Data quality is critical as that is what big data processing is all about. Big data value is one crucial valuable feature by companies, researchers, and hospitals to make data-driven decisions that can improve business-related outcomes as well as patient outcomes.
Tyagi (2019) states that big data has to have value to be of any importance. Big volume data, high velocity and a variety of big data with no value are of no fundamental application importance. The benefits include more effective marketing, new revenue opportunities, customer personalization and improved operational efficiency. With an effective strategy, these benefits can provide competitive advantages over rivals. After data is collected from various sources in its varied forms, data analyst and data scientist are then put to the world to process, clean and analyze the big data to find patterns. These patterns are the value of the collected data. As Cai & et al. (2015) state, big data analyses and use have to be based on accurate and high-quality data, which is crucial to getting the value from big data that scientists seek. True is that getting accurate, high-quality, valuable data from big data can be challenging given the massive volume of data from varied sources streaming in high velocities.
Reference:
Cai, L. and Zhu, Y., 2015. The Challenges of Data Quality and Data Quality Assessment in the Big Data Era. Data Science Journal, 14, p.2. DOI: http://doi.org/10.5334/dsj-2015-002
Taylor, D. (2022, March 26). What is big data..Introduction, types, characteristics, examples. Retrieved May 16, 2022 Guru99. https://www.guru99.com/what-is-big-data.html
Tyagi, V. (2019, January 10). 5 V’s of big data. GeeksforGeeks. https://www.geeksforgeeks.org/5-vs-of-big-data/