The sustained success of companies that run over the Internet like Google, Facebook, Amazon, etc. provides evidence that there is a fourth factor of production in today’s hyper – connected world. Besides, labour, resources and capital,information, no doubt has become an essential element of competitive differentiation.
Not just Internet companies, but companies in every sector are now looking to trade gut-feeling accurate data-driven insight to achieve effective business decision making. No matter what problem needs attention -anticipated sales volumes, customer product preferences, and optimized work schedules – it is data that now has the power to help businesses succeed. Big data includes information garnered from social media, data from internet-enabled devices (including smartphones and tablets), machine data, video and voice recordings, and the continued preservation and logging of structured and unstructured data. It is typically characterised by 4 V’s – Volume, Variety, Velocity and Veracity.
Volume: Volume always seems to head each list when we talk about big data and analytics. It is generally understood that the amount of data being created is vast compared to traditional data sources. Another general agreement is that if volume is in the gigabytes, it is probably not Big Data, but at the terabyte and petabyte level and beyond it may very well be. For example, Wal-Mart records more than 1 million customer transactions per hour, generating more than 2.5 petabytes of data.
Variety: Variety refers to the different formats of data that do not lend themselves to storage in structured relational database systems collated from different sources. This refers to data that is being created by machines as well as people. Unstructured or semi-structured data that includes data generated from sensors, devices, RFID tags, machine logs, cell phone GPS signals, DNA analysis devices, and more accounts for nearly 90% or more of data in organisations
Velocity: Velocity aptly describes data – in – motion for example, the stream of readings taken from a sensor or the web log history of page visits and clicks by each visitor to a web site. This large amount of data coming at super speeds need to be captured, stored and analysed. Data is being generated extremely fast — a process that never stops, even while we sleep. Another dimension of data Velocity is the time frame for which it will be valuable, since data changes rapidly and may lose its meaning and importance. Real Time Analytics is another aspect of the velocity of big data. IT architecture of capture, analysis, and deployment must support real-time turnaround (in this case fraction and must do this consistently over thousands of new visitors each minute.
Veracity: The eminence of the data being captured for Big Data handling varies significantly. The quality or lack of quality of data captured naturally has the potential to affect the accuracy of analysis using that data. Since data is gathered from a large number of sources, the importance to test the veracity/quality of the data, cannot be stressed upon enough. Earlier, it may seem virtually impossible to effectively quality check and verify data and reject any erroneous or anomalous data, now, however, with the sophistication of tools such as ‘grep’ and ‘awk’, it is possible to ensure that nothing ‘dodgy’ gets into the analytical mix.
This was simply a brief rundown of the big 4 Vs that characterize big data analytics.
Author: Richa Kapoor
Header Image: EY Insights on big data
Body Image: cloudfront.net