Real-Time Solution Requirements
From the use cases previously described around gaining insight from stored and real-time data, it is clear the two categories share the similar goal of extracting value from data. However, digging deeper, there are several significant reasons to recognize the solutions required to address each category as distinct. These reasons become apparent as we analyze solution requirements according to the 3 V’s of Big Data:
|Big Data Solution Requirements||Volume||Velocity||Variety|
|Stored Big Data||Periodic analysis of up to multi-petabyte data sets||Data is stored – pace of ingest is not relevant||Ingest structured and unstructured data|
|Real-time Big Data||Instant analysis of up to multi-terabyte data sets||Data is off-the-wire and can require extremely rapid pace of data ingest||Ingest primarily unstructured data|
To truly understand these solution requirements, it is first essential to understand the difference between real-time querying and real-time data. Real-time querying refers to the speed at which a solution delivers an answer to the question asked. Real-time data refers to the freshness of the data being analyzed.
Solutions geared towards analyzing stored Big Data can strive towards offering real-time queries, but fundamentally are not built to handle analysis of real-time data. The solutions around stored Big Data analysis are primarily focused around Hadoop. The solutions are generally capable of delivering periodic querying (and sometimes attempt to deliver real-time querying) of very large volume stored data, but are not built to handle high velocity data ingest, and are not built to handle real-time data.
Solutions geared towards analysis of real-time Big Data require fundamentally different capabilities. Real-time Big Data is comprised primarily of machine data, defined as all of the data generated by applications, servers, network and security devices, virtualization infrastructure, sensors, meters, and every other component of an organizations IT environment. Machine data contains an absolute and authoritative record of all the events that occurred (including user activity, transaction details, application and system behavior, network and security anomalies, and much more), and according to IDC will comprise over 40% of the overall volume of Big Data by the year 2020.
Analysis of real-time machine data requires not only real-time querying, but also the ability to ingest an extremely high velocity of data and the ability to make sense of unstructured data. For today’s enterprise, making sense of real-time machine data is absolutely essential, as acting on current information is the only way to ensure operational efficiency and business health.