What is Big Data and how is it different from other data?
Big Data is a term that has been buzzing through the media, social media, the business world, and the technology industry for years, but some people might not know what exactly distinguishes it from plain data. Big data not only describes the sheer size of massive data sets, but it also refers to its other telltale characteristics—or what has become known as the 3Vs of Big Data:
Velocity: very fast, unpredictable streaming speeds with changing rates and surges of activities.
Volume: enormous quantities of data (terabytes, petabytes, and more!).
Variety: different formats of data that are loosely structured, unstructured, or structured.
In other words, Big Data builds up extremely quickly in this digitalized world as it surrounds us daily in various forms: e-mails, video files, SMS messages, Twitter tweets, texts, mp3 files, PDFs, web pages, elevator logs, word documents, analog data, metadata, images, log files, and so on. Conversely, run-of-the-mill data in the traditional sense can be thought of as bits of information (i.e., numbers, words/text) or sets of quantitative or quantifiable values that can be represented in the following structures: tables with tabs and columns, tree diagrams that branch off from the origin to other related points, or graphs connecting points together. This regular structured data—information that exists in permanent or set fields within records or files—can be easily inputted, stored, managed, accessed, queried, and analyzed in neatly organized tables of rows and columns within relational databases. On the other hand, with its signature 3V traits, Big Data accumulates so rapidly from a plethora of sources—Hadoop, Redshift, MongoDB, Cassandra, Web logs, social networks, Internet traffic, print streams, and document archives (just to name a few!)—that it becomes nearly impossible to manage in traditional relational databases. Not only does this make Big Data management and storage vastly different from normal (or structured) data that most people are accustomed to handling, but it also means that organizations now require powerful, integrated solutions for making this information usable for business analytics practices (think big data analytics).