Introduction to Big Data#

About Big Data#

Big data is a common topic of discussion in the business intelligence world, and you may have heard about it a lot. But what is the difference with small data?

Definition#

Typically, data experts define big data by the “Three V’s”: Volume, Variety, and Velocity. They’re what make big data and small data different from each other. Veracity, Data Location, and Infrastructure are also some major aspects of big data.

Volume

Data volume is the sheer amount of data you have to process. Big data is often used to describe massive chunks of unstructured information. Small data involves more precise, bite-sized files.

Variety

Data variety refers to the number of data types. Big data might refer to multiple types while small data tends to focus on one type of data.

Velocity

Data velocity describes the speed at which information is acquired and processed. While big data involves huge chunks of information that can only be analyzed periodically, small data is typically compact enough to be processed quickly, often in real-time.

Veracity

Data veracity describes the quality of data. Big data is not guaranteed to all be in the same format or of the same quality. Rigorous data validation is required before processing. Small data contains less noise as data is collected in a controlled manner.

Data Location

Due to the sheer amount of data, big data is mostly on distributed storage in the Cloud or on external file systems. Meanwhile, small data are located within an enterprise, local servers, etc.

Infrastructure

Small data can be processed on individual computers, such as laptops or workstations. This isn’t the case with big data, which is usually processed simultaneously on multiple high-end servers.