Big Data - Challenges and Opportunities
From a technical standpoint, Big Data can be defined as data that is so large, complex and fast changing that none of the traditional data management tools are able to store and process it efficiently.
Therefore the term "big data" or "big data technologies" refers to the technologies that can store, manage and analyse large data sets to solve complex problems. Typical magnitudes are petabytes and exabytes of data.
Similarly the term "big data analytics" refers to the process of examining large data sets to uncover unknown correlations and hidden patterns. Using "big data technologies" to perform "big data analytics" has come into limelight, because it has become economically relevant for businesses, governments and consumers.
Categories of Big Data
"Big Data" can categorized into:
- Structured Data
- Unstructured Data
- Semi-structured Data
Structured Data is easiest to work with when developing techniques for data processing. Examples of structured data includes normalized data stored in DBMS systems, text files with specific file formats etc... Unstructured Data is most difficult to work with. It is not trivial to extract syntactic and semantic meaning especially for data generated by disparate systems. Examples of unstructured data could be social media feeds, transcripts of call-center interactions etc... Semi-structured data (for example: XML, JSON) provide some structure but otherwise have characteristics of unstructured data.
Characteristics of Big Data
There are some common characteristics which are noticeable for all big data sets:
- Volume: The name "big data" implies that data sizes will be huge
- Variety: Heterogeneous sources of data
- Velocity: Speed of generation of data
- Variability: Inconsistent and missing data
- Veracity: Truthfulness about data
- Value: Low value of data, relative to volume
Big Data challenges and opportunities
Implementing Big Data technologies comes with its own set of unique challenges. Overcoming these challenges unlocks great potential which is a great opportunity for any technologist.
Challenge #1
More than 90% of data that is created is unstructured. Therefore the challenge is to convert unstructured data into structured data before an attempt can be made to understand it.
Challenge #2
How to efficiently store big data so that it can be retrieved in a timely manner.
Challenge #3
How to integrate disparate sources (Email, Social Media,CRM etc...) to create a single coherency data source.
Challenge #4
How to validate similar data from different data sources.
Challenge #5
How to generate insights in timely manner (in a matter of days and hours and not weeks)