Nowadays, we are drowning in data which means that a huge amount of data is daily generated from various sources (Government, health, institutes, industry, media, social networks, mobile networks and so on). Among these various sources data is generated at an unprecedented rate from offline and online social networks like facebook, linkedIn, twitter, orkut, telephone call networks, disease infection networks, sensor networks and such like. Data is everywhere and we are entering the age of Big data.
Big Data is not a framework, not a language, not a technology. Actually, it is a problem statement. Earlier we measured data in Bytes (B), Kilobytes (KB), Megabytes (MB), Gigabytes (GB) and Terabytes (TB) and which are being handled till now. Now, Big Data is the age of dealing with data ranging from Terabytes (TB), Petabytes (PB), Exabytes (EB), Zettabytes (ZB) and Yottabytes (YB). But the traditional data analytics may not be able to handle such large quantities of data. So, the term Big Data refers to all the data that is being generated across the globe at an unprecedented rate.
This data could be either structured, unstructured or semi structured. Most of the data were born digital as well as exchanged on internet today due to the fast spread of information technology. For example Google processes data of hundreds of Petabytes (PB), Facebook generates log data of over 10PB per month, Baidu a Chinese company processes data of tens of PB, and Taobao, a subsidiary of Alibaba, generates data of tens of Terabytes (TB) for online trading per data 1. To such a large collection of data also called Big data we are facing the most important issue of privacy preserving.
Among various sources of Big data, one of the most representative and relevant source is the social media and social network. For example, on average, 72 hours of videos are uploaded to YouTube in every minute 7. Therefore, we are accosted with the main challenge of collecting and integrating massive data from widely distributed data sources and then ensuring the privacy of data.