We are on the cusp of a “Big Data” Revolution. Increasingly large datasets are being mined for important predictions and often surprising insights. We are witnessing merely the latest stage of the Information Revolution that has transformed our society and our lives over the past half century. But the big data phase of the revolution promises (or threatens, depending on one’s perspective) a greater scale of social change at an even greater speed. The scale of the Big Data Revolution is such that all kinds of human activities and decisions are beginning to be influenced by big data predictions, including dating, shopping, medicine, education, voting, law enforcement, terrorism prevention, and cybersecurity. This transformation is comparable to the Industrial Revolution in the ways our pre−big data society will be left radically changed.
The potential for social change means that we are now at a critical moment; big data uses today will be sticky and will settle both default norms and public notions of what is “no big deal” regarding big data predictions for years to come. Individuals have little idea concerning what data is being collected, let alone shared with third parties. Existing privacy protections focused on managing personally identifying information are not enough when secondary uses of big data sets can reverse engineer past, present, and even future breaches of privacy, confidentiality, and identity. Many of the most revealing personal data sets such as call history, location history, social network connections, search history, purchase history, and facial recognition are already in the hands of governments and corporations. Further, the collection of these and other data sets is only accelerating.
As the amount and variety of data continue to grow, defining the catchall term “big data” can be elusive. Technical definitions of big data are often narrowly constrained to describe “data that exceeds the processing capacity of conventional database systems.” Technologists often use the technical “3-V” definition of big data as “high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.” Peter Mell, a computer scientist with the National Institute of Standards and Technology, similarly constrains big data to “[w]here the data volume, acquisition velocity, or data representation limits the ability to perform effective analysis using traditional relational approaches or requires the use of significant horizontal scaling for efficient processing.”
We prefer to define big data and big data analytics socially, rather than technically, in terms of the broader societal impact they will have. Mayer-Schönberger and Cukier define big data as referring “to things one can do at a large scale that cannot be done at a smaller one, to extract new insights or create new forms of value, in ways that change markets, organizations, the relationship between citizens and governments, and more.” We have some reservations about using the term “big data” at all, as it can exclude important parts of the problem, such as decisions made on small data sets, or focus us on the size of the data set rather than the importance of decisions made based upon inferences from data. Perhaps “data analytics” or “data science” are better terms, but in this paper we will use the term “big data” (to denote the collection and storage of large data sets) and “big data analytics” (to denote inferences and predictions made from large data sets) consistent with what we understand the emerging usage to be.
In a prior article, we argued that nontransparent collection of small data inputs enables big data analytics to identify, at the expense of individual identity, and empower institutions that possess big data capabilities. In this paper, we argue that big data, broadly defined, Is producing increased powers of institutional awareness and power that require the development of Big Data Ethics. We are building a new digital society, and the values we build or fail to build into our new digital structures will define us. Critically, if we fail to balance the human values that we care about, like privacy, confidentiality, transparency, identity, and free choice, with the compelling uses of big data, our big data society risks abandoning these values for the sake of innovation and expediency.





