Considering Big Data Analysis as a Social Science
Driverless vehicles use a combination of weak artificial intelligence (AI) technologies. Weak AI, also known as narrow AI is non-sentient machine intelligence focused on a specific task. Strong AI, for the purposes here, is a still theoretical machine able to think as flexibly as humans. The self-propelled home vacuum is a good example of weak AI. As it vacuums, bumping into walls and furniture, it eventually learns the layout of the home and will eventually stop getting stuck in corners or falling down stairs. Siri is another good example of weak AI. Siri is a machine navigation system that learns the operator’s individual language usage (and search preferences) to return results from the cloud that are individualized to its user. Like driverless vehicles, Siri is a hybrid form of AI in that it combines narrow AI technologies with massive amounts of data available in the cloud.
The massive amounts of information that the Delphi car collected can be described as “Big Data”. Big data is a broad term for data sets so large and so complex that traditional processing applications are inadequate. And we have a lot of it. Everyday 2.5 exabytes (1 million terabytes) are created and that data takes multiple forms such as video, picture images, as well as text. Data scientists are currently working out new ways to use big data to answer longstanding, troublesome questions. If you are still with me – this is the important part! According to Forbes, the demand for big data analysts is the fastest growth field last year (growing at a rate of 123%). A 2011 McKinsey report estimates that there will be between 140,000 and 190,000 unfilled data analytics positions by 2018. Every industry, including those that hire social scientists need data analysts.
In a December 2012 article for Communications of the ACM, Vasant Dhar says that, “a data scientist requires integrated skill set spanning mathematics, machine learning, artificial intelligence, statistics, databases, and optimization, along with a deep understanding of the craft of problem formation to engineer effective solutions.” He goes on to outline three primary skill sets that data scientists need: 1) statistics specifically a working knowledge of hypothesis testing and multivariate analysis 2) computer science specifically how data is internally represented and 3) knowledge about correlation and causation. Social scientists certainly teach all of these skills, but are we doing so with specific jobs in mind – or with potential graduate school applications in mind? As this emerging field begins to understand the full implications of how big data can inform research questions, most graduate programs in data science target questions in business and marketing but why not social sciences, public health, education, politics, and media among others?
Social scientists are notably data driven, both quantitative and qualitative. Big data has room for everyone. The trick is to stay current and understand how what we teach relates to these broader fields. Unfortunately it is absolutely necessary to justify our continued existence (ahem, funding) in the academy. We must think of ourselves as contributors to emerging fields requiring cross disciplinary knowledge and deep interpretation. In my next post, I will delve more deeply into the uses of big data for social science research and questions that can be answered using big data analytics.
“In God we trust; all others must bring data” – credited to W. Edwards Deming
Further Reading:
Dhar, V. (2013). Data science and prediction. Communications of the ACM,56(12), 64-73.
Dilsizian, S. E., & Siegel, E. L. (2014). Artificial intelligence in medicine and cardiac imaging: Harnessing big data and advanced computing to provide personalized medical diagnosis and treatment. Current cardiology reports,16(1), 1-8.
Ferrucci, D. A., Levas, A., Bagchi, S., Gondek, D., & Mueller, E. T. (2013). Watson: Beyond Jeopardy!. Artif. Intell., 199, 93-105.
Issenberg, S. A more perfect union: How President Obama’s campaign used big data to rally individual voters. MIT Technology Review (Dec. 2012).
Markoff, J. (2011). Computer wins on ‘jeopardy!’: trivial, it’s not. New York Times, 16.