This is Part II of a series of blogposts on Evidence-based Policymaking in the Public Sector. You can  find Part I 
here.

Decision-makers across the public and private sectors need accurate and timely information on economic activity for effective action and interventions. At present, however, the information available is highly fragmented, privately held, and low in accuracy, making coordination among economic actors difficult. The cost of acquiring information and increasing its accuracy is borne privately, leading to asymmetry in the marketplace and an increase in transactions costs as well as the discount rate for future transactions. As an alternative to these information sources, Big Data — high-volume, high-frequency, high-resolution data gathered from across a range of sources and actors — can thus, be a useful solution for policymakers across the world.

What is Big Data?

Big Data refers, essentially, to data that is so large and detailed that it requires dedicated tools and techniques to store, process, and analyse. Weather information beamed down from satellites, hospital records, customer sales data, or data from your health tracking smartwatch are all examples of big data. These differ from traditional types of data in at least three important ways, the “3Vs” being the most widely accepted: the volume of information collected from across a variety of sources; the velocity or the frequency at which data is collected, stored, and made available for utilisation; and the variety or range of information that can possibly be generated.

Big Data is everywhere around us today, largely because of steep improvements in computational power, and correlated leaps in technological solutions, including storage and processes such as machine learning and natural language processing that are now able to utilise such data to generate complex and advanced insights.

Illustrative Data and Applications

Big data are being increasingly used in to drive policy design and decision-making processes, be it the use of nightlights to estimate GDP or Google searches for jobs to predict the level of unemployment. Recently, researchers have also begun combining big data from multiple sources to estimate or predict economic activity. Today, there even exist products on the market that use satellite data to predict indices of specific economic activities, such as manufacturing in China or ship movements in marine channels, but these are highly specialised and too costly for most users and decision-makers.

Governments use big data to study citizen and employee behaviour, and internal operations, and identify possible process inefficiencies and even redundancies, which can then be swiftly addressed.  Singapore, for instance, uses advanced analytics techniques to monitor over 12 million transactions every single day and operate its transport system effectively. Similarly, in the United States, for instance, research with the US Department of Veterans Affairs showed that the automation and redesign of their services through “Big Data applications reduce[d] offline and administrative processing, and optimise[d] current functions”. In general, policymakers can then use insights drawn from such data to improve existing services, design new ones, and even customise policy solutions to suit recipients’ needs.

Big Data has also helped governments effectively detect and reduce errors, irregularities, and fraud. For example, as long ago as 2011, the German Federal Labour Agency had reduced fraudulent benefit claims by 20% by incorporating the use of analytics tools. Similarly, the Irish Tax and Customs Authority, has been using Big Data to “develop predictive models to assist in the better targeting of taxpayers for possible non-compliance/tax evasion, and liquidation” with success.

This approach to policymaking can bring down costs as policymakers can use data-driven approaches to identify and implement the most promising interventions. Further, more (often ongoing) analyses of these interventions can help them innovate upon their offerings — an example of “design thinking” in action — thus leading to compounding improvements in policy efficiency over time. Finally, incorporating citizen preferences, feedback or responses into iterative policymaking process may also serve to bolster the government’s perceived legitimacy in the eyes of the public, especially among historically marginalised communities that have traditionally lacked representation in the policymaking processes or bodies. This also lends the added advantage of inclusivity and diversity to Big Data.

Some countries have made significant strides in gathering, studying, and applying data for governance — and consequently step up the effectiveness of their policy processes — through dedicated initiatives. Examples of these include the European Statistics Office’s Big Data Group, the Big Data Project established under the United Kingdom National Office of Statistics, and the New York Mayor’s Office of Data Analytics. Thanks to these efforts, there is even a new term to describe the use of data analytics for public policy: “policy analytics”

 

Benefits of Big Data in the policy cycle

Figure 1: Benefits of Big Data in the policy cycle as described by Pencheva et al.

In India, cognisant of the rewards of digitalisation, the government has undertaken multiple important steps towards embracing a more tech-oriented identity. Through what is the world’s largest telecentre scheme, it launched the Common Services Centres (CSCs) that provide e-governance, banking, education, healthcare, legal, and a variety of other services to the remotest of the country’s villages. Over the past decades, it has undertaken significant investments in the ICT network and health across the country.Demand for technology has burgeoned too: India has the second-highest number of both mobile phones and internet users in the world. These facts are evidence of the rich potential for the of Big Data, and indeed, the central as well as various state governments have been working towards this end through initiatives such as the Kerala government’s move to water supply and distribution tracking in Thiruvananthapuram, the Andhra Pradesh government’s real-time monitoring system for departmental performance, and the development of a Big Data-sharing platform to allow inter-departmental data sharing in Rajasthan.

Methods

For the promise of Big Data to be realised, it is critical that stakeholders’ be able to apply analytical tools to test Data. Indeed, researchers note that “without the underlying analytical technology, the data revolution can be viewed simply as a shift in the scale of the available data rather than a transformational change.”

Simply put, it is one thing to collect large volumes of data, no matter how interesting or potentially useful, but entirely another to coax relevant, useful, and potentially actionable insights from them. Here, data acquisition and warehousing within a spatially-explicit cyber architecture can allow for meaningful combination of data across formats and at multiple spatial/temporal resolutions.

Recent methodological advances such as Small Area Estimation techniques, which are widely used in agriculture, public health, and nutrition research to draw inferences for localities, leverage the statistical power of large national surveys with inadequate sample sizes at the local level. Another suite of methods includes Spatial Disaggregation techniques, primarily developed for statistical downscaling of coarse-resolution climate or biophysical data using high-resolution local parameters. The SAE and SD techniques can allow policymakers and researchers to leverage a variety of primary and secondary data at different temporal and spatial scales, thereby overcoming the flaws in individual sources of data.

Explanation vs Prediction

Another advantage of the innovative use of Big Data in the public sector can be the enhanced ability to make predictions by harnessing the power of machine learning tools. Such predictions can potentially find use in assessing the take-up or effectiveness of policies by “developing different scenarios and accurately predicting their possible outcomes”, uncovering potential externalities or unintended consequences of policy implementation, and even bolstering public safety through better hazard preparedness and response efforts, e.g., by allowing authorities to identify emerging areas of tension and pre-empt them.

In such a role, models for predictive analysis differ from those used for explanatory purposes. Explanatory models use statistical inference to test causal hypotheses (i.e., does phenomenon x lead to or cause phenomenon y?) and to evaluate the explanatory power of underlying causal models. The innate structure of these models, while expressly suited to tested hypothesis, makes them a poor fit for use in making predictions. These differences are elaborated in Table 1 below. 

Table 1: Differences between explanatory statistical modelling and predictive analytics; drawn from Shmueli & Koppius (2011).

Step

Explanatory

Predictive

Analysis Goal

Explanatory statistical models are used for testing causal hypotheses.

Predictive models are used for predicting new observations and assessing predictability levels.

Variables of Interest

Operationalised variables are used only as instruments to study the underlying conceptual constructs and the relationships between them.

The observed, measurable variables are the focus.

Model Building Optimised Function

In explanatory modelling the focus is on minimising model bias. Main risks are type I and II errors.

In predictive modelling the focus is on minimising the combined bias and variance. The main risk is over-fitting.

Model Building Constraints

Empirical model must be interpretable, must support statistical testing of the hypotheses of interest, must adhere to theoretical model (e.g., in terms of form, variables, specification).

Must use variables that are available at time of model deployment.

Model Evaluation

Explanatory power is measured by strength-of- fit measures and tests (e.g., R2 and statistical significance of coefficients).

Predictive power is measured by accuracy of out-of-sample predictions.

Predictive analytics are been increasingly sought after in the public sector — an approach labelled “anticipatory governance” — with numerous use cases in areas like crime prevention, counter-terrorism, and disaster response, among others.

Challenges

It should be noted that while appealing in theory, the use of Big Data, especially for governance, has been criticised on multiple fronts. For instance, while the use of predictive technology may be incredibly helpful for reasons elaborated upon previously, it can all too often suffer from something called “machine bias” — a phenomenon where the data used for creating predictive models is unbalanced due to human bias or error and thus, leads to the creation of biased models. In the United States, for instance, one model of “predictive policing” has been repeatedly found guilty of perpetrating systemic racism as it is built upon data on arrest rates. Given the historically and disproportionately high levels of incarceration of African Americans (for instance, African Americans are twice more likely to be arrested as their White counterparts and five times more likely to be stopped without cause), such tools have led to innocent African Americans being flagged as “high risk” individuals or even being detained.

There are also concerns over such predictive models breaching citizens’ privacy. And without robust and fair data governance principles in almost every part of the world, Big Data on the public often contains “individually identifiable information – i.e. Big Meta Data – that, even when anonymised, could be attributed back to users”, exposing them to risks that citizens would never ordinarily sign up for.

The promise of inclusivity in Big Data can also be context-specific; for instance, data gathered through remote sensing technologies such as night lights can proxy for levels of economic activity in a way that does not differentiate between economic players, but on the other hand, sources like mobile phone signals or take-up of mobile banking services can be skewed towards the relatively better-off.

Further, the heterogeneous nature of economic activity across regions such as the Indian sub-continent makes it highly likely that the same set of observations and data sources will not be the best predictors for all regions. This is particularly true for the difference between rural and urban regions, as well as within rural regions across different parts of India (about a third of rural India does not have access to electricity). It is thus up to policymakers to be cognisant of the noise and biases inherent in the data, and the strengths and limitations of, data sources, as well as the controls that may have to be imposed during data analysis to account for imbalances in the data.

Yet, it is not solely these concerns that prevent governments from capitalising on Big Data; as mentioned previously in this piece, this can, to a large extent, be attributed to their capability and resource limitations. Big Data can be “noisy”, especially at the beginning of the analysis process. Cleaning such data can be “costly and time consuming”, which can encourage some governments to use it sparingly, and being forced to justify costs for use. Indeed, a survey identified inadequate budget resources as the main obstacle to the adoption of Big Data in government institutions, closely followed by lack of appropriate staff.

Organisational structures too might inhibit adoption as “the relatively siloed approach within which many public sector organisations operate causes a range of issues – from the technical interoperability of IT systems to the lack of comparable data parameters. Leaders may also lack the requisite will to make the jump to more data-driven governance approaches, either due to risk aversion or general tech-phobia.

Conclusion

Big Data is not infallible. With increased usage, its limitations have begun to become amply clear. An appreciation of the potential for such data in the public sector can detract from concerns over its misuse or misinterpretation. The world jointly still has a ways to go with regard to cementing principles around best practices, setting in place safeguards, and importantly, addressing existing worries around information abuse.

However, as for any promising technology, these caveats should be taken as guidelines for further conversations, examinations and experimentation, not as evidence against the use of Big Data because success stories around Big Data abound as well. In addition to the examples provided above, Big Data is finding use in innovations that range from helping firefighters work more safely, to making construction processes leaner, to drilling for oil more efficiently to reducing drug overdose deaths among people struggling with substance abuse.

The Indian School of Business (ISB) has long been involved in studying the use of Big Data to drive firm performance, draw market insights, and generate cutting-edge academic research outputs. Now, through a dedicated Market Insights Information System (MIIS), developed in collaboration with the National Skill Development Corporation (NSDC), Ministry of Skill Development and Entrepreneurship, it has begun work to capture data on emerging job sectors and skills into a consolidated, one-of-its-kind information system that will extract and provide real-time views into the evolving nature of the Indian labour market at the country and regional level. This is a prime example of the use of Big Data for public good, where the insights — gathered through proprietary ML and NLP techniques — will be utilised by India’s youth that the NSDC engages with for skilling.

Watch this space for more updates on the MIIS, as well as other Big Data projects that the ISB has in the works.


References

  • Einav L and Levin J (2014) The data revolution and economic analysis. In: Lerner J and Stern S (eds) Innovation Policy and the Economy. vol.14. Chicago, IL: University of Chicago Press, pp.1–24. Eaton C, DeRoos D, Deutsch T, et al. (2012) Understanding Big Data Analytics for Enterprise Class Hadoop and Streaming Data. New York: The McGraw-Hill Companies.
  • Dong, L., Chen, S., Cheng, Y. et al. Measuring economic activity in China with mobile big data. EPJ Data Sci. 6, 29 (2017). https://doi.org/10.1140/epjds/s13688-017-0125-5
  • https://spaceknow.com
  • Daniell K, Morton A and Rios ID (2015) Policy analysis and policy analytics. Annals of Operations Research 236(1): 1–13. Shindelar S (2014) Big data and the government agency. Public Manager 43: 52–56.
  • Joseph R and Johnson N (2013) Big data and transformational government. IT Pro. pp.4348.
  • Manzoni J (2018) Civil service transformation speech. London. 24 January 2018. Available at: www.gov.uk/government/speeches/civil-service-transformation-speech.
  • Cleary D (2011) Predictive analytics in the public sector: Using data mining to assist better target selection for audit. Electronic Journal of e-Government 9(2): 132–140.
  • Mergel I (2016) Big data in public affairs education. Journal of Public Affairs Education 22(6): S231–S248.
  • Daniell K, Morton A and Rios ID (2015) Policy analysis and policy analytics. Annals of Operations Research 236(1): 1–13.
  • Pencheva, I., Esteve, M., & Mikhaylov, S. J. (2020). Big Data and AI–A transformational shift for government: So, what next for research?. Public Policy and Administration35(1), 24-44.
  • Telecom Regulatory Authority of India . The Indian Telecom Services Performance Indicators, April – June, 2021:
  • https://www.trai.gov.in/sites/default/files/PIR_21102021_0.pdf. Highlights of Telecom Subscription Data as on 31st January, 2019:
  • https://web.archive.org/web/20190328164926/https://main.trai.gov.in/.../PR_No.22of2019_0.pdf
  • Pencheva, I., Esteve, M., & Mikhaylov, S. J. (2020). Big Data and AI–A transformational shift for government: So, what next for research?. Public Policy and Administration35(1), 24-44.
  • Pencheva, I., Esteve, M., & Mikhaylov, S. J. (2020). Big Data and AI–A transformational shift for government: So, what next for research?. Public Policy and Administration35(1), 24-44.
  • Shmueli, G., & Koppius, O. R. (2011). Predictive analytics in information systems research. MIS quarterly, 553-572.
  • Anticipatory government: Preempting problems through predictive analytics. Deloitte Insights.
  • Data from the US Department of Justice
  • Stough R and McBride D (2014) Big data and U.S. public policy. Review of Policy Research 31(4): 339–342.
  • Pencheva, I., Esteve, M., & Mikhaylov, S. J. (2020). Big Data and AI–A transformational shift for government: So, what next for research?. Public Policy and Administration35(1), 24-44.
  • Sims H and Sossei S (2012) Federal government primed for data analytics. The Journal of Government Financial Management 61: 34–37.
  • Pencheva, I., Esteve, M., & Mikhaylov, S. J. (2020). Big Data and AI–A transformational shift for government: So, what next for research?. Public Policy and Administration35(1), 24-44.
  • Wirtz B, Piehler R, Thomas M, et al. (2015) Resistance of public personnel to open government: A cognitive theory view of implementation barriers towards open government data. Public Management Review 18(9): 1335–1364.
  • Mayer-Schoenberger V and Cukier K (2013) Big Data – A Revolution that will Transform How We Live, Work, and Think. Boston: Eamon Dolan Book.
  • Firefighting, Running and Beyond: Big Data Success Stories8 Big Data success stories you (probably) didn’t know aboutHow Big Data Helps To Tackle The No 1 Cause Of Accidental Death In the U.S.