Data & Development: A Journey Without Maps

Written by Ryan McGuine //

The world is awash in data like never before, which is a good thing for global development — there are increasing returns to both more information, and better linkages across information types and sources. Indeed, the plight of the world’s poorest has improved considerably in many ways recently, partly because there is plentiful data available to inform evidence-based policymaking. Despite considerable progress, though, it seems likely that a lot of the value associated with plentiful data is related to the ability to ask better questions, rather than the ability to make better prescriptions. Much of that value has yet to be realized.

Data is used by a range of actors for a range of purposes in global development. Policymakers use statistics compiled by think tanks and government bodies to inform laws and regulations, and citizens use identification information to receive social services including healthcare and education, as well as to exercise their rights to vote and to live in their home country. Additionally, multilateral institutions and civil society organizations use survey data to hold governments accountable for policy choices, and private companies use user data to provide services such as digital credit or virtual health consultations. 

One data-related problem that development practitioners face is that there is often not very much of it. According to the World Bank, around one billion people lack official proof of identity worldwide. Data quantity problems are particularly salient in poor countries — in sub-Saharan Africa for instance, one in two people cannot prove their identity, some 30-40% of economic growth during the last decade is due to GDP rebasing alone, and there are too few statistics to track progress toward 60% of the indicators for the Sustainable Development Goals (SDGs) — but Britain recently wrongfully deported dozens, and otherwise mistreated many more, because it bungled citizens’ documentation. Missing data create research blind spots, where data from the informal economy (transactions not recorded in external trade accounts, including everything from rural construction, subsistence farming, and small businesses, which can make up 50-70% of total production in poor countries) is left out of analyses. In turn, the lack of data also determines the narratives of entire continents. Countries with too little data are routinely left out of cross-country growth regressions, implicitly suggesting that a country’s inability to conduct household surveys has no relation to other correlates of slow growth, and that the experience of the West is the only one relevant to the process of growth.

There is also an issue of data quality. There are well-documented examples of both GDP and census data being grossly wrong, which then get used to create measures like debt ratio, vaccination ratio, and doctors or teachers per capita, magnifying the error. At some point, such large errors make conducting analyses a confounding exercise. For example, if an NGO or think tank writes a report identifying how much investment is needed in Bolivia’s agricultural sector to move from from a middle- to high-income country, it might be the case that merely correcting an accounting error would make a larger difference than meeting the investment target. And those are for fairly “objective” statistics — things get even more complicated when dealing with more subjective metrics, like perceived quality of governance. Low quality data can create false negatives, which occur when there is no measurable effect of some variable on another, or false positives, which occur when some variable has a measurable effect on another. In turn, these often get reported as there being no effect, or causing some effect, as in the case of foreign aid and economic growth, or corruption and economic growth.

Much of the data quantity problem has to do with low capacity within statistical offices of poor countries, something which is correlated with both levels of income (more money makes better statistics easier) and political priority (government support for reliable statistics improves things, even in low-income countries). Whereas the Norwegian education minister could determine exactly how many kids were at school on a given day by simply checking electronic attendance records, it might take the Guatemalan counterpart three years and $20m in donor aid to know how many kids were at school on one day in 2020, assuming all participants answered truthfully. Meanwhile, the data quality problem is partly due to the fact that poor countries are often autocracies, which are notorious for manipulating official data. It is also partly due to the fact that many things researchers care about cannot be cleanly made into metrics. For instance, two people from different countries might have both graduated from secondary school, but that does not mean that their educations are identical. Other data, like those related to health, tend to be more straightforward — someone either died before their fifth birthday, or they did not. But statistical capacity plays a role here as well, because even binary-type information in poor countries can be misleading: health data might be collected by an illegible hand, or it might never reach the capital. 

Data collected by governments, such as national accounts, household and firm surveys, and administrative information like birth records, pensions, and censuses, are typically costly and infrequent. In response, there has been an explosion in privately-collected data from sources like mobile phones and satellites being used in development. Compared to government data, these tend to be cheaper to collect, more frequent, and at finer levels of granularity, enabling meaningful inferences to be made about small sub-groups of interest. Yet, it would be wrong to conclude that they are a silver bullet for data collection. While new forms of data help solve the academic problem of improved precision, data has both academic and political dimensions — a 1963 census conducted in independent Nigeria found a population roughly twice that of a 1953 one conducted in colonial Nigeria because the citizens knew that the latter would be used to create a tax record, while the former would be used to allocate assembly seats and infrastructure funding. Countries need to be able to solve resource planning issues like how many people are expected to vote in different precincts, and need to be able to solve them with legitimacy. Comprehensive data compiled by governments are usually more trusted by the general public, so they remain important as a foundation for policymaking, but new data should be used to add value by filling in the gaps.

Even if there was plentiful, high-quality data accessible to those who needed it, there would remain concerns about its use. One such concern has to do with ethical questions that arise. For example, many countries are experimenting with digital initiatives that involve governments collecting more data about citizens, such as India’s biometric database or Estonia’s digital identities. As programs like these expand, many citizens are worried about negative consequences at the hands of their government if they register — Tutsis were easily targeted during the Rwandan genocide because ID papers identified ethnicity, and today Kenya uses identification to discriminate against Nubians, Myanmar does so against Rohingya, and America came close to doing so during its latest census. This dissuades people from being counted, and risks creating a class of officially-nonexistent citizens. There are also significant concerns surrounding privacy, including identity theft by individuals, anti-competitive practices by private businesses, and mass surveillance by governments like China is currently using against Uyghurs. 

Further, it is clearly not the case that widespread data is necessary for economic growth to occur — after remaining stagnant for centuries, human well-being shot up during the Industrial Revolution with just a tiny fraction of today’s statistics, and former Hong Kong Financial Secretary John Cowperthwaite even refused to compile official economic data to avoid the temptation of intervention. Sir Cowperthwaite was onto something important: data can be outright misleading. Global development data can be sparse, so researchers often use proxies to measure desired phenomena (e.g. number of elections held as a stand-in for quality of institutions, or number of land reforms undertaken as a stand-in for pro-poor land reforms). Proxy metrics are useful, but when interpreting results, there is a danger of forgetting that they are not what actually matters. Over time, this can cause rewards and punishments to be designed in a way that incentivizes actors to game the system into looking good, rather than changing their underlying behavior. This is referred to as Goodhart’s Law, which states that “once a measure becomes a target, it ceases to be a good measure.” In development, there are examples of countries under-reporting GNI and over-reporting progress toward targets to receive more foreign aid.

Finally, the widespread collection and use of data tends to flatten the world and distort research. Africa, Latin America, and Southeast Asia are often painted as regions where growth has failed to keep up with the rest of the world. While that is true on average, the narrative ignores numerous periods of impressive growth in certain regions, and leads to explanations of lack of development as the lack of some other element, such as geography, history, or institutions. However, these elements could well be consequences of low growth, rather than the causes of low growth, and quantifying something like institutions so that it can be used statistically is a wildly imprecise task. A rich literature has developed around quantifying how large a role each of these elements play in the income gap between different places, but determining what accounts for the income gap between Namibia and Germany probably says less about the nature of growth than explaining why GDP per capita in Namibia quintupled between 1960 and 1980, remained stagnant for two decades, and tripled since 2000. Whereas thinking in terms of sparking initial growth focuses researchers on finding what is missing, thinking in terms of transitioning from recurring growth to consistent growth focuses researchers on how best to use what is available. 

While encouraging researchers to focus on better questions is not concrete, there are plenty of things that can be implemented to bring about such a change. One such thing is a broad donor coalition aimed at improving capacity within foreign statistical offices, rather than dumping money on them to conduct projects that answer questions donors want answered. When donors do need the help of statistical agencies abroad, they could coordinate studies so that governments are not repeatedly collecting the same data. Additionally, countries should advocate for the independence of statistical offices. Statistical offices are often not legally and financially independent, which means that they are constantly chasing funding from other government agencies and donors for one-off projects, and are not able to set agendas that they deem important. Finally, Western universities and think tanks should work on forming partnerships with foreign ones in which all partners play an equal role in setting research agendas. They often face a smaller pool of academics and practitioners to draw from, and independence is rare because project-based funding is widely-available, but based on the objectives of foreigners.

While the latest trend in development economics is away from broad-stroke methods like cross-country regressions, and toward precise, small-scale randomized control trials, it remains true that some of the most important questions in poverty reduction are related to how countries can sustain economic growth and build resilience to shocks. At the end of the day, economics as a whole probably overemphasizes the ability of governments to affect the world around them, and few policies are as important as the condition of world markets. The last decade has been one of rapid growth for low- and middle-income countries, so the current moment offers development researchers the opportunity to draw meaningful lessons on how to better use the fruits of growth to dampen the impact of unexpected shocks. In doing so, studying economies rather than economics — that is, the history of state capacity, democratization, and politics in specific countries, rather than what is missing in poor countries compared to rich ones  — would provide the most meaningful results.

2 thoughts on “Data & Development: A Journey Without Maps

  1. Extremely well-written article, have shared with some of my Economist friends. I enjoyed your reference to Goodhart’s Law as I see this come to fruition in my daily life in the business realm. Some of the biggest takeaways I got from your post were the need to de-align incentives for specific groups from our data collection processes and to think from the perspective of developing countries, rather than from a solidified Western mindset of how growth should work. Definitely a great “systems-thinking” read and happy I had the opportunity to learn!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s