How does data availability compare across countries? The World Bank is a major source of country-level economic, financial and socioeconomic data. In this post, we summarise the number of data series available from the World Bank for different countries as a proxy for macroeconomic and social data availability.
Using the World Bank API, we calculate that there are 21 476 indicators in total across countries available from the World Bank. To give a sense of the scale of the data available from the World Bank, the full dataset has around 280 million rows in long format (i.e. one column that labels all the indicators, and another column that labels the countries or cities).
Based on the data we downloaded, South Africa has around 13 319 out of the available indicators, or 62% overall coverage. This is higher than for the median across all countries of around 8410 (around 39% coverage), compared to around 8708 for the median among G20 economies (48% when restricting the number 18139 of available indicators to the G20).
The chart below suggests that South Africa compares favourably to wealthy countries and other emerging markets in terms of the volume of indicators available.

However, South Africa’s indicators are available on a less timely basis than for many major economies. So, while South Africa does not lag on data volumes, this data source suggests that policymakers and firms in South Africa have to make decisions with less up-to-date information than in peer economies. One would also expect an inverse relationship between volume of data and the average last update of data series, as series on the fringes of availability are less feasible to maintain. However, South Africa has a similar volume compared to other G20 countries with less timely data.

Because some indicators may be discontinued, we also plot the median latest year available in the following chart. This raises the average year by an interval of about one year, with most countries having a median year of availability from all their data from 2017–2019 at present. The majority of indicators are released on an annual basis, which is why it looks discrete.

Footnote
To give a sense of the scale of the dataset, the full dataset of 280 million rows takes up around 24 GiB when loaded in memory (although only 343 MiB in a compressed format, on the hard drive).
To aggregate statistics on data comprehensiveness, we first included cities in their country groups, then summed up the number of indicators for each country. We also filtered the data to drop missing values, and only contain the latest date per observation. We calculated the mean and median latest date per country, imputing today’s date where forecasts were present in the data. This filtered dataset is only 1.7 million rows, and it would take up 305 MiB in memory, and 15.4 MiB in a compressed format on the hard drive.
Compiled by Aidan Horn