The dataset Panel_Built-up.dta is a panel of cities, as defined in Bird and Brandily (2018).
The variables are given below, followed by a brief summary on how the dataset is compiled. The base data comes from the Global Human Settlement (GHS) built-up layers produced by the
European Commission Joint Research Centre. These layers depict built-up land at a precise resolution (a
38x38m grid) all over the globe at four specific points in time, 1975, 1990, 2000 and 2014.
1. id - unique id for each urban area, defined from the 2014 extent.
2. year - 1975, 1990, 2000 or 2014
3. countrycode - unique code for each country
4. countryname - unique name for each country
5. area_albers - area, in km2, of the extent of the urban area in 2014
6. size - area, in km2, of the built-up areas within the extent, in given year. Missing if <1km2
7. ciso - standardised country id
8. point_x - longitude of centroid of city
9. point_y - latitude of centroid of city
10. dis_bord - distance in km of nearest country border
11. dis_b_neigh - ciso of nearest neighbouring country
12. dis_b_co - countrycode of nearest neighbouring country
13. dis_coast - distance to coast in km
14. population within urban extent as defined by GHS population data. Note, this is not used in Bird & Brandily. Details of how these population numbers are constructed can be found on the GHS website, referenced below.
We use the GHS data grid at 250m x 250m and 38m x 38m resolutions. The built-up land is derived in the GHS datasets
from Landsat image collections (GLS1975, GLS1990, GLS2000, and ad-hoc Landsat 8 collection 2013/2014,
for the four years respectively). To the best of our knowledge, the GHS built-up layers are the only globally
consistent and complete dataset available that gives this level of detail on the location of built-cover going
back to as early as 1975. This allows us to provide a unique description of urbanization (here defined as the
growth of built-cover) of Africa over almost 40 years.
We define country borders using the Global Administrative Areas dataset GADM (v2) and coasts and lakes
using Natural Earth, a public domain map dataset that we use to clip coasts and lakes when necessary (scale:
We used GIS software to transform the raw data from the GHS built-up layers into a dataset of cities. We
proceed in four main steps. First, we use the built-cover data at a 250x250m resolution, restricting our
attention to cells which are at least 10% built in any given year. This means we look at the 38m x 38m pixels
within each larger grid cell, and when the number of these that are built-up surpasses 10% of the total area,
we keep the grid cell. Second, we aggregate all contiguous (i.e. touching along any edge or corner) grid cells
into unions; we define the outer edge of these unions as the boundaries of cities. Note that these unions also
include some unbuilt spaces (like parks, rivers, etc.) within the city extent as a result of our decision to look
for cells that are as little as 10% built-up. Third, we keep only unions that: a. are greater than 3 square
kilometers it total area, b. are actually covered by imagery for more than 95% of their extent and c. have
a total built-cover of more than 1 square kilometer within that 3 square kilometers - i.e., in total, a third
of the city must be built-up, but some individual cells within that area may be less built to capture urban
parks and similar spaces.
To construct the panel we define cities using the 2014 dataset and resultant boundaries.
We then focus on the past evolution of these areas over the 1975-2014 period, constructing
the urban areas that lie within the 2014 boundaries for each preceding year. If these are two distinct areas in
earlier decades, that have since merged into one agglomeration, we regard it as one city.
Finally, we remove from our analysis areas that emit no nightlights over the 2008-2012 period. We added this step
after realizing that the GHS raw data includes some false-positive errors in arid or mountainous
areas. Although this first step risks us wrongly removing some urban areas from our dataset, we ensure the
risk remains small by setting a relatively weak condition to keep an area in the dataset; any part of an entire
city needs to be lit just twice over the 5 year period for the city to remain in the sample.
Bird, J and Brandily, P; Urban Footprints: The size and growth of African cities, 1975-2014, Working Paper
Pesaresi, Martino; Ehrilch, Daniele; Florczyk, Aneta J.; Freire, Sergio; Julea, Andreea; Kemper, Thomas; Soille, Pierre; Syrris, Vasileios (2015): GHS built-up grid, derived from Landsat, multitemporal (1975, 1990, 2000, 2014). European Commission, Joint Research Centre (JRC) [Dataset] PID: http://data.europa.eu/89h/jrc-ghsl-ghs_built_ldsmt_globe_r2015b
Pesaresi, Martino; Ehrilch, Daniele; Florczyk, Aneta J.; Freire, Sergio; Julea, Andreea; Kemper, Thomas; Soille, Pierre; Syrris, Vasileios (2015): GHS built-up confidence grid, derived from Landsat, multitemporal (1975, 1990, 2000, 2014). European Commission, Joint Research Centre (JRC) [Dataset] PID: http://data.europa.eu/89h/jrc-ghsl-ghs_built_ldsmtcnfd_globe_r2015b
Martino; Ehrilch, Daniele; Florczyk, Aneta J.; Freire, Sergio; Julea, Andreea; Kemper, Thomas; Soille, Pierre; Syrris, Vasileios (2015): GHS built-up datamask grid derived from Landsat, multitemporal (1975, 1990, 2000, 2014). European Commission, Joint Research Centre (JRC) [Dataset] PID: http://data.europa.eu/89h/jrc-ghsl-ghs_built_ldsmtdm_globe_r2015b
European Commission, Joint Research Centre (JRC); Columbia University, Center for International Earth Science Information Network - CIESIN (2015): GHS population grid, derived from GPW4, multitemporal (1975, 1990, 2000, 2015). European Commission, Joint Research Centre (JRC) [Dataset] PID: http://data.europa.eu/89h/jrc-ghsl-ghs_pop_gpw4_globe_r2015a