Spatial joins in Python
Spatial joins are the backbone of data management in spatial analysis. A spatial join is a GIS operation that combines two datasets based on their geographic relationships rather than attribute values. Instead of matching rows by a common ID, as in a typical database join, spatial joins link features based on location—such as points within polygons, nearest neighbors, or overlapping areas. This allows you to join one dataset with attributes from another based on their spatial relationship, which is essential for tasks like aggregating census data by neighborhoods or linking crime incidents to police precincts.
In this tutorial, I will spatially join two polygons—one for census tracts and one for places—to illustrate how a spatial join can combine data from two different geographic layers. While spatial joins can handle joining polygons and points, joining polygons to polygons presents a unique challenge because the polygons in one layer often do not perfectly overlap with those in another. Therefore, we need a method to determine which features to include. Here, we’ll join all San Francisco census tracts1 whose centroids (the geometric center of the polygon) fall within the San Francisco place2 polygon. This ensures that all census tracts within the Census-designated place of San Francisco are captured in our join.
Download the Notebook
If you want to run the code locally, you can download the original Jupyter notebook: 📥 Download spatial joins script
Footnotes
-
Census tracts are small geographies designed by the US Census to calculate demographic statistics and are designed to contain on average between 4,000 and 5,000 people. You can read more about them on the US Census’ website. ↩
-
Places are another geographies designed by the US Census. More information can be found on the US Census’ website. ↩