Spatial joins in Python | Kasey Zapatka, PhD

Spatial joins are the backbone of data management in spatial analysis. A spatial join is a GIS operation that combines two datasets based on their geographic relationships rather than attribute values. Instead of matching rows by a common ID, as in a typical database join, spatial joins link features based on location—such as points within polygons, nearest neighbors, or overlapping areas. This allows you to join one dataset with attributes from another based on their spatial relationship, which is essential for tasks like aggregating census data by neighborhoods or linking crime incidents to police precincts.

In this tutorial, I will spatially join two polygons—one for census tracts and one for places—to illustrate how a spatial join can combine data from two different geographic layers. While spatial joins can handle joining polygons and points, joining polygons to polygons presents a unique challenge because the polygons in one layer often do not perfectly overlap with those in another. Therefore, we need a method to determine which features to include. Here, we’ll join all San Francisco census tracts¹ whose centroids (the geometric center of the polygon) fall within the San Francisco place² polygon. This ensures that all census tracts within the Census-designated place of San Francisco are captured in our join.

Download the Notebook

If you want to run the code locally, you can download the original Jupyter notebook: 📥 Download spatial joins script

Footnotes

Census tracts are small geographies designed by the US Census to calculate demographic statistics and are designed to contain on average between 4,000 and 5,000 people. You can read more about them on the US Census’ website. ↩
Places are another geographies designed by the US Census. More information can be found on the US Census’ website. ↩