Libraries and environments¶
Here is a list of all the Python libraries needed to replicate this tutorial. You will need to uncomment these lines of code and run them if they are not already installed.
#
# Install Python libraries if necessary
# ----------------------------------------
#%pip install geopandas pandas matplotlib seaborn pygris
#%pip install "folium>=0.12" matplotlib mapclassify
1. Import Python libraries¶
Now that the appropriate libraries are installed, we need to import them for the tutorial.
#
# Import libraries
# ----------------------------------------
import pandas as pd
import geopandas as gpd
from shapely.geometry import Point
2. Get census tract and census place data¶
I prefer to import shapefiles directly from an API so that my work is reproducible and so that I do not have to manually download something. The easiest way to get Census shapefiles is to directly import them from the Census API. There are many libraries to do this; however my favorite is to use the pygris
library. I prefer this library to others since it is relatively simple, the library is stable, and does not require a Census API key.
I pull all the shapefiles for all tracts and places in California so that we do not have to worry about whether the spatial extents of the tracts or places shapefiles are larger. If the place extent is larger than the tracts, then we could miss some tracts that are actually located within San Francisco proper when we do the spatial join.
#
# Pull shapefiles
# ----------------------------------------
# tracts
from pygris import tracts
ca_tracts = tracts(state = "CA", cb = True, year=2022, cache=True) # cb = true calls cartographic boundary files that are simplified and load/process faster
# places
from pygris import places
ca_places = places(state = "CA", cb = True, year=2022, cache=True) # cache set to true makes it easier to load if we call again
Using FIPS code '06' for input 'CA' Using FIPS code '06' for input 'CA'
3. View tracts shapefile¶
Let's look at just the tracts shapefile to see what we have. Remember, we pulled in all tracts for the entire state of California so that when we conduct the spatial join of tracts to place, we won't be missing any tracts if, for some random reason, the place file for San Francisco is larger than than the spatial extent of San Francisco tracts. This is a conservative approach and you will likely get the same result if you were to subset the tracts for only San Francisco. Nevertheless, I've included them here.
I use .explore
here to view the spatial object interactively. Since we have all the tracts for Califorinia, I use a location
parameter in the call to initialize the interactive features on San Francisco, which I can adjust with the zoom_start
parameter. Larger numbers zoom in.
#
# View tracts
# ----------------------------------------
# Coordinates for San Francisco (approximate center)
sf_coords = [37.7749, -122.4194]
# Plot interactive map, zoomed into San Francisco
ca_tracts.explore(
color = "blue", # set color to blue
alpha=0.4, # make face color somewhat transparent
location=sf_coords, # location of
zoom_start=11 # adjust zoom level as needed
)