US Census Data API¶
M. Fawcett - 04/15/2021
Why census data?¶
The range of data available from the US Census Bureau is simply astounding, and it is all free. You just need to know how to find it.
It's not just about population counts. It's also data about the kinds of business, industry and agriculture that we have, the products they make and their value.
Social characteristics like poverty, income, education, vehicle ownership and a thousand other things are counted across a dozen geographical unit sizes, from the nation as a whole, down to the state and county level and all the way down to the neighborhood level.
For the research I am involved in, we are often interested in the relationship between neighborhood characteristics and the health care outcomes for the people who live there, and for this, the Census Bureau has been an invaluable source of information.
How to gather census data¶
There are two main ways to obtain census data. One way is to go to data.census.gov and work through their Web site to generate an Excel or CSV file that you then download. The other way is to use the Census API and a programming language to generate and download results directly.
For the occasional data need, the first method is probably preferable. You will likely need to do some manual tweaking of the Excel file, but it will often be a lot easier and faster than writing a computer program.
But, if you are after many variables, or need to get results for multiple geographic areas for multiple years, the API approach is what you want to use.
Basic code concepts¶
Use of the API is too big to cover in it's entirity in one posting, so I'll just introduce the basic approach. There is much more to discuss, and I'll try to do that in other posts.
The following requires a basic understanding of Python, and I will assume you have that.
A request is structured as an object containing a "base URL" and a list of parameters, called "predicates". The base URL tells the system which data table you are interested in. The predicates tell the system what variables to retrieve from the the table and for what geographical areas to return data.
The request object is then passed using the "get" method and the API returns a "response" object containing the data. See below for an example.
### The following returns the populations of every state and Puerto Rico from the 2000 Census.
import requests # to make http requests for data using census web API
import ast # for converting a string representing a list, into an actual list.
import pandas as pd # for dataframe manipulation
# Components of the base URL go into variables.
HOST = "https://api.census.gov/data"
year = "2010"
dataset = "dec/sf1"
# Code to join the base url components
base_url = "/".join([HOST, year, dataset])
# Form the predicates dictionary
predicates = {} # initialize empty dictionary
get_vars = ["NAME", "P001001"] # Want the name of the State and the total population.
predicates["get"] = ",".join(get_vars)
predicates["for"] = "state:*" # States are the geography to return data for. "*" means all states.
# Make the request. Results will go into the variable "myresponse".
myresponse = requests.get(base_url, params = predicates)
# Show the base URL that specifies what table to use and the predicates for selecting specific data from it.
print("Base URL:", base_url, "\n", "Predicates:",predicates)
Base URL: https://api.census.gov/data/2010/dec/sf1 Predicates: {'get': 'NAME,P001001', 'for': 'state:*'}
# Display the fully formed request string that was sent to the Census
print(myresponse.url)
# If you copy and paste the following https string into a browser address field and press Enter, you will get
# the populations of all the states.
https://api.census.gov/data/2010/dec/sf1?get=NAME%2CP001001&for=state%3A%2A
# You can also see the results by printing the "text" attribute of the
# response object. It is a list of lists, but it is contained within a single string object.
print(myresponse.text)
[["NAME","P001001","state"], ["Alabama","4779736","01"], ["Alaska","710231","02"], ["Arizona","6392017","04"], ["Arkansas","2915918","05"], ["California","37253956","06"], ["Louisiana","4533372","22"], ["Kentucky","4339367","21"], ["Colorado","5029196","08"], ["Connecticut","3574097","09"], ["Delaware","897934","10"], ["District of Columbia","601723","11"], ["Florida","18801310","12"], ["Georgia","9687653","13"], ["Hawaii","1360301","15"], ["Idaho","1567582","16"], ["Illinois","12830632","17"], ["Indiana","6483802","18"], ["Iowa","3046355","19"], ["Kansas","2853118","20"], ["Maine","1328361","23"], ["Maryland","5773552","24"], ["Massachusetts","6547629","25"], ["Michigan","9883640","26"], ["Minnesota","5303925","27"], ["Mississippi","2967297","28"], ["Missouri","5988927","29"], ["Montana","989415","30"], ["Nebraska","1826341","31"], ["Nevada","2700551","32"], ["New Hampshire","1316470","33"], ["New Jersey","8791894","34"], ["New Mexico","2059179","35"], ["New York","19378102","36"], ["North Carolina","9535483","37"], ["North Dakota","672591","38"], ["Ohio","11536504","39"], ["Oklahoma","3751351","40"], ["Oregon","3831074","41"], ["Pennsylvania","12702379","42"], ["Rhode Island","1052567","44"], ["South Carolina","4625364","45"], ["South Dakota","814180","46"], ["Tennessee","6346105","47"], ["Texas","25145561","48"], ["Utah","2763885","49"], ["Vermont","625741","50"], ["Virginia","8001024","51"], ["Washington","6724540","53"], ["West Virginia","1852994","54"], ["Wisconsin","5686986","55"], ["Wyoming","563626","56"], ["Puerto Rico","3725789","72"]]
# Convert the string using 'ast' into an actual list so the first sub-list can be extracted. It has the column names
data_list = ast.literal_eval(myresponse.text)
# Use pop to extract the first sub-list which contains the column names
data_list.pop(0)
column_names = ['State','Population','StateCode']
# Create a dataframe with the named columns
df = pd.DataFrame(data_list, columns=column_names)
# cite: https://www.geeksforgeeks.org/python-removing-first-element-of-list/
df
State | Population | StateCode | |
---|---|---|---|
0 | Alabama | 4779736 | 01 |
1 | Alaska | 710231 | 02 |
2 | Arizona | 6392017 | 04 |
3 | Arkansas | 2915918 | 05 |
4 | California | 37253956 | 06 |
5 | Louisiana | 4533372 | 22 |
6 | Kentucky | 4339367 | 21 |
7 | Colorado | 5029196 | 08 |
8 | Connecticut | 3574097 | 09 |
9 | Delaware | 897934 | 10 |
10 | District of Columbia | 601723 | 11 |
11 | Florida | 18801310 | 12 |
12 | Georgia | 9687653 | 13 |
13 | Hawaii | 1360301 | 15 |
14 | Idaho | 1567582 | 16 |
15 | Illinois | 12830632 | 17 |
16 | Indiana | 6483802 | 18 |
17 | Iowa | 3046355 | 19 |
18 | Kansas | 2853118 | 20 |
19 | Maine | 1328361 | 23 |
20 | Maryland | 5773552 | 24 |
21 | Massachusetts | 6547629 | 25 |
22 | Michigan | 9883640 | 26 |
23 | Minnesota | 5303925 | 27 |
24 | Mississippi | 2967297 | 28 |
25 | Missouri | 5988927 | 29 |
26 | Montana | 989415 | 30 |
27 | Nebraska | 1826341 | 31 |
28 | Nevada | 2700551 | 32 |
29 | New Hampshire | 1316470 | 33 |
30 | New Jersey | 8791894 | 34 |
31 | New Mexico | 2059179 | 35 |
32 | New York | 19378102 | 36 |
33 | North Carolina | 9535483 | 37 |
34 | North Dakota | 672591 | 38 |
35 | Ohio | 11536504 | 39 |
36 | Oklahoma | 3751351 | 40 |
37 | Oregon | 3831074 | 41 |
38 | Pennsylvania | 12702379 | 42 |
39 | Rhode Island | 1052567 | 44 |
40 | South Carolina | 4625364 | 45 |
41 | South Dakota | 814180 | 46 |
42 | Tennessee | 6346105 | 47 |
43 | Texas | 25145561 | 48 |
44 | Utah | 2763885 | 49 |
45 | Vermont | 625741 | 50 |
46 | Virginia | 8001024 | 51 |
47 | Washington | 6724540 | 53 |
48 | West Virginia | 1852994 | 54 |
49 | Wisconsin | 5686986 | 55 |
50 | Wyoming | 563626 | 56 |
51 | Puerto Rico | 3725789 | 72 |