icon

FCC BDC Map

Build an e2e data pipeline for ~10GB of broadband data

This covers my process and the idea behind getting broadband availability data as reported to the Federal Communications Commission (FCC).

Background

The most recent data was created as part of the new mapping initiative.

Now this data and process in itself is not problematic. The issue is that the data is not available in a format that is easily consumable. The data is available in a series of CSV files that are not easily consumable. The data is also unavailable in a format easily consumable by the public.

There has been extensive discussion about the accuracy and completeness of the data. The FCC has been working to improve the data and make it more accurate and complete. The FCC has also been working to make the data more accessible to the public.

But, many industry pundits and experts have been critical of the FCC’s efforts. They argue that the data is not accurate and that the FCC is not doing enough to make the data more accessible to the public. ref

I’m more shocked that there are many categories of use cases that the map and data are not quite set up for.

  • “As a member of the community I want to know how my city is doing in terms of broadband access so that I can advocate for better access for my community.”

  • “As a business owner I want to know what broadband options are available in my area so that I can make informed decisions about my business.”

  • “As a voter I want to know how effective my elected officials are in improving broadband access in my area so that I can make informed decisions at the voting booth.”

  • “As a city worker I want to know which areas of my city are underserved in terms of broadband access so that I can prioritize infrastructure improvements.”

and the list goes on.

Now let’s insert $42.45 billion…

Yes, that’s 42 followed by one, two, three … nine zeros… billion dollars. This is the amount of money that the FCC is spending to improve broadband access in the United States. The FCC is spending this money to improve broadband access in rural areas. The FCC is spending this money to improve broadband access in urban areas. The FCC is spending this money to improve broadband access in suburban areas. The FCC is spending this money to improve broadband access in tribal areas. The FCC is spending this money to improve broadband access in territories. The FCC is spending this money to improve broadband access in schools and libraries. The FCC is spending this money to improve broadband access in hospitals and clinics. The FCC is spending this money to improve broadband access in public safety agencies. The FCC is spending this money to improve broadband access in low-income households. The FCC is spending this money to improve broadband access in households with children. The FCC is spending this money to improve broadband access in households with seniors. The FCC is spending this money to improve broadband access in households with disabilities. The FCC is …

If only there were a good map of where this money should be spent!

Wait a minute, hold my beer…

Ok enough of the history lesson, let’s get to the demo.

Demo

Map of underserved areas (Red)

Instructions: The map shows two colors based on the definition of underserved. By using the menu on the bottom left you can change the definition of underserved. The map will update when you click “Update”.

Any census block with max speed below the “Underserved (Download/Upload):” input will be shown in red. The default is 100/20 Mbps.

It is also important to check what technologies are available. You can limit the map to only show areas that have a specific technology available. The default is the technologies in 10,11,12,20,30,40,41,42,43,50 from the FCC definition.

Lastly, you can limit the ISPs shown on the map. The default is all ISPs. But you can pick any and add their name to the “Providers:” input. The map will update when you click “Update”.

At any time you can click on a block and show the data.

How to

The data is available in a series of CSV files that are not easily consumable. The data is also unavailable in a format easily consumable by the public.

But it’s behind a nice API, so we can get it in a format that is easily consumable by the public. Who knows how to make API requests?

Goal: Make this easier

We have the technology! We can make this easier.

Now this combination of technology choices is perhaps not the most efficient, but it is the most reproducible. And that is the key to this project. We want to be able to reproduce the data and the process. We want to be able to reproduce the data and the process in a way that is easy to understand and easy to use.

Oh and best of all its FREE!

We are going to use Google Colab, DuckDB, Fused, and Mapbox.

The data will be ETL’d from CSV and ESRI Shapefile into GeoParquet.

Is this the best production-grade data pipeline… no, but it’ll do.

To be honest the workflow wasn’t this clean until I had to make it reproducible and reran it a few times. But looking backward the process can be separated into four distinct steps.

  1. Get raw data - this step will vary for every project, but its good to be familiar with web scraping tools, Linux command line like wget, curl, and python libraries like requests, beautiful soup, and selenium.

  2. Process data - maybe more or less involved for each data, but here we do a join to make the csv data spatial and then use the DuckDB spatial extension to create a GeoParquet file.

  3. Fuse data - this is where we use Fused to create partitioned geoparquet that can be added to a udf

  4. Map & Visualize - create a UDF that lets you explore the data and serve it as a URL. Add that to a Mapbox JS map and you have a nice one-page tool (As shown in the demo above).

Conclusion

In conclusion, it is now possible to make serverless full-stack one-page apps that can be shared with the world. This is a powerful tool for data scientists and data engineers. It allows them to quickly and easily create data products that can be shared with the world. Organizations looking to make data-driven decisions can rapidly go from data to app. It allows them to quickly and easily create data products that can be shared with the world.

If you are a company or start-up looking to build a data or analytics product, we can help you. We have the expertise and experience to help you build easy-to-use and share applications with the world.

Contact us at info@pozibl.com to learn more about how we can help you build this solution for your organization.