How to Scrape Websites with Pandas

01.06.2021

Intro

Working with tabular data is one of, if not the, main goal in Pandas. There are many sources of tabular data. One common source is a wikipedia page. Pandas allows us to pull table directly from websites, like a wikipedia page, and store them in a dataframe. In this article, we will see how to scrape a table from a webpage using pandas.

Scrapping a Table from a Web Page

To scrape webpages with pandas, we will need a few libraries. We can install them using pip.

pip install lxml html5lib beautifulsoup4

Now that we have those modules installed, we can scrape table data using the read_html method wand supply the method a url. The method will return a list of dataframes for each table on the page.

import pandas as pd

url = "https://en.m.wikipedia.org/wiki/List_of_Bob's_Burgers_episodes"
dataframes = pd.read_html(url)

print(dataframes[0].head())