How to Scrape Websites with Pandas

2021-01-06

Intro

Working with tabular data is one of, if not the, main goal in Pandas. There are many sources of tabular data. One common source is a wikipedia page. Pandas allows us to pull table directly from websites, like a wikipedia page, and store them in a dataframe. In this article, we will see how to scrape a table from a webpage using pandas.

Scrapping a Table from a Web Page

To scrape webpages with pandas, we will need a few libraries. We can install them using pip.

pip install lxml html5lib beautifulsoup4

Now that we have those modules installed, we can scrape table data using the read_html method wand supply the method a url. The method will return a list of dataframes for each table on the page.

import pandas as pd

url = "https://en.m.wikipedia.org/wiki/List_of_Bob's_Burgers_episodes"
dataframes = pd.read_html(url)

print(dataframes[0].head())
GoTea - KoalaTea