Working with tabular data is one of, if not the, main goal in Pandas. There are many sources of tabular data. One common source is a wikipedia page. Pandas allows us to pull table directly from websites, like a wikipedia page, and store them in a dataframe. In this article, we will see how to scrape a table from a webpage using pandas.
To scrape webpages with pandas, we will need a few libraries. We can install them using pip.
pip install lxml html5lib beautifulsoup4
Now that we have those modules installed, we can scrape table data using the read_html
method wand supply the method a url. The method will return a list of dataframes for each table on the page.
import pandas as pd
url = "https://en.m.wikipedia.org/wiki/List_of_Bob's_Burgers_episodes"
dataframes = pd.read_html(url)
print(dataframes[0].head())