Scraping IMDb Top 250 Movies

BY : Shan-Chun Niki YANG

Hello, in this short article, I am going to show you how I use Python to scrap IMDB for a list of their top 250 movies. In the end, the list will be stored in a CSV file with the following information:

motiv title, rating, released year, user votes, and movie url

Let’s get started ! 🍿

Step 1 :

Import the necessary libraries.

requests is for making HTTP requests in Python.

BeautifulSoup is for pulling data out of HTML and XML files, which is useful for web scraping.

pandas is for data manipulation and analysis. We need it to create a data structure.

import requests
from bs4 import BeautifulSoup
import pandas as pd

Step 2 :

Set the top 250 rated movies on IMDb to be the URL link.

Use the "Accept-Language" header to specify that the response should be in US English.

Create a variable named request_result to store the response from the GET request.

Parse the HTML content of the response using BeautifulSoup.

url = '[<https://www.imdb.com/chart/top/?ref_=nv_mv_250>](<https://www.imdb.com/chart/top/?ref_=nv_mv_250>)'
headers = {"Accept-Language": "en-US,en;q=0.5"}
request_result = requests.get(url, headers= headers)
soup = BeautifulSoup(request_result.content, "html.parser")