BY : Shan-Chun Niki YANG
Hello, in this short article, I am going to show you how I use Python to scrap IMDB for a list of their top 250 movies. In the end, the list will be stored in a CSV file with the following information:
Import the necessary libraries.
requests
is for making HTTP requests in Python.
BeautifulSoup
is for pulling data out of HTML and XML files, which is useful for web scraping.
pandas
is for data manipulation and analysis. We need it to create a data structure.
import requests
from bs4 import BeautifulSoup
import pandas as pd
Set the top 250 rated movies on IMDb to be the URL link.
Use the "Accept-Language" header to specify that the response should be in US English.
Create a variable named request_result to store the response from the GET request.
Parse the HTML content of the response using BeautifulSoup.
url = '[<https://www.imdb.com/chart/top/?ref_=nv_mv_250>](<https://www.imdb.com/chart/top/?ref_=nv_mv_250>)'
headers = {"Accept-Language": "en-US,en;q=0.5"}
request_result = requests.get(url, headers= headers)
soup = BeautifulSoup(request_result.content, "html.parser")