With Python we can easily Scrape data from Website into Excel using various popular libraries in Python like Beautifulsoupe and Selenium. In this tutorial we will be using Selenium module in Python to Scrap data from Website into Excel. We will be going step by step to learn how to Scrape data from Website into Excel.
While starting our basic topic of How to Scrape data from Website into Excel, we will first get into some brief introduction of the basics of Website scraping.
Why actually we require to Scrape data from Website into Excel
There are several reasons why we require to scrap data from website into Excel can be beneficial. Listed below are some of the benefits to Scrape data from Website into Excel.
Data Aggregation:
Websites often contain valuable information spread across multiple pages or sources. By scraping the data into Excel, you can aggregate and consolidate relevant data in one place, making it easier to analyze and manipulate.
Automation:
Manually copying and pasting data from websites can be time-consuming and prone to errors. Web scraping allows you to automate the process, saving you time and effort. You can write a script or use a scraping tool to extract data from multiple web pages and have it automatically populate an Excel spreadsheet.
Data Analysis and Visualization:
Excel provides a wide range of tools for data analysis and visualization. By scraping data into Excel, you can take advantage of these tools to perform calculations, create charts, and gain insights from the extracted data.
Reporting and Decision Making:
Scraping data into Excel allows you to create customized reports and presentations based on the extracted information. You can use Excel’s formatting options, pivot tables, and formulas to structure the data in a meaningful way, enabling better decision-making.
Research and Monitoring:
Scraping data from websites can be useful for research purposes, such as collecting data for market analysis, competitor tracking, or sentiment analysis. By regularly scraping and updating the data in Excel, you can monitor changes, trends, or updates on specific websites or web pages.
Data Integration:
Excel serves as a common platform for data integration. By scraping data from websites and importing it into Excel, you can combine it with other data sources, such as internal databases or external APIs. This integration allows for comprehensive data analysis and cross-referencing.
Customization and Manipulation:
Once the data is in Excel, you have full control over how you manipulate and customize it. You can apply filters, sort and rearrange the data, perform calculations, and create custom formulas or macros to suit your specific requirements.
While web scraping should be done ethically and in compliance with website terms of service and legal considerations, extracting data into Excel provides a convenient way to gather, analyze, and utilize information from the web for various purposes.
With all the information on why actually we require to Scrape data from Website into Excel let’s start the step by step of doing so using Python code utilizing Selenium module.
1 Installing Python and Selenium Module
1.1 Installing Python:
- Visit the official Python website: https://www.python.org/downloads/
- Download the Python installer suitable for your operating system.
- Run the installer and follow the instructions to install Python. Make sure to check the option “Add Python to PATH” during the installation process.
1.2 Installing Selenium:
- Open a command prompt or terminal.
- Use the following command to install the Selenium module via pip, the Python package installer:
pip install selenium
Note: If you’re using Python 3, the command may be pip3
instead of pip
.
With Python and Selenium installed, you can now start using Selenium for web scraping and automation.
2 Scraping the data from Website and saving the data into Excel Sheet
In this example to Scrape data from Website into Excel we will be using website Quotes to Scrape to scrap all its data like Quotes, Author Name and Tags from the website and saving the scraped data into Excel Sheet. In this example for saving the data into Excel Sheet and interacting with Excel we will be using Pandas Library also.
2.1 Installing Pandas Library
To install the Pandas library in Python, you can use the following steps:
Step 1: Open a command prompt or terminal.
Step 2: Run the following command to install Pandas using pip, the Python package installer:
pip install pandas
Note: If you’re using Python 3, the command may be pip3
instead of pip
.
Step 3: Wait for the installation to complete. Pip will download and install the Pandas library along with its dependencies.
Once the installation is finished, you can import and use the Pandas library in your Python scripts or interactive sessions.
2.2 Importing Selenium and Pandas Library.
First of all, from the Selenium we have to import webdriver and also Pandas library as per the following code.
from selenium import webdriver import pandas as pd
2.3 Setting the path to the Webdriver Executable and opening the website.
With the following code we will set the path to the Webdriver Executable. For our example we will be using ChromeDriver for Google Chrome. After setting the path to the Webdriver Executable we have to create a new instance of the webdriver and opening the website we want to scrape i.e. Quotes to Scrape. The code of this goes here.
# Set the path to the webdriver executable (e.g., ChromeDriver for Google Chrome). webdriver_path = '/path/to/chromedriver' # Create a new instance of the webdriver. driver = webdriver.Chrome(executable_path=webdriver_path) # Open the website. driver.get('https://quotes.toscrape.com/')
Make sure to replace /path/to/chromedriver
with the actual path to the ChromeDriver executable on your system. Also, ensure that you have the correct version of the ChromeDriver that corresponds to your installed version of Google Chrome. For downloading the ChromeDriver corresponding to your installed version of Google Chrome visit ChromeDriver – WebDriver for Chrome – Downloads (chromium.org).
2.4 Scraping the Data and storing the data into Pandas DataFrame
Following are the code for Scraping the data and storing the scraped data into Pandas DataFrame.
# Scrape the data. quotes = [] authors = [] tags = [] quote_elements = driver.find_elements_by_class_name('quote') for quote_element in quote_elements: quote = quote_element.find_element_by_class_name('text').text author = quote_element.find_element_by_class_name('author').text tags_list = quote_element.find_elements_by_class_name('tag') tag_names = [tag.text for tag in tags_list] quotes.append(quote) authors.append(author) tags.append(tag_names) # Create a DataFrame to store the scraped data. data = { 'Quote': quotes, 'Author': authors, 'Tags': tags } df = pd.DataFrame(data)
In the above code we have created three variables quotes, authors and tags in the starting with empty data sets to store the scraped data..
After opening the webpage using Webdriver our main task to search or select the webpage for the data which we are interested with. Webpage contains many of the data which is not useful as per our needs or requirements.
As per our our current example we have to extract only quotes, Author Name and Tags from the example website we have taken. In order to find the elements, form the webpage we have just opened by above code Selenium provides many methods for locating elements.
As per our example we will be using find_element(By.CLASS_NAME, “class name”) method to find the exact elements or texts we want to scrap from the opened webpage. In order to read further and get into other methods of finding the elements in Selenium follow the link 4. Locating Elements — Selenium Python Bindings 2 documentation (selenium-python.readthedocs.io).
Now how do we know exactly what the class name is of the element we want to scrap. For this you have to open the webpage which you are trying to scrap the data from in any browser and use the browser Inspect feature. This will open the browser’s built-in developer tools, where you can inspect the DOM and CSS of the webpage.
After using the Browser Inspect feature, we have seen that the page Quotes to Scrape is having 10 quotes in the front page and all of them is contained inside the div element having class name as “quote”. Now we have used Selenium feature of find elements by class name to scrap data contained inside the div tag with class name as “quote” with the following line of code.
quote_elements = driver.find_elements_by_class_name(‘quote’)
Here the scraped data is stored in a variable named “quote_elements” which is of type list and contains 10 items, which is same as the first 10 quotes we are trying to scrap.
What we have done next is that we have iterated using for loop to get all the ten Quotes, Author Name and Tags and stored them in their respective variables. For the tags we have stored them in a list as a quote may contains multiple tags.
And at the last we have created a Pandas DataFrame and stored all the scraped data. With the above steps we are done with maximum of our task of scraping the webpage to extract the data as per our requirements or needs.
2.5 Exporting the Scraped data into Excel Sheet
At last, we have exported the stored scraped data into an Excel FIle named “quotes_data.xlsx”. Pandas have an excellent support of interacting with Excel for data import and export and this is an standard. If you are into Data Science and Web Scraping field or you are just stepping into this Pandas will be quite helpful.
# Save the data to an Excel file.
excel_file = ‘quotes_data.xlsx’
df.to_excel(excel_file, index=False)
Now to check, open the quotes_data.xlsx file and look for the scraped data like Quotes, Author Name and Tags in it.
The complete code to Scrap data from Website into Excel goes here:
from selenium import webdriver import pandas as pd # Set the path to the webdriver executable (e.g., ChromeDriver for Google Chrome). webdriver_path = '/path/to/chromedriver' # Create a new instance of the webdriver. driver = webdriver.Chrome(executable_path=webdriver_path) # Open the website. driver.get('https://quotes.toscrape.com/') # Scrape the data. quotes = [] authors = [] tags = [] quote_elements = driver.find_elements_by_class_name('quote') for quote_element in quote_elements: quote = quote_element.find_element_by_class_name('text').text author = quote_element.find_element_by_class_name('author').text tags_list = quote_element.find_elements_by_class_name('tag') tag_names = [tag.text for tag in tags_list] quotes.append(quote) authors.append(author) tags.append(tag_names) # Create a DataFrame to store the scraped data. data = { 'Quote': quotes, 'Author': authors, 'Tags': tags } df = pd.DataFrame(data) # Save the data to an Excel file. excel_file = 'quotes_data.xlsx' df.to_excel(excel_file, index=False) # Close the browser. driver.quit() print(f'Data scraped successfully and saved to {excel_file}.')
This brings us to the end of our topic to Scrape data from Website into Excel and hope you all must have liked this. We have taken one of the example websites. But this must have given you an understanding of how to Scrape data from Website into Excel.
You can experiment as per my above examples give with any of the webpage you want to scrap and also with whatever data you like. But overall, the concept and mode to Scrape data from Website into Excel remains the same. If you liked my post then please share your experiences in the Comment Box below, we would love to hear from you.
For any improvement and any suggestions, you all are requested to kindly comment in the comment box below or else you can also contact us any time.
In order to continue your Learning Path into Web Scrapping in Python you can follow my earlier blogposts as below: