Web scraping and access API have been being used for various purposes, both commercial and personal. Web scraping enables businesses to swiftly provide updates to their customers, such as price increases, discounts, and a new product launch. Moreover, in many data science or data analysis tasks, we first need to gather data from websites or, generally, the Internet. Thus, it is essential to treat the Internet as a data source. Web scraping is the process of gathering and parsing HTML code from web pages to extract the desired data. Web scraping can serve an unlimited number of purposes, such as e-commerce, marketing, social media, and data analysis.
TechClass Access Web Data with Python online course gets you acquainted with scraping, parsing, and reading HTML data as well as accessing data using web APIs, which both together are called accessing web data. Besides, you will get familiar with some basic concepts such as HTML, HTTP, and XML as your very first step in the web scraping process. By the end of this course, you will gain hands-on experience creating a web scraper engine using various robust Python libraries such as Beatifulsoup, Selenium, Scrapy, and urllib. This course will prepare you to enter the fantastic world of web scraper job opportunities in the industry.
Learning outcomes
- Learn the basic concepts like internet protocols and HTML code
- Get familiar with JSON and XML format
- Learn about APIs and their main challenges
- Get acquainted with requesting data from APIs with urllib library
- Learn about how to find necessary information from URLs
- Get hands-on experience working with BeautifulSoup library for web scraping
- Get hands-on experience scraping data with Selenium
- Get hands-on experience scraping data with urllib library
- Learn how to work with Scrapy library and its shell
- Learn how to extract a substring from a string using the Regex library
- Learn how to extract a substring from a string using string methods
Table of contents
Chapter 1: Intro to Course
- 1.1. Welcome!
- 1.2. About TechClass Data Science Department
- 1.3. Learning Outcome
- 1.4. Your Expectations, Goals, and Knowledge
- 1.5. Abbreviations
- 1.6. Copyright Notice
Chapter 2: Introduction
- 2.1. What is Access Web Data about?
- 2.2. Importance of Web Scraping and APIs
- 2.3. Steps of Web Scraping
- 2.4. Challenges of Web Scraping and APIs
- 2.5. Python Libraries for Web Scraping
- 2.6. Quiz
Chapter 3: Basics Concepts
- 3.1. Introduction
- 3.2. Decipher the Information in URLs
- 3.3. HTTP and HTTPS
- 3.4. HTML
- 3.5. Static and Dynamic Websites
- 3.6. XML
- 3.7. JSON
- 3.8. Quiz
Chapter 4: Regular Expressions Modul (Regex)
- 4.1. Introduction
- 4.2. Regex
- 4.3. Basic Symbols (I)
- 4.4. Basic Symbols (II)
- 4.5. Basic Symbols (III)
- 4.6. Special Characters
- 4.7. Sets
- 4.8. Useful Methods
Chapter 5: Access Web Data with request and BeautifulSoup
- 5.1. Introduction
- 5.2. Scrape HTML Content with request
- 5.3. Scrap HTML Content with BeautifulSoup
- 5.4. div Element
- 5.5. Beautiful Soup in Action: Take the first step
- 5.6. Beautiful Soup in Action: Extract all links
- 5.7. Beautiful Soup in Action: Extract Specific Links
- 5.8. Beautiful Soup in Action: Navigate between pages
Chapter 6: Web Scrapping with Scrapy
- 6.1. Introduction
- 6.2. Scrapy Shell
- 6.3. Writing Custom Scrapy Spiders
- 6.4. Extract Data
Chapter 7: Access Web Data with urllib
- 7.1. Introduction
- 7.2. Retrieving Webpages
- 7.3. Extract Text from HTML
- 7.4. Create a Dataset
- 7.5. Access API
Chapter 8: Web Scraping with Selenium
- 8.1. Introduction
- 8.2. Install Selenium and Login
- 8.3. Extract Data
Chapter 9: Final Tasks
- 9.1. Final Project
- 9.2. Self-Study Essay
Chapter 10: Finishing the Course
- 10.1. What We Have Learned
- 10.2. Where to Go Next?
- 10.3. Your Opinion Matters
- 10.4. Congrats! You did it!
Brochure