Tutorial (Part 1) — Web Scraping for Beginners
We can get data when scraping the web with Python. Solving real problems in Software Development and Data Science requires data. Therefore, Data means everything for Data scientists, also for Software Developers. In this first Series of web scraping with python, we will learn how to get it.
In my third day as a Software Developer & Data Scientist (My path as Software Developer), I met Scrapy. Scrapy (Scraping Python) module, allows you to extract, process and store data from websites. However, that noise comes unstructured. What if we can collect that data and highly structure it? That is what Scrapy does! Creates unstructured data into structured data such as an SQL database.
Are you still asking yourself why you would need scrapy? — let's talk about something real, the Reddit groups investing in Gamestop, AMC, Blackberry, to mention a few. We can access the Reddit group and extract the conversation and the group's positive comments, which would help us make the best decision in supporting the investment of our choice.
👀 We will scrap wallstreetbets ⬇️
In this Medium posts Series, you will learn how to set up and create your Python framework with scrapy. It is envisaged that HTML and CSS's basic knowledge is coming with you if not, keep reading, you will understand.
" When you need to refresh them, give it a try Fundamentals of HTML & CSS."
Installing Scrapy
Which python scrapy? Scrapy supports Python 2 and Python 3. We can install it with the Pip command. (Refresh your notes about pip)
#Code block 1.0pip install scrapy ...(1)
In the code block 1.0, we have called the pip command that installs the scrapy module.
As in Python, scrapy comes with its own shell. The scrapy shell works perfectly to scrap the websites without previous implementation in the code. As we have installed our scrapy module, we can code.
#Code block 2.0scrapy shell ...(1)
In the code block 2.0, we have initiated the shell for scrapy. This console gives us the possibility to try scrapy without the creation of any script in Python.
When you crawl something with scrapy, the result is an object. This object is called “response”. This response is simply the information that you have downloaded. We can see what the crawler has downloaded if in the same console we write the next code.
#Code block 3.0view(response) ...(1)
In the code block 3.0, we have instructed to see the downloaded information's content in the response. The command will open the webpage downloaded by the crawler.
Wow! That's amazing. We have downloaded an entire webpage.
In the next post-tutorials, we will go further with our analysis.