Skip to main content

Command Palette

Search for a command to run...

#26 - Python Web Scraping

By Ifeanyi Omeata

Updated
2 min read
#26 - Python Web Scraping
I

Hi, I am a Software Developer of 3-4 years specialising in React, Javascript, Node, NextJS, Express, Python, Django, Fast API, SQL and a few other technology stack, with a good background in Networking and Cloud Infrastructure. I am working to become a DevOps Solutions Engineer and happily married to my long time girlfriend.


Topics:


1. Web Scraping


1. WEB SCRAPING


>>Return to Menu




import requests
from bs4 import BeautifulSoup

page = 3
res = requests.get(f'https://news.ycombinator.com/?p={page}')
soup = BeautifulSoup(res.text, 'html.parser')
links = soup.select('.titlelink')
subtext = soup.select('.subtext')
r = []

for idx, i in enumerate(links):
    title = links[idx].get_text()
    link = links[idx].get('href', None)
    vote = subtext[idx].select('.score')
    if len(vote):
        points = int(vote[0].get_text().replace(' points', ''))
        r.append({'title': title, 'link': {link}, 'votes': points})    

print(sorted(r,key= lambda k:k['votes'],reverse=True))

.storylink = .titlelink 

apt-get install python3-bs4
easy_install beautifulsoup4
Pip3 install beautifulsoup4
Pip3 install requests

import requests
from bs4 import BeautifulSoup

res = requests.get('https://news.ycombinator.com/')
soup = BeautifulSoup(res.text, 'html.parser')
a = soup.select('a')
for i in a:
    if i.get('href')[:5] == 'https':
        print(i.get('href'))   

res = requests.get('https://news.ycombinator.com/')
soup = BeautifulSoup(res.text, 'html.parser')
a = soup.find('a')
el = a.get('href')
print(el)

res = requests.get('https://news.ycombinator.com/')
soup = BeautifulSoup(res.text, 'html.parser')
print(soup.prettify())

el = soup.select(‘p’)
el = soup.select('.score')
el = soup.select(‘#unv_30557088’)
el = soup.find_all("a")
el = soup.find("a")
el = soup.find('title')
el = soup.title
el = soup.title.name
el = soup.title.string
el = soup.title.parent.name
el = soup.find_all('div')
el = soup.find(id='unv_30557088')
el = soup.get_text()
a = soup.find('a')
el = a.get('href')

print(el)

soup = BeautifulSoup(html_doc, 'html.parser')
soup = BeautifulSoup("<html>a web page</html>", 'html.parser')
print(soup.prettify())

with open("index.html") as fp:
    soup = BeautifulSoup(fp, 'html.parser')

for link in soup.find_all('a'):
    print(link.get('href'))

#End


Hope you enjoyed this! :) Follow me for more contents...


Get in Touch:
www.ifeanyiomeata.com
contact@ifeanyiomeata.com

Youtube: https://www.youtube.com/c/IfeanyiOmeata
Linkedin: https://www.linkedin.com/in/omeatai/
Twitter: https://twitter.com/iomeata
Github: https://github.com/omeatai/
Stackoverflow: https://stackoverflow.com/users/2689166/omeatai
Hashnode: https://hashnode.com/@omeatai
Medium: https://medium.com/@omeatai
© 2022

LEARN PYTHON/DJANGO

Part 11 of 37

Learn Python and Django the easy way!..... PYTHON | DJANGO | FASTAPI

Up next

#25 - Python Unittest

By Ifeanyi Omeata