allfeeds.ai

 

Stream Our Mistakes  

Stream Our Mistakes

Stream Out Mistakes

Author: eddyizm and octon

weekly podcast live coding with our user group
Be a guest on this podcast

Language: en-us

Genres: Education, How To, Technology

Contact email: Get it

Feed URL: Get it

iTunes ID: Get it


Get all podcast data

Listen Now...

004 - HTML Scraping with Beautiful Soup
Friday, 22 December, 2017

Stream Our Mistakes EP 004 In this episode, Matt walks us through html/web scraping using the popular python library, Beautiful Soup. Here's the code snippet from the session and links: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 # Created for Stream Our Mistakes # https://streamourmistakes.blogspot.com/ # Reference: # https://docs.python.org/3/library/urllib.request.html # https://www.crummy.com/software/BeautifulSoup/bs4/doc/ from bs4 import BeautifulSoup import urllib.request ''' # local html to play with from documentation Uncomment to enable html_doc = """ <html><head><title>The Dormouse's story</title></head> <body> <p class="title"><b>The Dormouse's story</b></p> <p class="story">Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well.</p> <p class="story">...</p> """ ''' # Get the html from the web. f = urllib.request.urlopen('https://en.wikiquote.org/wiki/Aristotle') # Load the html into the parser. soup = BeautifulSoup(f.read(), 'html.parser') # Show the whole raw # print(soup.prettify()) # Access a single element. # print(soup.title) # Find all a tags in the html doc and print some information. links = soup.find_all('a') for link in links: print(link.get('href')) print(len(links)) links: https://docs.python.org/3/library/urllib.request.html https://www.crummy.com/software/BeautifulSoup/bs4/doc/ Subscribe to the podcast on apple podcasts, google play,  stitcher matt site: http://octon.io/ github: https://github.com/mmdempsey eddyizm site: http://eddyizm.com twitter: http://twitter.com/eddyizm github: https://github.com/eddyizm perry github: https://github.com/apk29 --- **youtube live broadcast:** https://youtube.com/user/eddyizm/live Subscribe to our channel and follow my twitter feed to be notified of our next live broadcast and feel free to leave us comments and suggestions on what you want to see.

 

We also recommend:


Linux What
linuxwhat

Church IT Podcast
gccjason

Macfan-TV

Berlin-av
Christoph Becker

ACA - Fee-Only Financial Advisors

Online Business Zen » Podcasts
Dr Brad Smith and Friends

walkamileinyourshoes
Walk A Mile In Your Shoes

Fantasy Football 420
Josh Jacobson

Apfeltalk LIVE! Audiopodcast
Michael Reimann

Appware Podcast
Appware

Chapter Meetings (Western Cape)
INCOSE SA

The Energy Show
Barry Cinnamon