-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
Hi running the collecting page links cell:
quote_page = 'https://en.wikipedia.org/w/index.php?title=Category:Living_people'
page = urllib.request.urlopen(quote_page)
soup = BeautifulSoup(page, 'html.parser')
links = []
i = 0
while i < 10:
content_big = soup.find('div', attrs={'class': 'mw-content-ltr'}) # or infobox biography vcard
link_list = content_big.find('div', attrs={'class': 'mw-category'}) # or infobox biography vcard
links.extend(link_list.find_all('a'))
print(len(links))
next_page = None
for j, aa in enumerate(soup.find_all('a')):
if aa.text == 'next page':
next_page = aa.get('href')
if j%100 == 0:
print(aa.get('href')[:10])
page = urllib.request.urlopen('https://en.wikipedia.org' + next_page)
soup = BeautifulSoup(page, 'html.parser')
break
if next_page == None:
break
i = i + 1
linksfails with AttributeError: 'NoneType' object has no attribute 'find_all' because link_list is None
Did something change at wikipedia that broke the script?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels