Newsportal USENET - Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]

Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]

Sujet : Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]
De : not (at) *nospam* telling.you.invalid (Computer Nerd Kev)
Groupes : comp.misc
Date : 26. Jul 2024, 13:18:48

Autres entêtes

Organisation : Ausics - https://newsgroups.ausics.net
Message-ID : <66a39428@news.ausics.net>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14
User-Agent : tin/2.0.1-20111224 ("Achenvoir") (UNIX) (Linux/2.4.31 (i686))

D <nospam@example.net> wrote:

Read only sounds very simple. I usually scrape in python with the requests
library and the beautiful soup library. A simple scraping loop could look
like this (modify per web board of course):

for page in range(100, 150):
html = requests.get("https://www.svt.se/text-tv/" + str(page))
soup = BeautifulSoup(html.text, 'html.parser')
div_bs4 = soup.find('div', {"class": "Content_screenreaderOnly__3Cnkp"})
try:
email_body += div_bs4.string + "\n"
except AttributeError:
None

So basically a range of pages, then loop over those pages,

You need to sync it to the messages in the forum index though,
otherwise when they get a spam flood of messages that the admin
deletes, or just jump the thread counter around for some other
reason, the scraper is stuck looking for the next 25 threads
after the last one it saw when it needs to jump forwards 150. I
guess you could interpret the deleted thread pages and crawl
through them, but then you need the crawler to remember the gap
that was left so it doesn't forget to check for new posts in the
threads before the spam flood.

So even if it's possible to iterate over threads that way on all
forum platforms (which I'm not sure about), I think it would be
more reliable in the long run to parse the index pages to determine
which threads to retrieve. Also less risk of getting blocked by web
servers for too many requests.

But thanks for the example. I'm not really sure whether a HTML
parser library would be helpful or just a pointless extra layer
of complexity. So far I've just used regular expressions for
scraping webpages. I was thinking along the lines of a template
system defining strings that indicate the start/end of fields (and
any key features in-between) ideally allowing new forum parsers to
be added without needing to touch the code. There must be things
like that around already...

Perhaps I'm determined to make it hard for myself, but if it broke
all the time and was complicated to fix, then that would be worse.

Anyhow now I've got onto thinking about that I've wasted all the
time I was actually going to spend finishing a PHP static site
generator to format data that I scraped off a website last week.
That seemed simple at first too...

--
__ __
#_ < |\| |< _#

Les messages affichés proviennent d'usenet.

Date	Sujet	#	Auteur
24 Jul 24	Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	101	Anton Shepelev
24 Jul 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	44	D
24 Jul 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	1	Johanne Fairchild
24 Jul 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	40	The Real Bev
25 Jul 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	39	D
25 Jul 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	13	Rich
25 Jul 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	11	D
25 Jul 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	3	Anton Shepelev
26 Jul 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	2	D
26 Jul 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	1	Scott Dorsey
26 Jul 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	5	The Real Bev
26 Jul 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	3	D
26 Jul 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	2	Johanne Fairchild
26 Jul 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	1	D
27 Jul 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	1	D
27 Jul 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	2	Scott Dorsey
27 Jul 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	1	D
26 Jul 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	1	The Real Bev
6 Aug 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	25	The Real Bev
6 Aug 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	24	Stefan Ram
7 Aug 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	23	The Real Bev
7 Aug 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	21	Richard Kettlewell
7 Aug 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	16	Aharon Robbins
7 Aug 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	13	Lawrence D'Oliveiro
8 Aug 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	4	Scott Dorsey
9 Aug 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	3	Lawrence D'Oliveiro
9 Aug 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	2	Scott Dorsey
10 Aug 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	1	Lawrence D'Oliveiro
8 Aug 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	8	Dan Purgert
8 Aug 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	4	D
8 Aug 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	2	Rich
9 Aug 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	1	D
8 Aug 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	1	Bob Eager
9 Aug 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	3	Lawrence D'Oliveiro
9 Aug 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	2	Dan Purgert
9 Aug 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	1	Lawrence D'Oliveiro
8 Aug 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	1	Scott Dorsey
14 Aug 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	1	Johanne Fairchild
9 Aug 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	4	The Real Bev
9 Aug 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	3	D
10 Aug 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	2	The Real Bev
14 Aug 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	1	Johanne Fairchild
7 Aug 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	1	Scott Dorsey
5 Aug 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	2	David LaRue
5 Aug 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	1	D
25 Jul 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	56	George Musk
25 Jul 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	55	Marco Moock
25 Jul 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	2	yeti
25 Jul 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	1	D
25 Jul 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	43	Johanne Fairchild
25 Jul 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	3	D
26 Jul 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	2	candycanearter07
26 Jul 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	1	D
26 Jul 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	39	Stefan Ram
26 Jul 24	Re: Emigration from Usenet	38	Johanne Fairchild
27 Jul 24	Re: Emigration from Usenet	37	Stefan Ram
27 Jul 24	Re: Emigration from Usenet	2	Johanne Fairchild
27 Jul 24	Re: Emigration from Usenet	1	Stefan Ram
27 Jul 24	Re: Emigration from Usenet	34	Richard Kettlewell
27 Jul 24	Re: Emigration from Usenet	22	Stefan Ram
27 Jul 24	Re: Emigration from Usenet	4	D
27 Jul 24	Re: Emigration from Usenet	3	Rich
28 Jul 24	Re: Emigration from Usenet	1	D
29 Jul 24	Re: Emigration from Usenet	1	Stefan Ram
27 Jul 24	Re: Emigration from Usenet	11	Richard Kettlewell
27 Jul 24	Re: Emigration from Usenet	9	Stefan Ram
28 Jul 24	Re: Emigration from Usenet	1	D
28 Jul 24	Re: Emigration from Usenet	7	Richard Kettlewell
28 Jul 24	Re: Emigration from Usenet	5	Stefan Ram
28 Jul 24	Re: Emigration from Usenet	1	Stefan Ram
28 Jul 24	Re: Emigration from Usenet	3	Rich
28 Jul 24	Re: Emigration from Usenet	2	Richard Kettlewell
28 Jul 24	Re: Emigration from Usenet	1	Rich
28 Jul 24	Re: Emigration from Usenet	1	D
28 Jul 24	Re: Emigration from Usenet	1	The Real Bev
29 Jul 24	Re: Emigration from Usenet	6	Johanne Fairchild
29 Jul 24	Re: Emigration from Usenet	4	The Real Bev
29 Jul 24	Re: Emigration from Usenet	3	Mike Spencer
29 Jul 24	Re: Emigration from Usenet	2	The Real Bev
29 Jul 24	Re: Emigration from Usenet	1	Stefan Ram
29 Jul 24	Re: Emigration from Usenet	1	D
28 Jul 24	Re: Emigration from Usenet	1	Andreas Eder
29 Jul 24	Re: Emigration from Usenet	10	Johanne Fairchild
29 Jul 24	Re: Emigration from Usenet	9	Richard Kettlewell
29 Jul 24	Re: Emigration from Usenet	8	Javier
29 Jul 24	Re: Emigration from Usenet	1	yeti
29 Jul 24	Re: Emigration from Usenet	5	Rich
30 Jul 24	Re: Emigration from Usenet	4	Johanne Fairchild
30 Jul 24	Re: Emigration from Usenet	3	Stefan Ram
30 Jul 24	Re: Emigration from Usenet	1	Johanne Fairchild
30 Jul 24	Re: Emigration from Usenet	1	Lawrence D'Oliveiro
29 Jul 24	Re: Emigration from Usenet	1	Richard Kettlewell
25 Jul 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	9	Rich
25 Jul 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	8	Computer Nerd Kev
25 Jul 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	7	Anton Shepelev
26 Jul 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	6	Computer Nerd Kev
26 Jul 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	3	D
26 Jul 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	2	Computer Nerd Kev
28 Jul 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	1	Lawrence D'Oliveiro
12 Aug 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	2	Theo
12 Aug 24	Re: Emigration from Usenet [was: Re: PTD was the most-respected of the AUE regulars ...]	1	Computer Nerd Kev