Sujet : Re: grrrrrr
De : blockedofcourse (at) *nospam* foo.invalid (Don Y)
Groupes : sci.electronics.designDate : 15. Mar 2024, 00:43:59
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <usvujn$1slrq$3@dont-email.me>
References : 1 2 3 4
User-Agent : Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.2.2
On 3/14/2024 11:39 AM, bitrex wrote:
They also seem to like to host them on the slowest servers imaginable and still use FTP like it's the 90s.
FTP would require each asset to have a unique file name.
AND, would let a client peruse the list of ALL available
assets -- scrape the server in one shot!
Many sites deliver a "document.pdf" after you have clicked
through a set of pages (including an acceptance of license).
For "true PDFs", a simpler strategy is to scrape the site
and store the entire set of documents indexed for FTS.