should i scrape the soybooru?

yunglimabean

Guest
my motivation is that i am trans btw the sheer autism must be preserved and i have the storage or something, also the current shimmie scraper doesn't scrape comments from what i've heard so a custom scraper isn't too hard to make because all i gotta do is parse the dom, grab the image url and iterate through all the comments
as of writing there are 74092 submissions on the 'ru so i think that's a pretty large number but not large enough where it can't all be scraped within a day
 
Wouldn’t it take a shit ton of storage doe?
 
once this scrape is completed i'll upload it on archive.org for generations to come to witness the pure unfiltered autism
 
How do you plan on getting past the McChallenge?
 
over 70K images? I remember scraping the whole original booru when it had around 9K and thought I was mad. I don't see why not.
it's a shame I lost that folder though, it guaranteed a victory in every 'duel plus it had some jaks that I think are lost on the current booru
 
over 70K images? I remember scraping the whole original booru when it had around 9K and thought I was mad. I don't see why not.
it's a shame I lost that folder though, it guaranteed a victory in every 'duel plus it had some jaks that I think are lost on the current booru
there could be deleted images too within those 70k images so i will need to handle that in the event i find an id that does not work
 
my motivation is that i am trans btw the sheer autism must be preserved and i have the storage or something, also the current shimmie scraper doesn't scrape comments from what i've heard so a custom scraper isn't too hard to make because all i gotta do is parse the dom, grab the image url and iterate through all the comments
as of writing there are 74092 submissions on the 'ru so i think that's a pretty large number but not large enough where it can't all be scraped within a day
It's hard to scrape the 'ru independently these days because it has captcha. If you can then I might ask you to make a scraper for only my posts on there as well.

Will the data scaped preserve the tags and uploader of the images?
 
also the scraping process is almost complete i only need to scrape 2000 more images
 
Back
Top