NiggerRigger
Well-known schlogga
@Soyteen Liker Am I a Jartycuck?
its around 1200https://soyjak.blog/conversations/ is a url that you can use to access conversations, if you have permissions for a convo it'll show, so if I create a new conversation, it'll give me a url number and my most recent conversation was like 1029 o algo. So I'm betting the total amount of conversations to be under 2000 ish
URLs for any fellow scrapers:
8,620 Threads, 3.5 Gigabytes. Soygoy presents...
Soggy's Scrapbook: soyjak.blogA full archive of every post and thread on soyjak.blog leading up to thread number 8620.
This is a sister project to Soygoy's Expeditions: soyjak.blog
Download (Google Drive)
Overview
Whilst writing and preparing my book / memoir I had an issue in regards to archival. archive.is, archive.ph and archive.today are all extremely slow, so I decided to make my own scraper. 12 cups of tea and 24 hours later, here it is. Surprise! My 10k posts project is 500 posts early! Hinting at it all day, practically begging to get this out there for you all to enjoy. The entirety of soyjak.blog condensed down into 3.5 gigabytes! You can now own a piece of Schlog history, forever!
How did you do it?
I used python, curl and pageres. I wanted to make an overengineered go script but gave up trying to make it view webp's because I'm an absolute noob and I have no fucking clue with what I'm doing with go, but python sucks dick so I felt like not sharing the script because it was designed for the server that I was running it on overnight. Not to worry though, I plan on creating a toolset for scraping and archiving on my github later. The images are in Webp format to save on file space. I know JPG is similar, but in testing, Webp had a smaller file size. I had to keep the size down as much as possible, and thankfully I got it to about 3.5 gigabytes, very small in comparison to what could have been like 50 gigabytes.
It's also important to note that my tools can work on other xenforo based websites, but maybe not Kiwifarms, that's 150k threads o algo, if 8k threads is 24 hours of labour then 150k threads is like 480 hours and 60 gigabytes o algo, though such a small file size for the entirety of kiwifarms... ...
Why not just use text o algo?
Curling / wgetting web pages just doesn't work, it doesn't preserve css or image support, so I opted to take full page screenshots instead. I could make text only transcripts, but that'd require making a program to cut out a lot of text from a html time and I just think that taking an image of a web page is a lot faster imo. Sucks if you need the text though, you might have to use OCR for that, or I might make a text only version in the future so you can have something to search for occurences of words specifically throughout the entire website.
How can I use your archive?
Download the zip file, extract it and open thread-names.txt, use ctrl + f to find certain words you're looking for, get the thread number and then open thread-number-page-1 for example. The thread "add a sports board" is thread number 2026, so you'd go to thread-2026-page-1.webp and then just increment the number value in page-1. It's a bit iffy which is why I wanted to make a custom software that'd allow you to just input a url and get the thread and it'd give you controls to magically sort through everything but it's a bit of a pain right now due to my utter lack of knowledge. I could make a python script for it, but then you guys would have to go through all the hassle of downloading dependencies and that's just urrgghh.
Preferably, you should use a web browser so you can edit the url to access pages easier, or just search for the thread in your file explorer. In Linux you can create a slideshow by doing something like
gwenview `ls | grep -i "thread-10" | tr -d '/n'`
and that'll make a slideshow of that specific thread and all of it's pages. I would make some shell scripts but that would just leave Windows users in the dark. I'll do so anyways though for myself.
If you use ark like me you can also search in the archive and directly preview images without ever having to extract it!
View attachment 59372
Why not host a webserver?
I could, infact I probably might have enough space to do so even on my main website, my primary concern is just cost and maintenance. If it is possible, I will make a schlog archival service on my own website o algo. But releasing a zip is much more better because you can just look through everything yourself, it's only 3 to 4 gigabytes and everyone can just keep 99% of the schlog stored on a usb drive o algo lol. Pretty efficient if you ask me. I'd urge you to download it, as there is no reason not to, the more people who have this archive the less likely history will be lost.
And remember, you will always be remembered.
to make chuddies think twice before posting stupid shitBut what's the point?
This won't stop anyone oversharing.to make chuddies think twice before posting stupid shit
Which is a good thing, people will overshare and I will archive it.This won't stop anyone oversharing.
I love this website and I want to keep it forever. I've made some memories here, so I'm going to archive it for as long as I can and have it as a little digital scrapbook. You can read your favourite threads offline o algo, put it onto CDs or DVDs or USB drives and then hide them in the ground o algo for historians to find one day geg.But what's the point?
Yeah you do this Hagon, I'm sure it won't get you banned more oftenI should spam more so it gets put on the archive
Marge how does Soggy act like a little boyHe acts like a boy five years his junior, plus all trannies are pedophiles
I'm going to be the first person to accrue 100 different bansYeah you do this Hagon, I'm sure it won't get you banned more often
I don't think I will even remember this site in 5 years. That's a really long time.Trust me if you think this archive is useless now, come 5 years time o algo, you won't be thinking it's so useless then. When the sharty went down yesterday that struck the fear of god into me, it could all go at any point of time and you'd have nothing, so I created something, and now we have this archive. And the archive is only 3.5 Gigs, very small for such a large site.
I think you might, perhaps. It's all about the community, it's all about the memories we made, the story. We deserve to be remembered and I have a feeling that the schlog and the sharty are going to become legendary some day and maybe even overtake 4chan, but who knows. Bald man glasses forum may be silly, but it's not worth getting rid of entirely and forgetting about, to some degree, it's worth remembering.I don't think I will even remember this site in 5 years. That's a really long time.
All pfps as of 22 October 2024 are archived. You should check for yourself, it's only 4 gigs or so. I can give a sample perhaps.were pfps archived?
Signatures require you to log in.Were sigs archived?