Project The Schlog Archives | Most recent version: 22 October 2024

Soygoy · Oct 22, 2024

8,620 Threads, 3.5 Gigabytes. Soygoy presents...

Soggy's Scrapbook: soyjak.blog

A full archive of every post and thread on soyjak.blog leading up to thread number 8620.
This is a sister project to Soygoy's Expeditions: soyjak.blog

Download (Google Drive)

Overview

Whilst writing and preparing my book / memoir I had an issue in regards to archival. archive.is, archive.ph and archive.today are all extremely slow, so I decided to make my own scraper. 12 cups of tea and 24 hours later, here it is. Surprise! My 10k posts project is 500 posts early! Hinting at it all day, practically begging to get this out there for you all to enjoy. The entirety of soyjak.blog condensed down into 3.5 gigabytes! You can now own a piece of Schlog history, forever! [wholesome]

How did you do it?

I used python, curl and pageres. I wanted to make an overengineered go script but gave up trying to make it view webp's because I'm an absolute noob and I have no fucking clue with what I'm doing with go, but python sucks dick so I felt like not sharing the script because it was designed for the server that I was running it on overnight. Not to worry though, I plan on creating a toolset for scraping and archiving on my github later. The images are in Webp format to save on file space. I know JPG is similar, but in testing, Webp had a smaller file size. I had to keep the size down as much as possible, and thankfully I got it to about 3.5 gigabytes, very small in comparison to what could have been like 50 gigabytes.

It's also important to note that my tools can work on other xenforo based websites, but maybe not Kiwifarms, that's 150k threads o algo, if 8k threads is 24 hours of labour then 150k threads is like 480 hours and 60 gigabytes o algo, though such a small file size for the entirety of kiwifarms... hmm

...

Why not just use text o algo?

Curling / wgetting web pages just doesn't work, it doesn't preserve css or image support, so I opted to take full page screenshots instead. I could make text only transcripts, but that'd require making a program to cut out a lot of text from a html time and I just think that taking an image of a web page is a lot faster imo. Sucks if you need the text though, you might have to use OCR for that, or I might make a text only version in the future so you can have something to search for occurences of words specifically throughout the entire website.

How can I use your archive?

Download the zip file, extract it and open thread-names.txt, use ctrl + f to find certain words you're looking for, get the thread number and then open thread-number-page-1 for example. The thread "add a sports board" is thread number 2026, so you'd go to thread-2026-page-1.webp and then just increment the number value in page-1. It's a bit iffy which is why I wanted to make a custom software that'd allow you to just input a url and get the thread and it'd give you controls to magically sort through everything but it's a bit of a pain right now due to my utter lack of knowledge. I could make a python script for it, but then you guys would have to go through all the hassle of downloading dependencies and that's just urrgghh.

Preferably, you should use a web browser so you can edit the url to access pages easier, or just search for the thread in your file explorer. In Linux you can create a slideshow by doing something like


gwenview `ls | grep -i "thread-10" | tr -d '/n'`

and that'll make a slideshow of that specific thread and all of it's pages. I would make some shell scripts but that would just leave Windows users in the dark. I'll do so anyways though for myself.
If you use ark like me you can also search in the archive and directly preview images without ever having to extract it!

Why not host a webserver?

I could, infact I probably might have enough space to do so even on my main website, my primary concern is just cost and maintenance. If it is possible, I will make a schlog archival service on my own website o algo. But releasing a zip is much more better because you can just look through everything yourself, it's only 3 to 4 gigabytes and everyone can just keep 99% of the schlog stored on a usb drive o algo lol. Pretty efficient if you ask me. I'd urge you to download it, as there is no reason not to, the more people who have this archive the less likely history will be lost.

And remember, you will always be remembered.

NiggerRigger · Oct 22, 2024

I hope your computer explodes, fucking autist queer.

Hagon · Oct 22, 2024

>He archived the 'p
...

Soygoy · Oct 22, 2024

Hagon said:
>He archived the 'p
...

There better fucking not be anything like that on there o algo

NiggerRigger · Oct 22, 2024

Soygoy said:
There better fucking not be anything like that on there o algo

Reported for pedophilia.

NiggerRigger · Oct 22, 2024

@Soyteen Fucker Why you wanna fuck Soygoy?

Soyteen Liker · Oct 22, 2024

NiggerRigger said:
@Soyteen Fucker Why you wanna fuck Soygoy?

He reminds me of the man in the mirror

NiggerRigger · Oct 22, 2024

Soyteen Liker said:
He reminds me of the man in the mirror

Tranny.

Soyteen Liker · Oct 22, 2024

Soygoy said:
The entirety of soyjak.blog condensed down into 3.5 gigabytes! You can now own a piece of Schlog history, forever!

Also this is fake news cuz you didn't 'chive the dms

ISTANBUL · Oct 22, 2024

is there anything on the schlog really worth archiving

Soygoy · Oct 22, 2024

Soyteen Liker said:
Also this is fake news cuz you didn't 'chive the dms

I don't have access to those albeit. There's atleast 1000 conversations, just checked.

Soygoy · Oct 22, 2024

ISTANBUL said:
is there anything on the schlog really worth archiving

Sparkles diary, I guess? I mean idk, but it's only 3.5 gigabytes so why the fuck not, right?
Save it for something to tell your grandkids about o algo. Then again, archiving is not about why it's because we can.

Soyteen Liker · Oct 22, 2024

Soygoy said:
I don't have access to those albeit. There's atleast 1000 conversations, just checked.

Marge how did you check the conversations :cobshock:

ISTANBUL · Oct 22, 2024

Soygoy said:
Sparkles diary, I guess? I mean idk, but it's only 3.5 gigabytes so why the fuck not, right?
Save it for something to tell your grandkids about o algo. Then again, archiving is not about why it's because we can.

were profile posts archived?

NiggerRigger · Oct 22, 2024

@Soygoy Why not go kill yourself instead?

Soygoy · Oct 22, 2024

Soyteen Liker said:
Marge how did you check the conversations

https://soyjak.blog/conversations/ is a url that you can use to access conversations, if you have permissions for a convo it'll show, so if I create a new conversation, it'll give me a url number and my most recent conversation was like 1029 o algo. So I'm betting the total amount of conversations to be under 2000 ish

NiggerRigger · Oct 22, 2024

Soygoy said:
https://soyjak.blog/conversations/ is a url that you can use to access conversations, if you have permissions for a convo it'll show, so if I create a new conversation, it'll give me a url number and my most recent conversation was like 1029 o algo. So I'm betting the total amount of conversations to be under 2000 ish

Why can't autists get castrated?

Soygoy · Oct 22, 2024

ISTANBUL said:
were profile posts archived?

You have to login to view them unfortunately so that's a no.

It's probably possible though.

Soyteen Liker · Oct 22, 2024

Soygoy said:
https://soyjak.blog/conversations/ is a url that you can use to access conversations, if you have permissions for a convo it'll show, so if I create a new conversation, it'll give me a url number and my most recent conversation was like 1029 o algo. So I'm betting the total amount of conversations to be under 2000 ish

OH thanks, geg reminds me of this

ITT we post search gets

Ok so I just figured out everytime you search something this number goes up by 1 on the url so this means search GETS are possible

soyjak.blog

I wonder what conversation 1000 was

NiggerRigger · Oct 22, 2024

I honestly don't get why autsist can't get castrated.
@Sneedson

Project The Schlog Archives | Most recent version: 22 October 2024

I will fight for /anthro/

8,620 Threads, 3.5 Gigabytes. Soygoy presents...​

Soggy's Scrapbook: soyjak.blog​

Download (Google Drive)​

​

Overview​

How did you do it?​

Why not just use text o algo?​

How can I use your archive?​

Why not host a webserver?​

And remember, you will always be remembered.​

Well-known schlogga

So obsess with me mang

I will fight for /anthro/

Well-known schlogga

Well-known schlogga

it's just getting started

Well-known schlogga

it's just getting started

Most Helpful Member of The Blog

I will fight for /anthro/

I will fight for /anthro/

it's just getting started

Most Helpful Member of The Blog

Well-known schlogga

I will fight for /anthro/

Well-known schlogga

I will fight for /anthro/

it's just getting started

Well-known schlogga

Similar threads

8,620 Threads, 3.5 Gigabytes. Soygoy presents...

Soggy's Scrapbook: soyjak.blog

Download (Google Drive)

Overview

How did you do it?

Why not just use text o algo?

How can I use your archive?

Why not host a webserver?

And remember, you will always be remembered.