Project The Schlog Archives | Most recent version: 22 October 2024

NiggerRigger · Oct 23, 2024

Steve · Oct 23, 2024

Soygoy said:
https://soyjak.blog/conversations/ is a url that you can use to access conversations, if you have permissions for a convo it'll show, so if I create a new conversation, it'll give me a url number and my most recent conversation was like 1029 o algo. So I'm betting the total amount of conversations to be under 2000 ish

its around 1200

Soygoy · Oct 23, 2024

Soygoy said:
8,620 Threads, 3.5 Gigabytes. Soygoy presents...
View attachment 59361
Soggy's Scrapbook: soyjak.blog
A full archive of every post and thread on soyjak.blog leading up to thread number 8620.
This is a sister project to Soygoy's Expeditions: soyjak.blog

Download (Google Drive)

Overview
Whilst writing and preparing my book / memoir I had an issue in regards to archival. archive.is, archive.ph and archive.today are all extremely slow, so I decided to make my own scraper. 12 cups of tea and 24 hours later, here it is. Surprise! My 10k posts project is 500 posts early! Hinting at it all day, practically begging to get this out there for you all to enjoy. The entirety of soyjak.blog condensed down into 3.5 gigabytes! You can now own a piece of Schlog history, forever!

How did you do it?
I used python, curl and pageres. I wanted to make an overengineered go script but gave up trying to make it view webp's because I'm an absolute noob and I have no fucking clue with what I'm doing with go, but python sucks dick so I felt like not sharing the script because it was designed for the server that I was running it on overnight. Not to worry though, I plan on creating a toolset for scraping and archiving on my github later. The images are in Webp format to save on file space. I know JPG is similar, but in testing, Webp had a smaller file size. I had to keep the size down as much as possible, and thankfully I got it to about 3.5 gigabytes, very small in comparison to what could have been like 50 gigabytes.

It's also important to note that my tools can work on other xenforo based websites, but maybe not Kiwifarms, that's 150k threads o algo, if 8k threads is 24 hours of labour then 150k threads is like 480 hours and 60 gigabytes o algo, though such a small file size for the entirety of kiwifarms... ...

Why not just use text o algo?
Curling / wgetting web pages just doesn't work, it doesn't preserve css or image support, so I opted to take full page screenshots instead. I could make text only transcripts, but that'd require making a program to cut out a lot of text from a html time and I just think that taking an image of a web page is a lot faster imo. Sucks if you need the text though, you might have to use OCR for that, or I might make a text only version in the future so you can have something to search for occurences of words specifically throughout the entire website.

How can I use your archive?
Download the zip file, extract it and open thread-names.txt, use ctrl + f to find certain words you're looking for, get the thread number and then open thread-number-page-1 for example. The thread "add a sports board" is thread number 2026, so you'd go to thread-2026-page-1.webp and then just increment the number value in page-1. It's a bit iffy which is why I wanted to make a custom software that'd allow you to just input a url and get the thread and it'd give you controls to magically sort through everything but it's a bit of a pain right now due to my utter lack of knowledge. I could make a python script for it, but then you guys would have to go through all the hassle of downloading dependencies and that's just urrgghh.

Preferably, you should use a web browser so you can edit the url to access pages easier, or just search for the thread in your file explorer. In Linux you can create a slideshow by doing something like

gwenview `ls | grep -i "thread-10" | tr -d '/n'`

and that'll make a slideshow of that specific thread and all of it's pages. I would make some shell scripts but that would just leave Windows users in the dark. I'll do so anyways though for myself.
If you use ark like me you can also search in the archive and directly preview images without ever having to extract it!
View attachment 59372

Why not host a webserver?
I could, infact I probably might have enough space to do so even on my main website, my primary concern is just cost and maintenance. If it is possible, I will make a schlog archival service on my own website o algo. But releasing a zip is much more better because you can just look through everything yourself, it's only 3 to 4 gigabytes and everyone can just keep 99% of the schlog stored on a usb drive o algo lol. Pretty efficient if you ask me. I'd urge you to download it, as there is no reason not to, the more people who have this archive the less likely history will be lost.

And remember, you will always be remembered.

URLs for any fellow scrapers:
https://soyjak.blog/profile-posts/
https://soyjak.blog/posts/
https://soyjak.blog/threads/
https://soyjak.blog/conversations/
https://soyjak.blog/search/
https://soyjak.blog/members/
etc.

Fagon · Oct 23, 2024

>None of my conversations are gets
It really is over...

Fortuna · Oct 23, 2024

But what's the point?

istanbul · Oct 23, 2024

Fortuna said:
But what's the point?

to make chuddies think twice before posting stupid shit

Fortuna · Oct 23, 2024

ISTANBUL said:
to make chuddies think twice before posting stupid shit

This won't stop anyone oversharing.

Stephen · Oct 23, 2024

Soyteen Liker said:
Also this is fake news cuz you didn't 'chive the dms

Rip Urgentcord, gone forever it hurts

NiggerRigger said:
@Soyteen Fucker Why you wanna fuck Soygoy?

He acts like a boy five years his junior, plus all trannies are pedophiles

Soygoy · Oct 23, 2024

Fortuna said:
This won't stop anyone oversharing.

Which is a good thing, people will overshare and I will archive it.

Fortuna said:
But what's the point?

I love this website and I want to keep it forever. I've made some memories here, so I'm going to archive it for as long as I can and have it as a little digital scrapbook. You can read your favourite threads offline o algo, put it onto CDs or DVDs or USB drives and then hide them in the ground o algo for historians to find one day geg.

Fagon · Oct 23, 2024

I should spam more so it gets put on the archive

Soygoy · Oct 23, 2024

Hagon said:
I should spam more so it gets put on the archive

Yeah you do this Hagon, I'm sure it won't get you banned more often
:geg:

Soyteen Liker · Oct 23, 2024

Stephen said:
He acts like a boy five years his junior, plus all trannies are pedophiles

Marge how does Soggy act like a little boy

Fagon · Oct 23, 2024

Soygoy said:
Yeah you do this Hagon, I'm sure it won't get you banned more often

I'm going to be the first person to accrue 100 different bans

Soygoy · Oct 23, 2024

Trust me if you think this archive is useless now, come 5 years time o algo, you won't be thinking it's so useless then. When the sharty went down yesterday that struck the fear of god into me, it could all go at any point of time and you'd have nothing, so I created something, and now we have this archive. And the archive is only 3.5 Gigs, very small for such a large site.

Fortuna · Oct 23, 2024

Soygoy said:
Trust me if you think this archive is useless now, come 5 years time o algo, you won't be thinking it's so useless then. When the sharty went down yesterday that struck the fear of god into me, it could all go at any point of time and you'd have nothing, so I created something, and now we have this archive. And the archive is only 3.5 Gigs, very small for such a large site.

I don't think I will even remember this site in 5 years. That's a really long time.

istanbul · Oct 23, 2024

were pfps archived?

Stephen · Oct 23, 2024

Were sigs archived?

Soygoy · Oct 23, 2024

Fortuna said:
I don't think I will even remember this site in 5 years. That's a really long time.

I think you might, perhaps. It's all about the community, it's all about the memories we made, the story. We deserve to be remembered and I have a feeling that the schlog and the sharty are going to become legendary some day and maybe even overtake 4chan, but who knows. Bald man glasses forum may be silly, but it's not worth getting rid of entirely and forgetting about, to some degree, it's worth remembering.

I also just really like working on technical projects like this, it feels like I shouldn't be able to archive an entire website and yet here I am.

ISTANBUL said:
were pfps archived?

All pfps as of 22 October 2024 are archived. You should check for yourself, it's only 4 gigs or so. I can give a sample perhaps.

Soygoy · Oct 23, 2024

Stephen said:
Were sigs archived?

Signatures require you to log in.
It might be possible to archive log in only stuff by tricking pageres in some way, giving it a cookie o algo.

Stephen · Oct 23, 2024

@DOLL (44%) @Mustard LET SOYGOY HAVE AN ARCHIVER ALT NOOOOOOOOOOOOOOOWWWWWWWWWWWWWWWWWWW

Project The Schlog Archives | Most recent version: 22 October 2024

unknown schlogga

Dramacrat

I will fight for /anthro/

8,620 Threads, 3.5 Gigabytes. Soygoy presents...​

Soggy's Scrapbook: soyjak.blog​

Download (Google Drive)​

​

Overview​

How did you do it?​

Why not just use text o algo?​

How can I use your archive?​

Why not host a webserver?​

And remember, you will always be remembered.​

Violator

❤️🤍❤️🤍❤️🤍❤️🤍❤️🤍❤️🤍❤️🤍

Obsessed

❤️🤍❤️🤍❤️🤍❤️🤍❤️🤍❤️🤍❤️🤍

Oh My F--ci

I will fight for /anthro/

Violator

I will fight for /anthro/

Well-Rounded 'Liker

Violator

I will fight for /anthro/

❤️🤍❤️🤍❤️🤍❤️🤍❤️🤍❤️🤍❤️🤍

Obsessed

Oh My F--ci

I will fight for /anthro/

I will fight for /anthro/

Oh My F--ci

Similar threads

8,620 Threads, 3.5 Gigabytes. Soygoy presents...

Soggy's Scrapbook: soyjak.blog

Download (Google Drive)

Overview

How did you do it?

Why not just use text o algo?

How can I use your archive?

Why not host a webserver?

And remember, you will always be remembered.