~Bunbunmaru News~ > Front Page Headlines
The End of Shrinemaiden As We Know It
Edible:
Yeah, if we had full access to the linode account it wouldn't be an issue. Unfortunately that's not the case.
Infy♫:
I spent the past few days scraping the site.
Here is every post on shrinemaiden.org that's accessible through an account.
it's all in a .csv file of about 800mb. I hope someone else can figure out a way to make it all easy to access.
HakureiSM:
127mb compressed
Thanks my man, I got a copy
nav':
--- Quote from: Infy♫ on February 17, 2020, 10:01:56 PM ---I spent the past few days scraping the site.
Here is every post on shrinemaiden.org that's accessible through an account.
it's all in a .csv file of about 800mb. I hope someone else can figure out a way to make it all easy to access.
--- End quote ---
Grabbing a copy too. Myself I have downloaded every thread I could access in print mode, which in total is of similar size when compressed with 7zip Ultra setting. Lightweight, but not so great because links to other posts and attachments don't work... Gotta put together something better.
Edit: I sort of wish your csv file also preserved the id number of every post.
Barrakketh:
--- Quote from: nav' on February 17, 2020, 10:42:26 PM ---Grabbing a copy too. Myself I have downloaded every thread I could access in print mode, which in total is of similar size when compressed with 7zip Ultra setting. Lightweight, but not so great because links to other posts and attachments don't work... Gotta put together something better.
Edit: I sort of wish your csv file also preserved the id number of every post.
--- End quote ---
I'll probably have finished writing my scraper by tomorrow, the next function is just grabbing the main page to get all the boards to crawl. Each thread is combined into one page, quote anchors are corrected to link to the right post, and attachments are saved (the high score entry board was my first test). Images hosted on shrinemaiden.org and in a post are downloaded (covers the staff badges and such, avatars, emoticons, etc.), and replay links pointing to gensokyo are rewritten to so they go to the lunarcast archive.
It's not a mirror of the board. More of a copy of the important content.