I like to have a backup of my blog on my notebook, so that I can run searches in it when I am not connected.
Blogspot has nice URLs for each post - e.g.
http://ajayshahblog.blogspot.com/2007/03/how-to-make-email-to-blogger-work.htmlis the URL for a post that I wrote on making email -> blogger (mostly) work. This suggests a file system where there are directories
2007
, 2007/03
and then a file 2007/03/how-to-make-email-to-blogger-work.html
, which would be a case of nice software engineering.How would I make a personal file system which mirrors my blog which has this structure? I'm unable to do this. I tried to use wget with recursive get options and it gets lost. A key feature that I want is to be able to say wget -c so that modified posts are picked up (but all posts are not brought down).
Right now, I have a simple and dumb solution: I take one file per month, and I fetch the whole thing every time (which is wasteful of resources for google). I use this script:
#!/bin/sh rm -f *.html *.text for year in 2005 2006 2007 ; do for month in 01 02 03 04 05 06 07 08 09 10 11 12 ; do wget "http://ajayshahblog.blogspot.com/"$year"_"$month"_01_archive.html" links -dump "http://ajayshahblog.blogspot.com/"$year"_"$month"_01_archive.html" > $year$month.text done done
This works, but it's not a nice solution: (a) I'm wasting bandwidth and google's resources - and the waste will grow as the years go by - and (b) It doesn't get me the clean well organised file system with nice file names that ought to be possible.
This is a good guide ...
ReplyDeletehttp://www.fileslinger.com/blog/2007/01/blogger-backup-fileslinger-backup.html
And BTW, blogger does not store articles in directory structure as you think it does. Thats only a virtual representation of the articles stored in a flat database.
None the less, some of the tools in the article above should do what you want.
Cheers.
sir,
ReplyDeletethis might be useful
http://www.epicware.com/webgrabber.html
Sir,
ReplyDeleteFirefox has a wonderful add-on - DownThemAll. It allows one to download all the files from a particular blog directory [one-at-a-time] for e.g. 2007 / 2006 / 2005 / ..etc.
The good thing is we can chose the format of the file we wish to download from the site / blog. This could be [.pdfs], [.html], [.doc]....one can even enter a different format of file.
In case of a blog, the downloaded html pages will look exactly as they appear online, i.e. the RHS index, blog-owner's pic, etc.
Ravi, thanks for the pointer. But you know me: I don't like doing anything which requires interaction. That takes too much time. I want a 100% automatable solution that can be stuck into a crontab and then I can forget about it.
ReplyDeleteIf I interacted with software, I'd get a lot less done! :-)