The Leap Blog: Some suggestions for the guys building RSS feeds and feedreaders

I have been a happy user of RSS from two points of view: as a consumer of (97) RSS feeds through bloglines.com, and as the writer of a blog. I think this is the biggest advance in the idea of the World Wide Web after NCSA Mosaic. In this post, I have a few ideas for the guys building this stuff.

The only feedreader that I have used is bloglines.com, so please pardon a lack of knowledge of what other authors of feedreaders have been upto.

Scanning an RSS feed is a great advance when compared with scanning a list of websites. It saves time for the user because the feedreader tracks what I have seen versus what I've not seen. This is efficient when compared with landing up (say) at a newspaper web page and diffing against the last memory of what was on that page in the human mind. So RSS was a step forward in protecting the human mind from information overload and reducing the amount of information processing that the user has to do. But this mentality needs to be carried further.

The `no news' mentality

In the good old days, before the web, there was net news. The framework of newsreading was one where one would subscribe to a newsgroup like sci.math.stat (think of it as an RSS feed) and scan whatever entries appeared there. In the 1990-1995 period, I used a great newsreader called `nn' (for "no news" (is good news)) which was focused on reducing the material that got shown to you, in order to save your time.

My main point in this post is that I think it is time to apply that same mentality in RSS feedreading. Users are inundated with information overload, with too many feeds. The name of the game now should be to reduce the amount of information that's given to the human brain for processing. I have a few tangible suggestions of this nature, in order of importance.

1. Kill files

With nn, it was possible to write down regular expressions describing the entries one did not want to read.

I think that would fit nicely in an RSS feedreader. It would be great if there was a convenient way to make a big table of regular expressions about the entries that I do not want to be shown. There would need to be two cases: apply this regex to this RSS feed (where I don't want to hear from one particular RSS feed on one particular subject) or apply this regex to all feeds (where I don't want to hear anything about this subject from any source).

2. Deletion of dupes

Many newspaper websites exhibit multiple RSS feeds. Many times a given story appears in multiple feeds. The feedreader should prune these.

In the event that the two entries are identical, an implementation based on hashing is easy. But ideally one needs to go beyond an identical match to some kind of approximate matching: please compare many a New York Times story which shows up on the International Herald Tribune RSS feed. I don't have a grip on exactly how to go about it. I believe some hashing algorithms are robust to small differences in the input - e.g. the stuff that's going into the problem of music recognition.

Update: I just noticed that bloglines has a new feature. Entries are normally blue, but if you've clicked on a particular entry, other occurences of this entry are shown in black. I'd say this is nice, but why do you want to burden my mind with having to even parse these dupes and remember to ignore them if the colour is black?

3. Search to RSS service

Google has embarked on something interesting by letting you take any search on news.google.com and view it as an RSS feed. That's nice, but they have not carried this through, because every time that URL is accessed, I get the full list of matches afresh. This throws up a lot of repetitive entries (which I've seen already) across multiple interactions with the feedreader within a day. What is needed is a way to keep track of me, know when I last ran the google search, remove the material which matched on that search, and pack up the new material that's come up for the search into RSS format. Maybe this is done if you use google's feedreader?

4. Flow control (a sanity check)

Sometimes, people put a megabyte file into an RSS file. This is a huge pain. The RSS feedreader needs to have a sanity check of blocking entries in an RSS file bigger than (say) 65,536 bytes.

5. Detecting and deleting defunct feeds

Quite a few entries in the long list of feeds that I think I am reading are actually defunct. Someone thought an RSS feed would be published at this URL but never quite followed up in producing the feed. The feedreader should provide a service where I am alerted to feeds where no content shows up in the last (say) 90 days. That would help me to delete feeds and endup with a smaller .opml file. It would also reduce the psychological discomfort that I suffer when I think that I'm reading 97 feeds.

The Leap Blog

Sunday, October 01, 2006

Some suggestions for the guys building RSS feeds and feedreaders