Full-Text RSS | EchoDitto Labs

Partial-text RSS feeds are a pet peeve of mine. I’m not alone: I’ve read about Dave Winer and Steve Rubel’s dislike of the practice. I’m sure there are a lot of other RSS users who are similarly irked by it.

So, after having a post-workout algorithmic epiphany (it’s the best time for them), I started work on a little project to fix this annoyance — and ended up quite pleased with the result. You might find it useful, too: it’s a little script that creates full-text RSS feeds from partial feeds. Just enter the URL of a partial feed in the box below and hit submit. You’ll be directed to a URL that will (hopefully) provide a full-text version of the feed you specified.

RSS Feed URL

I’ve been through a few different versions of the algorithm, but this one seems to be fairly universal and stable. It won’t work for every partial-text feed, but it seems to work for a lot of them. I’m sure it could be better, which tempts me to open source the algorithm and invite people to improve upon it. But I won’t — not yet, anyway.

I’m sensitive to the pressures that make bloggers use partial text feeds — some of my friends depend on selling advertising to support their sites. Unfortunately, RSS simply isn’t respected by marketers and their clients. Offering a full text feed means fewer page views, which means less revenue — I’ve been told this bluntly by a friend who wanted to offer full text, did so, then noticed his revenues were shrinking. It’s hard to fault him for returning to partial-text feeds.

But this situation isn’t a problem with RSS; it’s a problem with the ad industry. It’s long past time for people to realize that if they give content away on the web they’ll be unable to control how others choose to consume it. Inconveniencing users is not an acceptable solution to advertisers’ inability to adopt new metrics.

Still, I wouldn’t want to offer a feature that middlemen can resell at the expense of bloggers. So while I do want to open this up, I don’t want to make things easy for the unscrupulous. This feature does need to pass out of my hands — its proper place is in the RSS reader, both for performance reasons and in order to eliminate one class of countermeasures that bloggers could take. Maybe I’ll try my hand at adapting the code for Vienna.

A few technical notes: depending on the site, some entries may come back with comments or other cruft attached. Fellow geeks can trim those off by specifying URL-encoded regexes, passed in the querystring as parameters regex0 – regex9 (note that an outstanding issue with PHP magic quotes means that the + character doesn’t work; use {1,} instead). I’d encourage users who create regexes for feeds to share them by tagging the URL with “fulltextrss” on del.icio.us. There are already a few examples available here.

Finally, please note that the service employs PEAR’s function caching on a 15 minute timeout. If the results you’re getting aren’t up-to-date, just be patient (or alter one of the regex parameters).

via Full-Text RSS | EchoDitto Labs.