Sync RSS to Safari Reading List

This page was originally posted on February 24, 2013. There is an update.

A lot of people seem concerned about Google Reader going the way of the dodo. Now I’m not worried about Google Reader per se—I don’t use most of its features—but I like the idea of not putting all my data into Google’s black box.1 That’s why I started to keep an eye out for Google Reader alternatives a few months ago.

None of the options, however, seem to do exactly what I want. This may be because my interaction with Google Reader is decidedly single-purpose: I log in, read the short items, add the longer ones to Safari’s reading list (which syncs to my iPhone and iPad), and leave. Safari has a Reading List function built in where I store all my temporary bookmarks, so I don’t want another reading list manager. Second, I don’t want to join another social network. Finally, I don’t want to simply move from one web service that may be gone tomorrow to another. All in all, the best option seems to be writing a script myself.

What I want is a script that will:

  1. Parse a list of RSS feeds and return links to the new items.
  2. Import those links into Safari’s reading list.

The second part is easy. Seven lines of AppleScript do the job:

set rssScript to "/usr/local/bin/python ~/Dropbox/bin/HitRSSFeeds.py"
set theItems to paragraphs of (do shell script rssScript)
tell application "Safari"
    repeat with theItem in theItems
        add reading list item theItem
    end repeat
end tell

For the first part I use this Python script, saved as ~/Dropbox/bin/HitRSSFeeds.py:

#! /usr/bin/python
#-*- coding: utf-8 -*-

conf_file = "/Users/ArjanBoerma/Dropbox/bin/HitRSSFeedsConf"

import feedparser as fp
import csv
import time

now = time.localtime()

def str_time(timestruct):
    return str(int(time.mktime(timestruct)))

def links_newer_than_date(items,date):
    sorted_items = sorted(items, key=lambda entry: entry["updated_parsed"], reverse=True)
    links = []
    for item in sorted_items:
        if item.updated_parsed > date:
            links.append(item.link)
        else:
            break
    return links

def links_up_to_link(items,link):
    links = []
    for item in items:
        if item.link != link:
            links.append(item.link)
        else:
            break
    return links

feeds = []
with open(conf_file,'r') as conf:
    for line in csv.reader(conf,delimiter="\t"):
        if line:
            feeds.append(line)

newconf = ""
for feed in feeds:
    content = fp.parse(feed[0])
    try:
        last_check = time.gmtime(int(feed[1]))
    except IndexError:
        last_check = time.gmtime(0)

    try:
        last_link = feed[2]
    except IndexError:
        last_link = ""

    if content["bozo"]:
        newconf += feed[0]+"\t"+str_time(last_check)+"\n"
    else:
        try:
            feed_updated = content["updated_parsed"]
        except KeyError:
            feed_updated = now
        if feed_updated > last_check:
            items = content["items"]
            try:
                links = links_newer_than_date(items,last_check)
            except KeyError:
                links = links_up_to_link(items,last_link)

            for link in links:
                print link
        else:
            links = []

        if links != []:
            last_link = links[0]

        newconf += feed[0]+"\t"+str_time(now)+"\t"+last_link+"\n"

with open(conf_file,'w') as conf: conf.write(newconf)

Thanks to the feedparser package2, I don’t have to handle the actual feeds myself. Lines 15 through 32 define two functions to filter the new content out. The preferred choice is links_newer_than_date(items,date), which returns the main URL for each item in items that was published or updated after date. Not every feed includes a date in their entries, so links_up_to_link(items,link)3 returns the main URL for each item in items until it encounters the URL link. I supply it with a URL that I’ve saved in a previous run of the script.4

The configuration file ~/Dropbox/bin/HitRSSFeeds.conf starts out as a simple text file with feed URL per line, for example:

http://news.yahoo.com/rss/
http://example.com/phoneyurl

When you run the script it appends5 the current time (in seconds since the dawn of time) and the URL of the most recent content item in the feed:

http://news.yahoo.com/rss/  1361747937 http://news.yahoo.com/south-carolina-disgraced-former-governor-seeks-resurrection-205305512.html
http://example.com/phoneyurl    -3600

The -3600 on the second line is a bug. Because I ignored time zones, the last_checked time for ‘bozo’ feeds (feeds that do not parse properly) is not saved properly. I’m currently on UTC+2, so each time I run the script two hours are subtracted from the last_checked time. This is probably easy to fix, but for me the last_checked time is never later than the current time, so this bug will not cause me to lose content. Take into account that dealing with time zones is very tedious and you can see why my current incentive to fix it is low.

Another issue is that the script hits every feed successively, so it is dreadfully slow if you query a large number of feeds. When I get around to fixing that, I’ll tackle the time zone bug as well.


  1. Of course I still have a bazillion e-mails in Gmail, so really ditching all Google products is not really an option. 

  2. If you don’t have feedparser already, an easy_install feedparser will remedy that. 

  3. I assume in links_up_to_link() that the newest RSS items are encountered first. This seems to be true for the one feed I want to check that doesn’t have entries with an updated property, but may not be true in general. I must admit that I have no idea how to solve this, except caching every item we ever parse ever to check whether they are new, and that seems overkill. 

  4. To be precise, the config file gets overwritten.