1. Introduction

This web page looks at various formats that are often used for newsfeeds. The formats that are considered are: RSS 1.0, RSS 2.0, Atom 0.3 and Atom 1.0.

This document also looks at OPML. This is a format that can be used to describe a collection of newsfeeds.

2. A Validator

There is a validator that can be used to check the validity of a newsfeed that allegedly conforms to one of these specifications. This validator is at http://feedvalidator.org/

I have been using this validator to check out the validity of the RSS 1.0, RSS 2.0, Atom 0.3 and Atom 1.0 newsfeeds produced by OXITEMS (a newsfeed generation system that is used by departments and colleges at the University of Oxford).

If you attempt to validate an Atom 0.3 newsfeed using this validator, it will report the newsfeed is an obsolete version. It says that early adopters of the Atom format should upgrade their feed to the latest version of the specification.

3. RSS 1.0

The RSS 1.0 specification is at http://web.resource.org/rss/1.0/spec

The description element (of an item) is described as (#PCDATA). Consequently, there is no explicit provision for including HTML markup in a newsfeed.

Extensions to RSS 1.0 is done through the use of modules. Here is a link to a page containing a list of proposed modules. One such module is the module for events.

4. RSS 2.0

Although the numbering may imply that RSS 2.0 is a more up-to-date version of RSS 1.0, this is not the case: there has been an unfortunate forking in the development of RSS and an unwanted confusion re numbering. So RSS 1.0 and RSS 2.0 are alternative and rival newsfeed formats.

Athough there is a version of the RSS 2.0 specification at http://blogs.law.harvard.edu/tech/rss, since July 2003, the RSS Advisory Board has been looking after revisions. The latest revision of the RSS 2.0 specification is at http://www.rssboard.org/rss-specification

The specification does not seem to specify what the description element can contain. Some people say that it can contain HTML; whereas others say that it definitely cannot.

For example, a web page at Mozilla (http://developer.mozilla.org/en/docs/RSS:Article:Why_RSS_Content_Module_is_Popular_-_Including_HTML_Contents) says Do not put anything but plain text into the RSS <description> element. Although it has become common practice to abuse the RSS <description> element and put non-plain text data in it. It is not actually allowed.

However, some people do put HTML into the description element.

When using RSS 2.0, you are allowed to use elements from other namespaces. So, other people use a content:encoded element instead of a description element for providing HTML. Some people even provide both with HTML in both.

YAHOO use RSS 2.0. In the description element, they put HTML inside a CDATA section. An example is http://rss.news.yahoo.com/rss/sports. This method is also adopted by other people.

The Mozilla web page cited above says that CDATA sections reduce the bloat. However, the <description> is NOT suppose to be used for any of this. It is only suppose to be used to include plain text. They say that HTML should be put in a content:encoded element.

The BBC uses a mix of RSS 2.0, RSS 1.0 and RSS 0.91. For their RSS 2.0 feeds, they just put a single sentence (using no HTML) in the description element. An example is http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/front_page/rss.xml

Date elements such as pubDate and lastBuildDate require use of the RFC 822 format for dates. This format is not easy to parse, and in my opinion, it is preferable to supply dates in the ISO 8601 format. Some people use a dc:date element instead of a pubDate element. This requires the use of ISO 8601.

RSS 2.0's author element requires an e-mail address. Because I do not wish to distribute e-mail addresses in newsfeeds, I'm currently using:

undisclosed_email_address@ox.ac.uk (Fred Bloggs)

This format is valid according to RFC 2822 but it is regarded as a legacy format by RFC 2822. However, it is the format illustrated in an example in the RSS 2.0 specification. Maybe I should try:

Fred Bloggs <undisclosed_email_address@ox.ac.uk>

which is the form preferred by RFC 2822 but which is not mentioned in the RSS 2.0 specification.

Other people omit the author element and use a dc:creator element instead. I tried using both, and the validator moaned that this was not allowed.

If you want to use RSS 2.0, I would suggest not using pubDate, description and author, and using dc:date, content:encoded and dc:creator instead. This is valid RSS 2.0 but this use of elements from other namespaces for the crucial elements suggests the RSS 2.0 format is inadequate. Although the use of these other namespaces in this way is quite common, I'm unclear as to how many newsfeed readers support the use of these other namespaces.

As mentioned above, RSS 2.0 permits elements from other namespaces. So the RSS 1.0 modules mentioned earlier can also be used in RSS 2.0. Here, again, is a link to a page containing a list of proposed RSS 1.0 modules. One such module is the module for events.

The additional elements used by Apple for iTunes are documented at http://www.apple.com/itunes/podcasts/specs.html

iTunes U uses an additional element, an itunesu:category element.

5. Atom 0.3

The Atom 0.3 specification is at http://www.mnot.net/drafts/draft-nottingham-atom-format-02.html

Note that with the release of Atom 1.0 during Summer 2005, Atom 0.3 is deprecated. The above web page says the Atom 0.3 specification is made available for historical purposes only. It continues by saying DO NOT implement it or ship products conforming to it.

The Atom 0.3 specification says that content elements MAY have a "type" attribute, whose value indicates the media type of the content and if this is not present, its value MUST be considered to be "text/plain". The specification also says that content elements MAY have a "mode" attribute, whose value indicates the method used to encode the content. This can be one of the values "xml", "escaped" and "base64". If [it is] not present, its value MUST be considered to be "xml".

Typically, people have content elements like:

<content type="text/html" mode="escaped"> ... </content>

And inside the content element, people:

(a) either put HTML inside a CDATA section, as shown at http://bluebillinc.com/feed/atom/

(b) or they provide encoded HTML writing <p> as & lt;p& gt; as shown at http://ramble.oucs.ox.ac.uk/blog/stuart/atom.xml.

6. Atom 1.0

The Atom 1.0 specification is at http://atompub.org/rfc4287.html

Atom 1.0 was released during Summer 2005. There is a useful summary of the changes between Atom 0.3 and Atom 1.0 at http://rakaz.nl/2005/07/moving-from-atom-03-to-10.html

Atom 1.0 has title, summary, content and link elements. The content element has an associated type element which can be "text", "html" or "xhtml" or any mime type.

A good overview of Atom 1.0 is: http://www-128.ibm.com/developerworks/xml/library/x-atom10.html

7. Support by newsfeed readers

Firefox 1.5 has support for live bookmarks for all of the above four formats. The previous release (Firefox 1.0.7) had no support for Atom 1.0. Plans for Firefox 2.0 and 3.0 are outlined in http://wiki.mozilla.org/Feed_Handling

With Thunderbird 1.5, it is possible to read RSS newsfeeds and Atom 1.0 newsfeeds. The previous release, Thunderbird 1.07, can only read RSS newsfeeds.

Although Internet Explorer 6 has no support for RSS and Atom newsfeeds, IE 7 has support for (at least) RSS 1.0, RSS 2.0, Atom 0.3 and Atom 1.0.

Although Opera 8 has support for RSS and Atom 0.3, it has no support for Atom 1.0. Support for Atom 1.0 is available in the latest development release of Opera (Opera 9.0 Preview 1, released on October 20th 2005). I have done some experiments using Opera 9.0 Preview 1.

At the University of Oxford, we use WebLearn (an implementation of Bodington) as our VLE, and this implementation has support for (at least) RSS 1.0, RSS 2.0, Atom 0.3 and Atom 1.0.

8. OPML

Wikipedia describes OPML as follows: OPML (Outline Processor Markup Language) is an XML format for outlines (often blogrolls). Originally developed by Radio UserLand as a native file format for an outliner application, it has since been adopted for other uses, the most common being to exchange lists of web feeds between web feed aggregators.

The specification of OPML 2 is given at http://dev.opml.org/spec2.html. There is a specific section of this specification entitled Subscription lists that describes how to provide an OPML file where each item describes a newsfeed.

A validator for OPML is available at http://validator.opml.org/.