Today, documents are usually prepared electronically using a word processor such as Word or OpenOffice. Such programs allow their users to make good-looking documents easily and quickly. However, there are problems associated with the multitude of different formats and programs used to produce documents. For instance:
- users can become locked into specific file formats;
- conversion to other formats becomes more difficult with time;
- the storage media used can quickly go out of fashion making document retrieval an expensive specialist service;
- formats change over time and vendors might not provide conversion mechanisms from old to new formats.
OUCS has adopted an open, vendor independent format approach to maintain our documentation in an accessible and interchangeable format. Our system uses XML or eXtensible Markup Language to store documents. XML allows the user to develop their own rules to code up their documents. However, there are already many different versions of XML rules available so we do not need to develop anything new for OUCS. Our system uses a modified version of the Text Encoding Initiative (TEI) XML for writing documentation.
The Text Encoding Initiative (TEI) Guidelines are an international and interdisciplinary standard that enables libraries, museums, publishers, and individual scholars to represent a variety of literary and linguistic texts for online research, teaching, and preservation.
The TEI standard is maintained by a consortium of leading institutions and projects worldwide; Oxford is one of these institutions. Two of the major players in the TEI are members of OUCS: Lou Burnard and Sebastian Rahtz. Lou joined the Text Encoding Initiative project as its European Editor back in 1989 (a post he still holds), while Sebastian is one of the consortium's directors and actively develops the TEI itself.
Since 2002 it has been law to provide documents (including web pages) in accessible formats to users of alternative technologies such as screen readers. The relevant legislation is the Special Educational Needs and Disability Act (SENDA) 2001 which is part 4 of the Disability discrimination Act (DDA). This act brought Education establishments into line with commercial providers in the way that they provide information and services to the disabled community.
The W3C organisation have created various standards for web accessibility. These are:
The following document includes details on how to make your XML documents accessible to as wide an audience as possible. Please make sure that you follow these accessibility guidelines - it's the LAW!
- An XML schema, derived from the Text Encoding Initiative, located at http://www.oucs.ox.ac.uk/schemas/tei-oucs.rng
- A set of XSLT stylesheets, which can transform document instances to HTML pages; see http://www.tei-c.org/Tools/Stylesheets/
- A set of XSLT stylesheets, which can transform document instances into PDF for printing; see http://www.tei-c.org/Tools/Stylesheets/
- CSS stylesheets for displaying the XML files directly (http://www.oucs.ox.ac.uk/schemas/tei-oucs.css, which can also be used with some editors), and for enhancing the HTML versions (http://www.oucs.ox.ac.uk/stylesheet/oucs/oucs.css)
- The XML text document
- The change management system where all the material is stored
The rules of the TEI XML format are stored in a schema (we use the RELAXNG schema language) file. This file defines the structure of how XML is to be written and is the key to transforming the text from one format to another. In order to write a valid TEI XML document the schema has to be followed. Luckily there are many XML editors that look after the schema for you and show any errors when the document is tested against the schema.
An XSLT Stylesheet or Extensible Stylesheet Language Transformation Stylesheet is basically a set of rules to process a XML document. It turns an XML rendition of a file into the final version of a file. OUCS uses two versions of XSLT files, one turns an XML file into a web page (HTML format), the other turns XML into PDF format for printing.
CSS or Cascading Style Sheets are files containing information on how a document is to be presented e.g. bold, red headings or grey backgrounds. There are two versions used by OUCS: one displays the XML file directly and is fairly simple; the other displays the final web page and is fairly complex.
- Obtain a suitable XML editor
- We recommend the cross-platform oXygen editor, for which we have a site licence; see the document on How to use oXygen at OUCS
- Obtain a Subversion client
- We recommend Syncro SVN, the client which comes with oXygen. Details are given in the document Using the Syncro Subversion client
- Obtain a Subversion account
- Accounts can be setup by visiting https://svn.oucs.ox.ac.uk/admin/useradmin/
- Write your document!
- This part is up to you! If you are unsure how to start, open an XML file and save
under a different name. After removing the original content of the file, you can now use
this file and add your own content as necessary.
Before submitting your file to Subversion, you should check your document's syntax. Most XML editors have facilities to check the validity of your document against your schema. Make any corrections necessary before submitting the file to the main Subversion repository. Also bear in mind that your document should be fully accessible and SENDA compliant.
- Elements and Tags
- XML documents have lots of elements, one example is the title element. This begins with a start-tag <title> and is closed by the end-tag </title>. Any text between the start and end tags is therefore defined as the title of the document. Most XML tags work in this way: a start tag, some text, followed by an end tag. There are some elements that are self closing (i.e. they have no end tag); where appropriate these will be highlighted later in this document.
- Content and Data
- Any text between tags is the content of the element. This can be of two forms: the actual information or data; and other elements. Where the two occur together this is termed mixed content.
All elements can have additional properties beside the element name and content. These properties are the attributes of an element and they consist of name-value pairs. For example a <div> element can have the attribute
id="xxx", where xxx represents a name or number. In the example below, the
idis 'email':<div id="email"> <head>Configuring your email client</head> <p> text....</p></div>
- XML structure and nesting tags
XML is very strict on its element structure, especially compared to HTML. In XML, tags usually have to be started and ended. They must be nested properly and used in the correct place within the document hierarchy. This generally means that you cannot open a new tag e.g. <p> without closing the previous <p> tag. (N.B. there are exceptions to this rule e.g. self-closing tags).
First comes the declaration that the file is a TEI document <TEI.2>. This is effectively the start tag for the document, all other elements must be correctly arranged or nested inside the <TEI.2> tags for the document to be valid TEI XML.
The first element inside <TEI.2> is the <teiHeader> element. Everything
within this element is part of the document's Metadata (Metadata is data
about the document, e.g. its title, author, creation date etc.). OUCS documents have a
number of fields in the <teiHeader>; some have to be manually completed, such as
the title of the document, while others are automatically added on document submission
Last changed by information. Usually, when writing your own documents,
you should complete the following metadata elements:
- contains any prefatory matter (headers, title page, prefaces, dedications, etc.) found before the start of a text proper.
- contains the whole body of a single unitary text, excluding any front or back matter.
- contains any appendices, etc., following the main part of a text.
Like HTML, XML relies on elements to code up the document. If you are familiar with coding HTML files the transition to XML should be fairly painless. OUCS XML has many elements available for use, although in any one document only a subset of these will ever be applied. In this section we discuss the elements making up the body of a text.
Your text may be just a series of paragraphs, or these paragraphs may be grouped together into chapters, sections, subsections, etc. In the former case, each paragraph is embedded inside a the <p> element. In the latter case, the <body> may be divided into a series of <div> elements, which may be further subdivided. An example of div structure is shown below:
Sectioning your document has important effects on the OUCS web site. Each div used is processed when the document is converted into html. Major divisions are treated as separate web pages and help to form the basis of the internal page navigation system. Each division is also sequentially numbered: 1, 2, 3 ... Where a div section is within another div, it is treated as a subsection and numbered accordingly e.g. 2.1, 2.2, 2.3....
Sectioning documents also influences the HTML output to browsers. The title of a document is always given the <h1> tag, major divisions are thus given the <h2> tag and minor section divisions are given <h3>, <h4>, <h5> etc. depending on how deep they are nested within the document.
Correct structural markup for documentation is important for accessibility. When documents are marked up in a structured way, they allow users of alternative technologies to discover the main sections and subsections more quickly and more easily. The structure allows users to jump from one section to another, without the need to read all of the information on the page. Documents that do not use structured markup pose a problem (to screen reader users in particular), as it is very difficult to find out what is on a page without reading all of the text. Where structural markup has not been used, the author has often employed styles (bold, italic, etc.) to indicate different sections and headings. While obvious to sighted readers, the structure is lost to screen reader users who must read the page to find out if it is of interest to them.
- This indicates the conventional name for this category of text division. Its value might be something like ‘Preface’.
- This specifies a unique identifier for the division, which may be used for cross
references or other links to it, such as a commentary. It is often useful to provide an
idattribute for every major structural unit in a text, and to derive the
idvalues in some systematic way, for example by appending a section number to a short code for the title of the work in question.
nattribute specifies a mnemonic short name or number for the division, which can be used to identify it in preference to the
id. If a conventional form of reference or abbreviation for the parts of a work already exists (such as the book/chapter/verse pattern of Biblical citations), the
nattribute is the place to record it.
n, indeed, are so widely useful that they are allowed on any element in any TEI schema: they are global attributes.
Highlighted words or phrases are those made visibly different from the rest of the text, typically by a change of type font, handwriting style, or ink color, intended to draw the reader's attention to them.
- marks words or phrases which are stressed or emphasized for linguistic or rhetorical effect
- contains a single-word, multi-word or symbolic designation which is regarded as a technical term
- An SGML, XML or HTML element name
- A button which a user can see
- Some sort of computer language code
- The name of a command
- A labelled input field
- A file or directory specification of any kind
- an icon in a GUI
- Text for a user to type
- A key to press
- A keyword in some technical code the user is being asked to write
- The label for a button, radio box, etc.
- The text of a link which is being described
- A menu item
- What comes back when you give a command
- A simple program listing
- A prompt from the computer
- A prettified display of text screenshot
- The name of a program
- A possible value for some option
Explicit cross references or links from one point to another in a text in the same XML document may be encoded using the elements described in section 4.4.1. Simple Cross References. References or links to elements of some other XML document, or to parts of non-XML documents, may be encoded using the TEI extended pointers described in section 4.4.2. Extended Pointers.
Accessibility of your links is important. The text you use can either enhance a user's
understanding of where the link will lead, or leave them clueless. The worst phrase you
can use for a link is
Click Here or simply
Here: in both
instances the user is left with no clear idea of where the link could lead. This problem
is compounded for a screen reader user: they can get lists of all links from any given
page, but if the author of the page has just said
Click Here or
Here, they will get a list consisting of just that. The user will be left
stranded on the page with no clear way to move forwards in their search for
An accessible link is one that conveys both where the link will go and the information
the user is likely to find. By default our system will add a
attribute to any link you make on your page when it is transformed into HTML. However,
while this is good practice and a nice failsafe measure, it will only add the same text
as the link text. This might be adequate in some circumstances, but to make your links
more accessible you should add your own additional text using the
attribute. People browsing with modern visual browsers will see your additional link
information when they mouse over your link, and screen reader users will have more
information about where the link will take them as the
attribute is read out to them.
The difference between these two elements is that <ptr> is an empty element, simply marking a point from which a link is to be made, whereas <ref> may contain some text as well --- typically the text of the cross-reference itself. The <ptr> element would be used for a cross reference which is indicated by a symbol or icon, or in an electronic text by a button.
targetattribute must be present in the current XML document. This implies that the passage or phrase being pointed at must bear an identifier, and must therefore be tagged as an element of some kind. In the following example, the cross reference is to a <div> element:
idattribute is global (i.e. can be used on any element), which means all elements in a document can be pointed to in this way. In the following example, a paragraph has been given an identifier so that it may be pointed at:
Sometimes the target of a cross reference does not correspond with any particular feature of a text, and so may not be tagged as an element of some kind. If the desired target is simply a point in the current document, the easiest way to mark it is by introducing an <anchor> element at the appropriate spot.
In addition to the attributes already discussed in section 4.4.1. Simple Cross References above, these elements share the following additional attribute, which is used to specify the target of the cross reference or link:
- contains any sequence of items organized as a list. Attributes include:
- describes the form of the list. This attribute can have the following values:
- describes how the labels should appear. The rend attribute can have the
no-bullets(for producing unordered lists with no bullet points)
lower-alpha(for producing ordered lists with labels a, b, c, ...)
upper-alpha(for producing ordered lists with labels A, B, C, ...)
lower-roman(for producing ordered lists with labels i, ii, iii, ...)
upper-roman(for producing ordered lists with labels I, II, III, ...)
- contains one component of a list.
- contains the label associated with an item in a list; in glossaries, marks the term being defined.
Individual list items are tagged with <item>. The first <item> may
optionally be preceded by a <head>, which gives a heading for the list. The
numbering of a list may be omitted (if reconstructible), indicated using the
n attribute on each item, or (rarely) tagged as content using the
<label> element. In order to achieve the same result with different browsers,
the value of
n should be greater than 0.
An unordered list
An ordered list
An ordered list with controlled numbering
An ordered list with letters for labels
An ordered list with controlled lettering
A glossary list
- contains text displayed in tabular form, in rows and columns.
- contains one row of a table. Attributes include:
- contains one cell of a table. Attributes include:
- indicates the kind of information held in the cell. This attribute should
have the value
labelfor labels or descriptive information, and
datafor actual data values. If omitted, it defaults to data.
- indicates the number of columns occupied by this cell. If omitted, it defaults to 1.
- indicates the number of rows occupied by this cell. If omitted, it defaults to 1.
Caution is advised when using tables as it is very easy to make them inaccessible to users of alternative technologies e.g. screen readers. It is your responsibility to make sure that any table used is comprehensible when it is linearised and that it contains suitable accessibility attributes.
Screen readers linearise tables when they are reading the content out to the user. This means that if you have failed to take this into account when designing your table, the screen reader user will not understand the content of your table. To check to see how your table will be read out, go to http://wave.webaim.org/. Run your page containing the table through this online checker. It will show you how the table will be read to a screen reader user.
All tables should be given the summary attribute regardless of whether they are for data or page layout. For data tables a short summary of the table content must be added for accessibility. Where a table is used for layout, the summary attribute is included, but left empty.
If a <table> element has a
rend attribute with the value
the table will be rendered with the cells of the first column sorted
and with buttons on each column that enable the person viewing the page
to sort the table on another column.
There are two ways in which the use of tablesorter can be customised. You will also find the documentation for tablesorter useful.
has the following definition for the template
In the XSL for the micro site, you define a template that overrides this.
Not all the components of a document are necessarily textual. The most straight forward text will often contain diagrams or illustrations, to say nothing of documents in which image and text are inextricably intertwined, or electronic resources in which the two are complementary. This poses accessibility issues for users who cannot see the images. What are they? Are they important to the text, or just page decoration? Is the image a graph or simple picture? Has the author provided extra information about the graphic for those that cannot see it? If you do not provide alternative text for graphics or other accessibiity features in the page coding, the page will be inaccessible to some visitors.
- marks the spot at which a graphic is to be inserted in a document. Attributes
- The location and file name of a graphic.
- The width to which the graphic should be scaled. If omitted, it defaults to the width of the graphic.
- The height to which the graphic should be scaled. If omitted, it defaults to the height of the graphic.
- The extent which the graphic should be scaled (eg
0.5). If omitted, it defaults to 1.
- contains a textual description of the appearance or content of a graphic, essential for accessible graphics.
Usually, a graphic will have at the least an identifying title, which should be encoded
using the <head> element. Images which are given a head tag have this text
automatically converted to a figure caption and are numbered sequentially throughout the
document. It is also essential to include a brief description of the image using
<figDesc>. If the image is difficult to describe in just a few words, you
should provide an alternative page where a full account of the image can be given to the
user: this extra information should be provided via a [d] link. These are
normal url links to normal web pages. By convention the [
d] link should be
provided next to the image in question; users needing greater detail about a given image
will click on the [
d] link for more information.
If the image is for decoration only (very rare on OUCS pages), it is still necessary to include the <figDesc> element in your document, but in this case it should be left blank. By convention the image is then considered just page decoration and unimportant to the reader.
If you want to control the way text flows around an image, use a
rend value, as described in the Rends section.
A newsfeed can be displayed by putting a
inside a <p> element.
The url attribute has the URL of the newsfeed.
Our XSL can cope with newsfeeds written in RSS 2.0, RSS 1.0 and Atom 1.0.
Gotcha: the web page will not change when new items get added to the feed unless you arrange for your page not to be cached by AxKit. Please contact email@example.com to get this done.
Here the rend attribute has a component that starts with
This is followed by some notation (e.g.,
that indicates how you want the date formatted.
It uses the same notation that is used by
PHP for its date function
with the addition of one character:
_ means generate a space.
If you want some HTML elements to appear in the <head> element of the HTML that gets generated, you should put these elements between the <fileDesc> and the <revisionDesc> elements (that appear in the <teiHeader>).
It is possible to provide a form (in a TEI file) that collects some data from a user and sends that data to someone in an e-mail message. There are details about this in a document on FormMail.
Accessibility of our documentation is paramount to ensure documents are accessible to all readers and for OUCS to stay on the correct side of the law. It is necessary for all OUCS authors to familiarise themselves with the ways and means to make their documents as accessible as possible.
- do not make links with the text
click here, make links that mean something out of context of the sentence they are in. Similarly do not use the same titles for lots of different links on a page when they actually point to different places.
- When using graphics always provide the <figDesc> element. If necessary go
the extra step and make a [
d] link for longer explanations of figures
- When using tables, make sure they are comprehensible when they are linearised.
Always include the
summaryattribute regardless of whether the table is for layout or data. The latter requires you to give some details of the table's content.
- When you have finished making a web page, you can check its accessibility using online services such as those found in the Complete List of Web Accessibility Evaluation Tools (compiled by the Web Accessibility Initiative (WAI)).
- indicates the location at which a textual division generated automatically by a text-processing application is to appear. Attributes include:
When an index or table of contents is to be encoded (rather than one being generated) for some reason, the <list> element discussed in section 4.6. Lists should be used.
Rend values can be used to define how an element is rendered on the webpage, for example aligning items to the left or right of a page, allowing text to flow around images or stating that a bit of text should be in italics or red. Some of the more common rends available for use with the OUCS webpages are listed below.
If there is a particular style you need on your pages that is not currently available, please contact firstname.lastname@example.org for help.
- Rends for use with images: <figure rend="xxx">
- places a border around the image.
- places then image in the middle of the line. Text breaks and runs above and below.
- image appears with the text on either side. Spacing for the line is decided by image height (which means there is space above the text if the image is higher than the text row)
- floats image to the left in your running text. Text does not break for image but continues to the right of (and under) it.
- floats image to the right in your running text. Text does not break for image but continues to the left of (and under) it.
- adds space around an image. Text runs above and below image+ space.
- image aligns to the left. Text breaks and continues to the right of image
- Rends for use with tables: <table rend="xxx">, <row rend="xxx">, <cell rend="xxx">
- used for cell or row to show they contain labels
rather than data.
Note: <row rend=”label”> makes the background light blue whereas <row role=”label"> makes the background grey and the text white and centred
- to make text in a row centred in each cell
- used for cell to add background colour
- used for cell to add background colour
- used for cell to add background colour
- used for cell to add background colour
- Rends for use with lists: <list rend="xxx">
- for ordered list. List items numbered a, b, c etc
- for ordered list. List items numbered i, ii, iii etc
- for ordered list. List items numbered A, B, C etc
- for ordered list. List items numbered I, II, III, IV etc
- List items appear without bullets before them
- Rends for use with a block of text: <p rend="xxx"> or <div rend="xxx>
- Rends to use with highlighted text: <hi rend="xxx">
- Other rends
|"||"||double quotation mark|
Any other characters which are not on your keyboard can either be entered as numeric
entities (see, eg, http://www.tedmontgomery.com/tutorial/HTMLchrc.html) or
using UTF-8. How you enter UTF-8 on your keyboard depends on your application or operating
system. oXygen, for example, has a facility
[Edit/Enter from Character Map] to
let you enter characters which are not on the keyboard.