Oxford University

OUCS Annual Report

3. Network-based Services


Apart from the basic functions of the network described in Section 2 above, a wide variety of key infrastructural network services are also provided by OUCS. These are used by all sectors of the University, some consciously but others unconsciously in the background. These services include:

  1. Domain Name Server (DNS)
  2. Email relay server
  3. Network time server
  4. Dynamic Host Configuration Protocol (DHCP) service
  5. Web cache (discussed above)
  6. Web server
  7. Email POP and IMAP servers
  8. Network News server
  9. Mailing List server
  10. File backup and archive server (Hierarchical File Store, HFS)
  11. Windows Internet name server
  12. Novell Directory Services (NDS) tree server

Most are based on PC equipment running Linux, with a few using Sun equipment running Solaris. The choice is determined by the requirements of the application software, though the PC/Linux solution is preferred (for cost and supplier-independence reasons) where feasible. Typically, all require and exhibit very high reliability.

In addition to the above computer-based services, other key network services provided more directly by staff include:

  1. Oxford Emergency Computer Response Team (OxCERT)
  2. Ethernet Maintenance service

Details of all these services can be found on the OUCS Web pages at http://www.oucs.ox.ac.uk/. Several of these services are discussed in more detail below.

3.1 Email Services

Email Relay

The Email Relay service handles all the University's incoming and outgoing email (with the exception of a couple of departments who still operate independently). It also handles all inter-system email within the University. It directs email to the appropriate email sever, performs address checks and rewrites addresses to the standard form (where requested by the relevant department to do so), handles distribution of multiple-recipient email, "spools" email against non-responding recipient systems, etc. The number of messages handled during the year approached 100,000/day on average [figure 21], with the volume of traffic amounting to 1,400Mbyte/day on average [figure 22]. Both the volume of messages and the size of each message continue to rise inexorably. A significant proportion of email cannot be delivered immediately, and must be stored until systems become willing to receive them [figure 23].

Figure 21 Figure 22 Figure 23

Herald Email Server

Ermine and Sable continue to operate as POP and IMAP email servers in addition to providing general-purpose Unix services, which adds substantially to the numbers of processes they undertake (see below). In addition, OUCS started to offer a dedicated email server, Herald, in August 1998. This service is accessible by IMAP, POP and Web protocols, and was launched with the pre-registration of 4,000 incoming undergraduates. The new software providing WWW access had a few teething problems which were ironed out within the first few weeks. The basic service (delivering and holding email and providing access to it via IMAP) has had no problems and there have been no outages or unavailability either scheduled or unscheduled for the entire year.

The WING Web service, apart from having a few hour-long outages in the first few weeks of service, and one or two similar ones in the rest of the year, has had no problems or unavailability either. The WING outages were tracked down to a software problem with WING which has since been fixed.

By July 99, there were 4,260 users of Herald. During term time, Herald handled approximately 100,000 incoming emails per week, 40,000 outgoing emails via the WING Web interface, 150,000 IMAP sessions (45,000 of which were WING sessions) and 60,000 POP sessions.

Mailing List Service

The Mailing List service was initially provided by Ermine but was transferred to a dedicated PC running Linux. It uses the public domain Majordomo software, and managers over 700 lists. Support and encouragement for the national Mailbase service (which in total handles 2,300 mailing lists) is also provided.

3.2 News Service

The News service is provided by the server news.ox.ac.uk In August 98 the server hardware was upgraded to a Sun Ultra5, with 54GB of disk space. A new version of the INN server software was installed on this system. This proved to be rather buggy, and a stable version (INN 2.2) could not be mounted until November.

The News feed is received from the JANET Usenet News service (http://www.ja.net/usenet/). The primary server is at Rutherford and the secondary one is at ULCC. This service has been extremely reliable. The amount of News received has stayed relatively constant, with an average of 166,000 messages being received per day, representing approximately 500Mbyte in volume [figure 24].

Figure 24

The average number of client connections to the server was 5,123 per day, but peaked at over 8,800 per day during term [figure 25], with the number of messages being read gradually declining over the years [figure 26]. At the end of July 1999 the server was carrying about 5,000 Newsgroups in 50 different hierarchies.

Figure 25 Figure 26

This system continues to provide a News feed to other sites, with NAG and OUP being the main recipients.

3.3 Web Server

The Web server holds the top-level University Web pages, all of OUCS's 14,000 pages, and pages from many departments and units as well as those of many individuals. The number of Web page requests received per week varied from 2 million during Term to 1 million outside Term. It was acting as a virtual Web server on over 90 IP addresses, mostly on behalf of departments and colleges for addresses of the form "www.department.ox.ac.uk". Nearly 3,000 users are making use of their personal web space. The main documentation tree of this server has over 400 MBytes of material stored in it, with users' material adding a further 4,850 Mbytes, and more or less official pages adding another 1,500 MBytes. Of the 400 MBytes of main documentation, the OUCS part is 340 MB which is made up of 10,000 web pages [figure 27]. There has been no downtime at all for the server over the last twelve months, either scheduled or unscheduled.


Oxford Web Activity (Figure 27)

The following approximate measures give some indication of the level of Web page activity at Oxford:

Number of requests for external Web pages/day:

3 million

Number of Web page requests/week to central server - min:

1 million

Number of Web page requests/week to central server - max:

2 million

Number of Oxford Web pages

40,000

Number of OUCS Web pages

14,000

Number of users with personal web pages

3,000

Volume of disk space used for Web pages on central sever

6,750 MBytes

Number of domains served

90


3.4 Backup and Archive File Server

The Hierarchical File Server (HFS) was acquired in 1995 to provide large-scale filestore services to the University community. The HFS runs software called ADSM which is from Tivoli, a wholly-owned subsidiary of IBM. The two main services provided by the HFS are (a) a site-wide Backup service and (b) a long-term data repository service for University assets.

The HFS computer systems currently comprise 2 IBM RS/6000 computers, an IBM 3494 Automated Tape Library (ATL) and within it 10 3590E tape drives. The 3494 ATL holds 2,000 tape cartridges, giving an overall data storage capacity of about 40 Terabytes. The total disk capacity of the RS6000s is about 600 Gbytes, which is used to hold the ADSM server databases, client backup and archive data (temporarily) when it is newly deposited on the ADSM servers, and permanent on-line storage for long term projects which are accessed via FTP; these latter use another transparent function of ADSM called HSM allowing the data to migrate off-line onto tape behind the scenes.

During 1999 several major hardware and software configuration changes were made in order to update the hardware to more modern technology as well as to improve system performance and scalability. The changes were particularly targeted at improving the site-wide backup service, which was overloaded.

The major hardware changes were:

  1. January 1999: replace most SSA disks with larger SSA disks on one of the RS6000 servers (DSM) providing the site-wide backup service. Disk capacity on DSM was then about 450GB.
  2. January 1999: add a sixth frame to the 3494 ATL and two 3590B tape drives bringing the slot capacity up to 2000 and the number of drives to ten.
  3. July 1999: replace the RS6000/R40 (DSM) with an RS6000/H70 which is approximately three times faster.
  4. July/August 1999: replace 10 3590B tape drives with 3590Es, each with an I/O rate of 15MB/sec and each capable of holding twice as much data on each cartridge as the B drives (20GB uncompressed).

The major software reconfigurations were:

  1. January 1999: split the ADSM services across two servers, providing independent ADSM servers for the two areas:
    1. site-wide backup service (on DSM);
    2. long term repository services, ADSM Archive and FTP/HSM, HFS - the existing R40.
  2. July 1999: reorganise/tune the ADSM Databases.

All of these changes were major in complexity but were carried out at relatively quiet times with the minimum loss of service to users - only about five working days. In addition, smaller upgrades have occurred occasionally in order to keep up to date with AIX and ADSM server software fixes and enhancements; these have caused very little downtime. There has been very little time lost to either hardware or software failures during the year.

As of August 99, the site-wide backup service had 2,195 registered backup clients with data backed-up in ADSM [figure 28]. In all there were more than 94 million files, around 8TB of data, amassed for those clients [figure 29 & figure 30].

Figure 28 Figure 29 Figure 30

Each week a typical amount of data being backed up might be around 3 million new/changed files, representing about 500GB data [figure 31]. The R40 has run at 100% CPU capacity during the overnight backup window for most of the year and so it is anticipated that with the faster H70 in service more data will be backed up.

Figure 31

The H70 is able to exploit the faster tape drives fully and so all the regular ADSM server housekeeping times (multiple copies of data from disk to tape, tape reclamation and ADSM server database backups) are all taking place much quicker.

In addition, major software performance improvements were seen on the ADSM server following the installation of Version 3 in January 1998.

The site-wide backup service is offering a highly reliable and available facility. The performance aspect to be addressed next is that of data restoration times, particularly for large filesystems. It is likely that software enhancements during 1999/2000 will offer relief in this area and OUCS will be encouraging existing users to update to the latest client code levels in order to take advantage of these and other facilities.

The ADSM database stands at almost 50GB (following the tuning exercise); it holds an entry for every client file stored on the server. It remains the single limiting factor to ADSM server scalability and it is likely that at some time during 1999/2000 it will be necessary to add a second ADSM server for backup, should the workload increase at expected rates.

One of the most difficult service aspects to manage well is that of clients not backing up regularly and detecting when the service is no longer required by clients, for whatever reason. It is likely that a change in policy will be recommended during 1999/2000 which will facilitate the management and removal of such dead clients, thereby releasing valuable resources for new and growing systems who do need active service.

Splitting the long-term repository onto its own ADSM server has improved some performance bottlenecks but it is clear that further enhancements to both hardware and software are needed in order to offer a high quality of service for archiving. This will be a focus area for 1999/2000, resources permitting. The amount of data being saved in the archive projects is steady: between 5 and 10GB each day. So far about 4TB of data has been uploaded to the HFS [figure 32]. In 1995 the average image size was between 1MB and 10MB; 100MB is not uncommon now.


HFS Archive Projects (Figure 32)

Department - Project

GBytes

Bodleian and College Libraries - Celtic manuscript images

2900.0

Bodleian Library - Internet Library of Early Journals

122.0

Centre for Human Genetics - automatic capture of genetic image data

92.0

Ashmolean Museum - images of artifacts

80.0

Centre for Study of Ancient Documents - paper impressions of Greek & Roman inscriptions

74.0

Diabetes - Human genetics research project data

40.0

Humanities Computing Unit - Oxford Text Archive

40.0

Engineering - colloidal plasma video data sequences

15.5

Anaesthetics - images of fragile 35mm slides for use in teaching

15.0

Music - Mediaeval Music Manuscripts digital archive (with Royal Holloway)

6.0

Bodleian Library - catalogue data

5.5

Materials - sound data from surface analysis

5.0

Humanities Computing Unit - images of Wilfred Owen letters & manuscripts

4.3

Archaeology Research Institute

1.1

Materials - moving image data from scanning tunnelling microscope

1.0

Centre for Study of Ancient Documents - Greek inscriptions and Romano-British writing tablets

0.3

Paediatrics - DNA sequences

0.2


Operationally, we have managed to prevent most of the above performance difficulties and system upgrades from unduly affecting system performance and availability [figure 33].


HFS Availability (Figure 33)

4 Weeks Ending

Availability

Time

# breaks

MTBI

MTTR

IBM

Overall

Prime shift

Lost

 

 

 

 

 

 

 

 

 

 

 

09-Aug-98

100.0

100.0

100.0

0.0

0

-

-

06-Sep-98

100.0

100.0

100.0

0.0

0

-

-

04-Oct-98

98.5

98.2

93.9

12.0

3

220

4.0

01-Nov-98

99.9

97.6

99.9

0.2

1

336

0.1

29-Nov-98

100.0

99.6

98.9

2.7

1

669

2.7

27-Dec-98

100.0

100.0

100.0

0.0

0

-

-

24-Jan-99

100.0

94.9

90.0

34.5

1

638

34.5

21-Feb-99

99.5

99.5

98.0

3.1

5

134

0.6

21-Mar-99

100.0

100.0

100.0

0.0

0

-

-

18-Apr-99

99.9

99.9

99.7

0.4

1

672

0.4

16-May-99

100.0

100.0

100.0

0.0

0

-

-

13-Jun-99

100.0

100.0

100.0

0.0

0

-

-

11-Jul-99

100.0

99.9

99.8

0.7

1

671

0.7

 

 

 

 

 

 

 

 

Total

99.8

99.2

98.5

53.6

13

668

4.1


Oxford remains one of the leading ADSM sites in the world and has an excellent relationship with the software developers. Oxford will host an ADSM Symposium in September 1999 with an expected attendance of 170 worldwide users.

3.5 OxCERT

The Oxford University Computer Emergency Response Team (OxCERT) has responsibility for protecting the security of University systems and networks from both internal and external attack; for liaising with other organisations where necessary and appropriate; for raising awareness of security and privacy related issues within the University; for testing the security of systems on request and advising their administrators; and for providing security related tools and documentation.

The team has four front-line staff handling incidents as they occur, and a notional backup team of another eight (the latter almost never contribute to the emergency role). These are drawn from a number of University departments and Colleges. Everyone working for OxCERT does so on a part-time basis. OxCERT itself provides cover on a best-efforts basis and, in particular, is not able to provide a 24-hour, 7-day response.

In the year to August 1998-1999, OxCERT dealt with 109 incidents, a continuing increase on the previous year’s 85. Many of these were "port scans", sniffed passwords and ftp or Web abuse with very few serious incidents.

The classification of incidents had to be changed so that port-scans were no longer counted as a security incident since they average at 4-5 per day. It is not practical to devote time to complaints to the remote sites with such a very high number of scans and they are now logged and examined for trends and changes in pattern with a view to pre-empting new exploit attacks.

Under this new classification there were 45 real incidents where the majority related to system compromises and/or password sniffers causing damage and often loss of data or necessitating system re-installation. Network denial-of-service (DoS) attacks were infrequent but more of a nuisance value. Some incidents involved multiple hosts (sometimes as high as 40).

There is a very strong correlation between system abuse and the use of IRC (Internet Relay Chat) and some network games, either to abuse remote servers, or as a form of attack against participants.

During the early part of the period 1998-9 the major target for host attacks was Linux, with less than previous levels of Sun system abuse. The latter part (spring and summer) saw a major shift to the abuse of Sun systems, with very large numbers of compromised systems after attacks against RPC services. These compromises were again initially used to run password sniffers and IRC abuse tools, with relatively minor network flooding tools. Analysis of previous incidents and logs indicate that Oxford systems were being used as test beds for these exploits and flooding tools as far back as Jan-Feb 1999 (with one particular system compromised up to 4 times).

Fixes for these system bugs were often already available, indicating poor administrator awareness of security aspects despite repeated advisories sent to mail lists and newsgroups. Some of the bugs were not patchable until some time after their release in to the wild, which left administrators with difficult choices affecting usability for their user base.

July-August saw very destructive exploits and major network flooding taking place, which had a very serious impact on the operation of the University IT facilities. These floods were co-ordinated with others taking place at remote sites all over the Internet, completely shutting down target sites. Oxford was not a target for these attacks directly - the use of 20 or so machines to flood remote victims being more than sufficient to saturate our backbone and JANET feeds.

Some of the exploited systems were seriously damaged while others were very easily recovered, indicating more than one group of abusers at work.

Much of the efforts put in by the OXCERT front-line team (4) has been directed at raising awareness and education so as to minimise the impact such attacks may have on systems. With a wide open network there are always systems which can be abused and many of the services running on systems will continue to be the target of attacks as new exploits emerge (sometimes old ones come round again as vendors reintroduce the flaws).

The goal of OxCERT has shifted somewhat, so that it now aims to minimise the potential for Oxford systems being used to abuse remote sites and thereby minimise the number of compromised systems on the networks. Other UK universities have experienced much greater disruption to their systems, while some are often not even aware that they are being abused.

OxCERT was accepted in to full membership of FIRST, and this communication channel direct to major vendor and national response teams has been of great benefit, especially when new patterns of attack emerge and need secure analysis and discussion.

OxCERT members have contributed to the University and wider communities by presenting seminars, news articles and the like. Presentations have been made to each of the University’s IT Support Staff conferences. Team members attended the 1998 FIRST conference in Mexico and the 1999 conference in Brisbane. One of the team also serves on the Steering Committee for FIRST until June 2000.

[ Table of Contents | Previous section | Next section ]

[ Oxford University | OUCS Home Page | Search OUCS Pages | Feedback ]

Last updated: 29-May-00.
© Oxford University Computing Services.