Title of Service: HFS Backup & Archive Services
Status of Document: This document describes the HFS Backup and Archive services. Access to HFS services is restricted to computers which are either physically connected to the Oxford Network or connect to it via the OUCS VPN service.
Contents
The Hierarchical File Server (HFS) is a centrally funded service providing backup and long-term archive services to Senior Members, Postgraduates and Staff. Users register online, pick up, install and perform simple configuration of the software on the client computer and then proceed to either back up or archive data to be safeguarded on the HFS.
The HFS uses IBM Tivoli Storage Manager (TSM) client-server software to manage the backup and/or archive data. Data is sent from the client computer, across the University (backbone) network to the HFS server and ultimately is stored on magnetic tape in an automated (robotic) tape library. The HFS servers and tape library are situated in a climate-controlled, secure location. Three copies of the data are made, each to separate tapes: one copy is held in the automated tape library; the second, in a fire-proof safe located at OUCS; and the third in a fire-proof safe at an off-site storage facility outside Oxford. Access to the data is private to the owner and is normally available 24 hours a day, seven days a week.
In 2010 the HFS received funding from PRAC/BSC to upgrade some of its system infrastructure. In a major project, the 1TB of TSM databases that manage 3PB of user data and c 1.6 billion files were migrated from a proprietary database to an industry-standard database platform (DB2). At the same time, the host systems were moved to a fully virtualised envronment to allow better resource utilisation and improved maintainability, servicability and performance.
The HFS Backup Service typically sees an overall doubling in demand every 18-24 months.
2. Summary of OUCS’s responsibilities:
2.0 The service operates at all times.
2.1 The HFS runs with minimal operational cover from 08:30 to 20:30 on weekdays.
2.2 HFS Systems staff cover is normally available from 09:00 to 17:00 on weekdays. Outside these hours there are no formal arrangements either for periodic monitoring of the HFS or for staff to be called, and no funding is provided to make this contractual.
2.3 If a fault is notified between 09:00 and 17:00 on a working day, OUCS will commence investigation and correction within one hour (provided that no similar fault is also being handled by the same team).
2.4 If a major fault is notified outside these hours, OUCS will use reasonable endeavours to attend to the fault, but no funding is allocated to this purpose.
2.5 It is intended, as far as is possible, to maintain service of all system components at all times. There are no formal serviceability targets.
2.6 OUCS endeavours to carry out any work required in advertised slots, and with minimum disruption to service.
2.7 The HFS is resilient to failure of several components. Most hardware failures are to one component: a single disk, tape drive, adaptor, cable or controller might fail and none would result in a loss of service. Additionally, each data storage element is dual-pathed with automatic failover, so an entire path might fail and all hardware resources would remain available. Were several hardware components to fail at the same time, then this would probably result in a degradation of service, the severity of which would depend on the precise mix of elements needing repair. A fault that rendered an entire SAN switch or SAN disk server inactive would also result in a partially degraded service. A fault on the automated tape library could have significant impact: there would be no access to client data (i.e. restore or retrieve) until it was fixed. Extended outage of the tape library (e.g. for more than 6 hours) would necessitate a total loss of service.
2.8 No alternative exists to any of these facilities within Oxford. 'Cloud' backup and storage solutions exist (e.g. Amazon's S3) but the location of the storage facilities outside the UK poses problems for University-owned data.
Hardware and Software Maintenance
2.9 OUCS has a contract for all HFS hardware maintenance with IBM. Pivotal items in the SAN, such as the Tape Library, host and disk servers - where failure would or could cause major degradation of service - are covered by 7x24 service call reporting. Other items, such as switches and tapes drives - where failure would not lead to a major degradation of service - are covered by 5x11 service call reporting.
In practice the above means that for a major hardware failure of the tape library, the SAN disk servers or the servers hosting TSM, an engineer would normally attend within 1-4 hours. Parts are not guaranteed to be available in any particular period (some parts are available on the same day, some are sourced from Europe and arrive next day). Problems with SAN switches, tape drives and single disk drives would normally be fixed within 1-2 days.
2.10 Software updates to the IBM equipment are supplied by IBM, and are mounted by the OUCS HFS team. Except in the case of a major system breaking fault, or the risk of a major breach of security, we will aim to give one week's notice of any down-time on any part of the HFS.
2.11 There is no scheduled development time.
Information for Departments and Colleges wishing to register new clients is given at http://www.oucs.ox.ac.uk/hfs/registration/. Registration for backup service is available online. Requests for archive projects are initiated by completing a web application form which is then processed offline.
Notification of major faults, outages, TSM server upgrades or downtime is circulated on the mailing list itss-announce@maillist.ox.ac.uk and on the OUCS Status web page http://status.oucs.ox.ac.uk/.
2.14 Problems should be reported to the OUCS Help Centre. End-user support is provided via the Help Centre with specialist follow-up diagnostics and advice provided by the HFS team.
2.15 The HFS web pages at http://www.oucs.ox.ac.uk/hfs provide comprehensive advice and guidance on installing, configuring and using the TSM client, as well as an FAQ.
3. Summary of client’s responsibilities
3.2 Long-term Project Data Archive
Additionally, we may insist on local measures being implemented in order that the backup or archival of that data does not consume a large amount of (HFS) system resources. These local measures may include, but are not limited to:- excluding files, re-partitioning into smaller filesystem/volume/drive partitions or pre-processing files before backup or archive.
Currently the following services are provided free:
4.2 Departmental/College Server Backup.
4.3 Departmental/College Large Server Backup.
4.4 Archive data up to 4TB for 5 years. A chargeable service is available for larger amounts of data and for additional requests to extend the archive period beyond the initial 5 years..
OUCS operates a chargeable service for use of the HFS Archive by a project that exceeds 4TB of storage. The HFS has two cost models, depending on whether the available funds are internal or external (e.g. grant-funded). For both models the unit cost is per Terabyte (TB) over five years and both models take into account the additional staff, media and maintenance costs incurred in managing large amounts of data. The model that applies to external funding also includes additional FEC elements (indirect and estates). Staff costs have been calculated as 'directly-allocated' for both models (that is, staff time for the management and support of 4TB data is averaged rather than audited).
The total internal price per TB for internal use of the HFS Archive service (above the initial 4TB) for 2009/10 is: £2,749 per TB per five year period (equating to £550 per TB per year).
The total FEC per TB per five year period for 2009/10 is £4,209 (equates to £842 per TB per year). However, we would expect to only receive 80% of this from e.g. a research council. Therefore, the price charged to Research Council (RC) funded projects would be: £3,370 per TB per five year period (equating to £674 per TB per year).
For non-RC-funded projects the starting price would be 100% FEC unless negotiated otherwise.