1. Introduction

1.1. Definition

The Core User Directory (CUD) is a central reference point or a directory that stores details of people associated with the University of Oxford. The directory holds information about students, researchers, tutors, staff and alumni. CUD consolidates records from a number of data sources. Every CUD record has multiple attributes such as first name, surname etc. It becomes easier to distinguish a CUD record as unique when more attributes are stored within a record.

CUD could support the storage of details about prospective students and users who are registered to receive information about services in the future should there be an availability of information.

For every data record entered, a CUD unique identifier (CUD ID) is assigned after the data is matched, consolidated and reconciled. As a result of these processes, a single CUD record (with many attributes) is created for each person associated with the University. By providing a CUD ID, duplication of information is reduced within these systems.

2. Context

The CUD service extends the suite of Identity and Access Management services (IAM) offered by OUCS.

CUD focuses on establishing a reliable source of user identity information and complements the existing IAM service suite. Other identity and access management processes and functions such as account provisioning, authentication, privilege management and authorization, are beyond the scope of CUD.

CUD provides data controllers with an easy to use source of information about users

There is a growing need in the University to implement Identity Access and Management processes. Most identity management solutions require access to one or more sources of fully comprehensive, authoritative user data. Each user must be assigned a digital unique identity with an associated unique identifier. For records stored in multiple sources, the unique identifier must be global in scope. Such an identifier did not exist. CUD provides the matched and consolidated user data along with globally unique identifier (CUD ID). This effectively supports the identity management and is a significant precursor to achieving a full IAM solution for the University.

CUD enhances efficiency and accuracy by establishing reliable cross-references for data related to the same person across multiple systems as well as supporting identity management. This will facilitate strategic sharing of attributes such as name and address, reducing duplicate data and associated processes and improve consistency. As not every system stores an ID, CUD provides a Foreign Key to act as reference between the Primary Data System (PDS) and CUD records. For more information, see Foreign Key Referral Service.

3. Users

CUD Service Users can contribute to or consume data from key or Primary Data Systems (PDS) such as Student Records System, University HR System, Alumni Relations System etc. Users are classified as following with respect to their association with PDS.

3.1. Data Contributors

Data Controllers or Managers contribute data to PDS. By using CUD, they can benefit in the following ways:

  • Attribute Release Policies enable the Data Controllers to determine the access privileges of users to access or view attributes
  • CUD ID provides a means to match and reconcile data with records stored in other Primary Data Systems
  • Data Controllers may choose to function as CUD data consumers by performing queries against CUD offering the same set of benefits to them.

3.2. Data Consumers

Users query the data and retrieve information stored in CUD. CUD data consumers can query CUD data and obtain a result, or set of results, about the user records. By using CUD, they can benefit in the following ways:

  • Get access to authoritative data using a single source
  • Data Controllers may configure the type, frequency and result format of queries made to CUD
  • Data is verified to ensure that the data format is as expected, preventing unforeseen results for Data Controllers and their systems
  • Data Controllers can determine data provenance using the meta data returned with queries
  • Attributes that are not stored in CUD can be requested using the Foreign Key

All data consumers and data controllers are required to sign a Service Level Agreement. This is to ensure that CUD reflects the same or more restricted release policies of all parties. For example, to be able to use the University Card photo, a data owner is required to agree to a particular clause. Hence, CUD uses the same statement as the Card Office, including it in the Terms of Usage for Data Consumers.

4. Terminology / Glossary

Glossary: http://www.oucs.ox.ac.uk/services/iam/cud/cud-glossary-of-terms.xml

The CUD glossary defines terms used specific to this project. CUD complies with the terms used in the glossary of the JISC Identity and Access Management Toolkit (http://www.jisc.ac.uk/media/documents/programmes/aim/IdMToolkit.pdf). While not in scope for CUD, the toolkit offers a good overview and discussion of the processes used in the implementation of IAM within an academic environment.

5. Services and Interfaces

5.1. Services

CUD gathers records from several primary data systems and matches them internally to identify records from different systems that correspond to the same person. It makes this consolidated data available to the systems and users that can query and extract information from CUD.

A typical user or system sends a query to CUD, specifying selection criteria and listing any desired attributes (which may have originated from more than one primary data system). CUD returns a single result for each record according to the selection criteria specified in the query. The result contains attributes and metadata describing the provenance and status of attributes.

As a result of matching and consolidation of data, CUD provides the following key services:

5.1.1. Data Consolidation and Reconciliation

CUD gathers information about people from CUD registered systems (primary data system), identifies matching records from different primary data sources and highlights inconsistencies in attribute values. This service provides the following capabilities:

  • Data Matching as Service matches and consolidates records between sources, resulting in cleaner source data without duplicate records. In case of duplicate records, CUD provides information about multiple matches that enable records to be merged or de-merged. CUD provides consolidated data about a person, even from multiple queries to multiple systems, making it easier to get the desired results.
  • Data Consolidation as a Service sorts and consolidates data from multiple sources. Given the number of data sources within the University that contain data for a person, there is a need to provide such information in a single place. This will allow you to make a single query, rather than queries to multiple data sources.

5.1.2. Data Matching as Service

A globally unique identifier is required to achieve identity and access management, reporting and auditing. It must be possible to reference information about a person from more than one data source and be assured that it is the same person.

Data matching is the process of matching records from multiple sources. Matching may result when few or all attributes of a record in a primary data system (PDS) are found identical to a record in another PDS.

Once person records are uniquely matched then a global unique identifier may confidently be assigned.

The process of matching occurs in the following scenarios:

  • Data from a new PDS is available for CUD provisioning
  • Data is entered into existing PDS
  • Data is changed significantly within an existing PDS

Dynamic data matching also happens when new or significantly changed data is available to CUD.

Full data sets can be compared against CUD data, where the matching process compares every PDS record against every record stored in CUD. This can result in multiple matches indicating duplicate records within the PDS.

Matching Strategies

CUD implements various matching strategies applying different test conditions for the records to be matched. The strategies result in matches generated with varying levels of confidence. Matches with a high measure of confidence (exact matches) are accepted without further processing.

Other high confidence matches are made where one or more unique attributes match between systems. For example, where email addresses for entities in more than one system are same.

One of the low confidence matching strategies considers unclear or fuzzy match results. In such a case, CUD tests for character similarities between two attribute values. Also, it allows for typographical errors intentionally. For example: Ann Smith may be treated same as Anne Smith.

More importantly, low confidence matches require Data Controllers to confirm matches manually.

Guidance for matching

A key function of CUD is to match data received from different systems. The following types of matches are defined:

  • Definite one-to-one match - This is triggered automatically
  • Possible one-to-one match - This requires human confirmation
  • Possible one-to-many match - This requires human confirmation

The UI provides a means for authorised users to:

  • View all matches that are made automatically
  • View and confirm or reject possible matches
  • Reject existing matches
  • Manually make a match where no possible match has been found by the system

The CUD Attribute Set

Although, it would be technically possible to represent every data item from every data source in CUD, this is not the function of CUD and data warehouse is responsible for it. CUD provides a set of commonly used, defined and agreed identity attributes, which are use cases for similar systems to enable equivalent functions and services to be applied to other data and are out of scope for CUD. The full attribute set is available at


5.1.3. Data Reconciliation as a Service

Each person with a relationship to the University will exist in one or more data sources and possibly not in any primary data source.

However, for all University card holders, records exist within Card database and OUCS registration database as a minimum requirement. It is possible that common attributes in these systems may differ as a result of error or because the user requested a change. For example, consider a Surname that was changed in one place but not at other(s).

Data Reconciliation Service reconciles the common values or attributes of a CUD record that exists in more than one PDS. Reconciling the values of common attributes is possible, when there is a shared agreement among the Data Controllers, about the common attributes. The agreement is decided at the University level through a common understanding among the data owners and governance board.

CUD stores all the values of an attribute along with metadata to identify the originating source and a date stamp stating when it was entered into CUD. Thus, historical data and source is preserved. The longevity of the historical values for each attribute is decided by a governance board. If required, Data Controllers can configure a request notification from CUD alerting them about divergent values for the attributes specified in the notification request. When a CUD data consumer makes a query, CUD provides all the values that are stored for each attribute and the system in which they are stored. Data Controllers may then manually choose how to use one or more of the reported values based on their own precedence rules or use cases.

Data Presentation

After the data is rationalized, CUD makes it accessible through a suitable interface, such that data controllers can configure personalized queries. This service includes the following details.

CUD Unique Identifier

CUD ID is a unique identifier for each person and unchangeable across all primary data systems. Thus, it acts as a suitable "global unique identifier" within the context of the University’s IT systems.

This allows all data providers to have a common reference for every person record they hold. Also, it functions as the shared, persistent, unique identifier for all CUD data consumers.

It also allows data managers to check whether a new record in their systems already exists in other systems. If it doesn't, a new CUD ID is assigned to that user so that it becomes available for matching. This enables the data managers to evade duplicating records. For example, if a person returns to the University after he or she has been deleted from one data source, and if there is a match against a historical record in another source, the old identity of the person can be retained in that source, rather than creating another record for the same person.

Foreign Key Referral Service

Storing the CUD ID in the PDS is optional and provides a means to refer to the corresponding matching record in CUD. As not every PDS stores an ID, CUD provides a reference between the PDS and CUD records by making a Foreign Key available. This is generated by each PDS as an attribute of every corresponding CUD record.

The PDS also stores the Foreign Key and therefore connects with the corresponding CUD record.

The Foreign Key, provided by the PDS to CUD, must uniquely identify both the PDS and associated records. It can be a composite of multiple attributes where no single unique identifier already exists in the PDS.

Storing the Foreign Key is very important to other service users. It provides the reference for a record by which a CUD data consumer may request attributes, which are not stored in CUD from the source PDS.

Attribute Release Policy

Policies that define permissions to users on the visibility or accessibility of attributes. Data Controllers may configure policies via a suitable interface to control the release of their data from CUD.

CUD provides a single place for data owners to configure policies against the specific data made available to requestors which sometimes can be another data owner. For instance, Career services may require sensitive data from HR. This data can be controlled and made available via CUD rather than having a manually configured query.

For more information about release policies, refer to https://www.oucs.ox.ac.uk/services/iam/cud/cas-usage.pdf

6. CUD Interfaces

The following are brief descriptions of available CUD interfaces intended to assist with selection of query for a given requirement.

More details, including the process to follow to request access are in http://www.oucs.ox.ac.uk/services/iam/cud/cud-interfaces-detail.xml

6.1. CUD UI

Typical use cases: ad-hoc lookups of data in CUD; preparation and testing of query to be used by server/service

The CUD UI is a web application which enables registered users to perform the following:

  • Searching using a query builder
  • Matching
  • Manage affiliations

All users are encouraged to use the CUD UI to familiarise themselves with CUD.

Documentation on the use of UI is available at http://www.oucs.ox.ac.uk/services/iam/cud/cud-ui-user-guide.xml

6.2. REST

Typical use-case: retrieve data for a college or department, saving to file for local processing

Representational State Transfer (REST) is the preferred method of querying CUD from a server or service. It allows data to be requested using a simple GET query communicated over HTTPS.

6.3. SOAP

Typical use-case: send data to/accept data from packaged application which supports SOAP for this purpose.

SOAP is currently supported as a means of pushing data to remote webservices. Requirements are specific to each service.

6.4. LDAP

Typical use case: Web application which requires a lookup of single records in CUD

CUD data can appear as an LDAP v3 directory, queryable by any LDAP client. Whilst this is possible it is not planned as this function is better served by existing and planned dedicated LDAP directories such as OAK and Groupstore.

Typical use case: provision accounts to local Active Directory, with account lifecycle managed by CUD

CUD can push data to an external LDAP directory, such as Microsoft Active Directory.

6.5. SQL

Typical use case: maintain data on a set of people in a table in a database for use locally

CUD can push data into a SQL database. Normally this involves storing data in a table or tables in the remote database which is dedicated to this task. This data is then processed by local procedures to update other data tables, or referenced as appropriate in queries.