The Core User Directory (CUD) is a central reference point or a directory that stores details of people associated with the University of Oxford. The directory holds information about students, researchers, tutors, staff and alumni. CUD consolidates records from a number of data sources. Every CUD record has multiple attributes such as first name, surname etc. It becomes easier to distinguish a CUD record as unique when more attributes are stored within a record.
For every data record entered, a CUD unique identifier (CUD ID) is assigned after the data is matched, consolidated and reconciled. As a result of these processes, a single CUD record (with many attributes) is created for each person associated with the University. By providing a CUD ID, duplication of information is reduced within these systems.
CUD focuses on establishing a reliable source of user identity information and complements the existing IAM service suite. Other identity and access management processes and functions such as account provisioning, authentication, privilege management and authorization, are beyond the scope of CUD.
There is a growing need in the University to implement Identity Access and Management processes. Most identity management solutions require access to one or more sources of fully comprehensive, authoritative user data. Each user must be assigned a digital unique identity with an associated unique identifier. For records stored in multiple sources, the unique identifier must be global in scope. Such an identifier did not exist. CUD provides the matched and consolidated user data along with globally unique identifier (CUD ID). This effectively supports the identity management and is a significant precursor to achieving a full IAM solution for the University.
CUD enhances efficiency and accuracy by establishing reliable cross-references for data related to the same person across multiple systems as well as supporting identity management. This will facilitate strategic sharing of attributes such as name and address, reducing duplicate data and associated processes and improve consistency. As not every system stores an ID, CUD provides a Foreign Key to act as reference between the Primary Data System (PDS) and CUD records. For more information, see Foreign Key Referral Service.
CUD Service Users can contribute to or consume data from key or Primary Data Systems (PDS) such as Student Records System, University HR System, Alumni Relations System etc. Users are classified as following with respect to their association with PDS.
- Attribute Release Policies enable the Data Controllers to determine the access privileges of users to access or view attributes
- CUD ID provides a means to match and reconcile data with records stored in other Primary Data Systems
- Data Controllers may choose to function as CUD data consumers by performing queries against CUD offering the same set of benefits to them.
Users query the data and retrieve information stored in CUD. CUD data consumers can query CUD data and obtain a result, or set of results, about the user records. By using CUD, they can benefit in the following ways:
- Get access to authoritative data using a single source
- Data Controllers may configure the type, frequency and result format of queries made to CUD
- Data is verified to ensure that the data format is as expected, preventing unforeseen results for Data Controllers and their systems
- Data Controllers can determine data provenance using the meta data returned with queries
- Attributes that are not stored in CUD can be requested using the Foreign Key
All data consumers and data controllers are required to sign a Service Level Agreement. This is to ensure that CUD reflects the same or more restricted release policies of all parties. For example, to be able to use the University Card photo, a data owner is required to agree to a particular clause. Hence, CUD uses the same statement as the Card Office, including it in the Terms of Usage for Data Consumers.
The CUD glossary defines terms used specific to this project. CUD complies with the terms used in the glossary of the JISC Identity and Access Management Toolkit (http://www.jisc.ac.uk/media/documents/programmes/aim/IdMToolkit.pdf). While not in scope for CUD, the toolkit offers a good overview and discussion of the processes used in the implementation of IAM within an academic environment.
CUD gathers records from several primary data systems and matches them internally to identify records from different systems that correspond to the same person. It makes this consolidated data available to the systems and users that can query and extract information from CUD.
A typical user or system sends a query to CUD, specifying selection criteria and listing any desired attributes (which may have originated from more than one primary data system). CUD returns a single result for each record according to the selection criteria specified in the query. The result contains attributes and metadata describing the provenance and status of attributes.
CUD gathers information about people from CUD registered systems (primary data system), identifies matching records from different primary data sources and highlights inconsistencies in attribute values. This service provides the following capabilities:
- Data Matching as Service matches and consolidates records between sources, resulting in cleaner source data without duplicate records. In case of duplicate records, CUD provides information about multiple matches that enable records to be merged or de-merged. CUD provides consolidated data about a person, even from multiple queries to multiple systems, making it easier to get the desired results.
- Data Consolidation as a Service sorts and consolidates data from multiple sources. Given the number of data sources within the University that contain data for a person, there is a need to provide such information in a single place. This will allow you to make a single query, rather than queries to multiple data sources.
A globally unique identifier is required to achieve identity and access management, reporting and auditing. It must be possible to reference information about a person from more than one data source and be assured that it is the same person.
Data matching is the process of matching records from multiple sources. Matching may result when few or all attributes of a record in a primary data system (PDS) are found identical to a record in another PDS.
- Data from a new PDS is available for CUD provisioning
- Data is entered into existing PDS
- Data is changed significantly within an existing PDS
Full data sets can be compared against CUD data, where the matching process compares every PDS record against every record stored in CUD. This can result in multiple matches indicating duplicate records within the PDS.
CUD implements various matching strategies applying different test conditions for the records to be matched. The strategies result in matches generated with varying levels of confidence. Matches with a high measure of confidence (exact matches) are accepted without further processing.
One of the low confidence matching strategies considers unclear or fuzzy match results. In such a case, CUD tests for character similarities between two attribute values. Also, it allows for typographical errors intentionally. For example: Ann Smith may be treated same as Anne Smith.
- Definite one-to-one match - This is triggered automatically
- Possible one-to-one match - This requires human confirmation
- Possible one-to-many match - This requires human confirmation
- View all matches that are made automatically
- View and confirm or reject possible matches
- Reject existing matches
- Manually make a match where no possible match has been found by the system
Although, it would be technically possible to represent every data item from every data source in CUD, this is not the function of CUD and data warehouse is responsible for it. CUD provides a set of commonly used, defined and agreed identity attributes, which are use cases for similar systems to enable equivalent functions and services to be applied to other data and are out of scope for CUD. The full attribute set is available at
However, for all University card holders, records exist within Card database and OUCS registration database as a minimum requirement. It is possible that common attributes in these systems may differ as a result of error or because the user requested a change. For example, consider a Surname that was changed in one place but not at other(s).
Data Reconciliation Service reconciles the common values or attributes of a CUD record that exists in more than one PDS. Reconciling the values of common attributes is possible, when there is a shared agreement among the Data Controllers, about the common attributes. The agreement is decided at the University level through a common understanding among the data owners and governance board.
CUD stores all the values of an attribute along with metadata to identify the originating source and a date stamp stating when it was entered into CUD. Thus, historical data and source is preserved. The longevity of the historical values for each attribute is decided by a governance board. If required, Data Controllers can configure a request notification from CUD alerting them about divergent values for the attributes specified in the notification request. When a CUD data consumer makes a query, CUD provides all the values that are stored for each attribute and the system in which they are stored. Data Controllers may then manually choose how to use one or more of the reported values based on their own precedence rules or use cases.
CUD ID is a unique identifier for each person and unchangeable across all primary data systems. Thus, it acts as a suitable "global unique identifier" within the context of the University’s IT systems.
It also allows data managers to check whether a new record in their systems already exists in other systems. If it doesn't, a new CUD ID is assigned to that user so that it becomes available for matching. This enables the data managers to evade duplicating records. For example, if a person returns to the University after he or she has been deleted from one data source, and if there is a match against a historical record in another source, the old identity of the person can be retained in that source, rather than creating another record for the same person.
Storing the CUD ID in the PDS is optional and provides a means to refer to the corresponding matching record in CUD. As not every PDS stores an ID, CUD provides a reference between the PDS and CUD records by making a Foreign Key available. This is generated by each PDS as an attribute of every corresponding CUD record.
The Foreign Key, provided by the PDS to CUD, must uniquely identify both the PDS and associated records. It can be a composite of multiple attributes where no single unique identifier already exists in the PDS.
Storing the Foreign Key is very important to other service users. It provides the reference for a record by which a CUD data consumer may request attributes, which are not stored in CUD from the source PDS.
Policies that define permissions to users on the visibility or accessibility of attributes. Data Controllers may configure policies via a suitable interface to control the release of their data from CUD.
CUD provides a single place for data owners to configure policies against the specific data made available to requestors which sometimes can be another data owner. For instance, Career services may require sensitive data from HR. This data can be controlled and made available via CUD rather than having a manually configured query.
For more information about release policies, refer to https://www.oucs.ox.ac.uk/services/iam/cud/cas-usage.pdf
More details, including the process to follow to request access are in http://www.oucs.ox.ac.uk/services/iam/cud/cud-interfaces-detail.xml
Documentation on the use of UI is available at http://www.oucs.ox.ac.uk/services/iam/cud/cud-ui-user-guide.xml
CUD can push data into a SQL database. Normally this involves storing data in a table or tables in the remote database which is dedicated to this task. This data is then processed by local procedures to update other data tables, or referenced as appropriate in queries.