5. Services and Interfaces

5.1. Services

CUD gathers records from several primary data systems and matches them internally to identify records from different systems that correspond to the same person. It makes this consolidated data available to the systems and users that can query and extract information from CUD.

A typical user or system sends a query to CUD, specifying selection criteria and listing any desired attributes (which may have originated from more than one primary data system). CUD returns a single result for each record according to the selection criteria specified in the query. The result contains attributes and metadata describing the provenance and status of attributes.

As a result of matching and consolidation of data, CUD provides the following key services:

5.1.1. Data Consolidation and Reconciliation

CUD gathers information about people from CUD registered systems (primary data system), identifies matching records from different primary data sources and highlights inconsistencies in attribute values. This service provides the following capabilities:

  • Data Matching as Service matches and consolidates records between sources, resulting in cleaner source data without duplicate records. In case of duplicate records, CUD provides information about multiple matches that enable records to be merged or de-merged. CUD provides consolidated data about a person, even from multiple queries to multiple systems, making it easier to get the desired results.
  • Data Consolidation as a Service sorts and consolidates data from multiple sources. Given the number of data sources within the University that contain data for a person, there is a need to provide such information in a single place. This will allow you to make a single query, rather than queries to multiple data sources.

5.1.2. Data Matching as Service

A globally unique identifier is required to achieve identity and access management, reporting and auditing. It must be possible to reference information about a person from more than one data source and be assured that it is the same person.

Data matching is the process of matching records from multiple sources. Matching may result when few or all attributes of a record in a primary data system (PDS) are found identical to a record in another PDS.

Once person records are uniquely matched then a global unique identifier may confidently be assigned.

The process of matching occurs in the following scenarios:

  • Data from a new PDS is available for CUD provisioning
  • Data is entered into existing PDS
  • Data is changed significantly within an existing PDS

Dynamic data matching also happens when new or significantly changed data is available to CUD.

Full data sets can be compared against CUD data, where the matching process compares every PDS record against every record stored in CUD. This can result in multiple matches indicating duplicate records within the PDS.

Matching Strategies

CUD implements various matching strategies applying different test conditions for the records to be matched. The strategies result in matches generated with varying levels of confidence. Matches with a high measure of confidence (exact matches) are accepted without further processing.

Other high confidence matches are made where one or more unique attributes match between systems. For example, where email addresses for entities in more than one system are same.

One of the low confidence matching strategies considers unclear or fuzzy match results. In such a case, CUD tests for character similarities between two attribute values. Also, it allows for typographical errors intentionally. For example: Ann Smith may be treated same as Anne Smith.

More importantly, low confidence matches require Data Controllers to confirm matches manually.

Guidance for matching

A key function of CUD is to match data received from different systems. The following types of matches are defined:

  • Definite one-to-one match - This is triggered automatically
  • Possible one-to-one match - This requires human confirmation
  • Possible one-to-many match - This requires human confirmation

The UI provides a means for authorised users to:

  • View all matches that are made automatically
  • View and confirm or reject possible matches
  • Reject existing matches
  • Manually make a match where no possible match has been found by the system

The CUD Attribute Set

Although, it would be technically possible to represent every data item from every data source in CUD, this is not the function of CUD and data warehouse is responsible for it. CUD provides a set of commonly used, defined and agreed identity attributes, which are use cases for similar systems to enable equivalent functions and services to be applied to other data and are out of scope for CUD. The full attribute set is available at

http://www.oucs.ox.ac.uk/services/iam/cud/cud-interfaces-detail.xml?ID=body.1_div.4

5.1.3. Data Reconciliation as a Service

Each person with a relationship to the University will exist in one or more data sources and possibly not in any primary data source.

However, for all University card holders, records exist within Card database and OUCS registration database as a minimum requirement. It is possible that common attributes in these systems may differ as a result of error or because the user requested a change. For example, consider a Surname that was changed in one place but not at other(s).

Data Reconciliation Service reconciles the common values or attributes of a CUD record that exists in more than one PDS. Reconciling the values of common attributes is possible, when there is a shared agreement among the Data Controllers, about the common attributes. The agreement is decided at the University level through a common understanding among the data owners and governance board.

CUD stores all the values of an attribute along with metadata to identify the originating source and a date stamp stating when it was entered into CUD. Thus, historical data and source is preserved. The longevity of the historical values for each attribute is decided by a governance board. If required, Data Controllers can configure a request notification from CUD alerting them about divergent values for the attributes specified in the notification request. When a CUD data consumer makes a query, CUD provides all the values that are stored for each attribute and the system in which they are stored. Data Controllers may then manually choose how to use one or more of the reported values based on their own precedence rules or use cases.

Data Presentation

After the data is rationalized, CUD makes it accessible through a suitable interface, such that data controllers can configure personalized queries. This service includes the following details.

CUD Unique Identifier

CUD ID is a unique identifier for each person and unchangeable across all primary data systems. Thus, it acts as a suitable "global unique identifier" within the context of the University’s IT systems.

This allows all data providers to have a common reference for every person record they hold. Also, it functions as the shared, persistent, unique identifier for all CUD data consumers.

It also allows data managers to check whether a new record in their systems already exists in other systems. If it doesn't, a new CUD ID is assigned to that user so that it becomes available for matching. This enables the data managers to evade duplicating records. For example, if a person returns to the University after he or she has been deleted from one data source, and if there is a match against a historical record in another source, the old identity of the person can be retained in that source, rather than creating another record for the same person.

Foreign Key Referral Service

Storing the CUD ID in the PDS is optional and provides a means to refer to the corresponding matching record in CUD. As not every PDS stores an ID, CUD provides a reference between the PDS and CUD records by making a Foreign Key available. This is generated by each PDS as an attribute of every corresponding CUD record.

The PDS also stores the Foreign Key and therefore connects with the corresponding CUD record.

The Foreign Key, provided by the PDS to CUD, must uniquely identify both the PDS and associated records. It can be a composite of multiple attributes where no single unique identifier already exists in the PDS.

Storing the Foreign Key is very important to other service users. It provides the reference for a record by which a CUD data consumer may request attributes, which are not stored in CUD from the source PDS.

Attribute Release Policy

Policies that define permissions to users on the visibility or accessibility of attributes. Data Controllers may configure policies via a suitable interface to control the release of their data from CUD.

CUD provides a single place for data owners to configure policies against the specific data made available to requestors which sometimes can be another data owner. For instance, Career services may require sensitive data from HR. This data can be controlled and made available via CUD rather than having a manually configured query.

For more information about release policies, refer to https://www.oucs.ox.ac.uk/services/iam/cud/cas-usage.pdf

Up: Contents Previous: 4. Terminology / Glossary Next: 6. CUD Interfaces