Data Quality

Organisations today spend vast sums of money, time and resources collecting massive volumes of data in data stores such as data warehouses and xml instances. Furthermore, organisations are increasingly combining data from any number of these data stores in order to create a single view of their operation. For organisations to benefit from this effort, it is imperative that not only is the data collected of the highest quality, but also that the combined data is consistent and accurate. Where this isn't the case, inaccurate information will lead to incorrect decisions.

As an organisation's data pool grows, so to do the problems associated with ensuring that the data pool is accurate, complete, consistent, and exhibits minimal data duplication or redundancy.

Transformation Manager (TM) has been designed and built to give a high-level tool set for data transformation from one data store to another. It can handle a range of data structures including XML, Java Objects and relational databases. An integral part of that functionality is the ability to carry out various quality checks and updates on the data.

To make this possible TM has functionality that allows the user to:

  • validate source data values by checking against

1. predefined list;
2. numeric or date range;
3. numeric or character length;
4. specified pattern;

  • ensure that the source and target data values satisfy the underlying model by checking an attribute value against associated XSD facets (for XML models) or relational datatypes for (relational database models).
  • enrich source data values;
  • modify values using string and arithmetic manipulation functions;
  • identify attributes (or a set of attributes) that have unique values; this is used to stop duplicate records being transformed;
  • apply conditional logic, using If/Else or Case statements.