Metadata management tool GTL Metadata Center (MDC) is web based application supporting development and maintenance of corporate data warehouses (DW). Main tasks include storing validated metadata of table/extract structures and generating outputs incl. SQL, ELT routines, ETL tools XML or validation logs. The structures can be organized into different layers.
Metadata management tool – main layers
- Extract/view definitions.
- Data mapping definitions.
- Table definitions.
- Output generation.
- Validation and generation scripts maintenance.
- Change management support.
Active directory integration
Since the beginning MDC has been designed as a web based (thin client) application, thus enabling multiple users or teams to work together. It integrates with Active Directory (AD) and its privileges are mapped to AD user groups. Access to many areas incl. editing, output generation, administration, can be restricted by MDC-to-AD privileges. Minimal set of MDC roles should be: application administrator, BI architect, data modeler, BI analyst, and ETL developer.
Every system delivering data into DW defines either interfaces or data extracts. MDC can import definitions from Excel spreadsheets, Sybase PowerDesigner, SQL files, via import forms or web services API. The same definitions can be exported as Excel spreadsheet or custom generation scripts can be used (see later).
Other operations cover definition creation (including field definition and DB technical info), version control, validation, browsing, compare, labeling for change management and output generation (i.e. XML code for Informatica PowerCenter, SQL data definition, SQL data manipulation, check plans, paramfiles). The same form serves for view and join index definition.
Data mapping definition
MDC supports definition of data mappings for tables or extracts defined in DW. Such mappings contain simple or more complex transformation rules. MDC allows such definitions to import from either SQL or Excel spreadsheets, or to create them from scratch. The same as for extracts, the mapping definitions are under version control, can be validated, displayed, compared, labeled for change management, and used for output generation. Data mapping form can be used for complex views definitions as well.
Table definitions can be imported from PowerDesigner either directly from MDC or from PowerDesigner calling web services from VBScript. Such definitions are imported into several layers as read-only but can be used as data mapping targets. Version control, version compare and output generation are available as well.
All the validation and generation scripts are written and uploaded via script administration forms in MDC and are written in easy-to-understand custom language, which resembles VB script. Syntax highlighter and syntax checker is available for PSPad text editor. That allows the administrator to create the script in PSPad and upload it in.
MDC provides scripting engine, which is used for extract/table/mapping validation. Scripts are fully customizable and among others validate logical relationships between properties or in more complex cases to validate defined properties against remote databases. To ensure best performance, the scripts are compiled. The scripts are maintained by application administrator and used automatically by the application.
MDC scripting engine is used for output generation as well. This is the most advantages feature of the application. The same as for validation scripts, generation scripts are fully customizable, pre-compiled, and their access permission restricted. The main outputs at the moment are:
- Data definition SQL (create table statements, create statistics, create index, create views, etc.).
- Historization SQL.
- SQL queries.
- Sources/targets definitions for Informatica PowerCenter.
- Paramfiles and check plans for Informatica PowerCenter.
- Mapping and workflows for Informatica PowerCenter.
Version control, locking
All major objects are part of MDC version control. The application allows to defined either permanent or working version. Every version is stored with auditing information: version author, version timestamp, version number, change management label, and archive timestamp. While being edited, the version is locked for other user, it can only be view in read-only mode or used for output generation.
MDC supports definition of derived files called ETL files. These files are based on extract definitions, table definitions or the combination of the two. They are usually used for enriching the extracts with target primary key.
MDC supports process management by dynamic fully customizable forms. These provide support for SLA, new system additions, and change management. Functionality includes version control, compare, list of responsible people, dynamic forms, custom views, output generation, etc.
Also all major objects can be labeled to support release management. Mappings can be also locked by development status, which helps transferring mapping analysis between analysts and developers.
MDC API at the moment covers two major areas: table definition imports and running generation scripts.