Data Lineage Builder
Data lineage comprises of many elements and relationships between those elements. Some of this metadata can be loaded in from external metadata management tools. For example, it may be possible to load at least partial catalogues of processes and datastores from an architecture repository, or directly from Extract-Transform-Load (ETL) metadata. And it may be possible to load metadata about physical data from database schema.
But it is highly likely these sources of metadata will not provide a complete picture of the end-to-end data lineage, and so a lineage specialist using Architector needs to be able to plug the gaps.
The Lineage Builder component of Architector provides an interactive tool to manually create or patch data lineage. Data inputs and outputs can be specified, working on one process at a time. And new datastores, processes and physical data can be created where required.
In the example above, the Calculate CCF process takes data inputs from three sources of data, and writes data back to one of those data sources (the APAC Consolidation System). If the lineage specialist wants to change any data inputs they can list all the data elements for a data source and select the appropriated data:
Developing data lineage metadata is a specialist task. If a lineage team tries to piece together a data lineage using static diagrams and spreadsheets, it will very quickly realise that this approach has serious limitations – it is not scalable and generally does not produce a good result.
Architector’s metadata repository approach takes a bit more effort to set up initially, but then has many advantages over the static diagram/spreadsheet approach:
- Can scale as far as necessary – e.g. to many thousands of processes/datastores/etc
- Drives consistency in lineage documentation, which makes lineage artefacts much more usable
- Manages the quality of metadata, driving completeness and accuracy of lineage metadata, which in turn promotes confidence in lineage diagrams
- Supports ad-hoc and focused analysis across lineage metadata, which supports root-cause and impact analysis.