Harnessing Technology for Data Lineage
Architector utilises leading edge technology to simplify and extend the capabilities of data lineage management. Technologies used include cloud technology, natural language processing (NLP), graph database technology, and (soon) Hadoop.
Architector can be installed on any cloud computing network, which brings maximum scalability, resilience, and accessibility. The system can also be installed on local servers, and both Windows and Unix environments are supported.
Natural Language Processing
Natural language processing (NLP) methods are used to parse data definitions, to understand the semantics of the definition. Architector also provides interactive NLP utilities for a definition analyst to work on definitions. For example, we provide online access to WordNet (a lexical database maintained by Princeton University).
Architector can measure the quality of definitions against language-based rules. For example, testing that the definition is a complete sentence, and that it does not contain swear words. Architector has about twenty such NLP rules built it, which can be switched on and off as required.
Here is the main definition assessment panel, which exposes several NLP capabilities:
And here is a summary of some lexical, grammatical, and consistency errors that Architector has automatically discovered for a definition, using NLP:
Data lineage is inherently graphical in nature, and it is mainly a directed graph. In other words, data flows between nodes in a certain direction. Architector stores lineage metadata in a graph database and provides online interaction with this for users. A user can run graph queries to analyse relationships between data, definitions, processes, systems, datastores, issues, owners, workflows, etc. Graph technology is very fast to execute such queries, because there are no joins involved. For more information about how Architector utilises graph technology contact us.
As well as providing unfettered access to the lineage metadata via the graph database, Architector has several pre-built, interactive views that are very easy-to-use and business friendly. See the data lineage page for more details.
We are working to harness the power of Hadoop, for example to speed up the process of assessing definition quality, and to allow even deeper quality assessments than we can make today. More news of this will be available via our blog early in 2016!