DataMingler: A Novel Approach to Data Virtualization (ACM SIGMOD 2021)

Presenter: Damianos Chatziantoniou
Date: 03 June 2021

Abstract

A Data Virtual Machine (DVM) is a novel graph-based conceptual model, similar to the entity-relationship model, representing existing data (persistent, transient, derived) of an organization. A DVM can be built quickly, agilely, offering schematic flexibility to data engineers. Data scientists can visually define complex dataframe queries in an intuitive and simple manner, which are evaluated within an algebraic framework. A DVM can be easily materialized in any logical data model and can be “reoriented” around any node, offering a “single view of any entity”. In this paper we demonstrate DataMingler, a tool implementing DVMs . We argue that DVMs can have a significant practical impact in analytics environments.