INSIGHTS
Case Study

Traceable Data Management System

Problem

Traversing data silos and understanding the connections across datasets is the biggest hurdle our clients are facing and the one we are solving with TDMS. We were asked to come up with a solution that allows easy usage and extraction of the data for decision-making without prior knowledge of the data structure and source.

Solution

In short, the Traceable Data Management System (TDMS) is a robust data management solution that helps our client:

  • Manage the workflows of research lab data
  • Understand the underlying scientific semantics of the data/metadata
  • Provide the operation capability to run the data generation and processing
  • Allow for data versioning
  • Have an intuitive front-end browser (example below) for users to access data, control workflows, etc.

We developed a solution for standardized data provenance management and traceability that enables R&D organizations to keep track of data origin, transformations, ownership, and usage. The main components of the system include: 

  • Traceable Scientific Data Network Browser (TDNB)- the exploration of scientific data provenance and context with domain knowledge and science-driven visualizations
  • Traceable Data Capture – automatic discovery agents (for historic data and ongoing locations such as lab instruments), human-curated data entry, add-on component, and API for existing external data processing pipelines and new software development efforts
  • Traceability Data Store – advanced flexible data model, infrastructure & datapoint-level security, built-in DevOps and DataOps
  • Analytics – built-in MLOps and Visualization Dashboards (both commercial and custom-made), science-aware solutions (cheminformatics, bioinformatics, etc.), algorithm and data versioning (Gitlab style)
  • Platform Apps and Components
    • Pipelining App: an IDE for data pipelines and interpreter/runner of pipelines designed externally such as AWS Step, Airflow, KNIMEVisualization App: replacement for Spotfire/Tableau/Neo4J, core implementation of TDNB conceptSearch Engine App: replacement for Elastic Search, science-driven
    context-aware
    • algorithmParser App: converting files (CSV, PDF, Excel, images, etc.) into structured scientific dataLists management (grouping of data into named lists that become meta-data)
    • Hardware management (such as off-the-shelf connectors for specific instruments lifecycle)
  • Scientific R&D Apps and Workflows
    • Requestor App
    • Data loader & QC agents
    • Experiment design
    • QSAR
    • Data packaging (standardized datasets)

Outcome

TDMS Platform Apps help you answer business questions like:

  • How much data am I holding? What’s my data footprint?
  • How is the data connected? Who/What is generating it?
  • How is it used?
  • What kind of data do I have?
  • What is the replication/redundancy of that data?
  • What is the rate of data growth?
  • What’s the opportunity to save cost on data storage?
  • What kind of data do I need to keep, or delete? What should be my organizational data policies? How are they applied?
  • How does the data flow out/in SaaS?
  • Who can access datasets, and datapoints? Who has accessed it in the past and how?
  • How many data personnel do I need?
  • What software version are we using to analyze that dataset (infrastructure metadata)?

This is an AI-driven platform logic. It discovers and provides answers to questions thanks to metadata coming from different data islands (data files, infrastructure metadata, pipeline parameters, SaaS data).

This is a solution feeding into our clients’ existing data architecture.