INSIGHTS
Case Study

Building a New CLI tool and Establishing CI/CD Infrastructure for Data Science Workflow Support

Problem

Our client had an internal machine learning (ML) training server where data science team members used a command line interface (CLI) to communicate with the network to train various ML models targeted for clinical enrollment optimization, preclinical research optimization, informatics,

computational tasks, and computer vision tasks. They had a typescript CLI that had become bloated with unused features, and they were relying on manual checks and balances for many of their projects and repositories. Our client requested us to make a lighter application and streamline their workflow so that they could efficiently work ML modeling and reduce time manually launching new ML training and utilizing CICD.

Solution

We ported essential functionality from the legacy typescript CLI to build a new CLI in python and make a leaner application that both aligned with the needs of the data science team and kept backwards compatibility for other teams. Additionally, we set up core infrastructure to change the overall workflow of the client’s team from running processes manually to relying on CICD in bitbucket. We offered support, documentation, and training to data scientists and non-computational users to adjust to the new workflow.

Outcome

These changes allowed the data science team to streamline their workflow, focus on modeling, and align with modern best practices across the industry.