Skip to content

Experiment Infrastructure at Project AGI

Experiment Setup Overview

It’s such a joy to be able to test an idea, go straight to the idea without wrestling with the tools. We recently developed an experimental setup which, so far, looks like it will do just that. I’m excited about it and hope it can help you too, so here it is. We’ll go through the why we created another framework, and how each module in the experiment setup works. But, if you’re in a hurry, here’s a summary!


What: Experimental infrastructure for Machine Learning using Tensorflow
Why?: Allows you to test new algorithms efficiently
Tell me more:

  • You can write your algorithm
  • Set up a parameter sweep easily in a json file
  • Hit Build on Jenkins to run your experiment, a sweep of parameters, on any Compute appropriately setup, including cloud or your local machine
  • Go to web page with a table of your experiments, and high level metrics giving feedback of performance
  • Drill down on any experiment in the appropriate TensorBoard to look at detailed information

Tools: Jenkins, MLflow, TensorFlow/TensorBoard, run-framework (Project AGI Python runner), some compute
Software: github/project-agi/run-framework

Why create yet another framework?

We were looking for an end-to-end solution to manage compute, launch experiments and allow us to easily track, analyse and organise them. Our first objective was to evaluate existing solutions to avoid spending effort developing a custom solution. While there are many great machine learning platforms, such as RiseML, that seem quite promising, none of them fit exactly what we were looking for. For instance, some platforms only use cloud compute services, and would be either too difficult or impossible to add in additional local compute. Others had no experiment tracking capability. So we decided to design something that suits our requirements, while also utilising as many existing tools as possible and simply bringing them all together to form a solution.

Continuous Integration – Jenkins

Jenkins is a popular automation tool for building pipelines and continuous integration. The pipeline setup is quite straightforward since our experiment runner (run-framework) does most of the heavy lifting. Jenkins simply pulls the latest code from GitHub, parses the build parameters specified by the user, and passes it over to run-framework to sync and start the experiment.

Jenkins Build Menu

A Jenkins project’s build menu and parameters

The build menu is designed to make it easy for the user to pick a particular project and specify which machine to run the experiment on (local or otherwise). We keep a selection of common experiment configurations on GitHub, but the user is free to override that by providing a custom JSON definition.



run-framework was originally developed to work alongside AGIEF, our Java-based experimental framework, and was adapted to also work with our new TensorFlow and Python experimental framework. The purpose of run-framework is to simply parse the experiment definition from a JSON file, establish a connection with local or cloud compute (and set it up if necessary), and finally sync and execute the experiment remotely. It also does some housekeeping after the experiment is completed by uploading experiment artefacts to S3, and ensuring the compute, specifically cloud compute, is shutdown to save resources.

Our TensorFlow framework was designed to facilitate our research with quick experimentation and ability to add new algorithms to test with. The modular framework includes:

  • Component: a representation of the model/algorithm, which can also be a composite component with multiple sub-components that are trained in parallel
  • Workflow: sets up the dataset pipeline, training and evaluation, and any related experiment needs
  • Dataset: a simple representation of the datasets we use, powered by tf.Dataset API

The framework is designed to be very flexible and configurable, so anyone can create their own components, datasets or workflows that suit their experiment needs. This also allows us to easily run and configure any type of experiment via Jenkins.


MLflow is a relatively new Python package by Databricks to track and organise experiments. We were looking for a tool that will neatly organise parameter sweeps without being too intrusive in our codebase, and MLflow fit the bill.

MLflow Overview

An overview of the runs in a single experiment

Each run logs hyperparameters, metrics and other artifacts (such as exported data) and is tied to a single experiment, which is incredibly useful for analysing results from large amount of parameter sweeps. MLflow provides a bird’s eye view of the experiment run and allows you to filter runs by metrics and parameters. You can drill down to a specific run for more information.


TensorBoard is a great visualisation tool for TensorFlow, and other frameworks as well. We use it as part of our workflow to better understand, improve and debug our models. TensorBoard summary events mirror the organisation of experiments and runs in MLflow, which makes it easy to find the corresponding visualisations/metrics for a particular run.

Future Work

We found that this setup works pretty well with our research and experimentation workflow, but it’s just an initial setup that we will continue to improve moving forward. Some concrete improvements we are working on right now focus on the overall user experience. This includes improving the links between Jenkins, MLflow and TensorBoard to more easily go from a build to its metrics and visualisations. We also plan on further utilising cloud storage by including TensorBoard summaries, and setting up a central MLflow server to track experiments executed on multiple machines.

We hope to share more with the community through open-source projects and more blog posts on the topic. We would also love to hear from the community about their research and experimentation workflows.

Leave a Reply