We all know how painful keeping track of your machine learning experiments can be.
You train a bunch of models of different flavours (Random Forests, XGBoost, Neural Networks, etc.).
For each model, you explore a range of hyper-parameters. Then you compute the performance metrics on some test data.
Sometimes you change the training data by adding or removing features.
Some other times, you have to work in a team and combine your results with other data scientits...
How do you manage these experiments in such a way the are easily tracable and therefore reproducible? MLflow is perfectly suited to this task.
To learn more about MLflow, watch the video tutorial.
Here's what I'll discuss:
- Setting up MLflow locally to track some machine learning experiences I performed on a dataset
-
For each model fit, using MLflow to track:
- metrics
- hyper-parameters
- source scripts executing the run
- code version
- notes & comments
- Comparing different runs between through the MFflow UI
- Setting up a tracking on AWS
To reproduce my experiments and set up MLflow on AWS, have a look at my Github repo.
Happy coding 💻