Introduction to AutoML with MLBox

Posted on Mar 08 octobre 2019 in AutoML • Tagged with MLBoxLeave a comment


Today's post is very special. It's written in collaboration with Axel de Romblay the author of the MLBox Auto-ML package that has gained a lot of popularity these last years.
If you haven't heard about this library, go and check it out on github: It encompasses interesting features, it's gaining in maturity and is now under active development.
In this post, we'll show you how you can easily use it to train an automated machine learning pipeline for a classification problem. It'll start off by loading and cleaning the data, removing drift, launching a strong pipeline of accelerated optimization and generating predictions.


Continue reading

Introduction to Neural Networks and Deep Learning from scratch

Posted on Sam 31 août 2019 in Deep Learning Introduction • Tagged with Deep Learning, Tutorial, workshop, slides, presentationLeave a comment


If you're willing to understand how neural networks work behind the scene and debug the back-propagation algorithm step by step by yourself, these slides should be a good starting point.
We will cover deep learning popular applications, the concept of the artificial neuron and how it relates to the biological one, the perceptron and the multi-layer one. We'll also dive in activation functions, loss functions and formalize the training of a neural net via the back-propagation algorithm.
In the last part, you'll learn how to code a fully functioning trainable neural network from scratch. In pure python code only, with no frameworks involved.


Continue reading

Automate the diagnosis of Knee Injuries with Deep Learning part 3: Interpret models' predictions

Posted on Mer 21 août 2019 in Medical Imaging • Tagged with MRI, Medical Imaging, MRNet, Convolutional Neural Networks, PyTorch, interpretability, Class Activation Map, CAMLeave a comment


In this post, we will focus on interpretability to assess what the ACL tear detector we trained in the previous article actually learnt.
To do this, we'll explore a popular interpretability technique called Class Activation Map, applied when using convolutional neural networks that have a special architecture. By using this method, we'll highlight discriminative areas the network focus on before making a prediction when confronted with an image thus explaining the decision process and building trust.
CAM is also a generic method that can be applied to a variety of computer vision projects. So if you're looking for a way to make your CNNs interpretable you should read this tutorial and adapt the source code. Let's get started.


Continue reading

Automate the diagnosis of Knee Injuries with Deep Learning part 2: Building an ACL tear classifier

Posted on Dim 14 juillet 2019 in Medical Imaging • Tagged with MRI, Medical Imaging, Computer Vision, MRNet, Convolutional Neural Networks, PyTorch, image classificationLeave a comment


In this post, you'll build up on the intuitions you gathered on MRNet data by following the previous post. You'll learn how to use PyTorch to train an ACL tear classifier that sucessfully detects these injuries from MRIs with a very high performance. We'll dive into the code and we'll go through various tips and tricks ranging from transfer learning to data augmentation, stacking and handling medical images. You'll also learn about optimization tricks as well as how to organize your code efficiently. If you're a crafty AI engineer who wants to play with code to learn how things work, just keep reading !


Continue reading

Automate the diagnosis of Knee Injuries with Deep Learning part 1: an overview of the MRNet Dataset

Posted on Mar 25 juin 2019 in Medical Imaging • Tagged with MRI, Medical Imaging, Computer Vision, MRNet, Convolutional Neural Networks, PyTorch, image classification, Jupyter WidgetsLeave a comment


If you are interested in learning an impactful medical application of artificial intelligence, this series of articles is the one you should looking at.
My goal is to show you how you can use deep learning and computer vision to assist radiologists in automatically diagnosing severe knee injuries from MRI scans.
To do this, we'll first explore the MRNet dataset in this first post. We'll then build a deep learning classification model in PyTorch in the next post and develop an interpretation pipeline in the last one.
By the end, you'll have an overview of a medical imaging application with different components that you can use elsewhere in similar situations.
Let's start.


Continue reading

Overview and benchmark of traditional and deep learning models in text classification

Posted on Mar 12 juin 2018 in Sentiment Analysis • Tagged with NLP, CNN, RNN, GRU, transfer learning, deep learning, keras, neural networks, Twitter, GloVe, Bag of words, word ngrams, character ngramsLeave a comment


This article is an extension of a previous one I wrote when I was experimenting sentiment analysis on twitter data. Back in the time, I explored a simple model: a two-layer feed-forward neural network trained on keras. The input tweets were represented as document vectors resulting from a weighted average of the embeddings of the words composing the tweet. The embedding I used was a word2vec model I trained from scratch on the corpus using gensim. The task was a binary classification and I was able with this setting to achieve 79% accuracy.
The goal of this post is to explore other NLP models trained on the same dataset and then benchmark their respective performance on a given test set. We'll go through different models: from simple ones relying on a bag-of-word representation to a heavy machinery deploying convolutional/recurrent networks: We'll see if we'll score more than 79% accuracy!
Let's investigate !


Continue reading

Understanding deep Convolutional Neural Networks with a practical use-case in Tensorflow and Keras

Posted on Lun 13 novembre 2017 in Computer Vision • Tagged with Deep learning, Convolutional Neural Networks, Image Classification, Keras, Tensorflow, AWS, GPULeave a comment


Convolutional Neural Networks (CNNs) are nowadays the standard go-to technology when it comes to analyzing image data. These are special neural network architectures that perform extremely well on image classification. They are widely used in the computer vision industry and are shipped in different products: self driving cars, photo tagging systems, face detection security cameras, etc.
The theory behind convnets is beautiful. It attempts to explain and reverse-engineer the vision process. In this article, I'll go through it and explain what CNNs are all about. I'll try to go over the hype you see on the mass media and provide a detailed explanation with code snippets and interpretations.
This is also a hands-on guide to setup a deep learning dedicated machine on AWS and develop an end-to-end CNN model from scratch using Keras and Tensorflow.
By the end of this post you should have the global picture about CNNs: How do they work? and How to put them in practice?


Continue reading

Sentiment analysis on Twitter using word2vec and keras

Posted on Jeu 20 avril 2017 in Sentiment Analysis • Tagged with NLP, word2vec, doc2vec, Deep Learning, Keras, Neural Networks, TwitterLeave a comment


The focus of this post is sentiment analysis. This is a Natural Language Processing (NLP) application I find challenging but enjoyable. It aims at identifying emotional states, reactions and subjective information. It tries to determine the attitude of a speaker with respect to some topic.
If done automatically with high precision and on a large scale, sentiment analysis could be a goldmine for marketers or politicians who want to measure the public opinion through social networks.
In this post I'll show you how I built a machine learning model that classifies tweets with respect to their polarity. Tweets are short and yet capture lots of subjective information. That's why we'll be playing with them.
Some words for those who are ready to dive in the code: I'll be using python, gensim, the word2vec model and Keras.


Continue reading

How to mine newsfeed data and extract interactive insights in Python

Posted on Mer 15 mars 2017 in Topic Mining • Tagged with tf-idf, LDA, Kmeans, Newsapi.org, NLP, Topic mining, Text Clustering, BokehLeave a comment


In this tutorial we'll dive in Topic Mining. We'll analyze a dataset of newsfeed extracted from more than 60 sources. We'll show how to process it, analyze it and extract visual clusters from it. We'll be using great python tools for interactive visualization, topic mining and text analytics.
All the code is available to you to run and test. No bullshit.


Continue reading

How to score 0.8134 in Titanic Kaggle Challenge

Posted on Mer 10 août 2016 in Kaggle • Tagged with Kaggle, Titanic Challenge, Data Science, Tutorial, Data Science Competition, ClassificationLeave a comment


The Titanic challenge on Kaggle is a competition in which the task is to predict the survival or the death of a given passenger based on a set of variables describing him such as his age, his sex, or his passenger class on the boat.
I have been playing with the Titanic dataset for a while, and I have recently achieved an accuracy score of 0.8134 on the public leaderboard.
As I'm writing this post, I am ranked among the top 4% of all Kagglers: More than 4540 teams are currently competing.
This post is the opportunity to share my solution with you.


Continue reading