Overview and benchmark of traditional and deep learning models in text classification

Posted on mar. 12 juin 2018 in Deep Learning • Tagged with NLP, CNN, RNN, GRU, transfer learning, deep learning, keras, neural networks, Twitter, GloVe, Bag of words, word ngrams, character ngramsLeave a comment

This article is an extension of a previous one I wrote when I was experimenting sentiment analysis on twitter data. Back in the time, I explored a simple model: a two-layer feed-forward neural network trained on keras. The input tweets were represented as document vectors resulting from a weighted average of the embeddings of the words composing the tweet. The embedding I used was a word2vec model I trained from scratch on the corpus using gensim. The task was a binary classification and I was able with this setting to achieve 79% accuracy.
The goal of this post is to explore other NLP models trained on the same dataset and then benchmark their respective performance on a given test set. We'll go through different models: from simple ones relying on a bag-of-word representation to a heavy machinery deploying convolutional/recurrent networks: We'll see if we'll score more than 79% accuracy!
Let's investigate !

Continue reading

Sentiment analysis on Twitter using word2vec and keras

Posted on jeu. 20 avril 2017 in NLP • Tagged with NLP, word2vec, doc2vec, deep learning, keras, neural network, TwitterLeave a comment

The focus of this post is sentiment analysis. This is a Natural Language Processing (NLP) application I find challenging but enjoyable. It aims at identifying emotional states, reactions and subjective information. It tries to determine the attitude of a speaker with respect to some topic.
If done automatically with high precision and on a large scale, sentiment analysis could be a goldmine for marketers or politicians who want to measure the public opinion through social networks.
In this post I'll show you how I built a machine learning model that classifies tweets with respect to their polarity. Tweets are short and yet capture lots of subjective information. That's why we'll be playing with them.
Some words for those who are ready to dive in the code: I'll be using python, gensim, the word2vec model and Keras.

Continue reading

How to mine newsfeed data and extract interactive insights in Python

Posted on mer. 15 mars 2017 in NLP • Tagged with Data science, Python, tf-idf, LDA, Kmeans, Newsapi.org, NLP, Topic mining, Text Clustering, BokehLeave a comment

In this tutorial we'll dive in Topic Mining. We'll analyze a dataset of newsfeed extracted from more than 60 sources. We'll show how to process it, analyze it and extract visual clusters from it. We'll be using great python tools for interactive visualization, topic mining and text analytics.
All the code is available to you to run and test. No bullshit.

Continue reading

Welcome !

Posted on sam. 05 mars 2016 in Random, Data Science • Tagged with Machine Learning, Data Science, NLP, Dataviz, AnalyticsLeave a comment

Welcome to my blog!

My name is Ahmed. I'm a junior data scientist living in France. This blog is a personal project I'm embarking on to present some work I'm developing independently. It'll be about data analytics with a focus on machine learning.

In future posts, I'll write about some data science use-cases I'll make, some tools I enjoy using or some reading material I find worth sharing.

I do not pretend expertise. I'm writing about things I just learnt. So if you spot any mistake please don't hesitate to point it out.

Also if you have any recommendation that could improve this blog, I'm all ears and the comment section is yours.

Hope to see you around.

Continue reading