How it works
This is a web demo that allows to perform image captioning with visual attention mechanism to highlight the areas of the image where the model look when generating a token.
The model implementation is in PyTorch and based on this paper called: "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"
Code
Coming soon.