Click on an Example or Upload an Image

Try it out!

Drop files (Max 2) here or click here.

About Capttitude

As humans, we often tend to associate a sentiment with an image and express that in the caption. One such method is by using an Emoji. Additionally, a richer use of text would be another way to express the associated sentiment with the caption. Adjectives can add an emotional component to nouns and the resulting Adjective Noun Pairs (ANP) can express the visual contents in the image. Hence incorporating adjectives into machine generated captions is one feasible way of adding an emotional component to the caption.

The Capttitude is a model that is capable of generating affective image captions with an emotional component and thereby going beyond the factual image descriptions. It provides access to two methods. Firstly, combination of a Convolutional Neural Network (CNN) with a Long Short Term Memory (LSTM) Network [Show and Tell]. The output sentences are further refined with the addition of ANPs from DeepSentiBank. Secondly, a graph based Concept and Syntax Transition (CAST) Network. The directed graph is generated from the training set captions of YFCC100M dataset, connecting the “Concepts” (nouns, adjectives, verbs) with one another. To expand the vocabulary of the captioning model Word2Vec similarity is used to connect conceptually similar nodes in the graph. A run through the graph from “Start” to “End” containing the activated nodes represents a generated caption of the image. Hence, a user is provided with a pair of emotionally rich captions from these two methods.


[1] B. Thomee, D. A. Shamma, G. Friedland, B. Elizalde, K. Ni, D. Poland, D. Borth, and L. Li. YFCC100M: The New Data in Multimedia Research. Communications of the ACM, 2016

[2] T. Karayil, P. Blandfort, D. Borth, and A. Dengel. Generating Affective Captions using Concept And Syntax Transition Networks. ACM Multimedia 2016, Grand Challenge

[3] T. Chen, D. Borth, T. Darrell, and S.-F. Chang. DeepSentiBank: Visual sentiment concept classification with deep convolutional neural networks. arXiv preprint arXiv:1410.8586, 2014

[4] Vinyals, Oriol, et al. "Show and tell: A neural image caption generator." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.

If you like our work, please cite our work "Generating Affective Captions using Concept And Syntax Transition Networks" mentioned in References