Automated Image Captioning using Deep Learning

Bhoomi Thakkar


Today, over eighty percent data for business analytics is non-textual and unstructured like JPEG images, MPG videos. Image analytics therefore becomes important. Many algorithms that deal with image processing do not really understand what an image entails. However during Image captioning, a computer will be able to identify the different objects in the image, the relations among them and present it in a grammatically accurate description. It is amajor collaboration of Natural Language Processing and Computer Vision and is a highly elusive task for machines. In this article, a model has been proposed to automatically generate captions for images which can aid the visually challenged, optimise content-based retrieval of images and also be extended for captioning real-time images and video annotation. The model presented in this paper is implemented in two stages, at the high-level, feature extraction phase and language model for word-by-word sentence generation, trained on the images in Flickr8k dataset. The efficacy of the model, measured using the BLEU metric, is comparable to the current state-of-the-art technique.


Convolution Neural Network (CNN), Image Captioning, Long Short Term Memory (LSTM), Recurrent Neural Network (RNN)

Full Text: PDF


  • There are currently no refbacks.


All Rights Reserved © 2012 IJARCSEE

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 Unported License.