We were tasked with the development of a system which would take an image as input and provide a social media caption as output. The text had to not only match the content of the image, but also be indistinguishable from that written by a human. The ‘human-like’ aspect of the generated texts was of the utmost importance - users can easily identify AI generated text which reduces the engagement and customer trust.
The model which we have developed generates social media captions which are indistinguishable from those written by real users in tone and grammatical structure. The model embellishes the generated text with appropriate emojis and hashtags to further imitate what an average social media user would write under an image.
In fact, the model even makes the same mistakes as an average user of social media, like misspelling or using a wrong word. The dataset of captions used to train the model included captions with mistakes in them, which the model also replicates from time to time. This serves as a testament to how well it replicates the real captions.