GOAT-Bench:
Safety Insights to Large Multimodal Models through Meme-Based Social Abuse

Hong Kong Baptist University
*Equal Contribution, Corresponding author

cshzlin@comp.hkbu.edu.hk, cszyluo@comp.hkbu.edu.hk, majing@comp.hkbu.edu.hk

Abstract

The exponential growth of social media has profoundly transformed how information is created, disseminated, and absorbed, exceeding any precedent in the digital age. Regrettably, this explosion has also spawned a significant increase in the online abuse of memes. Evaluating the negative impact of memes is notably challenging, owing to their often subtle and implicit meanings, which are not directly conveyed through the overt text and imagery. In light of this, large multimodal models (LMMs) have emerged as a focal point of interest due to their remarkable capabilities in handling diverse multimodal tasks. In response to this development, our paper aims to thoroughly examine the capacity of various LMMs to discern and respond to the nuanced aspects of social abuse manifested in memes. We introduce the comprehensive meme benchmark, GOAT-Bench, comprising over 6K varied memes encapsulating themes such as implicit hate speech, sexism, and cyberbullying, etc. Utilizing GOAT-Bench, we delve into the ability of LMMs to accurately assess hatefulness, misogyny, offensiveness, sarcasm, and harmful content. Our extensive experiments across a range of LMMs reveal that current models still exhibit a deficiency in safety awareness, showing insensitivity to various forms of implicit abuse. We posit that this shortfall represents a critical impediment to the realization of safe artificial intelligence.

The GOAT Benchmark

goatbench

We introduce the GOAT-Bench, a comprehensive and specialized dataset designed to evaluate large multimodal models through meme-based multimodal social abuse. GOAT-Bench comprises over 6K diverse memes, encompassing a range of themes including hate speech and offensive content. Our focus is to assess the ability of LMMs to accurately identify online abuse, specifically in terms of hatefulness, misogyny, offensiveness, sarcasm, and harmfulness. We meticulously control for the granularity of each specific meme task to facilitate a detailed analysis. Furthermore, we extend our evaluation to assess the effectiveness of thought chains in discerning the underlying implications of memes for deducing their potential threat to safety.

Experiment Results

radar
table1
table2

BibTeX

@article{lin2023goatbench,
  title={GOAT-Bench: Safety Insights to Large Multimodal Models through Meme-Based Social Abuse},
  author={Lin, Hongzhan and Luo, Ziyang and Wang, bo and Yang, Ruichao and Ma, Jing},
  journal={arXiv preprint arXiv:2401.01523},
  year={2024}
}