Large Multimodal Models as General In-Context Classifiers
CVPR Findings 2026
A representation of CIRCLE (CIRCLE Iteratively Refines Contextual Learning Examples): starting from unannotated images, it first assigns pseudo-label to each one independently. Next, it iteratively refines their labels by taking into account all the other images. As a result, CIRCLE produces a context that can be used to classify new inputs via In-Context Learning (ICL).
ABSTRACT
Which multimodal model should we use for classification? Previous studies suggest that the answer lies in CLIP-like contrastive Vision-Language Models (VLMs), due to their remarkable performance in zero-shot classification. In contrast, Large Multimodal Models (LMMs) are more suitable for complex tasks. In this work, we argue that this answer overlooks an important capability of LMMs: in-context learning. We benchmark state-of-the-art LMMs on diverse datasets for closed-world classification and find that, although their zero-shot performance is lower than CLIP's, LMMs with a few in-context examples can match or even surpass contrastive VLMs with cache-based adapters, their "in-context" equivalent. We extend this analysis to the open-world setting, where the generative nature of LMMs makes them more suitable for the task. In this challenging scenario, LMMs struggle whenever provided with imperfect context information. To address this issue, we propose CIRCLE, a simple training-free method that assigns pseudo-labels to in-context examples, iteratively refining them with the available context itself. Through extensive experiments, we show that CIRCLE establishes a robust baseline for open-world classification, surpassing VLM counterparts and highlighting the potential of LMMs to serve as unified classifiers, and a flexible alternative to specialized models.
• The first systematic analysis of ICL in LMMs for closed-world image classification.
• In-depth comparison of LMM behavior and caching-based VLMs, showing that LMMs with ICL can match and even surpass VLMs.
• Introduction of CIRCLE, a new approach that enhances LMMs for open-world classification using only unlabeled images as ICL examples, iteratively refining their pseudo-labels.
• Extensive benchmarking of CIRCLE against naïve ICL, showing that the latter struggles in open-world settings.
• Performance improvements: CIRCLE largely improves the performance of the base model, consistently surpassing VLMs, making a valid case for adopting LMMs for discriminative tasks.
• In-depth comparison of LMM behavior and caching-based VLMs, showing that LMMs with ICL can match and even surpass VLMs.
• Introduction of CIRCLE, a new approach that enhances LMMs for open-world classification using only unlabeled images as ICL examples, iteratively refining their pseudo-labels.
• Extensive benchmarking of CIRCLE against naïve ICL, showing that the latter struggles in open-world settings.
• Performance improvements: CIRCLE largely improves the performance of the base model, consistently surpassing VLMs, making a valid case for adopting LMMs for discriminative tasks.
We provide additional details and complete per-dataset and per-model results in the supplementary material. You can find it at the end of the arXiv paper.
• Marco Garosi (DISI, University of Trento)
• Matteo Farina (DISI, University of Trento)
• Alessandro Conti (DISI, University of Trento)
• Massimiliano Mancini (DISI, University of Trento)
• Elisa Ricci (DISI, University of Trento, and Fondazione Bruno Kessler)
• Matteo Farina (DISI, University of Trento)
• Alessandro Conti (DISI, University of Trento)
• Massimiliano Mancini (DISI, University of Trento)
• Elisa Ricci (DISI, University of Trento, and Fondazione Bruno Kessler)
You can cite this work as:
@inproceedings{garosi2026circle,
title = {Large Multimodal Models as General In-Context Classifiers},
author = {Garosi, Marco and Farina, Matteo and Conti, Alessandro and
Mancini, Massimiliano and Ricci, Elisa},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision
and Pattern Recognition Findings},
year = {2026}
}