S²COPE: Self-Supervised Concept Discovery via Preference Learning

S²COPE: Self-Supervised Concept Discovery
via Preference Learning

Can unlabeled data teach foundation models interpretable concepts?

Rutgers University

Abstract

Current representation learning paradigms force a fundamental compromise: self-supervised methods scale to massive datasets but yield opaque features, whereas interpretable models remain bottlenecked by the need for dense human annotation.

We introduce Self-Supervised Concept discOvery via Preference lEarning (S²COPE), a label-free framework that resolves this dilemma. Instead of treating Vision-Large-Language Models (VLLMs) as static feature extractors, S²COPE leverages them as active participants in a self-supervised preference optimization loop. By autonomously hypothesizing, validating, and reinforcing candidate visual attributes directly from raw imagery, our framework discovers novel, structured concepts without a single label.

Extensive experiments across natural, medical, and physics domains demonstrate that S²COPE successfully extracts domain-specific concepts where standard VLLMs often fail to generate. By amortizing concept discovery directly into the VLLM backbone through our self-supervised preference objective — rather than relying on static generation and disjoint filtering — we achieve up to a 24-point absolute improvement in downstream top-1 classification accuracy on unseen data.

iNaturalist CUB HAM10000 MedMNIST Galaxy 10 Gravity Spy

+24pt

Max Top-1 Gain

+16

Avg Top-1 Gain

Datasets

Method

Figure 2 Overview of the S²COPE Discovery Loop. Our framework operates as an end-to-end, self-supervised discovery process. In iteration k, the VLLM policy π_k uses high-temperature sampling to hypothesize diverse candidate concepts C(x) for an unlabeled image x. To evaluate these proposals without human labels, we compute a self-supervised, cross-modal contrastive reward R(c, x) based on visual invariance. A candidate concept receives a high reward only if it is stable across augmented views (the positive set) while maintaining specificity against unrelated batch images. This automatically filters out generic, noisy descriptions (Answer A) in favor of discriminative, structured attributes (Answer B). An Easy-Negative pairing strategy (selecting pairs with the largest reward gap) converts these rewards into preference pairs (c_w, c_l) to form dataset 𝒟_k. Finally, Direct Preference Optimization (DPO) internalizes this invariance by updating the VLLM concept generator's weights, yielding a refined policy π_k+1 that iteratively transforms the VLLM into a self-supervised concept miner.

Results

Visualizing Self-Supervised Concept Discovery. For each sample, we contrast the top concepts generated by the VLLM baseline (top list) with our S²COPE-optimized model (bottom list). Red text indicates incorrect concepts for recognizing the image's category. S²COPE optimized model suppresses these nuisance concepts, extracting precise, physically grounded attributes.

Ablation **Ablation Studies on Reward Formulation.** **(a) Reward Components:** Impact of isolating the positive and negative signals of the contrastive reward. Eliminating the positive signal causes a performance collapse, while removing the negative signal yields a suboptimal accuracy plateau. **(b) Reward Modality:** Comparison of cross-modal image-text grounding versus unimodal text-text consensus. Cross-modal grounding against physical image features achieves better performance than relying on unimodal textual consensus.

@article{xiang2026scope, title={S$^2$COPE: Self-Supervised Concept Discovery via Preference Learning}, author={Xiang, Shilong and Zhang, Zirui and Mao, Chengzhi}, journal={arXiv preprint arXiv:2606.14586}, year={2026} }

S2COPE: Self-Supervised Concept Discoveryvia Preference Learning

Abstract

Method

Results

BibTeX

S²COPE: Self-Supervised Concept Discovery
via Preference Learning