Neural Concept Binder

Abstract

The challenge in object-based visual reasoning lies in generating concept representations that are both descriptive and distinct. Achieving this in an unsupervised manner requires human users to understand the model's learned concepts and, if necessary, revise incorrect ones. To address this challenge, we introduce the Neural Concept Binder (NCB), a novel framework for deriving both discrete and continuous concept representations, which we refer to as “concept-slot encodings”. NCB employs two types of binding: “soft binding”, which leverages the recent SysBinder mechanism to obtain object-factor encodings, and subsequent “hard binding”, achieved through hierarchical clustering and retrieval-based inference. This enables obtaining expressive, discrete representations from unlabeled images. Moreover, the structured nature of NCB's concept representations allows for intuitive inspection and the straightforward integration of external knowledge, such as human input or insights from other AI models like GPT-4. Additionally, we demonstrate that incorporating the hard binding mechanism preserves model performance while enabling seamless integration into both neural and symbolic modules for complex reasoning tasks. We validate the effectiveness of NCB through evaluations on our newly introduced CLEVR-Sudoku dataset.

Learn Expressive, yet Inspectable and Revisable Concepts

Our proposed Neural Concept Binder (NCB) framework tackles the challenge of learning inspectable and revisable object-factor level concepts from unlabeled images by combining two key elements: (i) continuous representations via (block-)slot-attention based image processing with (ii) discrete representations via retrieval-based inference.

Inspection of Learned Concepts

One key advantage of NCB's concept representations is their inherent readability and inspectability. The concepts of an object can be inspected and compared to different concepts of the same block. For a more detailed understanding, concepts can even be swapped and new images based on the modified concept representation can be generated.

Concepts of an object can be inspected and compared.

CLEVR Sudoku

We introduce the CLEVR Sudoku dataset, a new benchmark that represents a challenging visual puzzle requiring both visual object perception and reasoning capabilitie. The dataset consists of 9x9 Sudoku puzzles with varying degrees of difficulty. Each image is annotated with the correct solution to the puzzle, which serves as the ground truth for evaluating the model's performance.

CLEVR-Sudoku requires visual perception as well as deductive reasoning skills. Now you can try solving the puzzles yourself directly on this website! Simply place the images where you think they belong and see if its right!

Play CLEVR-Sudoku Now!

In our evaluations we show the solving CLEVR Sudoku puzzles is harder than one might expect. Only small errors in the concept prediction (even in the case of supervised learning) cause a wrong symbolic representation of the grid and thus a wrong solution. With NCB we propose a strong baseline for solving the CLEVR Sudoku puzzles without supervision of the ground thruth concepts.

@article{stammer2024neural, title={Neural Concept Binder}, author={Stammer, Wolfgang and W{\"u}st, Antonia and Steinmann, David and Kersting, Kristian}, journal={Advances in Neural Information Processing Systems (NeurIPS)}, year={2024} }

Neural Concept Binder

The Neural Concept Binder (NCB) learns expressive yet inspectable and revisable concepts from unlabeled data.

Abstract

Learn Expressive, yet Inspectable and Revisable Concepts

NCB Step by Step

Inspection of Learned Concepts

CLEVR Sudoku

BibTeX