Vision-Language Programs

Task Birthday Cake — few-shot examples
+ Positive
- Negative
Type G
+ imgs
⚡ VLM Query
VLM
📷 imgs Type G
symbols
Objects
cake candles fire person
Properties
colorful lit decorated
Actions
blow burn
Task Birthday Cake — few-shot examples
Domain-Specific Language V ∪ F ∪ O  ·  typed PCFG  ·  search
🔭
VLM Functions V
perception interface
get_objects(IMG, obj) → list[float]
get_actions(IMG, action) → list[float]
🧮
Symbolic Functions F
reasoning primitives
exists_object(s, obj) → bool
count_object(s, obj) → int
exists_obj_with_property(s,o,p) → bool
⚙️
Operators O
logical composition
and or xor not eq? gt?
Grounded Symbols → cake candles fire person colorful lit decorated blow burn
Search Tree (PCFG)
bool
└─exists_object(s, ·)
└─get_objects(IMG)
├─cake0.86
├─candles0.67
└─person0.08
├─or(·, ·)
├─exists_object(s, cake)
└─exists_object(s, candles)
└─and(·, ·)
├─exists_object(s, cake)
└─exists_object(s, candles)
└─exists_object(s, cake)
└─exists_obj_with_property(s, candleslit)
Occurrence-based Prior
P(symbol | positives)
cake
0.86
candles
0.67
person
0.08
Program Evaluation — Search
Program Accuracy
#–
exists_object(get_objects(IMG), cake)
#–
exists_object(get_objects(IMG), candles)
#–
exists_object(get_objects(IMG), person)
#–
or(
 exists_object(get_objects(IMG), cake),
 exists_object(get_objects(IMG), candles))
#–
and(
 exists_object(get_objects(IMG), cake),
 exists_object(get_objects(IMG), candles))
#–
and(exists_object(..., cake),
    exists_obj_with_property(
      get_objects(IMG), candles, lit))
✓ Best Program — p*
p* = (and
 (exists_object (get_objects IMG) cake)
 (exists_obj_with_property (get_objects IMG) candles lit))
Balanced accuracy: 100% on 6-shot task