Polyphonic (Contour-Shared) Portraiture (PCP) AI Benchmark

Polyphonic (Contour-Shared) Portraiture (PCP) Benchmark is a multistable-structure interpretation benchmark for AI vision–language systems. It evaluates whether a model can identify and reason about multiple coherent readings intentionally encoded into a single unified portrait, while remaining faithful to the work’s shared-contour constraints and compositional “grammar.”

Unlike generic image-captioning or open-ended art commentary, PCP is designed to test:

Perceptual multistability detection (discovering more than one stable parse).
Constraint-based justification (linking each interpretation to concrete visual evidence: shared contours, figure/ground boundaries, axis scaffolds, occlusion logic).
Parse co-existence (maintaining multiple readings simultaneously, not replacing one with another).
Structural economy (recognizing that the same stroke/edge serves more than one role).
Cross-modal anchoring (optional): aligning the visual structure with an accompanying conceptual statement (e.g., dependent origination / field of conditions) without inventing unsupported details.

Polyphonic (Contour-Shared) Portraiture Benchmark (PALA)
PALA curates R. Latchman’s Polyphonic (Contour-Shared) Portraiture as a vision–language benchmark for multistable visual reasoning. The benchmark evaluates whether AI systems can extractmultiple coherent readings from a single unified portrait while remaining grounded in shared-contour evidence (edges, boundaries, axes, occlusions) and preserving co-existing interpretations rather than overwriting one with another. PCP therefore tests a capability that standard captioning and aesthetic commentary do not: constraint-based interpretation of intentionally polyphonic images, with explicit control of hallucination and evidence discipline.

PCP is a curated benchmark for evaluating evidence-grounded, multi-parse interpretation in vision–language models under contour-sharing composition constraints.

We introduce Polyphonic (Contour-Shared) Portraiture (PCP) as a curated benchmark for evaluating multistable visual reasoning in vision–language models (VLMs). PCP is grounded in a distinct class of artworks authored within R. Latchman’s Polyphonic (Contour-Shared) Portraiture practice, in which a single unified composition is intentionally constructed to support multiple coherent perceptual parses (e.g., portrait, sub-figures, structural fields), while preserving a disciplined “shared-contour” logic. PCP evaluates whether an AI system can (i) discover more than one stable interpretation, (ii) maintain co-existing parses without overwriting, and (iii) justify each parse using explicit visual evidence (contours, boundaries, axis scaffolds, occlusions, and shared structural edges). The benchmark targets a capability gap not addressed by standard captioning, aesthetic description, or generic visual question answering: constraint-based interpretation of intentionally polyphonic images with controlled hallucination.

1. Motivation

Current evaluations for VLMs primarily reward single-shot captioning, object recognition, or loosely grounded “style commentary.” These tasks do not reliably test whether a model can handle images that are designed to be read in multiple coherent ways, where perception itself is a structured variable rather than a fixed label. In human cognition, such stimuli probe multistable perception, figure–ground organization, and evidence-disciplined inference under ambiguity. PCP operationalizes these cognitive demands into a reproducible evaluation protocol suitable for research, benchmarking, and longitudinal model assessment.

PCP is motivated by the observation that many model failures in real-world settings—misinterpretation, overconfident narration, and hallucination—arise precisely when images admit competing plausible explanations. PCP therefore treats ambiguity not as noise, but as an intentional test surface.

2. Benchmark Construct: What PCP Tests

PCP evaluates polyphonic visual reasoning under contour-sharing constraints, decomposed into five measurable competencies:

Perceptual multistability detection
Ability to identify two or more coherent, non-trivial interpretations (parses) of the same image.
Constraint-based justification (evidence discipline)
Ability to ground each interpretation in explicit, checkable visual features (e.g., shared contours, figure/ground boundaries, axis scaffolds, occlusion relations, repeated structural edges). Outputs that rely on vague affective language or ungrounded invention are penalized.
Parse co-existence and non-collapse
Ability to preserve multiple valid readings simultaneously and articulate their compatibility (or controlled tension) without forcing a single “final” answer.
Structural economy recognition
Ability to identify that the same stroke/edge/boundary serves multiple roles across readings—an explicit hallmark of contour-shared construction.
Cross-modal anchoring (optional module)
When accompanied by a short doctrinal statement (e.g., dependent origination / field-of-conditions framing), ability to align the conceptual interpretation with the visual evidence without adding unsupported narrative content.

3. Inputs and Task Definition

Inputs

A curated set of PCP artworks (image + minimal metadata).
Optional short accompanying doctrinal text (used only in the cross-modal module).

Core Task (PCP-Core)
Given an image, the model must produce:

At least two coherent parses, each described succinctly.
For each parse: evidence anchors referencing concrete visual structure (edges, contours, axes, boundaries, occlusions, repeated motifs).
A co-existence statement describing how the parses are jointly supported by shared structure.

Optional Task (PCP-X: Cross-Modal Anchoring)
Given image + short statement, the model must:

Map the statement to one or more parses and justify the mapping via image-grounded evidence.
Avoid interpretive drift and narrative invention.

4. Failure Modes

PCP is specifically constructed to elicit and measure known VLM failure patterns:

Single-caption collapse: model gives one caption and misses multistability.
Aesthetic drift: model produces generic “art talk” without structural grounding.
Narrative hallucination: the model invents a story or objects to force coherence.
Parse overwriting: the model proposes multiple readings but treats them as mutually exclusive, without a shared-structure analysis.
Evidence underspecification: model claims “a face” or “a figure” without pointing to the contour logic that makes the claim checkable.

Because PCP’s target artworks are intentionally constructed around shared edges and multi-role boundaries, these failures become measurable rather than anecdotal.

PCP occupies a rigorous niche between perception science and AI evaluation. It provides a controlled, repeatable test surface for abilities that matter in high-stakes settings: managing ambiguity, grounding claims, and maintaining structured uncertainty. In doing so, PCP expands evaluation beyond “what objects are present” toward “how a system reasons when perception itself has multiple stable solutions.”

The PALA Institute for Spiral Cognition curates Polyphonic (Contour-Shared) Portraiture (PCP) as a benchmark for evaluating multistable, evidence-grounded visual reasoning in modern AI systems. PCP formalises a class of artworks engineered for co-existing coherent readings within a single unified portrait field. By requiring explicit visual anchors (contours, boundaries, axes, occlusion relations) and penalising unsupported invention, PCP measures a capability gap not addressed by conventional captioning or aesthetic commentary: constraint-disciplined interpretation under intentional ambiguity. PCP thus functions both as an evaluation suite and as a research instrument for studying grounded inference, hallucination resistance, and the discovery of perceptual structures in vision–language models.

PCP–01 "The Field of Conditions"

What we call [an event] arises from nearby causes and conditions; yet those nearby conditions themselves stand within a broader world-field of conditions—the character of an era.

This world-field is like the season of reality: it shapes what can flourish, what fades quickly, and what changes course. Seeing both levels together is to see dependent origination more completely.'' -- RL

བྱ་བ་ཞིག་ཅེས་པ་ནི་ཉེ་རྒྱུ་རྐྱེན་ལས་འབྱུང་ཡང་། ཉེ་རྐྱེན་དེ་དག་ཀྱང་དུས་སྐབས་ཀྱི་ཁྱད་པར་མཚོན་པའི འཛམ་གླིང་གི་རྐྱེན་ཚོགས ཀྱི་ཞིང་ནང་དུ་གནས་སོ། ཞིང་འདི་ནི་དངོས་པོའི་དུས་ཚིགས་ལྟ་བུ་སྟེ། གང་ཞིག་འཕེལ་སྲིད་པ་དང་། གང་ཞིག་མྱུར་དུ་ཉམས་པ། གང་ཞིག་ལམ་འགྱུར་བ་བཅས་ཀྱི་མཚམས་བཅད་པར་བྱེད་དོ། གཉིས་ཀ་མཉམ་དུ་མཐོང་བ་ནི་རྟེན་འབྲེལ་ལ་ཞིབ་ཏུ་མཐོང་བ་ཡིན་ནོ

This work introduces R. Latchman’s ongoing practice of Polyphonic Portraiture: “A newly defined form formalising contour-shared, multi-reading portraits into a disciplined method.”

A single portrait constructed so it can be read in more than one coherent way—face, figures, and meaning held together within one continuous contour logic. At first, a profile appears. As the eye settles, the image opens into three aligned presences: a central monk in debate, a contemplative meditator, and an onlooking witness. A cross-axis threads through the composition, not as decoration, but as structure—binding action, reflection, and perspective into a unified field. For non-specialists, the experience is immediate: the image shifts, yet remains whole. For specialists, the difficulty is architectural: shared contours that serve multiple forms without collapsing, layered legibility, and a disciplined economy of line where each stroke carries more than one role. Placed alongside the accompanying text, the portrait becomes a visual statement of dependent origination: what looks like a single event is shaped by near causes and conditions, while those conditions themselves arise within a wider world-field—the character of an era. Seeing both levels together is the deeper seeing.

Title: Field of Conditions Artist: R. Latchman Year / 2025

PCP–02 "Dimensional oneness, The Meditator"

This work belongs to R. Latchman’s emerging genre of Polyphonic (Contour-Shared) Portraiture: a disciplined portrait method in which a single image is engineered to sustain multiple coherent readings—portrait, structure, and symbolic narrative—without breaking into separate pictures.

At first glance, the composition reads as a geometric face-form assembled from angled planes and curved fields. As attention settles, the same shapes begin to function as independent image-units with their own internal “micro-stories”: triangular forms that can be read as ascent and orientation, curved wedges that suggest ground, shelter, or the turning of a path, and circular “nodes” that act like focal witnesses within the field.

Rather than illustration, these symbols operate as perceptual levers—each local form changing the way the whole face is recognised, and the face in turn changing how the local forms are understood.

A strong vertical axis divides and binds the image, holding two complementary sides in tension: stability and motion, foundation and trajectory. Accents and overlaid line-work (including the red tracing and contained facial ovals) intensify the polyphonic effect, bringing forward secondary presences while still preserving the unity of a single portrait-field.

What defines the genre here is not ambiguity for its own sake, but constructed multi-legibility: shared contours and shared structural boundaries do double-duty across readings, so the viewer’s perception “switches” while the image remains one continuous system. The result is a portrait that behaves like a field of interdependent meanings—a single face made from many intelligible parts, each part altering the outcome of the whole when engaged deeply.

Title: Dimensional oneness, The meditator. Artist: R. Latchman Year/ 2023

Breakthroughs

Polyphonic (Contour-Shared) Portraiture (PCP) AI Benchmark

PCP–01 "The Field of Conditions"

PCP–02 "Dimensional oneness, The Meditator"