Preamble: Governing Principles
Principle 1 — Visual output is a peer cognitive modality
Visual output is not decoration tacked onto prose. It is a parallel analytical product with its own schema, its own adversarial review, its own accessibility layer, and its own failure semantics. The foundational research — Larkin & Simon (1987), Paivio (1986), Baddeley (1974/2000), Mayer (2001/2009) — converges on a single design claim: well-designed visuals engage nonverbal representational codes, visuospatial working-memory resources, and preattentive perceptual grouping that prose alone does not engage. This is not a style preference. It is a representational commitment with measurable cognitive consequences.
Principle 2 — The protocol is an active inversion of LLM priors
LLMs have been trained on millions of slides, blog screenshots, and BI dashboards. The modal chart in their training distribution is a low-density, decorated, truncated-axis, default-binned bar chart from marketing collateral. Absent explicit counter-pressure, an LLM asked to visualize will regress to this mean. Every protocol rule, every adversarial check, and every default style operates as a deliberate inversion of that prior. This is not a one-time fix but a standing architectural commitment. Mode prompts, system prompts, and post-generation linting all encode this inversion. The protocol exists to produce Cleveland dot plots, not PowerPoint bar charts.
Principle 3 — Dishonest visuals are worse than no visuals
Tufte’s honesty constraint is the protocol’s cardinal rule. Readers apply skepticism to prose but credulity to pictures. A poorly drawn chart replaces the reader’s first-principles reasoning with a false conclusion the reader is unlikely to interrogate. If a visual cannot satisfy the integrity rules, silence is preferable. This applies with particular force to automated visualization, where failures compound unseen.
Principle 4 — Duplicative visuals are actively harmful
Mayer’s redundancy principle (1999/2003) is not a design guideline but an empirical finding: visuals that merely restate what prose already says make comprehension worse, not better. The protocol treats relation_to_prose: redundant as the least-preferred state. When a visual would only restate what the prose says, the correct output is no_visual, not a courtesy chart.
Principle 5 — Toward spatial-native intelligence
The long-term architectural trajectory is not “prose system with visual outputs” but “spatial-native intelligence system with prose as one rendering format among others.” Tversky’s research establishes that spatial structure is the foundational representational system for abstract thought. Prose is a lossy translation of spatial insight. This protocol is the first step toward restoring cognitive fidelity: letting insights be represented in formats closer to their native spatial structure. The visual output protocol, the visual input processing layer (sketches, diagrams, whiteboard photos), and the direct annotation interface are components of a single spatial-native architecture. This version specifies the output protocol; the input and annotation layers are documented in the companion Spatial-Native Architecture specification and will converge into a unified bidirectional protocol.
1. In-scope visual techniques
The protocol supports 22 diagram types organized into seven schema families. This is an increase from v0.1’s 19 types, adding tornado/sensitivity diagrams, influence diagrams, and bow-tie risk diagrams — all identified by the conceptual research as high-tractability, visually native techniques serving modes that are among the most degraded by prose-only execution.
1.1 Technique table
| # | Technique | Family | Rendering target | Tier | Rationale |
|---|---|---|---|---|---|
| 1 | Comparison chart (bar, column, dot plot, slope graph) | QUANT | Vega-Lite | Low | Cleveland–McGill position/length ranking; grammar-of-graphics deterministic |
| 2 | Time series with uncertainty (line + band, fan, sparkline) | QUANT | Vega-Lite | Low | Small multiples, fan bands, error layers; banking-to-45° via aspect field |
| 3 | Distribution plot (box, violin, strip, histogram) | QUANT | Vega-Lite | Low | Reveals shape hidden by bar+error-bar; Weissgerber applies |
| 4 | Scatter with annotation | QUANT | Vega-Lite | Low | Bivariate relation; supports facet/repeat for small multiples |
| 5 | Heatmap / small-multiple heatmap | QUANT | Vega-Lite | Low | Also the ACH rendering target |
| 6 | Tornado / sensitivity diagram | QUANT | Vega-Lite | Low | Sorted horizontal bar with center axis; sensitivity ranking is perceptual. Added v0.2 |
| 7 | Causal loop diagram (CLD) | CAUSAL | Semantic JSON → SVG/DOT | High | No DSL carries polarity or R/B loop semantics |
| 8 | Stock-and-flow diagram (XMILE-aligned) | CAUSAL | Semantic JSON → SVG | High | Stocks/flows/clouds/auxiliaries have conservation semantics |
| 9 | Causal DAG (Pearl/Hernán) | CAUSAL | DAGitty DSL | Low | Mature DSL; exposure/outcome/latent typing; acyclicity checkable |
| 10 | Fishbone / Ishikawa | CAUSAL | Semantic JSON → SVG | High | Typed tree with declared category framework |
| 11 | Decision tree / probability tree | DECISION | Semantic JSON → SVG | High | Probability-sum and payoff invariants demand validation |
| 12 | Influence diagram (Howard & Matheson) | DECISION | Semantic JSON → DOT | High | Encodes conditional independence structure trees cannot show. Added v0.2 |
| 13 | ACH matrix | DECISION | Semantic JSON → Vega-Lite heatmap | High | Typed table with C/I/N/NA cell vocabulary and diagnosticity scoring |
| 14 | 2×2 / scenario-planning matrix | DECISION | Semantic JSON → SVG | High | Axis independence is epistemic claim requiring explicit rationale |
| 15 | Bow-tie risk diagram | RISK | Semantic JSON → SVG | High | Threat/event/consequence symmetry is the method’s point; invisible in prose. Added v0.2 |
| 16 | IBIS argument diagram | ARGUMENT | Semantic JSON → DOT | High | Tiny typed graph with grammar constraints |
| 17 | Pro–con tree | ARGUMENT | Semantic JSON → SVG | High | Degenerate argument map |
| 18 | Concept map (Novak) | RELATIONAL | Semantic JSON (CXL-shaped) → DOT | High | Labeled edges carry propositional semantics |
| 19 | Sequence diagram | PROCESS | Mermaid / PlantUML | Low | Mature DSLs; LLM reliability high |
| 20 | Flowchart / swimlane | PROCESS | Mermaid (subgraphs) | Low | Subgraph-as-lane idiom works |
| 21 | State diagram (FSM) | PROCESS | Mermaid stateDiagram-v2 | Low | Harel statecharts excluded; FSM covers most needs |
| 22 | C4 architecture (Context + Container) | SPATIAL | Structurizr DSL | Low | Reference DSL by the C4 author |
1.2 Excluded techniques, with rationale
- Mind maps (Buzan) — No propositional semantics. Use concept maps with
linking_phraseoptional. Explicit exclusion prevents confusion. - Venn diagrams (n > 3) — Cannot be area-proportional in general (proven). Automatic dishonesty.
- Full Toulmin argument maps — IBIS + pro-con cover the common case. Full Toulmin has unresolved rebuttal-target ambiguity in the literature. Candidate for v0.3.
- Harel statecharts — Layout benefits sharply from human judgment. FSM subset covers the vast majority of needs.
- Gantt charts — Redundant with timeline + swimlane; the specific task/dependency/resource semantics rarely appear in analytical modes.
- Parallel coordinates — Fold into QUANT family as a Vega-Lite
repeat + foldvariant rather than a separate type. - ER diagrams, org charts, tree diagrams — Subsumed as specializations of Mermaid class/flowchart or concept map.
- Sparklines — Treat as
compact: truevariant on time-series schema, not a distinct type. - Sankey / alluvial — Deferred to v0.3. Neither Vega-Lite native nor cleanly semantic.
- Evidence-weight diagrams, Wigmore charts, full AIF — Deferred as niche.
- Pie charts — Not explicitly banned, but never selected by default. Cleveland-McGill ranks angle/area among the least accurate encodings. If the protocol’s encoding-selection logic ever proposes a pie chart, the adversarial reviewer should flag it for replacement with a bar or dot plot.
1.3 Rejection criteria
Three gates: (a) formal structure cannot be captured in structured text without losing the thing that makes it useful; (b) high dishonesty risk by construction; (c) rendering is aesthetically sensitive enough that automated output will reliably produce degraded results.
2. Protocol envelope structure
2.1 Format
Visual specification blocks are fenced JSON code blocks with a typed marker, embedded in the response Markdown:
```ora-visual
{
"schema_version": "0.2",
"id": "fig-1",
"type": "causal_loop_diagram",
"mode_context": "systems_dynamics",
"relation_to_prose": "integrated",
"title": "Feedback loops in team velocity",
"spec": { … type-specific fields … },
"semantic_description": { … four-level description … },
"spatial_representation": { … optional, for spatial-native pipeline … },
"render_hints": { … optional, all ignorable … },
"integrity_declarations": { … optional honesty assertions … }
}
```
Why fenced JSON. XML closing-tag drift is a known LLM failure mode. YAML indentation is brittle when interleaved with prose. Sidecar files break the stateless pipeline invariant. The discriminated union on type combined with additionalProperties: false and enum-constrained vocabularies eliminates the majority of LLM schema drift.
2.2 Required envelope fields
schema_version— string, semver. Consumers fail-closed on unknown major versions.id— string, stable within a canonical document. Used for prose cross-references (“see fig-1”) and annotation targeting.type— enum, one of the 22 in-scope types.mode_context— string, the Ora mode that generated this visual (for adversarial routing and default configuration lookup).relation_to_prose— enum:integrated|visually_native|redundant. See §5.spec— type-dispatched object.semantic_description— object per §8.
2.3 Optional envelope fields
title— short string (alt-text label andaria-labelledby).caption— longer attribution string (source, period, n).render_hints— object, all ignorable by any renderer:{ preferred_engine, aspect_ratio, compact }.integrity_declarations— object:{ non_zero_baseline_justified, inverted_axis_justified, axes_independence_rationale, log_scale_base }. Populated only when the visual triggers a Tufte rule that allows a justified exception.memorability_goal— boolean, defaultfalse. Whentrue(set by user or mode configuration, never by the model), relaxes T4/T5 chartjunk rules to allow limited embellishment constrained by integrity rules. Based on Bateman et al. (CHI 2010) finding that embellished charts showed no worse comprehension and significantly better long-term recall.fallback— alternative representation if render fails:{ type: "table", data: [[…]] }or{ type: "prose_only" }.spatial_representation— optional spatial-native format for bidirectional communication pipeline. See §10.
2.4 Tiering
The spec field is polymorphic on type. For low-tier types, spec contains the declarative DSL directly or a validated Vega-Lite subset. For high-tier types, spec is a semantic JSON object that a deterministic compiler translates to a renderer format.
| Tier | Techniques | Why |
|---|---|---|
| Low (Vega-Lite) | Comparison, time series, distribution, scatter, heatmap, tornado | Grammar-of-graphics is already the semantic tier |
| Low (DAGitty) | Causal DAG | DSL trivially short; acyclicity checkable |
| Low (Mermaid) | Sequence, flowchart/swimlane, state (FSM) | LLM reliability empirically highest; repair loops converge in 1-2 retries |
| Low (Structurizr) | C4 architecture | Reference DSL; model/view separation matches mode context |
| High (semantic JSON) | CLD, stock-flow, fishbone, decision tree, influence diagram, ACH, 2×2/scenario, bow-tie, IBIS, pro-con, concept map | No mainstream DSL carries required semantics |
The wrong alternative, worth naming: emitting Mermaid for everything. A CLD drawn as a Mermaid flowchart has no polarity to validate. A decision tree drawn as a flowchart has no probability sum to check. The protocol must preserve the semantics that make the technique worth using.
3. Per-type schemas
Schemas are informal here; the normative form is JSON Schema 2020-12 with additionalProperties: false everywhere.
3.1 QUANT family (comparison, time series, distribution, scatter, heatmap)
A conservative Vega-Lite subset. Required: $schema, data, mark (from enumerated marks), encoding with typed channels. Required metadata: title, caption.source, caption.period, caption.n, caption.units. Uncertainty field mandatory when encoding.y.field.statistic is point-estimate or when mode_context involves forecast/projection.
Banned without integrity_declarations justification: non-zero baseline on bar/area, inverted y-scale on conventional quantities, log scale without base disclosure, dual y-axes with independent zero points, rainbow/jet colormaps for ordered data.
3.2 Tornado / sensitivity diagram (QUANT family)
{
"base_case_label": "string",
"base_case_value": number,
"outcome_variable": "string",
"outcome_units": "string",
"parameters": [
{
"label": "string",
"low_value": number,
"high_value": number,
"low_label?": "string",
"high_label?": "string",
"outcome_at_low": number,
"outcome_at_high": number
}
],
"sort_by": "swing" // enum: "swing" | "high_impact" | "custom"
}
Invariants: parameters sorted by swing (|outcome_at_high - outcome_at_low|) descending unless sort_by overrides; base_case_value rendered as vertical center line; each parameter renders as a horizontal bar spanning [outcome_at_low, outcome_at_high].
3.3 Causal loop diagram (CAUSAL family)
{
"variables": [{ "id", "label", "description?" }],
"links": [{ "from", "to", "polarity": "+|-", "delay": false, "note?" }],
"loops": [{ "id": "R1|B1|…", "type": "R|B", "members": ["varId", …], "label", "narrative?" }]
}
Invariants: every link has polarity; every declared loop is a genuine cycle in the graph; loop type matches sign-product of edge polarities (even count of − → R; odd → B); every variables.id unique; no orphan nodes unless allow_isolated: true.
3.4 Stock-and-flow (CAUSAL family, XMILE-aligned)
{
"stocks": [{ "id", "label", "initial?", "unit?" }],
"flows": [{ "id", "label", "from": "stockId|cloudId", "to": "stockId|cloudId", "rate?", "unit?" }],
"clouds": [{ "id" }],
"auxiliaries": [{ "id", "label", "expression?" }],
"info_links": [{ "from": "stockId|auxId", "to": "flowId|auxId" }]
}
Invariants: each flow endpoint resolves to a stock or cloud; stocks have ≥ 1 flow; auxiliaries form a DAG over info_links; units dimensionally consistent if provided.
3.5 Causal DAG (CAUSAL family)
{
"dsl": "dag { x [exposure]; y [outcome]; u [latent]; x -> y; u -> x; u -> y }",
"focal_exposure": "x",
"focal_outcome": "y"
}
Invariants: parser accepts dsl; graph acyclic; focal_exposure and focal_outcome present in dsl.
3.6 Fishbone / Ishikawa (CAUSAL family)
{
"effect": "string",
"framework": "6M|4P|4S|8P|custom",
"categories": [
{ "name", "causes": [ { "text", "sub_causes?": [ { "text", "sub_causes?": […] } ] } ] }
]
}
Invariants: if framework ≠ custom, categories[].name drawn from framework’s canonical set; depth ≤ 3; effect stated as a problem, not a solution (soft lint).
3.7 Decision tree / probability tree (DECISION family)
{
"mode": "decision|probability",
"root": { node },
"utility_units": "USD|QALY|utils|…" // required if mode=decision
}
node := {
"kind": "decision|chance|terminal",
"label",
"children?": [ { "edge_label", "probability?", "payoff?", "node": node } ]
}
Invariants: chance-node children’s probabilities sum to 1 ± 1e-6; probabilities in [0,1]; decision nodes have ≥ 1 child; terminals have payoff when mode=decision; no probabilities on decision-node edges. Compiler computes rollback EV.
3.8 Influence diagram (DECISION family) — new in v0.2
{
"nodes": [
{ "id", "label", "kind": "decision|chance|value|deterministic", "description?" }
],
"arcs": [
{ "from", "to", "type": "informational|functional|relevance", "note?" }
],
"temporal_order?": ["nodeId", …] // decision sequence if relevant
}
Invariants: exactly one value node; no arcs into decision nodes from later-decided nodes (temporal consistency when temporal_order provided); the graph implied by functional arcs from chance/deterministic nodes into the value node forms a DAG; informational arcs represent information availability at decision time. Compiler checks d-separation readability.
3.9 ACH matrix (DECISION family)
{
"hypotheses": [{ "id", "label", "description?" }],
"evidence": [{ "id", "text", "credibility": "H|M|L", "relevance": "H|M|L", "source?" }],
"cells": { "<evidence_id>": { "<hypothesis_id>": "CC|C|N|I|II|NA" } },
"scoring_method": "heuer_tally|bayesian|weighted"
}
Invariants: every (evidence × hypothesis) cell populated; cell values from enum; non-diagnostic evidence flagged.
3.10 2×2 / scenario quadrant (DECISION family)
{
"subtype": "strategic_2x2|scenario_planning",
"x_axis": { "label", "low_label", "high_label", "description?" },
"y_axis": { "label", "low_label", "high_label", "description?" },
"quadrants": {
"TL": { "name", "narrative?", "action?", "indicators?": [] },
"TR": { … }, "BL": { … }, "BR": { … }
},
"items?": [{ "label", "x": 0..1, "y": 0..1, "note?" }],
"axes_independence_rationale": "string (required)"
}
Invariants: all four quadrants named; axes_independence_rationale non-empty; items in [0,1]; for scenario_planning, each quadrant narrative non-empty.
3.11 Bow-tie risk diagram (RISK family) — new in v0.2
{
"hazard_event": { "label", "description?" },
"threats": [
{
"id", "label",
"pathway?": "string",
"preventive_controls": [{ "id", "label", "type": "eliminate|reduce|detect", "effectiveness?": "H|M|L" }]
}
],
"consequences": [
{
"id", "label",
"severity?": "H|M|L",
"mitigative_controls": [{ "id", "label", "type": "reduce|recover|contain", "effectiveness?": "H|M|L" }]
}
],
"escalation_factors?": [{ "from_control_id", "label", "escalation_control?": { "id", "label" } }]
}
Invariants: hazard_event is the center node; threats render left of center, consequences right of center; preventive controls sit on threat-to-event pathways, mitigative controls sit on event-to-consequence pathways. The visual symmetry — the whole reason the form exists — is enforced by layout. At least one threat and one consequence required.
3.12 IBIS argument diagram (ARGUMENT family)
{
"nodes": [ { "id", "type": "question|idea|pro|con", "text" } ],
"edges": [ { "from", "to", "type": "responds_to|supports|objects_to|questions" } ]
}
Grammar invariants: idea.responds_to → question; pro.supports → idea; con.objects_to → idea; question.questions → any. Violations are blocking.
3.13 Pro–con tree (ARGUMENT family)
{
"claim": "string",
"pros": [ { "text", "weight?": 1..5, "source?", "children?": [ … ] } ],
"cons": [ … same shape … ],
"decision?": "string"
}
3.14 Concept map (RELATIONAL family, CXL-shaped)
{
"focus_question": "string",
"concepts": [ { "id", "label", "hierarchy_level": 0..N } ],
"linking_phrases": [ { "id", "text" } ],
"propositions": [ { "from_concept", "via_phrase", "to_concept", "is_cross_link?": false } ]
}
Invariants: every proposition resolves to declared concept/phrase IDs; soft warning if no cross-links (cross-links are the Novak-specific insight).
3.15 Process family (sequence, flowchart/swimlane, state)
spec.dsl is a Mermaid string; spec.dialect names the diagram kind. Compiler runs a Mermaid parse; on failure, bounded repair loop (2 retries). Known-failure-prone tokens pre-scanned and escaped by the compiler, not the model.
3.16 C4 architecture (SPATIAL family)
spec.dsl is Structurizr DSL. Compiler rejects forward references and mixing of C4 levels within a single view. Level declared in spec.level ∈ {context, container}.
4. Mode-to-visual configuration table
This table maps Ora’s 19 modes (18 answer-seeking + 1 question-seeking) to their native modality classification, default visual types, default relation_to_prose, and adversarial strictness. The classifications are grounded in the Larkin-Simon computational-equivalence framework applied mode-by-mode in the conceptual research.
4.1 Visually native modes
These modes’ core inferences — loop polarity, diagnosticity, conditional independence, simultaneity, dominance ranking, spatial segmentation — are cheap in spatially indexed representations and expensive in sequential ones. Prose-only execution imposes measurable inferential cost.
| Mode | Default visual types | Default relation_to_prose | Adversarial strictness |
|---|---|---|---|
| Systems Dynamics | CLD, stock-and-flow | visually_native | Critical |
| Competing Hypotheses | ACH matrix | visually_native | Critical |
| Decision Under Uncertainty | Decision tree, influence diagram, tornado | integrated | Critical |
| Root Cause Analysis | Fishbone, CLD (when loops present) | integrated | Standard |
| Relationship Mapping | Concept map, causal DAG, network diagram | integrated | Standard |
| Consequences and Sequel | Causal DAG, flowchart | integrated | Standard |
| Constraint Mapping | 2×2 matrix, pro-con tree | integrated | Standard |
| Scenario Planning | 2×2 scenario matrix | integrated | Standard |
| Strategic Interaction | Decision tree (game tree), influence diagram | integrated | Critical |
| Benefits Analysis | Pro-con tree, tornado (for quantified benefits) | integrated | Standard |
4.2 Bimodal modes
These modes decompose into structure identification (visual) and structure interpretation (linguistic). Both representations carry essential, non-redundant information.
| Mode | Default visual types | Default relation_to_prose | Adversarial strictness |
|---|---|---|---|
| Synthesis | Concept map (structural parallels) | integrated | Standard |
| Dialectical Analysis | IBIS (thesis/antithesis structure) | integrated | Standard |
| Terrain Mapping | Concept map (known/unknown/open) | integrated | Standard |
| Passion Exploration | Concept map (exploration nodes, potential projects) | integrated — but visual is for navigation, not argument | Relaxed |
| Cui Bono | Flowchart (interest flows), concept map | integrated | Standard |
4.3 Linguistically native modes
Core inferences depend on operators graphics cannot express compactly: negation, counterfactual conditionals, modal qualifiers, normative predicates. Visual output is supplementary at best; forcing a diagram falsifies the task through Stenning-Oberlander over-specificity.
| Mode | Visual types (if any) | Default relation_to_prose | Notes |
|---|---|---|---|
| Steelman Construction | None by default | no_visual | The steelman is a piece of prose; reducing it to a node label destroys the steelman |
| Deep Clarification | Optional: flowchart (for mechanistic processes) | redundant → prefer no_visual | Visual only when the mechanism is itself spatial/procedural |
| Paradigm Suspension | None by default | no_visual | The questioning of assumptions is linguistic; diagram would force premature commitment |
| Project Mode | Varies by deliverable | Varies | Project Mode inherits visual configuration from the analytical mode it serves |
4.4 Usage rules
- The table is user-editable per mode and stored in Ora’s vault as a canonical configuration document.
visually_nativerelation is permitted only for modes marked visually native in this table. Other modes may not claim it without user override.- When
relation_to_prosedefaults toredundant, the model should evaluate whether the visual adds anything the prose does not. If the answer is no, emitno_visual. The redundancy principle (Mayer 1999/2003) means a courtesy visual that merely restates prose carries real cognitive cost. - Changes to mode configuration propagate to subsequent analytical invocations without requiring per-output approval.
5. Coexistence with prose — the relation_to_prose field
Four states, assigned per visual per mode. The assignment is prescriptive, not optional.
-
visually_native— The visual is the primary artifact; prose is caption and context. Used only when the mode is inherently structural and the diagram does the cognitive work. The adversarial reviewer applies stricter integrity checks because dishonesty is more costly when the visual is primary. Assigned modes: Systems Dynamics, Competing Hypotheses. -
integrated— Prose and visual are mutually dependent; prose references figure by id; visual carries information that prose summarizes but does not reproduce. Prose must remain interpretable without the visual via the semantic description. Assigned modes: most analytical modes. -
redundant— Prose carries the full analytical content; visual reinforces. Health warning (Mayer): this state carries empirical cognitive cost. The protocol treats it as the least-preferred option. Before emitting aredundantvisual, evaluate whether the visual genuinely adds pattern-recognition, spatial structure, or comparison capability. If it merely restates the prose in diagrammatic form, suppress it. -
no_visual— No visual specification block emitted. The default for linguistically native modes. Not a failure state — it is the correct output when prose is the computationally superior representation.
6. Handoff architecture — rendering paths
Three paths, all active, with assignments:
Path A — Direct declarative render
The model’s output is already consumable. Failure surface is parse error only.
Assigned: QUANT family (Vega-Lite), PROCESS family (Mermaid), SPATIAL family (Structurizr DSL), causal DAG (DAGitty DSL).
Path B — Specialized compiler
A compiler wraps a graph-layout library, applies Tufte-aligned styling defaults, computes derived quantities (rollback EV, loop polarity product, ACH diagnosticity, bow-tie symmetry), and emits SVG. This path carries the integrity logic that cannot be delegated to a generic renderer.
Assigned: CAUSAL non-DAG (CLD, stock-flow, fishbone), DECISION (all), RISK (bow-tie), ARGUMENT (IBIS, pro-con), RELATIONAL (concept map).
Path C — Small-model rendering judgment
Reserved for freeform or hybrid outputs the semantic tier cannot express. Not used in v0.2. When activated, operates only on rendered SVG from Path B to refine layout — never invents semantics. The adversarial reviewer still audits against the original specification.
Fork C status: Deferred. Including Path C introduces a model call in the rendering pipeline that otherwise has none, affecting latency and stateless-pipeline discipline. Activate when v0.2 Path B outputs prove aesthetically inadequate in a way users notice.
Multi-path routing
Some techniques can flow through multiple paths. Causal DAGs: Path A (DAGitty → Graphviz) is default; Path B available when analytical model is uncertain about syntax. 2×2 matrices: Path B default; Path A Vega-Lite variant exists for scatter-style 2×2 without named quadrants. Protocol records the actual path used in the rendering manifest.
7. Adversarial review for visual output
Visual adversarial review is a distinct adversarial stage with its own prompt, run after the analytical adversarial stage and before rendering (spec-level review) plus a second, lighter pass after rendering (artifact-level review).
7.1 Tufte integrity rules (T-rules)
Machine-checkable rules applied at spec level:
- T1 Lie factor. Length/area encoding: ratio-of-pixels / ratio-of-values must be in [0.95, 1.05].
- T2 Zero baseline. Bar/area/column:
scale.domain[0]=0required, orintegrity_declarations.non_zero_baseline_justifiedpopulated with quantity type (index, z-score, temperature). - T3 Dimensional conformance. Visual dimensions ≤ data dimensions. Fail on 1D-to-2D area or 3D volume.
- T4 Data-ink ratio proxy. Count decorative elements against data marks; fail on 3D extrusion, drop shadow, gradient fill on categorical mark, decorative image. Exception: relaxed when
memorability_goal: true, but integrity rules (T1–T3) still hold. - T5 Chartjunk blacklist. Hard-fail on 3D bar/pie/cylinder/cone, moiré, non-data gradients. Same memorability exception as T4.
- T6 Show the data. If
n/marks > 20and no distributional layer, require adding one or justifying aggregation. - T7 Labelling completeness. Axis titles, units, scale type, n, source, period — all required.
- T8 Scale-type disclosure. Log/symlog/pow must be labelled with base.
- T9 Axis orientation. Inverted y-scale on conventional quantities fails unless declared intentional.
- T10 Banking to 45°. For line marks, aspect ratio within 2× of Cleveland-banked optimum.
- T11 Small-multiples trigger. ≥ 7 categorical colors on one panel triggers a facet suggestion.
- T12 Currency standardization. Nominal currency over > 3 years warns; require real/deflated unless declared.
- T13 Event labelling. Long time series: warn if major-event metadata present but unlabelled.
- T14 Tick consistency. Constant tick step on quantitative axes; log axes label powers of base.
- T15 Caption-source-n present. Hard requirement.
7.2 Structural integrity rules (per-family)
- QUANT: Uncertainty required when quantity is inferential, forecast, model output, or drives a decision. Dual y-axes blocked unless mathematically linked.
- CAUSAL: CLD polarity on every edge; declared loop type matches edge-sign product. DAG acyclic. Stock-flow: stocks ≥ 1 flow; units consistent. Fishbone: categories from declared framework; depth ≤ 3.
- DECISION: Tree chance-node probabilities sum to 1; terminals have payoffs. Influence diagram: exactly one value node; temporal consistency. 2×2: axes_independence_rationale non-empty. ACH: cells complete and from vocabulary; non-diagnostic evidence flagged. Tornado: parameters sorted by swing.
- RISK: Bow-tie: at least one threat, one consequence; preventive controls on left pathways, mitigative on right. Symmetry preserved in layout.
- PROCESS: Flowchart decision nodes have ≥ 2 labelled, mutually exclusive, exhaustive outgoing edges. Sequence: every message has sender and receiver. State: initial state declared; unreachable states flagged.
- SPATIAL: C4 level declared and not mixed.
- ARGUMENT: IBIS grammar enforced. Warrant ≠ evidence.
7.3 Severity tiers
- Critical (auto-block render): T1 beyond 2×; T3; T5; T9 undisclosed; log without label; Venn ≥ 4 sets; cherry-picked time range reversing trend sign; CLD missing polarity; decision tree missing probability/payoff; IBIS grammar violations; bow-tie with controls on wrong side.
- Major (warn; require human sign-off if
integratedorvisually_native): aspect ratio > 2× off optimum; rainbow on ordered data; aggregation hiding distribution; missing uncertainty on inferential quantities; false precision; non-orthogonal 2×2 axes (|corr| > 0.7); chart-type mismatch to task per Mackinlay ranking. - Minor (informational log): missing legend title; inconsistent tick intervals; redundant data-ink; heavy gridlines; untested CVD palette.
7.4 Artifact-level review
After rendering, a lightweight adversarial pass checks: overlapping nodes, text truncation, illegible contrast (WCAG 1.4.11 ≥ 3:1), visual-chartjunk the spec layer cannot see. Does not re-litigate structural correctness.
7.5 LLM-prior-inversion checks
In addition to the T-rules and structural checks, the adversarial reviewer flags:
- Template-trap regression: output that looks like the modal BI dashboard (default palette, generic layout, low density). Not a blocking violation but a prompt to consider higher-density alternatives.
- Chart-type misselection: encoding selection must follow Bertin/Cleveland/Munzner decision procedure (data type × task × cardinality), not the first chart the model emits. If the model proposes a bar chart and a dot plot would be more accurate for the task, the reviewer flags it.
- Default-settings passthrough: if the model emits a Vega-Lite spec where all optional fields are at library defaults (bin, scale, axis, legend), flag for explicit authorship. Every default is an authored choice.
8. Accessibility and semantic description
8.1 Four-level semantic description (mandatory)
Following Lundgard & Satyanarayan (MIT Vis Group, IEEE TVCG 2022):
"semantic_description": {
"level_1_elemental": "string", // required — chart type, encodings, axis ranges
"level_2_statistical": "string", // required — extrema, trends, correlations, counts
"level_3_perceptual": "string", // required — synthesized patterns, notable exceptions
"level_4_contextual": "string|null", // optional — domain interpretation
"short_alt": "string", // required, ≤ 150 chars
"data_table_fallback": { … } | null
}
Rules:
short_altfollows the Cesal formula: “[chart type] of [data], where [key takeaway].”- Level 4 is optional and should be omitted when in doubt. Lundgard-Satyanarayan found blind readers ranked Level 4 least useful (63% emphatically opposed it).
- For quantitative visuals with ≤ 50 data points,
data_table_fallbackpopulated. - For non-quantitative diagrams, type-specific description fields augment Level 1 (loops for CLD, optimal path for decision trees, leading hypothesis for ACH, items-per-quadrant for 2×2, threat/consequence counts for bow-tie, actors/steps for sequence diagrams).
8.2 Redundancy guard
When relation_to_prose = redundant, short_alt is the Cesal one-liner and Levels 2-4 may be “See surrounding prose.” When relation_to_prose = integrated or visually_native, full four-level description required.
8.3 Rendering accessibility
- SVG wrapped with
role="img",aria-labelledby="<title-id> <desc-id>". <title>fromshort_alt,<desc>from concatenated Levels 1-3.- Decorative shapes:
aria-hidden="true". - Complex SVG: parallel navigable representation following Olli/ARIA TreeView pattern.
8.4 Contrast and color rules (WCAG 2.2)
- Text ≥ 4.5:1 (AA).
- Graphical objects essential to meaning ≥ 3:1 (SC 1.4.11).
- Never encode via color alone (SC 1.4.1).
- Categorical: Okabe-Ito (≤ 8) or similar CVD-safe.
- Sequential: viridis-family or ColorBrewer sequential.
- Diverging: ColorBrewer RdBu/PuOr or Crameri vik/roma.
- Interactive element target size ≥ 24×24 CSS px (SC 2.5.8).
8.5 Fallback when rendering fails
Three-tier graceful degradation:
- Render fails → display
data_table_fallbackif present, else full four-level description in bordered block labelled “Figure unavailable — description follows.” - Artifact fails artifact-level adversarial review at Critical severity → same fallback.
- User opts for prose-only view →
semantic_descriptionserves as primary content.
Cardinal rule: never render a degraded visual just because a slot was allocated.
9. Human-in-the-loop points
Four intervention surfaces, none requiring per-output approval. Design principle: human is in authority but not in the loop for the common case.
9.1 Specification-stage edit
The canonical document exposes visual spec blocks as editable JSON. User modifies spec directly and triggers re-render without re-running the analytical model. Primary fast-path; seconds, not minutes.
9.2 Artifact review and regeneration
Compiled artifact displayed alongside spec. “Regenerate” re-runs compiler only. “Re-analyze” re-runs analytical model with directive to emit different visual.
9.3 Technique selection override
Mode configuration (permitted visual types, default relation_to_prose, adversarial strictness) is user-editable per mode, stored in Ora’s vault. Changes propagate to subsequent invocations.
9.4 Visual suppression
User can disable visuals globally, per mode, or per visual type. When suppressed, analytical model still emits semantic_description as prose (because the description often carries information prose did not) but omits spec block.
10. Spatial-native extensions (v0.2 forward architecture)
This section documents the architectural direction toward spatial-native intelligence. These extensions are not implemented in v0.2 but their data structures are forward-compatible with the protocol envelope.
10.1 The spatial representation field
The optional spatial_representation field in the envelope captures Tversky’s correspondence principles in a format usable by the bidirectional pipeline:
"spatial_representation": {
"entities": [
{ "id": "A", "position": [x, y], "label": "concept A", "spec_ref?": "node_id_in_spec" }
],
"relationships": [
{ "source": "A", "target": "B", "type": "causal|associative|hierarchical|temporal",
"strength?": 0.0..1.0, "spec_ref?": "edge_id_in_spec" }
],
"clusters": [
{ "members": ["A", "B", "C"], "label": "core processes" }
],
"hierarchy": [
{ "parent": "F", "children": ["A", "B", "C"], "type": "abstraction|containment|composition" }
]
}
The spec_ref fields link spatial entities to typed-diagram elements, so the spatial representation and the typed spec remain synchronized. This enables:
- Extracting spatial structure from visual inputs (sketches, diagrams, whiteboard photos) and injecting it into the analytical pipeline
- Maintaining spatial continuity across conversation turns — the spatial arrangement persists even when the typed diagram changes
- Feeding spatial position data into the annotation interface so user markup targets specific spatial regions
10.2 Visual input processing (forward-looking)
Three levels of visual input, each feeding into the analytical pipeline:
Level 1 — Structure extraction. Vision-language model parses visual inputs (napkin sketches, Excalidraw exports, whiteboard photos, Obsidian Canvas files) and populates the spatial_representation format. Boxes become entities, lines become relationships, spatial clusters become clusters, vertical position maps to hierarchy level.
Level 2 — Spatial reasoning. The analytical model applies Tversky’s correspondence principles to the extracted structure: are entities positioned according to their conceptual relationships? Are there missing connections that spatial layout suggests? Are hierarchical relationships captured? This is fog-diagnosis applied to visual input — the model helps the user see what their spatial intuition was encoding.
Level 3 — Collaborative spatial refinement. The system proposes refinements to spatial structure with both spatial suggestions and prose explanation: “You’ve placed X near Y, suggesting relationship, but there’s no connecting line. Should there be a causal connection?“
10.3 Direct annotation interface (forward-looking)
The companion Spatial-Native Architecture specification details the annotation interface. For protocol purposes, the key requirement is that every id in the spec and spatial_representation is a valid annotation target. User annotations are parsed into structured feedback keyed to spec elements:
{
"annotation_type": "expand|connect|correct|insert|delete",
"target_id": "fig-1.node-A",
"content": "string",
"spatial_position?": [x, y]
}
This structured feedback enters the analytical pipeline as input to the next invocation without requiring the user to translate their spatial markup into prose instructions.
10.4 Visual conversation continuity
The spatial_representation field, when populated, persists across conversation turns. On subsequent invocations, the analytical model receives the prior turn’s spatial layout and either preserves it (maintaining the user’s spatial mental model) or explicitly declares spatial changes with rationale. This prevents the disorienting effect of spatial arrangements shifting silently between turns.
11. Unresolved forks
Fork A — Semantic JSON vs. direct Mermaid for process family
Currently routes sequence/flowchart/state through Path A (direct Mermaid). Alternative: semantic JSON tier compiling to Mermaid, adding validation at cost of reinventing a grammar. Decision rule: instrument v0.2 rollout; switch if Mermaid repair loops exceed 3 retries on > 10% of outputs.
Fork B — Adversarial review: blocking vs. annotating
Currently blocks on Critical, warns on Major. Alternative: annotate-only with no blocking. Recommended resolution: per-mode configuration. Analytical modes default to blocking. Passion Exploration and Project Mode (depending on deliverable) default to annotate-only. User-configurable.
Fork C — Small-model rendering activation
Deferred from v0.1 and v0.2. Activate when Path B outputs prove aesthetically inadequate in user testing. Path C introduces a model call in the rendering pipeline; only justified by measurable aesthetic failure.
Fork D — Spatial representation: required or optional (new in v0.2)
The spatial_representation field is optional in v0.2. The question is whether it should become required when the mode is visually native. Making it required would enable visual-input and annotation features immediately for those modes but would increase spec size and model output cost. Decision rule: make required for visually native modes in v0.3 if the annotation interface enters implementation.
12. Vault integration and versioning
Spec blocks are canonically stored in the vault alongside the analytical document, with compiled artifacts as sidecar files referenced by id. Users can revise a spec and re-render against an updated compiler without losing provenance. The schema_version field ensures forward compatibility: a v0.1 spec remains renderable under v0.2 compilers via migration rules. New v0.2 types (tornado, influence diagram, bow-tie) are not available in v0.1 specs; the compiler rejects unknown types cleanly.
The mode-to-visual configuration table (§4) is itself a canonical document in the vault, editable and version-controlled. Changes to the table take effect on the next analytical invocation.
13. Changelog from v0.1
| Change | Rationale |
|---|---|
| Added tornado/sensitivity diagram | Conceptual research: visually native for sensitivity analysis, high-tractability, serves a mode ranked among most degraded by prose-only |
| Added influence diagram | Encodes conditional independence structure decision trees cannot show; Howard & Matheson (1981); serves Decision Under Uncertainty and Strategic Interaction |
| Added bow-tie risk diagram | Symmetry of preventive vs. mitigative controls is invisible in prose; serves Risk Analysis mode if added, and interim serves Consequences and Sequel |
| Added Principle 2 (LLM prior inversion) | Conceptual research: the modal chart in LLM training data is the wrong chart; protocol must actively invert this prior, not just check violations post-hoc |
| Added Principle 4 (duplicative visuals harmful) | Mayer’s redundancy principle: courtesy visuals that restate prose carry measured cognitive cost |
Added memorability_goal flag | Bateman et al. (CHI 2010): embellished charts show better long-term recall; T4/T5 relaxation bounded by integrity rules |
Added no_visual as explicit relation_to_prose state | Linguistically native modes should default to no visual, not to a redundant one |
| Created mode-to-visual configuration table (§4) | Maps all 19 Ora modes to modality classification, visual types, prose relation, and adversarial strictness using actual mode names |
| Added spatial_representation field (§10) | Forward-compatible with spatial-native architecture; enables visual input, annotation, and cross-turn continuity |
| Added LLM-prior-inversion checks to adversarial layer (§7.5) | Template-trap regression, chart-type misselection, default-settings passthrough |
| Added Fork D (spatial representation required vs. optional) | New design fork arising from spatial-native integration |