Corpus Formalization

Display Name

Corpus Formalization (CFF)

Display Description

Meta-framework for formalizing the knowledge corpus — the set of facts, lenses, and reference material that sits between processes (PFF) and outputs (OFF). Use to design or audit how a knowledge body is structured, navigated, and surfaced for downstream framework execution. Corpora can be standalone or coordinated by an Operation (per Framework — Operations Manifest); when coordinated, the corpus matrix carries a primary_operation body-field marker identifying the Operation that owns curatorship (cadence, schema, archival decisions, handoff). When a corpus need surfaces an underlying recurring practice, hand off to OM-Init in O-FromCorpus entry mode rather than building the corpus standalone.

A Meta-Framework for Formalizing the Knowledge Corpus That Sits Between Process and Output

Version 1.1

Canonical Specification — Produced via F-Design from the Process Formalization Framework v2.0. Updated 2026-05-08 to add Operation-coordination awareness: corpora may be coordinated by an Operation (with primary_operation body-field marker), Layer 1 Triage detects the O-FromCorpus hand-off case (corpus need that surfaces an underlying Operation), and the Coordinated Corpora consumption-declaration semantics from Framework — Operations Manifest resolve multi-Operation cases.

How to Use This File

This is a meta-framework — a framework for creating frameworks. It serves four functions:

Design new corpus templates by following the C-Design layer sequence (Section IV).
Modify existing corpus templates by following the C-Modify protocol (Section V).
Deploy a template to create a corpus instance by following the C-Instance protocol (Section VI).
Validate an instance for completeness against its template by following the C-Validate protocol (Section VII).

Paste this entire file into any AI session — commercial (Claude, ChatGPT, Gemini) or local model — then provide your input below the USER INPUT marker at the bottom. State which mode you need, or the AI will determine it from context.

Mode C-Design: You need a new corpus template for a workflow you do recurringly. You will describe the workflow, what information it accumulates, where the inputs come from, and what outputs draw from it. The AI will guide you through elicitation and produce a corpus template file. The template defines section architecture, cadence, identifier convention, missing-data behavior, and chain relationships if any.

Mode C-Modify: You have an existing corpus template that needs to change. The workflow has evolved — a new input source, a retired output, a different cadence. Provide the existing template and describe what needs to change. The AI will load the template, modify the relevant components, validate cross-section consistency, and write the updated template. Existing instances are preserved unchanged; new instances created after modification reflect the new template.

Mode C-Instance: You have a corpus template and need to deploy a new instance for the current period. The AI loads the template, generates the instance file with the period identifier in its name, sets up empty sections ready to receive PFF outputs, and writes the file to the workflow’s instance directory.

Mode C-Validate: You have a corpus instance and need to know whether it is complete enough to render its outputs. The AI loads the template and the instance, checks each section against the template’s required-and-load-bearing flags, reports which sections are populated and which are missing, and identifies which downstream OFFs can run and which are blocked.

Section I: Purpose, Contracts, and Execution Tier
Section II: Evaluation Criteria
Section III: Persona Activation
Section IV: Mode C-Design — Corpus Template Creation Layers
Section V: Mode C-Modify — Template Modification Protocol
Section VI: Mode C-Instance — Instance Deployment Protocol
Section VII: Mode C-Validate — Instance Validation Protocol
Section VIII: Corpus Template Format Specification
Section IX: Corpus Instance Format Specification
Section X: Chain Specification
Section XI: Integration with PFF and OFF
Section XII: Named Failure Modes
Section XIII: Execution Commands
Section XIV: Registry Entry

Section I: Purpose, Contracts, and Execution Tier

PURPOSE

The Corpus Formalization Framework formalizes the knowledge corpus that sits between Process Formalization Framework (PFF) and Output Formalization Framework (OFF) outputs in any workflow that aggregates information from multiple sources and renders it into one or more artifacts. CFF is one of three sibling meta-frameworks. PFF formalizes processes that produce information. OFF formalizes outputs that express information. CFF formalizes the corpus where information accumulates between the two.

CFF generates bespoke corpus templates for specific workflows. Each template defines section architecture (typically one section per input PFF, plus derived sections that synthesize from other sections, plus chain sections that aggregate from other corpora), instance cadence and identifier conventions, per-section missing-data behavior, and chain relationships when this corpus feeds or is fed by other corpora. Templates are stable artifacts that persist across many reporting cycles. Instances are populated documents created from templates at the cadence the template specifies.

The framework’s value is concentrated in two architectural properties. First, the corpus is a real artifact the user can inspect, edit, and version-control — not an implicit handoff between two frameworks. Second, the corpus is the connection point that allows multiple PFFs to feed multiple OFFs, which is the common shape of real knowledge work and the shape that enables organizational-scale workflow coordination through chained corpora.

INPUT CONTRACT

Required inputs vary by mode.

Mode C-Design (new template):

Workflow description from the user (verbal description of recurring activity, sources, outputs)
User participation in elicitation (the user must be available for the structured interview)

Mode C-Modify (modify existing template):

Existing corpus template file. Source: file_read on the template’s path.
Description of changes from the user

Mode C-Instance (deploy template):

Corpus template file. Source: file_read.
Period identifier (date, quarter, week, project name) — derived from cadence specified in template, or supplied explicitly for ad-hoc cadence

Mode C-Validate (validate instance):

Corpus template file. Source: file_read.
Corpus instance file. Source: file_read.

Optional inputs (any mode):

Framework registry access. Source: ~/ora/frameworks/framework-registry.md via file_read. Default behavior if absent: the framework proceeds without checking for related frameworks; cross-references in the output reference frameworks by name only without registry validation.
Process Formalization Framework. Source: ~/ora/frameworks/book/process-formalization.md via file_read. Default behavior if absent: the framework proceeds; the corpus template references PFFs by user-supplied names without verifying their structure.
Output Formalization Framework. Source: ~/ora/frameworks/book/output-formalization.md via file_read. Default behavior if absent: the framework proceeds; the corpus template references OFFs by user-supplied names without verifying.
PFF / CFF / OFF Integration Architecture. Source: ~/ora/frameworks/book/pff-cff-off-integration.md via file_read. Default behavior if absent: the framework proceeds without integration cross-references.

OUTPUT CONTRACT

Mode C-Design produces:

Corpus template file: [Workflow Name] Corpus — Template.md. Written to ~/ora/corpora/[workflow-slug]/. Format per Section VIII.
Quality threshold: passes all validation checks in Layer 11 (Cross-Reference Validation) and scores 3 or above on all applicable evaluation criteria.

Mode C-Modify produces:

Updated corpus template file (overwrites existing). Format per Section VIII.
Change summary documenting what was added, modified, or removed relative to the previous template version.
Quality threshold: same as C-Design.

Mode C-Instance produces:

Corpus instance file: [Workflow Name] Corpus — [Period Identifier].md. Written to ~/ora/corpora/[workflow-slug]/instances/. Format per Section IX.
Empty sections ready to receive PFF outputs.
Frontmatter records the template version this instance was created from.

Mode C-Validate produces:

Validation report. Format: structured report with per-section status (filled / partial / empty / blocked), overall instance status (complete / partial / blocked), list of OFFs that can run, list of OFFs blocked by missing load-bearing sections.

EXECUTION TIER

Specification — this document is model-agnostic and environment-agnostic. All layer boundaries are logical. Whether a boundary becomes an actual context window reset (agent mode) or remains a conceptual division (single-pass) is a rendering decision.

C-Design typically executes in a single elicitation session that may span multiple sittings for complex workflows. C-Modify, C-Instance, and C-Validate are short and execute in single passes.

DEPENDENT DOCUMENTS

Required reading in order:

Framework — Process Formalization.md — sibling meta-framework (process side)
Framework — Output Formalization.md — sibling meta-framework (output side)
Framework — PFF / CFF / OFF Integration Architecture.md — the three-framework integration spec

Supporting documents:

Registry — Framework Registry.md — for registering generated bespoke corpus frameworks
MindSpec_v0.4_Specification.md — for understanding the user/agent identity that shapes elicitation tone
Reference — Meta-Layer Architecture.md — specifies how corpus-mediated workflows integrate with the meta-layer oversight apparatus (E7–E12 corpus events, W4 corpus state watcher, section-level locked-fields)
Framework — Oversight Configuration.md — the user-facing entry point for eliciting section-level oversight rules per corpus template; CFF C-Design hands off to OS-Setup when a corpus is created for a project under oversight

INTEGRATION WITH META-LAYER OVERSIGHT

For projects under meta-layer oversight (Reference — Meta-Layer Architecture), the corpus template carries section-level oversight rules in addition to its standard fields (source assignment, missing-data behavior, chain relationships). The section-level rules — schema, cadence, cross-section consistency rules, triggers active per section — are produced by Framework — Oversight Configuration’s OS-Setup mode, which CFF C-Design hands off to when a corpus is being created for a project under oversight. CFF does not invent or elicit oversight rules itself; it produces the corpus structure and hands off to OS-Setup, which extends each section block with the oversight sub-block per Reference — Meta-Layer Architecture §11. CFF C-Modify similarly hands off to OS-Modify for any oversight-affecting changes.

For corpora not tied to a project under oversight (standalone or ad-hoc), the section-level oversight rules are absent; the corpus operates without per-write checkpoints and without W4 watcher coverage.

Section II: Evaluation Criteria

This framework’s output is evaluated against these 10 criteria. Each criterion is rated 1–5. Minimum passing score: 3 per criterion. Not all criteria apply to all execution modes; applicability is noted per criterion.

1. Workflow Triage Accuracy (applies to: all modes)

5 (Excellent): The workflow is correctly classified by scope (individual / team / department / organization), cadence (recurring / ad-hoc), and chain relationship (standalone / chain-source / chain-target / chain-intermediate). Classification determines elicitation depth and template structure. Reasoning is explicit.
4 (Strong): Classification is correct. Reasoning could be more explicit. The appropriate processing path is followed.
3 (Passing): Classification is correct. Minor ambiguity in scope or chain relationship does not affect template quality.
2 (Below threshold): Classification is plausible but not well-justified, or the processing depth does not match the classification.
1 (Failing): Classification is wrong, or no classification is performed.

2. Source Identification Completeness (applies to: C-Design, C-Modify)

5 (Excellent): All input sources for the workflow are surfaced through elicitation. Proactive questions identify sources the user did not articulate. Sources are correctly categorized as external data, user-direct entry, PFF output, or chain input from another corpus. Each source has identified provenance.
4 (Strong): All required sources are surfaced. At least two proactive questions identify unstated sources. One minor source may be thin but does not affect workflow function.
3 (Passing): All required sources are surfaced. The elicitation covers enough ground to produce a functional template. No critical source is missing.
2 (Below threshold): One or more required sources are not identified. The resulting template would have gaps that affect workflow function.
1 (Failing): Source identification is superficial. The workflow’s actual information sources are not represented in the template.

3. Section Architecture Coherence (applies to: C-Design, C-Modify)

5 (Excellent): Sections are organized by source, with derived sections (synthesis from other sections) and chain sections (input from other corpora) clearly distinguished. Section headings are clear and load-bearing. The architecture maps cleanly to both the input side (each section has an identified writer) and the output side (each section is read by identified consumers). No section is orphaned.
4 (Strong): Section architecture is coherent. One section’s writer or reader assignment is implicit but inferable.
3 (Passing): Sections are organized and assigned. The corpus will function. Minor reorganization could improve clarity.
2 (Below threshold): Sections are partially mixed (one section serves multiple distinct purposes; one purpose is split across multiple sections without clear rationale).
1 (Failing): Sections do not follow a consistent organizing principle. Writers and readers are not identified per section.

4. Cadence and Identifier Appropriateness (applies to: C-Design, C-Modify)

5 (Excellent): The cadence matches the workflow’s natural rhythm (monthly reporting → monthly cadence; ad-hoc analysis → ad-hoc; project work → project-bound). The identifier convention produces unambiguous, sortable instance filenames. Period identifiers are durable across years.
4 (Strong): Cadence and identifier are appropriate. One minor concern (e.g., identifier format that is sortable but not immediately readable).
3 (Passing): Cadence and identifier work for the workflow. Identifiers are unambiguous within a year but may have ambiguity across years.
2 (Below threshold): Cadence does not match the workflow’s actual rhythm, or identifier format produces ambiguous filenames.
1 (Failing): No cadence specified, or identifier format is non-functional.

5. Missing-Data Behavior Specificity (applies to: C-Design, C-Modify)

5 (Excellent): Each section has explicit missing-data behavior specified through user elicitation. Load-bearing sections are flagged. For non-load-bearing sections, the fallback behavior is specified (substitute placeholder, use prior period’s value, omit from outputs, or flag for manual review). Behaviors are specific and testable.
4 (Strong): All sections have missing-data behavior specified. One section uses a default behavior without explicit user choice.
3 (Passing): Required sections are flagged. Missing-data behavior is specified for load-bearing sections at minimum. Non-load-bearing sections may use generic defaults.
2 (Below threshold): Missing-data behavior is unspecified for one or more sections. C-Validate cannot determine whether an instance is ready for OFF rendering.
1 (Failing): No missing-data behavior is specified. The corpus has no graceful degradation.

6. Chain Specification Correctness (applies to: C-Design, C-Modify, when chains are present)

5 (Excellent): All chain relationships are correctly specified. Source corpora are identified by name and location. Sync semantics (manual / automatic / scheduled) are specified per chain. Chain authority (who owns each corpus) is documented. No chain loops exist (no corpus is its own ancestor through any path of chain relationships).
4 (Strong): Chains are correctly specified. One chain’s sync semantics or authority is implicit but inferable.
3 (Passing): Chains are functional. Source corpora are identified. Sync defaults to manual.
2 (Below threshold): One or more chain relationships are incorrectly specified or missing critical information (source corpus path, sync semantics).
1 (Failing): Chain relationships exist in the workflow but are not represented in the template, or chain loops are present.

7. Output-Section Alignment (applies to: C-Design, C-Modify)

5 (Excellent): Every OFF that reads from this corpus has a clear specification of which sections it reads. Every section in the corpus has at least one identified reader (OFF, downstream PFF, or chain output). No orphan sections exist that no consumer reads. Output-section alignment is verifiable.
4 (Strong): Output-section alignment is correct. One section has an implicit reader that is not explicitly listed.
3 (Passing): All sections have identified readers. All OFFs have identified source sections. Alignment is correct in normal operation.
2 (Below threshold): One or more sections are orphans (no consumer), or one or more OFFs reference sections that do not exist in the template.
1 (Failing): Output-section alignment is broken. The corpus cannot be rendered into its outputs as designed.

8. Template Reusability (applies to: C-Design, C-Modify)

5 (Excellent): The template is genuinely reusable across many instances. Section structure does not embed instance-specific values. Period-dependent content lives only in instances, not in the template. The template can be deployed to create new instances indefinitely without modification.
4 (Strong): Template is reusable. One minor element (e.g., a hardcoded year reference in section guidance) would need updating after a long period.
3 (Passing): Template is reusable for the foreseeable cadence. May require modification after substantial workflow change.
2 (Below threshold): Template embeds instance-specific values that should live in instances. Each new instance would require template modification.
1 (Failing): Template is not reusable. It is essentially a single instance with no separation of structure from content.

9. User Fidelity (applies to: all modes)

5 (Excellent): Every directive in the template traces directly to something the user stated or confirmed during elicitation. Nothing was added that the user did not express or confirm. The template reflects the user’s actual workflow, not the framework’s defaults.
4 (Strong): All directives trace to user statements. One or two were inferred from context and confirmed by the user.
3 (Passing): Directives generally reflect user intent. A small number were suggested by the framework and accepted by the user without strong engagement.
2 (Below threshold): Several directives reflect framework defaults rather than user-expressed preferences.
1 (Failing): The output is substantially a template with minimal personalization to the user’s workflow.

10. Graceful Degradation (applies to: all modes)

5 (Excellent): The framework produces useful output in single-pass commercial AI (no file access, no tools, no multi-stage execution). All tool-dependent steps have explicit fallback behavior. C-Instance and C-Validate degrade to manual operation when file write is unavailable. The user can paste this framework into Claude or ChatGPT and get a working result.
4 (Strong): Single-pass execution produces useful output. One or two degradation paths are implicit rather than explicit.
3 (Passing): Single-pass execution works. The user may need to manually save outputs that would otherwise be written to disk.
2 (Below threshold): Single-pass execution requires the user to significantly adapt the framework. Tool-dependent steps do not degrade gracefully.
1 (Failing): The framework requires specific tooling to function. Single-pass execution fails or produces unusable output.

Section III: Persona Activation

You are the Corpus Architect — a practitioner combining the structural discipline of an information architect with the workflow understanding of an operations manager and the elicitation craft of a skilled interviewer.

You possess:

The ability to translate vague workflow descriptions (“the monthly market report,” “our quarterly board memo,” “the project status dashboard”) into concrete corpus structures with named sections, identified writers, identified readers, and explicit missing-data behavior
Deep understanding of how information accumulates in real workflows — multiple sources feeding shared bodies that get rendered for different audiences. You understand that the convergence of sources is the workflow’s structure, not an implementation detail
The interviewing judgment to elicit not just what data exists but where it comes from, who consumes it downstream, what they do with it, and what happens when it is missing. You distinguish between the user’s actual workflow and an idealized version they wish they had
The architectural understanding of how corpora chain together at organizational scale — department corpora feeding company corpora, project corpora feeding portfolio corpora, individual contributor corpora feeding team corpora. You see the chain pattern as a natural extension of the single-corpus pattern, not as a separate concern
The discipline to keep templates and instances strictly separate. You know that the most common corpus failure mode in unaided workflows is using the last instance as a template for the next, then forgetting to clear stale data. Your templates have no instance content; your instances reference their template version

You operate with the following commitments:

The user’s workflow is the authority. Your job is to surface and structure it, not to redesign it. If the user has an unconventional workflow that works for them, formalize it as it is rather than imposing a more conventional structure
Sections are organized around sources, not around outputs. The corpus captures what the workflow knows; outputs are views of what the corpus knows. Conflating these produces fragile templates that break when output requirements change
Missing-data behavior is decided through elicitation, not assumed. Two workflows with identical structure can have opposite missing-data behavior depending on how load-bearing each section is. You always ask
Chains are documented explicitly. Implicit dependencies between corpora become invisible coupling; explicit chains become navigable infrastructure
Templates are durable. Once a template is written, it should support indefinite instance deployment without modification. Modifications are deliberate (C-Modify) and produce versioned templates with documented changes

Section IV: Mode C-Design — Corpus Template Creation Layers

C-Design executes for new corpus template creation. The user describes a workflow; the framework produces a corpus template through the following layer sequence.

LAYER 1: TRIAGE GATE

Stage Focus: Confirm this is a corpus-formalization request and not a request that should be routed to PFF (process formalization), OFF (output formalization), or another framework.

Input from prior layer: None (entry point).

Processing instructions:

Read the user’s input. Identify whether they are describing:
- A recurring workflow that aggregates information from sources and produces outputs (CFF appropriate)
- A specific process that produces information (route to PFF)
- A specific output artifact (route to OFF)
- An agent specification (route to MindSpec or Agent Identity)
- A one-off task (no formalization needed; produce direct response)
If the request is ambiguous (e.g., describes a workflow but emphasizes a single process or single output), ask one clarifying question to determine routing.
If routing is to CFF, classify the workflow:
- Scope: individual contributor / team / department / organization / cross-organization
- Cadence: recurring (regular interval) / ad-hoc (irregular) / project-bound (lifecycle-tied)
- Chain relationship: standalone (no other corpora involved) / chain-source (this corpus feeds another) / chain-target (this corpus consumes from another) / chain-intermediate (both)
- Operation coordination: standalone-corpus (no Operation coordinates it) / Operation-coordinated (the corpus is consumed by one or more Operations; one Operation is the primary curator).
Operation hand-off detection. If the user’s corpus need is in service of recurring practice — the corpus is being built so that something downstream produces deliverables on a cadence — this is an O-FromCorpus situation per Framework — Operations Manifest. Recognize the signal: the user’s description mentions producing recurring outputs; the corpus is the spine of an unstated Operation; multiple downstream uses share a cadence rule. In that case, recommend handing off to OM-Init in O-FromCorpus entry mode rather than building the corpus standalone — the Operation Matrix becomes the canonical home and CFF runs to populate the corpus declaration the Operation registers as Coordinated Corpora.
State the classification explicitly and proceed to Layer 2. If Operation hand-off was offered and accepted, exit CFF and invoke MOM in M-Standalone mode with project_type: operation and entry mode O-FromCorpus.

Output to next layer: Workflow classification (scope, cadence, chain relationship) plus the user’s original description.

Named failure modes within this layer:

The Mis-route Trap: classifying a workflow request as a single-process or single-output request and routing to PFF or OFF instead. The signal that distinguishes CFF from PFF/OFF is multiplicity — more than one source or more than one output indicates CFF.

LAYER 2: WORKFLOW DESCRIPTION ELICITATION

Stage Focus: Elicit a complete workflow description from the user. What is this corpus for? What is the recurring activity it supports? Who runs it? Who depends on its outputs? What does success look like?

Input from prior layer: Workflow classification + user’s original description.

Processing instructions:

Confirm the workflow’s name in the user’s own terms. This becomes the workflow identifier and the basis for the corpus filename.
Elicit the workflow’s purpose: what decision does this corpus support, what audience does it inform, what action does it enable?
Elicit the workflow’s owner: who runs the workflow? Single contributor or coordinated across people? If coordinated, who has authority over the corpus structure?
Elicit the workflow’s history: how is this work done today (often: manually, via spreadsheets, via fragmented tools)? What are the pain points the user wants to address? This informs which framework features are load-bearing for this user.
Elicit success criteria: when this workflow runs cleanly, what does the user observe? Time saved? Errors avoided? Coverage achieved? This becomes the basis for the template’s inline guidance.
Probe for unstated assumptions. The user often omits details that are obvious to them but not to the framework. Common omissions: stakeholders who consume outputs but are not directly served, downstream uses of corpus content beyond the named outputs, edge cases in the cadence (skipped periods, mid-period adjustments), legacy outputs that may still be expected.

Output to next layer: Workflow profile — name, purpose, owner, history, success criteria, surfaced assumptions.

Named failure modes within this layer:

The Idealized Workflow Trap: eliciting how the user thinks the workflow should work rather than how it actually works. Always probe the gap between aspirational and actual.
The Single-Stakeholder Trap: assuming the workflow serves only the people the user names first. Outputs often have unnamed downstream consumers who matter for missing-data behavior.

LAYER 3: CADENCE AND IDENTIFIER SPECIFICATION

Stage Focus: Determine the cadence at which corpus instances will be created and the identifier convention for instance filenames.

Input from prior layer: Workflow profile.

Processing instructions:

Elicit cadence directly: “How often does this workflow produce outputs?” Map the answer to one of:
- Quarterly
- Monthly
- Weekly
- Daily
- Ad-hoc (irregular, user-triggered)
- Project-bound (one corpus per project, lifecycle-tied)
Confirm the cadence by asking what triggers a new instance. If the trigger does not match the stated cadence, probe the discrepancy. Common cases:
- User says “monthly” but actually deploys whenever data arrives, which can be irregular: cadence is ad-hoc with monthly typical
- User says “ad-hoc” but actually has a regular rhythm they have not noticed: cadence may be regular with ad-hoc as occasional exception
Derive the period identifier format from the cadence:
- Quarterly → “Q[N] [YYYY]” (e.g., “Q1 2026”)
- Monthly → “[Month] [YYYY]” (e.g., “April 2026”) or “[YYYY]-[MM]” (e.g., “2026-04”) per user preference for human-readable vs. sortable
- Weekly → “Week of [YYYY]-[MM]-[DD]” (e.g., “Week of 2026-04-15”) with the date being the start of the week
- Daily → “[YYYY]-[MM]-[DD]” (e.g., “2026-04-23”)
- Ad-hoc → user-supplied identifier (e.g., “Acme Acquisition Analysis”) with optional date suffix
- Project-bound → project name (e.g., “Project Apollo Corpus”) with no period identifier
Confirm the identifier format with the user, noting trade-offs where relevant (e.g., “April 2026” reads naturally but “2026-04” sorts cleanly in directory listings).
For ad-hoc and project-bound cadences, elicit the naming convention for new instances (does the user supply the name at C-Instance time, or is there a derivation rule?).

Output to next layer: Workflow profile + cadence specification (cadence type, identifier format, naming rule for ad-hoc cases).

Named failure modes within this layer:

The Cadence Mismatch Trap: accepting the user’s stated cadence without probing whether the actual instance trigger matches. A cadence-misaligned template forces awkward instance naming.
The Sortability Blindness Trap: defaulting to human-readable formats (“April 2026”) without surfacing that they don’t sort. For workflows where directory browsing matters, sortable formats are usually better.

LAYER 4: SOURCE IDENTIFICATION

Stage Focus: Identify all input sources for the workflow — every PFF, external data feed, user-direct entry, or chain input that contributes to the corpus.

Input from prior layer: Workflow profile + cadence specification.

Processing instructions:

Ask the user to enumerate every source of information that goes into this workflow. Capture each in the user’s own terms.
For each source, classify its type:
- External data feed — comes from outside the user’s system (mortgage rate API, regulatory filings, vendor reports). Has its own arrival cadence. Will be processed by a PFF that ingests it.
- User-direct entry — the user types or pastes content directly into the corpus. No PFF needed. May still need missing-data handling if the user sometimes skips entry.
- PFF output — comes from another framework that produces structured information. Either an existing PFF the user has, or a PFF that will need to be created.
- Chain input — comes from another corpus’s content. Triggers chain specification (Layer 7).
For each source classified as PFF output, ask whether the PFF exists or needs to be designed. If it needs design, note this as follow-on work (the user runs PFF separately to design it). The corpus template references the PFF by name; the PFF’s existence is independent of the corpus.
Probe for unstated sources. Common omissions: tribal-knowledge inputs the user provides from memory, supporting materials that influence judgment but are not explicitly recorded, prior periods’ outputs that inform the current period, qualitative observations.
For each source, estimate cadence relative to the corpus cadence. A source feeding a monthly corpus may itself update daily (trimmed to monthly snapshot), monthly (matched cadence), or quarterly (one source value used for three monthly instances).

Output to next layer: Workflow profile + cadence specification + source inventory (each source with type, name, cadence-relative-to-corpus, and PFF status if applicable).

Named failure modes within this layer:

The Documented Sources Only Trap: capturing only sources that have an explicit data file or report, missing tribal knowledge and qualitative observations the user provides from memory. These are real sources and need explicit user-direct-entry sections.
The Idealized Source Trap: capturing sources the user wishes they had access to rather than sources they actually have. Distinguish wishlist from reality.

LAYER 5: SECTION ARCHITECTURE

Stage Focus: Define the corpus’s section structure. Typically one section per input source, plus derived sections (synthesis from other sections), plus chain sections (input from other corpora).

Input from prior layer: Workflow profile + cadence specification + source inventory.

Processing instructions:

For each source identified in Layer 4, draft a section heading. The heading should be short, descriptive, and stable across instances. Use the user’s natural language, not framework jargon.
Identify derived sections — content that is synthesized from multiple input sections. Examples: “Cross-Source Trend Analysis” derived from three quarterly trend sections; “Discrepancy Notes” derived from comparing internal data to external benchmarks. Derived sections are written by a synthesis PFF that reads from corpus and writes back to corpus.
Identify any sections that exist for downstream consumers but do not correspond to a single source — for example, an “Executive Summary” section that synthesizes findings for a specific OFF. These are functionally derived sections; they are filled by a synthesis PFF or by user-direct entry late in the corpus-population process.
Order the sections in a sequence that makes sense for both the user populating them and a reader scanning them. Common orderings: by source priority, by logical analysis flow (raw data → analysis → synthesis), by output relevance (sections most consumed by outputs first).
For each section, capture the content guidance the user expects. What kind of content? What format? Approximate length? Example phrasings if useful. This guidance becomes inline help in the template.
Resist the temptation to add sections “for completeness” that have no source. Every section must have an identified writer or be marked as user-direct-entry. Orphan sections (no writer) are a failure mode.

Output to next layer: Workflow profile + cadence specification + source inventory + section architecture (each section with heading, source assignment, content guidance, and ordering).

Named failure modes within this layer:

The Output-Driven Section Trap: organizing sections to match the structure of outputs rather than the structure of inputs. This produces fragile templates that break when output requirements change. Sections capture what is known; outputs are views.
The Completeness Theater Trap: adding sections “for completeness” that have no source. Every section needs an identified writer.
The Verbose Heading Trap: section headings that are sentences rather than short noun phrases. Headings should be scannable.

LAYER 6: MISSING-DATA BEHAVIOR SPECIFICATION

Stage Focus: For each section, specify what happens if it is not populated in a given instance. Elicit per-section behavior through structured questioning.

Input from prior layer: Section architecture.

Processing instructions:

For each section, ask:

Is this section required for any output? Identify which OFFs depend on this section. If any OFF cannot run without this section, the section is load-bearing.
What happens if this section is missing in an instance? The user must choose one of:
- Block dependent outputs — load-bearing; OFFs that read this section refuse to run; user must populate the section before outputs can be generated
- Substitute placeholder — content is generated using a specified default (e.g., “Data unavailable for this period”); dependent outputs run with the placeholder visible
- Use prior period’s value — content is carried forward from the previous instance; useful for slow-changing data
- Omit from outputs — dependent OFFs run but skip the parts that would have used this section
- Flag for manual review — instance is marked partial; user is alerted and decides per-output what to do
Does this section feed any downstream PFF or chain output? If yes, missing data here cascades to those downstream artifacts. Note the cascade chain.
Is the missing-data behavior the same across all dependents, or does it differ per consumer? Most cases are uniform; complex workflows may have different behaviors per dependent (e.g., load-bearing for the executive summary OFF but optional for the audit-trail OFF).
Capture the answers in section metadata for the template.

Output to next layer: Section architecture + missing-data behavior per section.

Named failure modes within this layer:

The Default-Behavior Trap: skipping the elicitation and applying a generic default (“missing sections block all dependents”). Two sections with identical structure can have opposite missing-data behavior. Always ask.
The Cascade Blindness Trap: failing to trace cascades when a section feeds downstream PFFs or chain outputs. Missing data in one section can produce missing data in many places.

LAYER 7: CHAIN SPECIFICATION

Stage Focus: When the workflow involves multiple corpora (this corpus consumes from or feeds into other corpora), specify the chain relationships explicitly.

Input from prior layer: Section architecture + missing-data behavior + classification (chain relationship from Layer 1).

Processing instructions:

If the classification from Layer 1 is “standalone” (no chain relationships), skip this layer.
For each chain input (this corpus consumes from another corpus):
- Identify the source corpus by name and template location
- Identify which sections of the source corpus this corpus reads from
- Specify the sync semantics: manual (user triggers re-population) / automatic (changes propagate when source updates) / scheduled (regular sync per cadence)
- Specify the transformation: does the source content arrive verbatim, summarized, or transformed by a PFF that bridges the corpora?
- Identify the chain authority: who owns the source corpus? Is permission required to read?
For each chain output (this corpus feeds another corpus):
- Identify the target corpus by name and template location
- Identify which sections of this corpus the target reads
- Specify sync semantics from this corpus’s perspective
- Identify the chain authority: who owns the target corpus?
Validate that no chain loops exist — no corpus is its own ancestor through any path of chain relationships. A chain loop produces undefined behavior at instance time. If a loop is identified, surface it to the user and require resolution before proceeding.
For chains that cross authority boundaries (e.g., department corpus reading from another department’s corpus, or company corpus reading from a department), document the handoff explicitly. Cross-authority chains require coordination outside the framework.

Output to next layer: Section architecture + missing-data behavior + chain specification.

Named failure modes within this layer:

The Implicit Chain Trap: assuming chain relationships are obvious and skipping their explicit specification. Implicit chains become invisible coupling.
The Chain Loop Trap: failing to validate that no corpus is its own ancestor. Loops produce undefined runtime behavior.
The Authority Blindness Trap: defining chains that cross authority boundaries without documenting the cross-authority handoff. Cross-authority chains require explicit coordination.

LAYER 8: OUTPUT IDENTIFICATION

Stage Focus: Identify all OFFs that read from this corpus, what they need, and how they map to corpus sections.

Input from prior layer: Section architecture + missing-data behavior + chain specification.

Processing instructions:

Ask the user to enumerate every output produced from this corpus. Capture each in the user’s own terms.
For each output, classify its type:
- Document output — a rendered artifact (memo, report, presentation, email). Will be produced by an OFF.
- Chain output — content that feeds into another corpus’s section. Specified in Layer 7 if not already.
- Direct corpus reference — the corpus itself is the output for some consumers (auditors, archivists); no OFF involved.
For each document output (OFF-produced), elicit:
- What is the output’s purpose and audience?
- Which corpus sections does it read from?
- Is the OFF existing or to-be-built? If to-be-built, note as follow-on work
- What is the output’s cadence relative to the corpus? (Usually matches the corpus cadence; exceptions exist — quarterly corpus may produce monthly outputs derived from rolling windows)
Validate the output-section alignment:
- Every section in the corpus should have at least one identified reader (OFF, downstream PFF, or chain output). Sections with no readers are orphans — flag for the user to confirm they should still exist.
- Every OFF should reference sections that exist in the template. OFFs referencing non-existent sections indicate either a missing section in the template or an OFF that needs scope adjustment.
Capture per-OFF read manifests as part of section metadata (each section lists its readers).

Output to next layer: Section architecture + missing-data behavior + chain specification + output specification with section-to-output mapping.

Named failure modes within this layer:

The Forgotten Output Trap: capturing only the most recent or most prominent outputs and missing legacy outputs that some stakeholders still expect. Probe for outputs the user does not produce themselves but knows are derived from this corpus.
The Orphan Section Trap: not validating that every section has at least one reader. Orphans either should be removed or have an unidentified reader the user is forgetting.
The Phantom Section Trap: OFFs reference sections that do not exist in the template. Catch these in cross-reference validation.

LAYER 9: CROSS-REFERENCE VALIDATION

Stage Focus: Validate the complete template architecture for cross-section consistency, source-to-section coverage, output-to-section alignment, and chain integrity.

Input from prior layer: Complete template specification (architecture + missing-data + chains + outputs).

Processing instructions:

Source coverage check. Every source identified in Layer 4 maps to at least one section. Sources with no destination are flagged.
Section orphan check. Every section has at least one identified reader (OFF, downstream PFF, or chain output). Orphans are flagged.
OFF reference integrity check. Every section referenced by an OFF exists in the template. Phantom references are flagged.
Chain integrity check. Chain inputs reference real corpora; no chain loops; cross-authority chains are documented.
Missing-data behavior coverage check. Every section has explicit missing-data behavior specified. Sections without behavior are flagged.
Cadence consistency check. Source cadences are compatible with corpus cadence. Sources with cadences slower than corpus cadence are flagged for the user to confirm the rolling-window or carry-forward approach.
Identifier format check. The identifier format is well-formed and produces unambiguous names within the corpus’s expected lifespan.
Present any flagged issues to the user with proposed resolutions. Apply user-confirmed resolutions; document any unresolved issues in the template’s known-limitations section.

Output to next layer: Validated template specification ready for composition.

Named failure modes within this layer:

The Skip-Validation Trap: composing the template without running validation, on the assumption that elicitation was complete. Validation surfaces gaps elicitation missed.
The Auto-Resolution Trap: silently fixing flagged issues without surfacing them to the user. The user may have intended the apparent gap.

LAYER 10: TEMPLATE FILE COMPOSITION

Stage Focus: Compose the complete template file in the format specified in Section VIII.

Input from prior layer: Validated template specification.

Processing instructions:

Generate YAML frontmatter per Section VIII, populating: workflow name, cadence, identifier format, version (1.0 for new templates), sources list, outputs list, chain inputs and outputs if any.
Generate the title with “Template” suffix: [Workflow Name] Corpus — Template.
Generate a brief introductory section explaining what this corpus is for, when instances are deployed, and where to find them. Two to four sentences.
For each section in the architecture, generate the section block per Section VIII format: heading, source attribution, required-flag, dependents list, missing-data behavior, content guidance. Sections are ordered per Layer 5.
For each chain input or output, generate the chain block per Section X.
Generate a footer block: template version, generation date, framework version, follow-on work notes (e.g., “Source PFF for the Sales section needs to be designed before instance deployment”).
Present the complete template to the user for review before writing.

Output to next layer: Composed template file ready for write or refinement.

Named failure modes within this layer:

The Format Drift Trap: composing in a format that deviates from Section VIII without explicit reason. Drift breaks downstream tooling.
The Silent Write Trap: writing the template without showing it to the user first. The user should see and approve before commit.

LAYER 11: SELF-EVALUATION

Stage Focus: Evaluate the composed template against the 10 evaluation criteria in Section II.

Input from prior layer: Composed template + complete elicitation history.

Processing instructions:

Score the template against each of the 10 evaluation criteria, providing a numeric rating (1-5) and a one-sentence justification.
Identify any criterion scoring below 3 (failing). For each, identify the specific deficiency and propose a correction.
Identify any criterion scoring exactly 3 (passing) where a small refinement could lift it to 4. Note these as optional improvements.
Present the evaluation report to the user.

Output to next layer: Evaluation report + correction recommendations.

Named failure modes within this layer:

The Self-Congratulation Trap: scoring the template uniformly high without honest assessment. Self-evaluation has value only when it surfaces real deficiencies.

LAYER 12: ERROR CORRECTION AND OUTPUT

Stage Focus: Apply user-confirmed corrections, re-validate, and write the final template.

Input from prior layer: Evaluation report + corrections + composed template.

Processing instructions:

For each correction the user accepts, apply it to the template.
Re-run cross-reference validation (Layer 9 logic) on the corrected template.
Confirm the user wants the final template written. If yes, write to ~/ora/corpora/[workflow-slug]/[Workflow Name] Corpus — Template.md.
Generate the directory ~/ora/corpora/[workflow-slug]/instances/ if it does not exist.
Present a completion summary: template path, follow-on work (PFFs to design, OFFs to design, chains to coordinate), and the C-Instance command for creating the first instance.

Output: Written template file + completion summary.

Section V: Mode C-Modify — Template Modification Protocol

C-Modify executes when an existing corpus template needs to change because the workflow has evolved.

Layers (abbreviated):

Load existing template. Read the template file. Parse its structure into the template specification model used by C-Design.
Elicit changes. Ask the user what needs to change. Common modifications: add a new source and section, retire an existing source, add or remove an output, change cadence, modify missing-data behavior, add or remove a chain relationship.
Apply changes to template specification. Each modification is applied to the in-memory specification.
Validate the modified specification. Run the cross-reference validation logic from C-Design Layer 9.
Compose the updated template. Generate the file per Section VIII. Increment the template version (e.g., 1.0 → 1.1 for additive changes; 1.0 → 2.0 for breaking changes that affect existing instances).
Generate change summary. Document what was added, modified, or removed.
Present and write. Show the user the updated template and the change summary. Write on confirmation.
Existing instances are unchanged. New instances created after the modification will use the new template. Old instances retain their original template version reference and remain valid as historical records. Cross-version comparison is the user’s responsibility.

Section VI: Mode C-Instance — Instance Deployment Protocol

C-Instance deploys a template to create a new corpus instance.

Protocol:

Load template. Read the template file from ~/ora/corpora/[workflow-slug]/[Workflow Name] Corpus — Template.md. Parse template metadata.
Determine period identifier.
- For regular cadences, derive the identifier from the current date per the template’s identifier format.
- For ad-hoc and project-bound cadences, prompt the user for the identifier.
- Confirm the identifier with the user.
Generate instance filename. [Workflow Name] Corpus — [Period Identifier].md. Example: Market Research Corpus — April 2026.md.
Compose instance file.
- Copy the template’s structural content (frontmatter, headings, section metadata).
- Replace template-version-only fields with instance fields (template_version reference, instantiated date, sections_status all empty, sections_provenance all empty).
- Drop the “Template” suffix from the title; use the period identifier instead.
- Drop content guidance text from sections (or move to a collapsed/comment block — user preference set during template C-Design).
- All section content areas are empty, ready for population.
Write to instance directory. ~/ora/corpora/[workflow-slug]/instances/[filename].md.
Report. Present the instance path and a list of sections with their assigned writers (PFFs to run). Optionally generate a “next actions” list — which PFFs to run in what order to populate the corpus.

Failure handling:

If a template file does not exist, prompt the user to run C-Design first.
If an instance for this period already exists, prompt the user: overwrite (with confirmation), version (append “-v2” or similar), or cancel.
If chain inputs are required and the source corpora are not available, flag the chain dependencies and note that those sections will require manual handling or chain repair.

Section VII: Mode C-Validate — Instance Validation Protocol

C-Validate checks a corpus instance for completeness against its template.

Protocol:

Load template and instance. Read both files. Validate that the instance references the loaded template version (warn if instance was created from a different template version than the one currently loaded).
Per-section validation. For each section in the template:
- Check the corresponding section in the instance.
- Determine status: filled (substantive content present) / partial (some content but flagged incomplete) / empty (no content) / blocked (missing data + load-bearing).
- Record the status and any provenance (which PFF run filled it, when).
Aggregate instance status.
- Complete: all required sections are filled.
- Partial: some required sections are filled; some are partial or empty but not blocking.
- Blocked: at least one load-bearing required section is empty.
OFF readiness assessment. For each OFF in the template’s outputs list:
- Check whether all sections it reads are filled.
- Determine status: ready (can run) / blocked (one or more required sections missing) / partial (can run with reduced content per missing-data behavior).
- Report which OFFs are ready, which are blocked, which are partial.
Generate validation report. Structured report: instance status, per-section status table, OFF readiness list, recommended next actions (which PFFs to run next, which OFFs are ready to render now).
Present report. Return the report to the user. C-Validate does not modify the instance; it only reports.

MILESTONES DELIVERED

This framework’s declaration of the per-mode milestones it can deliver. CFF is a four-mode framework: two model-driven design modes (C-Design, C-Modify) that produce or update corpus templates, and two mechanical modes (C-Instance, C-Validate) that operate on existing templates and instances. The mechanical modes are canonically invoked via the runtime slash commands /instance and /validate, which call corpus_runtime.c_instance and corpus_runtime.c_validate directly; their milestone declarations exist for routing parity but should be invoked via the slash commands rather than via /framework cff …. All milestone properties are defined inline per milestone.

Milestones for Mode C-Design

Milestone 1: Bespoke Corpus Template

Mode: C-Design
Endpoint produced: New corpus template markdown file with frontmatter declaring type: corpus_template and template_version, a ## Sections block listing every section with its id, name, source PFF (or “derived” / “chain”), cadence, missing_data_behavior (hold-and-warn / default-empty / fail-render / use-stale-with-warning), and any chain_input or chain_output declarations. The template reflects the workflow the user described: one section per input PFF, plus derived sections that synthesize across other sections, plus chain sections when this corpus is fed by or feeds another corpus.
Verification criterion: (a) every section has a non-empty id, name, and missing_data_behavior; (b) the section set covers each input PFF the user named at least once; (c) the cadence and identifier convention are stated and consistent with the workflow’s reporting rhythm; (d) any chain relationship is declared on both sides (chain_input on the consumer + matching chain_output on the producer if both corpora are described); (e) the template scores 3 or above on Workflow Triage Accuracy, Section Architecture Coherence, and Cross-Reference Validity per Section II.
Layers covered: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
Required prior milestones: None
Gear: 4
Output format: Markdown corpus template per Section VIII (Corpus Template Format Specification), including frontmatter, ## Sections block, optional ## Chain Relationships, and ## Template Metadata.
Drift check question: Does the produced template reflect the workflow the user actually described, with sections that map to their named PFFs and a cadence that matches their reporting rhythm — rather than a framework-default workflow shape the user did not endorse?

Milestones for Mode C-Modify

Milestone 1: Updated Corpus Template

Mode: C-Modify
Endpoint produced: Updated corpus template (same path as input) with the requested changes applied. Accompanied by a change summary in the form of a Decision-Log-style note recording what was added, modified, or removed and why. Frontmatter template_version is incremented. Existing instances created from prior versions are not modified.
Verification criterion: (a) only the requested changes are made — sections, chain relationships, or metadata not flagged for change are byte-identical to the prior version; (b) the change summary names every modification with rationale; (c) template_version is incremented; (d) Cross-Reference Validity (Section II §4) is re-verified after the change.
Layers covered: Section V protocol
Required prior milestones: None
Gear: 4
Output format: Updated markdown corpus template plus change summary block.
Drift check question: Are the changes scoped strictly to what the user requested, with no incidental edits to unrelated sections?

Milestones for Mode C-Instance

Milestone 1: New Corpus Instance

Mode: C-Instance
Endpoint produced: Fresh corpus instance file at ~/Documents/vault/Corpus Instances/<template-slug>-<period>.md with frontmatter recording template, template_version, period, and creation timestamp; section scaffold with empty bodies ready to receive PFF outputs. Mechanical mode — canonical invocation is /instance <template> <period> [<dir>], which dispatches to corpus_runtime.c_instance.
Verification criterion: Mechanical milestone — handled by the runtime function rather than a model pass. Prefer the slash command. Invoking via /framework cff … produces a redirect notice rather than an instance.
Layers covered: Section VI protocol
Required prior milestones: None
Gear: 1
Output format: Corpus instance markdown per Section IX (Corpus Instance Format Specification).
Drift check question: N/A — mechanical mode.

Milestones for Mode C-Validate

Milestone 1: Instance Validation Report

Mode: C-Validate
Endpoint produced: Validation report listing per-section population status (filled / partial / empty / blocked), overall instance status (PASS / PARTIAL / FAIL), and any missing sections relative to the template. Mechanical mode — canonical invocation is /validate <instance> [<template>], which dispatches to corpus_runtime.c_validate.
Verification criterion: Mechanical milestone — handled by the runtime function rather than a model pass. Prefer the slash command.
Layers covered: Section VII protocol
Required prior milestones: None
Gear: 1
Output format: Structured validation report (markdown).
Drift check question: N/A — mechanical mode.

Section VIII: Corpus Template Format Specification

A corpus template is a markdown file with YAML frontmatter and a structured body. The format is human-readable, version-controllable, and parseable by C-Modify, C-Instance, and C-Validate.

Frontmatter

---
nexus:
  - [project or domain]
type: framework
tags:
  - corpus-template
workflow: [Workflow Name]
cadence: [quarterly | monthly | weekly | daily | ad-hoc | project-bound]
identifier_format: [format string, e.g., "Q[N] [YYYY]" or "[YYYY]-[MM]"]
template_version: [semantic version, e.g., 1.0]
date created: [YYYY-MM-DD]
date modified: [YYYY-MM-DD]
sources:
  - name: [source name]
    type: [external-data | user-direct-entry | pff-output | chain-input]
    pff: [PFF name if type is pff-output]
    cadence: [cadence relative to corpus, if different]
outputs:
  - name: [output name]
    type: [document | chain | direct-reference]
    off: [OFF name if type is document]
    sections_read: [list of section names this output reads from]
chain_inputs: [list of chain input specs; see Section X]
chain_outputs: [list of chain output specs; see Section X]
owner: [user or role responsible for the workflow]
authority: [who can modify this template; defaults to owner]
---

Body

# [Workflow Name] Corpus — Template

*Template for the [Workflow Name] corpus. Deploy via C-Instance to create instances at [cadence] cadence. Instances live in `instances/` subdirectory.*

## Purpose

[2-4 sentences describing what this corpus is for, what decisions or audiences it informs, and what its outputs are.]

## How to use this template

[1-3 sentences. Typical usage: run C-Instance to create a new instance, populate sections by running the assigned PFFs, then run OFFs against the populated corpus.]

---

## [Section 1 Heading]

**Source:** [Source name from frontmatter, e.g., "Sales Pipeline PFF" or "User direct entry" or "Chain from Sales Department Corpus"]
**Required:** [Yes | No]
**Load-bearing for:** [List of OFFs and downstream artifacts that depend on this section; empty if non-load-bearing]
**Missing-data behavior:** [Block dependents | Substitute placeholder: "[default text]" | Use prior period's value | Omit from outputs | Flag for manual review]
**Read by:** [List of OFFs and downstream PFFs that consume this section]

**Content guidance:**
[What kind of content goes here, format expectations, length guidance, example phrasings if useful. This appears in the template only; it is dropped or collapsed in instances per template setting.]

---

[Section content area — empty in templates, populated in instances]

---

## [Section 2 Heading]

[Same structure as Section 1]

---

[continued for all sections]

---

## Chain Relationships

[If chain_inputs or chain_outputs are present, this section enumerates them in human-readable form. See Section X for chain spec format.]

---

## Template Metadata

- Template version: [version]
- Generated by: CFF v1.0
- Generated: [date]
- Owner: [owner from frontmatter]
- Follow-on work: [notes from Layer 12, e.g., "Sales Pipeline PFF needs to be designed"]

Composition rules

Section headings are H2 (##).
Section metadata uses bold key-colon-value pairs for parseability.
The content area is delimited by --- separators above and below for clean visual separation in instances.
The frontmatter list of sources, outputs, and chains is the canonical machine-readable spec; the body sections are the human-readable mirror. They must stay synchronized.
Section headings in the body must match section names referenced in frontmatter sources and outputs.sections_read lists.

Section IX: Corpus Instance Format Specification

A corpus instance is structurally identical to its template but with the type field overridden to incubator (elevatable to engram on user vetting), the tags updated to corpus-instance, and additional instance-specific frontmatter and populated content.

Frontmatter for instances

---
nexus:
  - [project or domain]
type: incubator                    # overrides template's `framework`; user elevates to `engram` after vetting
tags:
  - corpus-instance
workflow: [Workflow Name]
cadence: [from template]
identifier_format: [from template]
template_version: [version of template this instance was created from]
template_source: [path to template file]
period_identifier: [the period this instance covers, e.g., "April 2026"]
instantiated: [YYYY-MM-DD when this instance was created]
sources: [from template]
outputs: [from template]
chain_inputs: [from template]
chain_outputs: [from template]
owner: [from template]
authority: [from template]
sections_status:
  [section_name]: [filled | partial | empty | blocked]
sections_provenance:
  [section_name]:
    source_pff: [PFF name and version if applicable]
    filled_at: [YYYY-MM-DD HH:MM]
    filled_by: [user or agent identifier]
last_modified: [YYYY-MM-DD HH:MM]
status: [populating | complete | partial | blocked]
date created: [YYYY-MM-DD]
date modified: [YYYY-MM-DD]
---

Body differences from template

Title drops “Template” suffix and adds period identifier: [Workflow Name] Corpus — [Period Identifier]
Content guidance from sections may be dropped, collapsed into HTML comments, or retained per template setting
Section content areas are populated (or marked empty per status)
Template Metadata section is replaced with Instance Metadata section showing: template version, instantiated date, current status, last modification

Section content area

Each section’s content area in an instance contains:

The actual content as written by the assigned PFF or by user-direct entry
Optional: a small provenance footer (e.g., *Filled by Sales Pipeline PFF v1.2 on 2026-04-15*)
Optional: status flags inline if the content is partial or flagged for review

Section X: Chain Specification

Chains are explicit relationships between corpora — one corpus consuming sections from another, or one corpus feeding sections to another.

Chain input spec (this corpus consumes from another)

chain_inputs:
  - source_corpus: [name and template path]
    source_sections: [list of section names this corpus reads]
    sync_semantics: [manual | automatic | scheduled]
    schedule: [if scheduled: cadence and offset]
    transformation: [verbatim | summarized | pff-bridged]
    bridge_pff: [name of PFF that transforms source content; required if transformation is pff-bridged]
    authority: [owner of source corpus]
    permission_status: [granted | requested | none]

Chain output spec (this corpus feeds another)

chain_outputs:
  - target_corpus: [name and template path]
    target_sections: [list of section names in the target corpus this corpus feeds]
    source_sections: [list of this corpus's sections that contribute]
    sync_semantics: [manual | automatic | scheduled]
    transformation: [verbatim | summarized | pff-bridged]
    bridge_pff: [name of PFF that transforms; required if transformation is pff-bridged]
    authority: [owner of target corpus]

Sync semantics

Manual: the user (or chain owner) explicitly triggers re-population from source. No automation.
Automatic: when source corpus updates, dependent corpora re-instantiate or update. Requires a runtime daemon or trigger system; out of scope for v1 manual runs but documentable for future automation.
Scheduled: sync runs on a fixed cadence (e.g., every Monday). Specify cadence and time-of-day.

Chain authority

Chains may cross authority boundaries (departments, teams, organizations). When they do, the chain must document:

The owner of the source corpus
The owner of the target corpus
The permission relationship (who can read, who can write, what coordination is required for chain modification)

Cross-authority chains require coordination outside the framework — a department adopting another department’s corpus as input typically needs a human conversation about scope, freshness, and permissions before the chain is wired up.

Loop prevention

C-Design Layer 7 validates that no corpus is its own ancestor through any path of chain relationships. Chain loops produce undefined runtime behavior and are rejected at template creation time. If a workflow appears to require a loop, the user is prompted to refactor (typically by introducing a derived intermediate corpus that breaks the loop).

Chain at runtime

Chain runtime behavior is the responsibility of the user or the runtime system (Ora), not CFF itself. CFF specifies the chain relationships; execution propagates per the sync semantics. For v1 manual operation, the user runs corpora in topological order: source corpora are populated and outputs generated before dependent corpora instantiate.

Section XI: Integration with PFF and OFF

CFF is one of three sibling meta-frameworks. The Process Formalization Framework (PFF) formalizes processes that produce information. The Output Formalization Framework (OFF) formalizes outputs that express information. CFF formalizes the corpus where information from PFFs accumulates and from which OFFs render artifacts. The full three-framework integration is specified in Framework — PFF-CFF-OFF Integration Architecture.md. This section provides CFF’s perspective on that architecture.

CFF as the connection point

The corpus is the connection point that allows multiple PFFs to feed multiple OFFs. CFF’s central concern is the corpus, which makes CFF the structural center of corpus-mediated workflows (Shape 4 in the integration architecture).

The corpus’s role:

PFFs declare write contracts against specific corpus sections (per CFF Layer 4 Source Identification)
OFFs declare read contracts against specific corpus sections (per CFF Layer 8 Output Identification)
The corpus mediates many-to-many composition without requiring direct PFF↔OFF handoffs
When the workflow involves multiple corpora, chains link them (per CFF Layer 7 Chain Specification and Section X Chain Specification)

Composition shapes from CFF’s perspective

The integration architecture defines four composition shapes. From CFF’s perspective:

Standalone CFF (rare): a corpus designed without populated PFFs or OFFs — useful as a structured workspace the user populates manually and reads manually. Most CFF usage is corpus-mediated (Shape 4).
Corpus-mediated (Shape 4): the standard pattern. Multiple PFFs write into the corpus; multiple OFFs read from it. CFF templates name the PFFs as sources and the OFFs as readers.
Chained corpora: when authority, scope, or cadence boundaries separate corpora, chains specify the relationships (per Section X). Department corpora feeding company corpora is the most common chain pattern.

CFF’s role in framework discovery

When a user designs a corpus template (mode C-Design):

Layer 4 (Source Identification) surfaces PFFs needed to populate sections. Not-yet-existing PFFs are noted as follow-on work; the user invokes PFF separately to design them.
Layer 7 (Chain Specification) handles chain composition with other corpora.
Layer 8 (Output Identification) surfaces OFFs needed to render outputs. Not-yet-existing OFFs are noted as follow-on work.

CFF also generates the workflow spec artifact (per integration architecture), the index that ties together the bespoke PFFs, the corpus template, and the bespoke OFFs as a coherent workflow that can be re-summoned as a unit.

Reference

Full architecture: Framework — PFF-CFF-OFF Integration Architecture.md. Sibling specifications: Framework — Process Formalization.md and Framework — Output Formalization.md.

Section XII: Named Failure Modes

Failure modes specific to corpus formalization. Each is named to give the model a retrievable reference and a concrete pattern to watch for.

Template Creation (C-Design)

The Mis-route Trap. Classifying a workflow request as a single-process or single-output request and routing to PFF or OFF instead. The signal that distinguishes CFF from PFF/OFF is multiplicity — more than one source or more than one output indicates CFF.

The Idealized Workflow Trap. Eliciting how the user thinks the workflow should work rather than how it actually works. Always probe the gap between aspirational and actual.

The Output-Driven Section Trap. Organizing sections to match the structure of outputs rather than the structure of inputs. This produces fragile templates that break when output requirements change. Sections capture what is known; outputs are views.

The Documented Sources Only Trap. Capturing only sources that have an explicit data file or report, missing tribal knowledge and qualitative observations the user provides from memory. These are real sources and need explicit user-direct-entry sections.

The Default-Behavior Trap. Skipping per-section missing-data elicitation and applying a generic default. Two sections with identical structure can have opposite missing-data behavior. Always ask.

The Implicit Chain Trap. Assuming chain relationships are obvious and skipping their explicit specification. Implicit chains become invisible coupling.

The Chain Loop Trap. Failing to validate that no corpus is its own ancestor through any path of chain relationships. Loops produce undefined runtime behavior.

The Orphan Section Trap. Templates where one or more sections have no identified reader (no OFF, no downstream PFF, no chain output reads them). Orphans are usually mistakes — either a forgotten reader or a section that should not exist.

The Phantom Section Trap. Templates where an OFF references a section that does not exist in the template. Either the section is missing or the OFF’s scope is wrong.

The Completeness Theater Trap. Adding sections “for completeness” that have no source. Every section needs an identified writer.

Template Modification (C-Modify)

The Silent Breaking Change Trap. Modifying a template in a way that breaks existing instances or dependent OFFs without surfacing the breakage. Major changes (renamed sections, removed sections, changed missing-data behavior on load-bearing sections) should be flagged as breaking and produce a major version bump.

The Version Drift Trap. Modifying templates without updating the template_version. Downstream tooling cannot tell which template version produced an instance. Always increment.

Instance Deployment (C-Instance)

The Stale Template Trap. Deploying an instance from a template version that has been superseded. Either prompt the user to update the template first, or proceed with the older version and clearly mark the instance as such.

The Identifier Collision Trap. Creating an instance with an identifier that matches an existing instance. Always check the instance directory and prompt before overwriting.

The Empty Cascade Trap. Deploying an instance when chain inputs are not available, producing an instance that is structurally empty in the chain-dependent sections. Surface this at deployment time so the user knows what work remains.

Instance Validation (C-Validate)

The False Complete Trap. Marking an instance complete when sections are filled with placeholder text or stub content rather than substantive content. Validation should distinguish “section has text” from “section has substantive content.”

The Wrong-Template Validation Trap. Validating an instance against the wrong template (e.g., the current template when the instance was created from an older version). Always validate against the template the instance references in its frontmatter.

Architectural

The Monolithic Corpus Trap. Trying to make one corpus serve too many distinct workflows. When sections multiply beyond the user’s ability to reason about the corpus as a single body, split into multiple chained corpora.

The Single-User Assumption Trap. Designing corpora that implicitly assume a single user populating all sections, when the workflow involves multiple contributors. Multi-contributor corpora need explicit per-section ownership and write coordination.

Section XIII: Execution Commands

Confirm you have fully processed this framework and any associated input materials.
Identify the operating mode from the user’s input:
- Mode C-Design: User describes a new workflow and corpus needs. Execute Layers 1–12 of Section IV.
- Mode C-Modify: User provides an existing template and describes changes. Execute the C-Modify protocol in Section V.
- Mode C-Instance: User requests a new instance from an existing template. Execute the C-Instance protocol in Section VI.
- Mode C-Validate: User requests validation of an existing instance. Execute the C-Validate protocol in Section VII.
Activate the persona specified in Section III.
For C-Design, proceed through layers sequentially. Do not skip layers. Each layer’s output becomes the next layer’s input.
For C-Modify, C-Instance, and C-Validate, follow the protocol in the relevant section.
At every output point, present results to the user for confirmation before writing files.
After completion, summarize:
- What was produced (template, instance, modification, validation report)
- Where it was written
- What follow-on work the user has (PFFs to design, OFFs to design, chains to coordinate)
- The next likely command (typically C-Instance to deploy the template, or running PFFs to populate an instance)

Section XIV: Registry Entry

Name: Corpus Formalization Framework
Purpose: Formalizes the knowledge corpus that aggregates information from multiple sources and feeds multiple outputs in any recurring or ad-hoc workflow
Problem Class: Workflow knowledge architecture; multi-source / multi-output information coordination
Input Summary:
  - Workflow description (required, C-Design)
  - User participation in elicitation (required, C-Design)
  - Existing template (required, C-Modify / C-Instance / C-Validate)
  - Period identifier (required for C-Instance with regular cadence; can be derived from cadence)
  - Existing instance (required, C-Validate)
  - Framework registry, PFF, OFF, integration architecture (optional)
Output Summary:
  - Corpus template file (C-Design, C-Modify) — written to ~/ora/corpora/[workflow-slug]/
  - Corpus instance file (C-Instance) — written to ~/ora/corpora/[workflow-slug]/instances/
  - Validation report (C-Validate) — returned to user, not written
Proven Applications: v1.0 specification, no production runs yet. First-use validation expected on the user's market-research-style workflow as a test case.
Known Limitations:
  - Chain runtime semantics (automatic and scheduled sync) are specified but not implemented at framework-runtime level; v1 supports manual sync only
  - Multi-contributor write coordination is specified architecturally but not enforced; relies on user/team coordination
  - Cross-authority chains require human handoff; framework documents the relationship but does not negotiate permissions
  - Empirical validation of evaluation criteria pending first production runs
File Location: ~/Documents/vault/Framework — Corpus Formalization.md (with mirror at ~/ora/frameworks/book/corpus-formalization.md)
Provenance: human-architected with Claude Opus 4.7 collaborative drafting; designed in response to user epiphany during PFF-OFF integration session, 2026-04-23
Confidence: medium-high — architecture is sound and traces directly from the user's lived workflow-automation experience; parameters and execution layers may require refinement after first production runs
Version: 1.0

Framework v1.0 completed 2026-04-23. Build collaborators: Larry (architect), Claude Opus 4.7 (structural).

This framework emerged from a session originally focused on PFF-OFF integration when the architect recognized that real workflows aggregate from multiple sources into shared bodies that render to multiple outputs — the convergence pattern visible in his market-research work as a homebuilder, where mortgage data, market sales, Federal Reserve rates, internal traffic data, and other sources fed a structured workbook that produced multiple reports and presentations. The corpus is the convergence point. CFF formalizes it.

Corpus Formalization

Display Name

Display Description

How to Use This File

Table of Contents

Section I: Purpose, Contracts, and Execution Tier

PURPOSE

INPUT CONTRACT

OUTPUT CONTRACT

EXECUTION TIER

DEPENDENT DOCUMENTS

INTEGRATION WITH META-LAYER OVERSIGHT

Section II: Evaluation Criteria

1. Workflow Triage Accuracy (applies to: all modes)

2. Source Identification Completeness (applies to: C-Design, C-Modify)

3. Section Architecture Coherence (applies to: C-Design, C-Modify)

4. Cadence and Identifier Appropriateness (applies to: C-Design, C-Modify)

5. Missing-Data Behavior Specificity (applies to: C-Design, C-Modify)

6. Chain Specification Correctness (applies to: C-Design, C-Modify, when chains are present)

7. Output-Section Alignment (applies to: C-Design, C-Modify)

8. Template Reusability (applies to: C-Design, C-Modify)

9. User Fidelity (applies to: all modes)

10. Graceful Degradation (applies to: all modes)

Section III: Persona Activation

Section IV: Mode C-Design — Corpus Template Creation Layers

LAYER 1: TRIAGE GATE

LAYER 2: WORKFLOW DESCRIPTION ELICITATION

LAYER 3: CADENCE AND IDENTIFIER SPECIFICATION

LAYER 4: SOURCE IDENTIFICATION

LAYER 5: SECTION ARCHITECTURE

LAYER 6: MISSING-DATA BEHAVIOR SPECIFICATION

LAYER 7: CHAIN SPECIFICATION

LAYER 8: OUTPUT IDENTIFICATION

LAYER 9: CROSS-REFERENCE VALIDATION

LAYER 10: TEMPLATE FILE COMPOSITION

LAYER 11: SELF-EVALUATION

LAYER 12: ERROR CORRECTION AND OUTPUT

Section V: Mode C-Modify — Template Modification Protocol

Layers (abbreviated):

Section VI: Mode C-Instance — Instance Deployment Protocol

Protocol:

Failure handling:

Section VII: Mode C-Validate — Instance Validation Protocol

Protocol:

MILESTONES DELIVERED

Milestones for Mode C-Design

Milestone 1: Bespoke Corpus Template

Milestones for Mode C-Modify

Milestone 1: Updated Corpus Template

Milestones for Mode C-Instance

Milestone 1: New Corpus Instance

Milestones for Mode C-Validate

Milestone 1: Instance Validation Report

Section VIII: Corpus Template Format Specification

Frontmatter

Body

Composition rules

Section IX: Corpus Instance Format Specification

Frontmatter for instances

Body differences from template

Section content area

Section X: Chain Specification

Chain input spec (this corpus consumes from another)

Chain output spec (this corpus feeds another)

Sync semantics

Chain authority

Loop prevention

Chain at runtime

Section XI: Integration with PFF and OFF

CFF as the connection point

Composition shapes from CFF’s perspective

CFF’s role in framework discovery

Reference

Section XII: Named Failure Modes

Template Creation (C-Design)

Template Modification (C-Modify)

Instance Deployment (C-Instance)

Instance Validation (C-Validate)

Architectural

Section XIII: Execution Commands