Constraint-Based Architecture
The extraction engine doesn't use hardcoded prompts. It resolves constraints from a database, layered by scope. Adding a new extraction field or changing how a field is extracted requires zero code changes — only a row in prompt_specifications.
The Three-Tier Resolution
Every extraction prompt is assembled at runtime from constraints stored in the prompt_specifications table. Constraints resolve in a strict hierarchy:
GLOBAL constraints ← Platform-wide defaults (every tenant gets these) ↓ override / extend / disable INDUSTRY constraints ← Vertical-specific (construction, financial, CRM) ↓ override / extend / disable TENANT constraints ← Per-customer customization (PAD's specific fields) ↓ RESOLVED CONSTRAINT SET → Passed to buildPromptFromConstraints() ↓ → Generates Claude Vision API prompt EXTRACTION RESULT → Structured JSON matching the resolved fields
Each constraint row defines a field with its group (component, group, page, keying), data type, extraction instructions, validation rules, and sort order. Higher scopes can replace individual fields, add new ones, or disable inherited ones.
Traditional document extraction products hardcode their prompts. When a customer needs a new field (e.g., "panic hardware rating"), it's a code change, a deploy, and a prayer. In this architecture, it's an INSERT into prompt_specifications. The extraction engine resolves it automatically on the next run.
This is also the bridge to multi-vertical: construction fields live at industry scope. CRM fields live at a different industry scope. The engine is identical — only the constraints differ.
Implementation: resolveConstraints()
Located at hardware-schedule-extractor.js:2871. The function:
- Loads all
scope_level = 'global'constraints fromprompt_specifications - Resolves
industry_idfrom the tenant (if not explicitly provided) - Applies
scope_level = 'industry'overrides — merge, replace, or delete fields - Applies
scope_level = 'tenant'overrides — same merge logic - Sorts by
field_groupandsort_order - Returns the resolved constraint set with
spec_version,scope_chain, andresolved_at
The resolved set is then passed to buildPromptFromConstraints() (:2983), which generates the Claude Vision API prompt. Every extraction is audited in constraint_executions with the prompt hash and input/output hashes for reproducibility.
The Schema: prompt_specifications
id TEXT PRIMARY KEY spec_version TEXT NOT NULL -- Schema version for backward compat scope_level TEXT NOT NULL -- 'global' | 'industry' | 'tenant' DEFAULT 'global' industry_id TEXT -- NULL for global, industry code for industry scope tenant_id TEXT -- NULL for global/industry, tenant ID for tenant scope field_group TEXT NOT NULL -- 'component' | 'group' | 'page' | 'keying' field_name TEXT NOT NULL -- e.g., 'manufacturer', 'fire_rating', 'panic' field_type TEXT DEFAULT 'string' -- Data type hint for extraction extraction_instruction TEXT -- Natural language instruction for the AI validation_rule TEXT -- Optional validation regex/rule sort_order INTEGER DEFAULT 0 -- Ordering within group active INTEGER DEFAULT 1 -- Soft delete field_aliases TEXT -- Alternative names (migration 033)
SCHEDULE_TYPE_REGISTRY
The constraint architecture extends to schedule type routing. SCHEDULE_TYPE_REGISTRY in hardware-schedule-extractor.js maps detected document types to extraction functions, target tables, and constraint scopes. Adding a new document type is one registry entry plus the constraint rows for its fields.
| Schedule Type | Extraction Function | Target Table | Status |
|---|---|---|---|
| hardware_schedule | extractFromPageImage() | hardware_page_extractions | DEPLOYED |
| door_schedule | extractDoorSchedule() | door_schedule_entries | DEPLOYED |
| electrical_panel | extractGenericSchedule() | hardware_page_extractions | EXTENSIBLE |
Zero-Code Extensibility in Practice
| Want to... | Do this | Code changes? |
|---|---|---|
| Add a field to all extractions | INSERT into prompt_specifications with scope_level='global' | Zero |
| Add a construction-specific field | INSERT with scope_level='industry', industry_id='construction' | Zero |
| Override a field for one customer | INSERT with scope_level='tenant', tenant_id=customer's ID | Zero |
| Disable a field for one customer | INSERT with active=0 at tenant scope (overrides global) | Zero |
| Add a new document type | Add entry to SCHEDULE_TYPE_REGISTRY + constraint rows | One registry entry |
Consent & Sovereignty
The core innovation is the affirmation boundary. Nothing enters production data until a human consents. Every consent is immutably logged with entity snapshots, user identity, and timestamps. The audit log is the product.
The Affirmation Boundary
AI extraction is probabilistic. Hardware schedules are messy. OCR makes mistakes. The system handles this with a simple rule: AI proposes, humans affirm. Data lives in staging tables (hardware_page_extractions) until explicitly approved. Only then does it materialize into production tables (hardware_sets, hardware_components).
PDF Upload ↓ AI Extraction (Claude Vision API) ↓ hardware_page_extractions ← STAGING (AI output lives here) ↓ Human Review & Correction ← Frontend card-by-card review ↓ Affirm (click) ← BOUNDARY CROSSING ↓ affirm_audit_log ← WHO affirmed WHAT, WHEN, with SNAPSHOT ↓ hardware_sets + hardware_components ← PRODUCTION (only affirmed data) ↓ takeoff_line_items ← Auto-materialized from affirmed sets ↓ takeoff_quotes ← Immutable quote snapshots
Seven Affirm Levels
The affirmation system operates at seven granularity levels, each independently logged:
| Level | What's Affirmed | What Happens |
|---|---|---|
| L1: Field | Individual data points | Field-level corrections applied |
| L2: Component | Line items within a group | Component data validated |
| L3: Group | Hardware group (triggers materialization) | Creates hardware_sets + hardware_components in production tables. Auto-pricing fires. Line items auto-materialize. |
| L4: Region | Detected document areas | Schedule region locked for extraction |
| L5: Entity | Project metadata | Project data locked. Smart reset if data changes post-affirm. |
| L6: Reference | Product/cut sheet mappings | Cut sheet association locked to component |
| L7: Template | Output document templates | Quote/submittal template locked for generation |
Every affirm action writes to affirm_audit_log with: entity_type, entity_id, action (affirm/unaffirm), user_id, user_email, reason, and a full entity_snapshot JSON. The snapshot captures the EXACT data at the moment of affirmation. If someone asks "who approved this quote with these numbers?" — the answer is in the log, with the data as it appeared when they clicked.
This isn't a feature. It's the architectural foundation that makes Consenta possible — because consent-based services require auditable consent.
Smart Reset
If affirmed data changes after affirmation (e.g., a re-extraction updates the underlying data), the affirm status resets automatically. No stale sign-offs. The dual-card interlock (ticket 26L) preserves the old extraction alongside the new one until the user explicitly chooses.
User Sovereignty Schema (Planned)
The affirmation boundary operates at the data level. The sovereignty schema (ticket CH-2026-0129-SOVEREIGNTY-001) extends consent to the access level:
Constitutional pattern: 10th Amendment — "Access not explicitly granted remains with the user." Enterprise SaaS assumes organizational ownership. This platform assumes user sovereignty. The distinction is legally defensible and marketable.
Three proposed tables establish the foundation:
| Table | Purpose | Status |
|---|---|---|
user_spheres | Sovereignty declaration — user owns their sphere | PENDING |
access_grants | Explicit consent grants with grantor, grantee, scope, time bounds, revocation, contract reference | PENDING |
consent_events | Audit trail of every grant lifecycle event | PENDING |
Default denial pattern: no grant = no access. Payment as consent mechanism (Stripe contract_reference). Revocable, time-bounded, auditable.
Mutual Consent & Shared Spaces
From the Platform Architecture Spec v2.0: all cross-organization data sharing requires bilateral agreement. Three visibility layers:
PRIVATE SPACE Only users within the owning tenant ↓ (explicit share offer + acceptance) MUTUAL SPACE Two+ tenants with bilateral consent ↓ (public flag, rare) Neither sees the other's PRIVATE content PUBLIC SPACE Anyone with link (no consent required)
Shared spaces (shared_spaces + shared_space_members) track bilateral consent with full lifecycle: pending → accepted → active, or declined/withdrawn. Either party can withdraw without destroying the other's data.
Platform Northstar
MHS operates a B2B platform serving businesses across verticals. SubmittalExpress is the first product, skinned for construction. Consenta is the second, skinned for CRM/ERP. The platform underneath is the same — and it's 70% domain-agnostic.
Corporate Structure
MHS (Holding Company)
├── WeylandAI (Construction Vertical)
│ ├── SubmittalExpress (Product — PAD customer)
│ └── TakeoffExpress (Product — quoting engine)
│
├── Consenta (CRM/ERP — Agreement Lifecycle)
│ ├── Consenta Docs ("Upload anything. Extract everything. Trust what matters.")
│ ├── Consenta Flow ("Approval workflows that remember everything.")
│ ├── Consenta Forge ("Beautiful documents from messy data.")
│ └── Consenta CRM ("The CRM that reads your documents.")
│
└── Quanticfork (RIA/Financial Vertical — future)
The Platform Audit: 70/15/15
An audit of weyland-worker.js (~14K lines) revealed the portability split:
| Tier | % | What | Examples |
|---|---|---|---|
| Direct carry | 70% | Auth, projects, quotes, line items, audit log, templates, telemetry | users, projects, vendor_profile, takeoff_quotes, affirm_audit_log |
| Rename only | 15% | Generic patterns with construction names | hardware_extraction_sessions → document_sessions, submittals → deliverables |
| Domain replace | 15% | Construction-specific entities → CRM entities | door_schedule_entries → extracted_entries, hardware_sets → entity_groups |
Consenta: The Fork in Practice
Consenta (consenta.cc) is a 1:1 fork of Weyland with the construction skin replaced. Phase 1 (generic rename) is COMPLETE. Phase 2 (registry swap) is COMPLETE.
| Weyland Name | Consenta Name | What It Actually Is |
|---|---|---|
| HGSE | AIDE (AI Document Extraction) | Upload → detect → affirm → extract → review → materialize |
| Affirmation Boundary | Consent Engine | Seven-level hierarchical approval with immutable audit |
| Submittal Assembler | Document Forge | Affirmed data + templates → branded PDF packages |
| CPS + Cut Sheet Discovery | Knowledge Base | FTS5 reference library with AI-powered matching |
| Takeoff Express | Pricing & Quoting Engine | Multi-source line items → grouped pricing → branded quotes |
| 3-tier constraints | Enterprise Platform Layer | Global → industry → tenant configuration without code |
Consenta (Latin "consentire" — to agree, to feel together). The Weyland platform's core innovation IS the affirmation boundary — nothing enters production until a human consents. This isn't a construction feature being dragged into CRM. It's a trust substrate that happens to be skinned as construction. Consenta makes the trust substrate the PRODUCT.
Viral Growth Model
The freemium sub-tenant model creates network effects:
PAD (paying) ——shares submittal——> Smith GC (freemium sub-tenant)
↓
experiences value
↓
CONVERTS to paying
↓
Smith GC (paying) —shares submittal—> Jones Masonry (freemium sub-tenant)
↓
[repeats]
Sub-tenants convert to peer clients with operational continuity — shared spaces and consent records persist across conversion. The business relationship encoded in data survives the commercial relationship change.
10 CRM Document Types (Consenta Phase 2)
The SCHEDULE_TYPE_REGISTRY pattern extends from 3 construction types to 10 CRM document types with zero extraction engine changes:
| Document Type | Extracted Fields | Status |
|---|---|---|
| invoice | vendor, amounts, line items, dates, payment terms | DEPLOYED |
| purchase_order | buyer, seller, items, quantities, delivery | DEPLOYED |
| contract | parties, terms, obligations, dates, amounts | DEPLOYED |
| receipt | merchant, items, total, date, payment method | DEPLOYED |
| resume | name, contact, skills, experience, education | DEPLOYED |
| business_card | name, title, company, phone, email, address | DEPLOYED |
| compliance_cert | issuer, holder, type, dates, scope | DEPLOYED |
| tax_form | form type, TIN, amounts, period, filing status | DEPLOYED |
| shipping_manifest | origin, destination, items, weights, carrier | DEPLOYED |
| insurance_policy | insurer, holder, coverage, limits, dates | DEPLOYED |
Each type uses the same constraint-driven extraction via extractWithConstraints, with 87 prompt_specifications rows defining the fields per document type. The extraction engine, affirmation boundary, materialization bridge, and document assembly pipeline are all reused without modification.
Tactical Vision
"We built a telescope lens by lens, but never assembled the tube — the light never reaches the eyepiece." The pipeline works. The plumbing works. The critical path is connecting the data flows end-to-end for a customer demo.
What's Deployed Today
| Capability | Ticket | Status |
|---|---|---|
| Project data model with billing metadata | 26G | DEPLOYED |
| Vendor profile with branding | 26H | DEPLOYED |
| Hardware set page generator | 26I | DEPLOYED |
| Quote persistence + server-side PDF + line item edit | 26J | DEPLOYED |
| Materialization bridge (DSE → hardware sets) | 26K | DEPLOYED |
| Re-extraction with dual-card interlock | 26L | DEPLOYED |
| Card-level re-extraction via region pipeline | 26M | DEPLOYED |
| Quote HTML engine (CAPT pivot from pdf-lib) | 26N | DEPLOYED |
| Shareable quote links with access tokens | 26O | DEPLOYED |
| Project-scoped takeoff (cross-session aggregation) | Pipeline Phase 1 | DEPLOYED |
| Bulk session link + backfill bridge + auto-link | Pipeline Phase 2 | DEPLOYED |
| Project-scoped quote generation | Pipeline Phase 3 | DEPLOYED |
| CPS price-informed takeoff with pricing options | CPS Integration | DEPLOYED |
The Pipeline End-to-End
UPLOAD User uploads hardware schedule PDF ↓ DETECT System detects schedule pages, creates extraction session ↓ (auto-links to project if name matches) EXTRACT Claude Vision API extracts per-page data using resolved constraints ↓ REVIEW Frontend shows card-by-card extraction results ↓ (re-extract individual cards if needed — 26M) AFFIRM User affirms groups → materializes to hardware_sets/components ↓ (audit logged with entity snapshots) DISCOVER CPS catalogue + web search finds cut sheet PDFs ↓ (user affirms or rejects matches) PRICE Auto-pricing from CPS catalogue, user overrides, per-set pricing ↓ QUOTE Generate branded HTML quote → headless Chrome → PDF ↓ (shareable link with access token — 26O) ASSEMBLE Submittal package: cover + TOC + schedule + set pages + cut sheets ↓ DELIVER R2-stored PDF, download endpoint, quote history
Revenue Context
PAD paid $15K for the buildout. Customers lined up for monthly seat demos. $2K/month/seat validated by Andrew (PAD). The critical path is demo-ready: a customer uploads a hardware schedule, the system extracts → affirms → prices → generates a branded quote with cut sheets, all in one flow.
Production Data Status
| Project | Sessions | DSE Entries | HW Sets | Quotes | Status |
|---|---|---|---|---|---|
| One Camino Real (OCC) | 9 linked | 42 | 17 (with pricing) | #3 ($18,787), #8 ($24,787 project-scope) | Demo-ready |
| Kaiser Sunset | 47 (35 door, 7 hardware) | 0 (never persisted) | 2 skeletal | #1, #2 ($0) | Needs session linking + backfill |
What Ships Next
| Item | Why | Status |
|---|---|---|
| Extraction flow unification (Family 29) | Three extraction paths (A, B, C) need unified entry point and consistent behavior | SPECIFIED |
| Orphan merge (29B) | Cross-page continuation detection — hardware groups split across pages | SPECIFIED |
| Kaiser data recovery | 47 sessions with extraction data that never hit door_schedule_entries | Needs investigation |
| Stripe integration | Payment as consent mechanism, sub-tenant conversion trigger | PENDING |
| Sovereignty schema foundation | user_spheres + access_grants + consent_events tables | PENDING |
Infrastructure
| Component | Service | Details |
|---|---|---|
| Worker | Cloudflare Workers | weyland-worker.js (~14K lines), routes at api.weylandai.com |
| Frontend | Cloudflare Pages | subx.html (~1.1MB SPA), serves at subx.weylandai.com |
| Database | Cloudflare D1 | weyland_db, 88+ tables, 42 migrations applied |
| Storage | Cloudflare R2 | Uploaded PDFs, generated submittals, quote PDFs, region renders |
| Browser | Cloudflare Browser Rendering | Headless Chromium for quote HTML → PDF conversion |
| Queue | Cloudflare Queues | Cut sheet discovery queue consumer |
| AI | Anthropic API (Claude) | Vision API for document extraction, constraint-driven prompts |
| Dev tooling | HASCOM | 17 providers, 166 capabilities, 3,637 symbols, source reference |
Source Documents
| Document | Location | Covers |
|---|---|---|
| Platform Architecture Spec v2.0 | MHS_PLATFORM_ARCHITECTURE_SPEC_v2.md (49KB) | Entity model, mutual consent, shared spaces, subscription tiers, compliance, viral growth |
| Consenta Platform Brainstorm | CE_NOTE_2026-0210_ConsentaPlatformBrainstorm.json | Engine abstraction map, product lines, fork plan, 10 CRM document types |
| Sovereignty Schema Foundation | CH-2026-0129-SOVEREIGNTY-001 | user_spheres, access_grants, consent_events, default denial, payment as consent |
| Pipeline Unification Strategy | CE_NOTE_2026-0209_TakeoffPipelineUnification.json | Three disconnected data streams, project-scoped aggregation, 4-phase execution |
| Extraction Pipeline Source | weylandai.com/handoff-product-source | All 5 pipeline files with exports and flow diagram |
| HASCOM Toolkit Source | weylandai.com/handoff-hascom-source | Developer tooling: core.py, analyzers.py, UAT framework |
| Infrastructure Config | weylandai.com/handoff-infrastructure | wrangler.toml, deployment routing, Cloudflare bindings |