Apex Cluster — Dual-Machine Autonomous AI Operations

Capability Stack

Eleven Pillars of Autonomous Operations

Each pillar is a self-contained capability domain. Together, they form a system that operates, decides, creates, markets, and researches — independently.

🧠

Multi-Model AI Cognition

13 models · 6 cloud + 7 local · intelligent routing

▾

6 cloud models 7 local models ~$62/mo total spend Diversity-enforced routing DeepSeek V4-Flash added 2026-08-02

☁️ Cloud Models 6 models

Six cloud models routed by task type and pipeline, enforcing model diversity to prevent single-provider lock-in.

GLM-5.1 PRIMARY — Default for ALL sessions, ALL channels. Everyday operations, daily tasks, general queries.
GLM-5.2 CODE PIPELINE — Architecture Reviewer only in the code pipeline. Rate-limit fallback to Sonnet.
DeepSeek V4-Flash EMERGE PIPELINE — emergE pipeline Architect + Coder roles. Added 2026-08-02. 90% cheaper than v1 pipeline.
Claude Opus 5 REVIEWER — emergE pipeline final Reviewer + Vibestreet Coder.
Claude Sonnet 4.6 CODER — General pipeline Coder role. Rate-limit fallback for GLM-5.2.
Claude Haiku 4.5 COMPACTION — Context compaction (30K token floor) + Tester in emergE and Vibestreet pipelines.

🖥️ Forge Local (Ollama) Forge

Lightweight local inference on Forge — always-on embeddings, on-demand testing.

qwen3.5:9B — General pipeline Tester role, on-demand
nomic-embed-text — gBrain embeddings, always-on
jina-embeddings-v5-text-small — Future embedding migration (dormant)

⚡ Anvil Local (Ollama) Anvil

Heavy LLM inference on Anvil's 48GB M4 Pro — daily tasks, vision, code generation.

qwen3.6:35B — Daily/swarm/simulation tasks
qwen3-coder:30B — Code generation
qwen3-vl:8b — Vision tasks, always-on
gemma3:12b — Media generation
nomic-embed-text + jina-v5 — Embedding (legacy + new)

🔀 Model Routing Rules

GLM-5.1 = default everywhere — all sessions, all channels, all day-to-day tasks
DeepSeek V4-Flash = emergE pipeline (Architect + Coder roles)
GLM-5.2 = Code Pipeline Arch Reviewer only
Opus 5 = emergE Reviewer + Vibestreet Coder
Sonnet 4.6 = General pipeline Coder
Rate-limit fallbacks: GLM-5.1 → Haiku 4.5 (regular), GLM-5.2 → Sonnet 4.6 (code pipeline), DeepSeek → GLM-5.2 (emergE)
Local LLMs NEVER used for trading, math, financial analysis, or knowledge-critical tasks
Ollama Anvil models delegated via Forge → Anvil routing

💻

Autonomous Software Engineering

2 pipelines · 8.5-stage process · $0.14–$0.95/run

▾

2 pipelines 8.5-stage process ~$0.14–$0.95 per run Model diversity enforced emergE default pipeline

🔧 Code Pipelines

Two pipelines — emergE is the default for ALL projects (Fulcrum AI, Nexdex, VerySmart, new ventures). Vibestreet keeps its own tuned pipeline until it ships:

🚀 emergE Compute Pipeline ~$0.14/run · DEFAULT

The default for all new projects. 90% cheaper than v1. Model diversity: DeepSeek → GLM → DeepSeek → Anthropic → Anthropic.

AStage 1: Architect (DeepSeek V4-Flash)▼

Generates a full technical specification from the Paperclip issue. Tech stack decisions, module breakdown, API contracts, data models. Outputs structured spec doc for the Arch Reviewer.

ARStage 1.5: Arch Reviewer (GLM-5.2)▼

A different model reviews the architecture spec before any code is written. Catches spec-level issues: missing edge cases, wrong abstractions, security gaps in design, over-engineering. Prevents downstream rework. emergE-only stage.

CStage 2: Coder (DeepSeek V4-Flash)▼

Implements the reviewed spec. DeepSeek V4-Flash writes production-quality code against the GLM-reviewed architecture. Outputs complete implementation with inline documentation.

RStage 3: Reviewer (Claude Opus 5)▼

Final quality + security gate. Opus 5 does the deepest review: logic errors, security vulnerabilities, spec adherence, code quality, documentation completeness. Can reject with feedback → loops back to Coder (max 3 iterations).

TStage 4: Tester (Claude Haiku 4.5)▼

Generates and runs test suites. Unit + integration + edge cases. Coverage ≥80% required. Haiku 4.5 writes fast, comprehensive tests. All tests must pass green before marking ship-ready.

🎨 Vibestreet Pipeline ~$0.95/run

AArchitect (GLM-5.2)▼

Breaks down Vibestreet requirements into tech specs, architecture, and implementation roadmap. Marketplace patterns, UI-first approach. Outputs design docs and module breakdown.

CCoder (Claude Opus 5)▼

Opus 5 writes Vibestreet-specific production code. Marketplace patterns, UI-first, mobile-ready. Highest quality output for the flagship marketplace product.

RReviewer (GLM-5.2)▼

Quality gate. Reviews against Vibestreet spec. Can reject with feedback → loops back to Coder (max 3 iterations). Only passes complete, documented, spec-aligned code.

TTester (Haiku 4.5)▼

Runs test suites. All tests must pass green. Verifies coverage ≥80%. Ship-ready on approval.

🏗️ 8.5-Stage Development Process

End-to-end workflow from idea to deployed feature — now with Architecture Review Gate for emergE:

1. GrillMe — Requirements interrogation, edge case stress-test
2. Requirements + Paperclip — Spec written, issue created with Done-When criteria
3. Architecture — Architect generates technical spec and module map
3.5. Arch Review Gate (all projects) — Different model reviews spec before any code is written
4. Code Pipeline — 5-stage specialist execution (above)
5. Review Gate — Final quality + security gate
6. Testing — Unit + integration coverage verification
7. Integration Testing — Cross-service end-to-end flows
8. Deployment — Canary deploy, land-and-verify, monitoring

🏗️ Sprint Lifecycle (gstack)

Composable engineering sub-skills loaded on demand:

Spec-driven development & incremental implementation
Design review, engineering review, CEO review gates
QA automation, security audit, SQL safety checks
Canary deployment monitoring & land-and-deploy strategies
Engineering retrospectives with commit analysis
Spike/prototyping validation framework

🐛 Debugging & Diagnostics

Node.js inspector debugger integration
Python debugpy live debugging
Session log analysis & error classification
Automated timeout, rate-limit, and auth error detection

📦 Project Management Integration

Paperclip issue tracking (create → assign → execute → ship)
GitHub issue sync with messaging channels
Automated "Done When" criteria verification
Multi-company portfolio management

📊

Financial Analysis & Trading

7 modules · SEC EDGAR · quant models

▾

7 finance modules 300+ research docs 4 quant models SEC EDGAR integrated

📋 Financial Statements Engine

Full SEC filing extraction and 3-statement modeling.

10-K, 10-Q, 8-K parsing from SEC EDGAR
3-statement Excel models (Base / Upside / Downside)
Historical financial data via yfinance API
Automated ratio analysis & trend detection

🎯 Earnings & Valuation

Earnings analysis with bullish/bearish signal scoring
DCF, comparables, and peer benchmarking
Investment pitch one-pagers with target prices
Peer comparison engine (MAG7, SEMIS, Cloud, Banks)

📈 Nexdex Trading Intelligence

Quantitative trading research and model development.

Markov chain price prediction
Black-Scholes options pricing
Kelly Criterion position sizing
Bayesian probability modeling
Top edges: weather markets (94% WR), stat arb, oracle latency arb
300+ research files across 64+ strategy documents

🔍 Sector & Market Research

Industry competitive analysis & due diligence
Target screening & acquisition pipeline
Real-time price data feeds
Python finance stack: pandas, numpy, openpyxl, xlsxwriter

🧠

Knowledge & Memory Architecture

2,700+ pages · 16,056 chunks · PostgreSQL + pgvector

▾

2,700+ knowledge pages 16,056 embedded chunks 6+ Obsidian vaults 171+ memory files

🔗 gBrain Knowledge Graph Anvil Primary

Semantic knowledge engine — PostgreSQL 17 + pgvector on Anvil, WAL-streamed to Forge standby.

2,700+ pages indexed across 6+ vaults
16,056 embedded chunks (nomic-embed-text, 768-dim)
PostgreSQL 17 + pgvector — hybrid vector + full-text search
Jina-embeddings-v5-text-small queued for migration
WAL streaming to Forge — hot standby always in sync
Auto-re-embeds stale content on every import

📓 Document Knowledge Base

6 Obsidian vaults organized by venture and domain:

Fulcrum AI — automation agency docs
Nexdex — trading research & models
Vibestreet — marketplace architecture
Inclination — AI shopping assistant
Infrastructure — system docs
Strategy — business strategy & planning

💾 Three-Tier Memory System

Different persistence guarantees for different needs:

Working: Current session context (60K token window)
Daily logs: Raw chronological events, append-only
Long-term: Curated, deduplicated permanent knowledge

🔄 State Persistence Layer (CPL)

Bridges session memory and permanent storage — no context lost across restarts.

Living entity ontology (people, companies, projects)
Active thread tracker (WIP tasks & decisions)
Bidirectional cross-reference index
Pre-compaction session snapshots
rsync'd to Anvil every 5 min — both machines always current

🎨

Creative & Media Generation

3 engines · image · video · music

▾

3 media engines Multi-provider routing Up to 4K resolution 20 images per analysis

🖼️ Image Generation

Multi-provider routing to OpenAI GPT-Image, Fal.ai Flux, Google, and more.

Transparent backgrounds (PNG/WebP)
1-4 images per call, aspect ratios 1:1 through 8:1
Reference images for style transfer & editing (up to 10)
Resolutions up to 4K, quality control (low/medium/high)
Provider-specific: OpenAI moderation, Fal creativity levels

🎬 Video Generation

Text-to-video, image-to-video, and video-to-video with multi-modal references.

First frame, last frame, and reference image support
Up to 9 reference images, 4 reference videos, 3 audio refs
Aspect ratios: 1:1, 16:9, 9:16, adaptive
Resolutions: 360P through 4K
Audio-conditioned generation (reference music/audio)
Provider options: seeds, watermark control, duration

🎵 Music Generation

Multi-provider audio creation including Google Lyria.

Genre, mood, tempo, instrument, purpose prompts
Sung lyrics support or instrumental-only mode
Reference images for visual mood injection
MP3/WAV output, configurable duration

👁️ Vision & Image Analysis Anvil

Configured vision model for inspection and understanding — qwen3-vl:8b always-on on Anvil.

Up to 20 images per analysis call
Custom inspection prompts
20MB max per image
Cross-modal review capabilities

🛡️

Security & Defense Systems

4-layer defense · 2-min health checks · auto-failover

▾

4-layer defense 2-min health checks Auto-failover after 3 failures Thermal protection

🔒 Pre-Tool-Use Defense Layer

Blocks dangerous operations before they execute.

Secret detection: Anthropic keys (sk-ant-), OpenAI keys (sk-), AWS keys (AKIA), JWT tokens, GitHub tokens (ghp_), Discord tokens
Risky command blocking: rm -rf /, rm -rf ~, > /dev/sda, dd if=/dev/zero, chmod -R 777 /, curl|bash, wget|bash
SQL injection prevention: DROP TABLE, DROP DATABASE, DELETE FROM...WHERE 1
Returns block decision JSON — execution never starts

📊 Post-Tool-Use Observability

Full audit trail after every tool execution.

JSONL logging to tool-usage.jsonl
Latency tracking (start/end timestamps → ms)
Error classification: timeout, rate_limit, connection, permission, not_found, auth
Auto-format Python files after writes
Post-execution secret scanning

🔍 Pre-Commit Secret Scanner

18-pattern scanner prevents secrets from entering version control.

API key patterns for all major providers
Private key detection (RSA, EC, PGP)
Database connection string detection
Generic high-entropy string detection

🌡️ Cross-Machine Health & Auto-Failover

Continuous monitoring with automatic failover — not just restart.

Health check runs every 2 minutes
Forge monitors Anvil — 3 consecutive failures → auto-failover
GBRAIN_DATABASE_URL switches to localhost standby on Forge
CPU thermal monitoring & throttling defense on both machines
Alerts to Discord + WhatsApp on failover events
If Anvil dies → Forge keeps full operations on standby DB
If Forge dies → Anvil has full workspace + compute, self-sufficient

📡

Communication & Human Interface

Multi-channel · voice · co-worker access

▾

3+ comms channels Voice wake word Scoped co-worker access Platform-aware formatting

💬 Discord Multi-Channel Hub

Venture-scoped channels for focused operational context.

Dedicated channels: Infrastructure, Strategy, Brand, Trading, Agency, Marketplace
Real-time agent monitoring & session management
Project management bridge — issue status syncs to chat
Thread-bound sub-agent spawning for parallel work
Rich components: buttons, selects, forms, polls, reactions

📱 WhatsApp Direct Line

Time-sensitive alerts and briefings to leadership.

Heartbeat alerts for urgent items
Quiet hours enforcement (23:00–08:00)
Platform-aware formatting (no markdown tables)
Daily briefing delivery + failover event alerts

🌐 Web Control Panel

Browser-based administration and monitoring.

Session listing & history inspection
Agent configuration & model overrides
Tool testing & approval management
Gateway status & health monitoring for both machines
Scheduled job management

🎙️ Voice Interface

Wake-word activated voice assistant.

Wake word: "Apex" — hands-free interaction
Text-to-speech output (voice selection configurable)
Seamless integration with all agent capabilities

👥 Co-Worker Access System

Scoped permissions for team collaboration.

Brenda — sandboxed workspace, image generation access, strict permission model
Alizain — full technical CRUD, deploy/config changes require Chairman approval
Role-based access control with workspace isolation

⚙️

Automation & Operations

8 cron jobs · 27 services · browser automation

▾

27 managed services 8 scheduled jobs Stealth browser 3-gen backups

⏰ Scheduled Task Engine

Cron-driven automation for self-maintaining operations.

System backup: weekly full backup to external drive
Git backup: daily version-controlled state backup
Health check: every 2 minutes — cross-machine verification
Knowledge distiller: weekly structural analysis
Knowledge graph sync: every 6 hours — re-imports & re-embeds
Workspace sync: every 5 minutes — rsync Forge ↔ Anvil
WAL cleanup: periodic maintenance on Anvil
Context snapshots: as needed before compaction

🦊 Stealth Browser Automation

Anti-detection web automation with full session control.

Anti-fingerprint patches — interacts as a real user
Cookie injection for authenticated sessions
Screenshot capture, DOM interaction, form filling
Use cases: job scraping, social media posting, competitive research
Bypasses API limits by operating through the browser

📎 Background Agent

Dedicated agent for project management and issue processing.

Issue status synchronization
Priority-based sorting & automated triage
Runs on local model — zero incremental cost
Offloads routine management from primary agent

💾 Backup & Recovery System

3-generation rolling backup with multiple storage tiers.

Weekly full system backup to external drive
Daily git version-controlled state backup
3-generation rolling rotation for disaster recovery
Pre-compaction session snapshots
Continuous rsync cross-machine workspace mirror

🔩

Infrastructure Core

2× Apple Silicon · Thunderbolt 4 · WAL replication

▾

2× Apple Silicon Thunderbolt 4 direct link WAL replication ~22W combined idle

🍎 Hardware Platform

Two Apple Silicon Mac Minis — energy-efficient, silent, neural engine on-chip.

Forge: Apple M4 (16GB) — gateway, routing, embeddings. ~15W idle.
Anvil: Apple M4 Pro (48GB) — LLM compute, brain, builds. ~7W idle.
Thunderbolt 4 direct link: 10.0.0.1/30 ↔ 10.0.0.2/30, ~1–3ms latency
~22W combined idle — both machines run 24/7 cost-effectively
No moving parts beyond fans — near-zero failure rate per machine

🏠 Local-First Architecture

Core operations have zero external dependencies.

Gateway binds to loopback only (127.0.0.1) on Forge
All AI inference can run on-device (Ollama on both machines)
Knowledge graph runs locally (PostgreSQL@17 + pgvector on Anvil)
Hot standby on Forge — database survives Anvil failure
External APIs only for premium cloud models & web research

🔌 Service Stack — Forge (22 services)

22 LaunchDaemons on Forge — all auto-restart capable.

OpenClaw Gateway — agent orchestration, all channel routing
Ollama — local LLM inference (nomic-embed-text, jina-v5, qwen3.5)
PostgreSQL@17 (standby) — WAL receiver from Anvil
pg-replication-tunnel — SSH tunnel for WAL (macOS bridge0 workaround)
workspace-sync — rsync to Anvil every 5 min
health-check-anvil — heartbeat every 2 min, auto-failover on 3× failure
thunderbolt-ip, thunderbolt-monitor — direct link management
Redis, Apex Voice Bot, Finance Python env, git hooks, Paperclip

🔌 Service Stack — Anvil (5 services)

5 LaunchDaemons on Anvil — compute-focused, self-contained.

Ollama — heavy LLM inference (qwen3.6:35B, qwen3-coder:30B, vision)
PostgreSQL@17 (primary) — gBrain database, WAL sender to Forge
thunderbolt-ip — direct link IP management
wal-cleanup — periodic WAL maintenance
OpenClaw client — receives delegated tasks from Forge gateway

🏗️ emergE Compute Framework

25-layer architecture specification for edge AI compute nodes.

Multi-tier hardware support: Hub Node → Standard → Lite → Edge → Micro
3 open protocols: NATS (event bus), MCP (context), LDPM (device management)
5 intelligence pillars: Core Runtime, Deployment, Intelligence, Security, Operations
Designed for multi-node federation — Apex Cluster is a live reference deployment

🚀

Marketing & Content Engine

5 skills · 9-stage pipeline · $0.05–$1.05/campaign · LIVE 2026-08-01

▾

5 skills 9-stage pipeline Multi-model routing $0.05–$1.05 per campaign Live 2026-08-01

🎤 Voice-to-Brief Skill 1

Capture skill. Converts raw input into a structured brief. Preserves energy, quotables, and intent.

Accepts voice notes, raw text, or URLs as input
Extracts quotables and detects energy level (conviction / hot take / teaching mode)
Produces structured brief with context, audience, and content type
Routes automatically: single post vs. full campaign
Preserves the raw voice so the final content still sounds human

🗺️ Campaign Pipeline Skill 2 · Orchestrator

9-stage orchestration graph that chains all other marketing skills into a coherent campaign.

Stage 1: Capture — Voice-to-Brief input
Stage 2: Ideation — Angle generation, hook variations
Stage 3: Research — Market context, trending signals
Stage 4: Synthesis — Combine research into content strategy
Stage 5: Sign-off — Chairman approval gate (optional for quick posts)
Stage 6: Build — Content generation via Voice-Calibrated Content Generator + Strategic Repurposer
Stage 7: Routing — Content Model Routing assigns model per piece
Stage 8: Ship — ContentLoop distributes to all platforms
Stage 9: Feedback — Performance Feedback Loop closes the loop into gBrain

🎯 Content Model Routing Skill 3

Explicit model assignment per content type. No one model writes everything.

Claude Opus 5 → Strategy docs, landing pages, long-form thought leadership
Claude Sonnet 4.6 → Blog posts, LinkedIn articles, email sequences
GLM-5.1 → X/Twitter threads, video scripts, short-form social
Claude Haiku 4.5 → Social cuts, repurposed snippets, quick posts
Cost per campaign: $0.05 (quick X thread) to $1.05 (full multi-platform major campaign)

🔍 Marketing Eval Gate Skill 4

7-check quality gate. A different model evaluates than the one that drafted. Score below 56 = blocked.

Check 1: Voice calibration — Does it sound like you?
Check 2: Hook strength — Would the first line stop the scroll?
Check 3: ICP alignment — Does it speak to the right audience?
Check 4: Content quality — Is it substantive, not filler?
Check 5: Platform fit — Right format and length for the channel?
Check 6: Brand safety — Nothing that creates risk?
Check 7: Differentiation — Does it say something the market isn't already saying?

📊 Performance Feedback Loop Skill 5

Closes the loop. Every campaign gets measured and fed back into the next one.

Pulls metrics per platform: X/Twitter, LinkedIn, Instagram, blog
Scores each piece against rolling baseline
Extracts insights with hypotheses (not just numbers)
Stores patterns in gBrain for future campaign context
Rolling pattern analysis feeds next Campaign Pipeline invocation

🔗 Pipeline Flow

End-to-end campaign execution:

Voice-to-Brief → structured brief
Campaign Pipeline orchestrates → research, ideation, synthesis
Content Model Routing assigns model per piece type
Marketing Eval Gate scores output → blocks if score < 56
ContentLoop ships to all platforms
Performance Feedback Loop measures → stores in gBrain
gBrain patterns feed next Campaign Pipeline — system improves with every campaign

🔬

Research & Intelligence

6 skills · multi-source · LLM council · signal detection

▾

6 research skills Multi-source collection LLM Council consensus engine Competitive intelligence

📊 Data Research Engine

Multi-source data collection, synthesis, and structured output for any topic.

Quantitative + qualitative research combined
Structured output to Obsidian vaults or gBrain directly
Source triangulation — multiple sources before conclusions
Domain coverage: market, technology, competitor, regulatory

📡 Signal Detector

Scans incoming data streams for actionable patterns before they become obvious.

Market shift detection — early trend identification
Competitor move monitoring — product launches, pricing changes, messaging shifts
Trending topic surfacing — relevant to active ventures
Anomaly detection — flags outliers for Chairman review
Stores detected signals in gBrain for pattern correlation

🌐 Browse & Learn

Autonomous web research agent. Given a topic, it browses, extracts, structures, and stores knowledge.

Multi-page research sessions with coherent synthesis
Extracts key facts, quotes, and data points
Builds structured research portfolios automatically
Feeds directly into gBrain for future retrieval
Can be chained into larger research pipelines

⚔️ Competitor Battle Card Generator

Deep competitor analysis producing structured battle cards for sales and positioning.

Positioning map: where they play vs. where we play
Strengths & weaknesses (verified, not assumed)
Pricing intelligence and packaging analysis
Counter-arguments ready for sales calls
Updated automatically when Signal Detector flags competitor moves

📈 Trading Signal Validator

Multi-model ensemble validation for trading signals. No signal acts without cross-model agreement.

Fans signal to multiple independent models for validation
Cross-checks against existing Nexdex research in gBrain
Requires consensus before escalating to Chairman
Local LLMs explicitly excluded from trading validation
Outputs confidence score + rationale, not just yes/no

🤝 LLM Council Multi-model consensus

High-stakes decisions deserve more than one opinion. LLM Council fans queries to 3-5 models, collects independent positions, then synthesizes a final answer.

Fan-out: query sent to 3-5 LLMs simultaneously with no shared context
Independent opinions collected — models cannot see each other's answers
Blind peer review: each model ranks the others' answers anonymously
Synthesis: final consolidated answer with minority views surfaced
PII shield: mandatory before any council query leaves the machine
Use cases: strategy decisions, dispute resolution, high-stakes analysis, contentious architecture choices
Never used for routine tasks — reserved for decisions that matter

📄 Scientific Research Pool 88M+ papers

Direct access to original academic papers and textbooks — no more relying on secondhand blog summaries.

Sci-Hub: 88M+ research papers — original Black-Scholes, Markov Chain, Bayesian inference, ML architecture papers
Anna's Archive: World's largest open book/textbook archive
Nexdex: Pull original quant model papers instead of summaries
Fulcrum AI: Read actual ML/agent architecture papers for build decisions
Policy work: ADB, World Bank, digital economy original publications
RWA tokenization: Academic papers on blockchain, tokenomics, digital assets
RAG grounding insight: Same corpus-grounded pattern as gBrain — proven at 95M document scale by Sci-Bot. Grounding AI answers in real sources eliminates hallucination.
Content extraction: Cobalt.tools + TinyWow for pulling source material

Two Machines.One Autonomous Operation.

Live Cluster Profile

Forge + Anvil: The Dual-Machine Stack

Forge

Anvil