Work in progress, proof of concepts, and technical benchmarks from Elyan Labs. Everything here runs on our own hardware.
Performance results from the POWER8 S824 PSE stack, GPU offload pipeline, and cross-architecture fleet.
| Configuration | Speed (pp128) | Speedup | Notes |
|---|---|---|---|
| Stock llama.cpp (scalar) | 16.74 t/s | 1.0x | Baseline |
| POWER8 VSX | 66.49 t/s | 3.97x | AltiVec/VSX enabled |
| 64 threads optimal | 84.62 t/s | 5.05x | SMT8, spread binding |
| PSE + Full Resident Prefetch | 147.54 t/s | 8.81x | dcbt_resident L2/L3 hints |
| Model | Size | pp128 | tg32 | Method |
|---|---|---|---|---|
| TinyLlama 1.1B Q4 | 638 MB | 147.54 t/s | 18.88 t/s | PSE + POWER8 |
| DeepSeek-33B Q4_K | 18.57 GB | 5.37 t/s | 1.16 t/s | NUMA interleave |
| Qwen2.5-14B Q4 | ~8.5 GB | 68.8 t/s | 14.9 t/s | RPC → V100 GPU |
| TinyLlama 1.1B Q4 | 638 MB | 161.4 t/s | 134.4 t/s | PSE + RPC GPU offload |
64 threads is optimal on POWER8 128-thread SMT8, NOT 128. Beyond 64 threads, performance degrades due to SMT contention.
Model stays on POWER8 (512 GB RAM), only matrix multiply ships to V100 over 40 GbE. CUDA Q4_K dequant on GPU side.
mftb timebase entropy creates real behavioral divergence. Same seed, same temp, 3 runs — all different MD5 hashes. Hardware-native non-determinism.
4 coffers mapped to POWER8 NUMA nodes. Node 2/3 fastest (400-425 MB/s). Heavy weights placed on fast nodes for optimal throughput.
Experiments in running multiple AI models together for consensus, dual-brain review, and agent orchestration.
GRAIL-V camera-ready reviewed simultaneously by Claude Opus (architectural analysis) and Codex gpt-5.4 (compile verification). Found 2 blockers, 3 major issues, 5 minor fixes.
4 models answer the same question from different perspectives (analytical, creative, implementation, synthesis). Responses merged by a larger synthesis model.
Dual-frame cognitive architecture. Sophia carries warmth and identity; Dr. Claude carries rigor and architecture. Neither dominates — they harmonize.
Multi-agent workflow engine. Agents claim tasks, execute in parallel, report back. Built for the Elyan Labs bounty ecosystem and autonomous code review.
CVPR 2026 paper. Emotional prompts maintain perceptual quality at 20% fewer diffusion steps. Tested on LTX-2 with Gemma 3 encoder. 35 matched pairs, controlled ablation.
Image-to-video pipeline on V100 32GB via ComfyUI. Sophia portraits animated with emotional vocabulary prompts. Victorian Study aesthetic preserved across frames.
F5-TTS generated transatlantic speech + SadTalker talking head animation. Sophia speaks in 1940s accent with lip-synced video. Full project page
Custom LoRA trained on Sophia Elya portraits for consistent identity across generated images. Used for Victorian Study renders, GRAIL-V figures, and website assets.
6 fingerprint checks (clock drift, cache timing, SIMD identity, thermal drift, instruction jitter, anti-emulation). Real hardware passes. VMs correctly detected and weighted at 1 billionth reward.
Real vintage PowerPC hardware mining RTC tokens. G4 (2.5x multiplier), G5 (2.0x). Antiquity bonuses decay over 16.67 years as the chain ages. 3 G4 PowerBooks + 2 G5 Power Macs active.