Phase 1 used synthetic correlated latents — vectors designed to have structure. The honest question was whether quantum advantage survives on real pretrained representations, which are richer and harder to predict. Phase 2 answers that with frozen DistilBERT embeddings from text8.
The result: quantum advantage grew. +72.9% at n=100, up from +63.9% in Phase 1.
Architecture
The full pipeline is:
The quantum circuit and two projection layers are the only trainable components — ~60K parameters regardless of the base model size.
The encoder runs once per dataset and caches results — 60,000 text8 chunks encoded in ~19 minutes, stored as a 768-dim tensor. All subsequent training reads from disk.
Sanity Check Result
n=100, 50 epochs, 8 qubits (256-dim bottleneck):
| | Classical | Quantum | Δ | |--|:---------:|:-------:|:-:| | val loss | 0.00154 | 0.00042 | +72.9% ▲ |
Figure 1 — Validation loss over 50 epochs (n=100, 8 qubits, real DistilBERT embeddings).
Figure 2 — Quantum advantage (%) across phases. The advantage grew when moving from synthetic latents to real text representations.
What This Tells Us
Three things stand out from the Phase 2 sanity check.
The advantage grew. +72.9% on real DistilBERT embeddings vs +63.9% on synthetic latents. The original concern was that synthetic latents were too simple — purpose-built to have correlations that quantum could exploit. Real transformer representations are richer and more complex. The fact that quantum advantage increased is the opposite of what a skeptic would expect.
Classical overfits harder on real representations. Classical epoch-1 val loss is 0.02086, final val loss is 0.00154 — a 13.6× drop that still leaves a 7× gap between train (0.00022) and val (0.00154). Real DistilBERT embeddings have more structure to memorize. Quantum doesn't have this problem: it ends at 0.00042 with a train/val gap of only 1.6×.
Quantum starts below classical's final value. By epoch 10, quantum val loss is 0.00063 — already better than classical ever gets (0.00154). The circuit structure provides a prior that's immediately well-calibrated to real language representations.
What's Next
This is a sanity check at n=100, 50 epochs. The full Phase 2 experiment runs the same n_train sweep as Phase 1 — [100, 500, 1000, 5000] — to find where the crossover lands on real text.
Phase 1 crossover was at ~n=200 on synthetic latents. The hypothesis is that richer real representations push the crossover right — potentially to n=1K-5K, which is squarely in the range of real low-resource NLP tasks. That's what gets run next.