DARPA's CyPhER Forge Wants to Cut Defense Test Points by 10x with a Real-Time Digital Twin and an AI Test Agent. Oral Proposals Are Due June 15.
May 23, 2026 · 8 min read
Arthur Griffin
The bottleneck that DARPA's Tactical Technology Office is trying to break with the CyPhER Forge program — formally Cyber Physical Systems Executing in Real Time, solicitation DARPA-PS-26-04 — is one of the least-discussed and most economically consequential constraints on U.S. defense modernization. Modern combat platforms with adaptive flight controls, autonomous behaviors, or AI-enabled mission systems require an order of magnitude more test points to qualify than legacy platforms with the same physical envelope. The added test burden is not because the new systems are unreliable. It is because their behavior depends on the interaction between physical dynamics, software state, and operational context in ways that traditional design-of-experiments methods do not efficiently sample.
The downstream consequence is that warfighter access to advanced platforms is throttled by test schedule, not engineering progress. CyPhER Forge is DARPA's attempt to break that throttle by replacing the test campaign itself with a real-time digital twin coupled to an AI test agent — and by doing so on a procurement timeline tight enough to matter.
Oral proposal packages are due June 15, 2026. The solicitation was published February 25, 2026, with an abstract step in April. For aerospace, controls, AI/ML, and uncertainty-quantification teams that have the technical credibility to compete here, this is one of the most operationally consequential DARPA-TTO solicitations open in 2026 — and one of the few that pairs a hard digital-twin systems problem with a flight-test deliverable on a 36-month clock.
What CyPhER Forge Actually Is
The program is a three-phase structured acquisition. Phase 0 Backbone runs six months and is the systems-engineering and integration spine: the period in which performer teams stand up the digital-twin software architecture, the AI-test-agent control interface, the data assimilation pipeline, and the demonstration platform integration plan. Phase 1 Base runs twelve months and is where the technical work happens — the multi-physics-informed surrogate models, the uncertainty-quantification framework, the statistical-safety-guarantee machinery for the AI test agent, and the first end-to-end demonstrations against an instrumented test article. Phase 2 Option runs eighteen months and is the flight-test culmination: an accelerated flight-sciences test campaign on an instrumented, experimental aircraft, with the digital twin running in real time, the AI test agent planning and executing the test sequence, and statistical evidence that the resulting qualification dataset is equivalent to a conventional campaign requiring roughly ten times as many test points.
The total program length is 36 months. The structure is meaningful for two reasons. First, the Phase 2 flight-test deliverable forces the technical work to converge on a real system rather than a paper demonstration. Second, the option-exercise gate between Phase 1 and Phase 2 gives DARPA a clean off-ramp for performers whose technology does not mature on schedule — and gives performers a clean on-ramp to scale resources when their technology does.
The program manager is James Valpiani, sitting in TTO. That placement matters. CyPhER Forge could have been structured as an AI-research program out of DSO or I2O. By placing it in TTO, DARPA has signaled that the deliverable is not a research artifact — it is a working capability with a transition customer, and the success criterion is whether DoD test ranges adopt the resulting tooling.
The Real-Time Digital Twin Is the Hard Part
The program documentation describes two technical innovations. The first is a real-time digital twin built on three pillars: multi-physics-informed surrogate modeling, uncertainty quantification, and continuous data assimilation. Each of those phrases hides a substantial open research problem.
Multi-physics-informed surrogate modeling means building computationally cheap stand-ins for first-principles physics simulations — aerodynamics, structures, propulsion, controls — that are accurate enough to drive real-time decision-making but fast enough to run inside a test loop. Modern surrogate techniques (neural operators, Gaussian-process regression, reduced-order models) can hit either accuracy or speed, but the combinations that hit both at the fidelity needed for flight-qualification work are still an open problem. Whatever the winning teams propose for this layer is going to look like an architectural commitment, not a single algorithm.
Uncertainty quantification means rigorously characterizing not just the surrogate model's predictions but the confidence intervals around those predictions, propagating those intervals through the test-planning and decision-making layers, and producing statistical-safety guarantees the AI test agent can actually use. The UQ literature in machine learning has matured significantly over the past five years — conformal prediction, Bayesian deep learning, calibrated ensembles — but the integration of these techniques into a real-time test-control loop with formal safety properties is not a solved engineering problem.
Continuous data assimilation means updating the digital twin's state in real time as test data streams in from the physical system, in the Kalman-filter tradition but for high-dimensional nonlinear models. This is the layer that closes the loop between the physical aircraft and its digital counterpart. It is also the layer most likely to break under operational conditions, because real test data is noisy, intermittent, and arrives at rates that the assimilation pipeline must match without producing instability.
Teams that can credibly cover all three pillars in a single proposal are rare. Most strong proposals are likely to be consortium structures with a lead integrator and one or two specialist subcontractors filling in gaps. The non-traditional-defense-contractor pathway is open — the program is structured as an Other Transaction acquisition — but the depth of expertise required is non-trivial.
The AI Test Agent Is the Reason the Test-Point Count Drops
The second technical pillar is the AI test agent itself. Its job is to look at the current state of the digital twin, identify the test points most informative about regions of the operating envelope where uncertainty is highest, plan the next physical test, execute it with machine precision, and update the twin with the resulting data — then repeat. The agent provides "statistical safety guarantees" — meaning it is constrained by the UQ framework not to plan tests outside the validated region of the surrogate model, and it must justify each planned test point with the marginal information gain expected from running it.
This is the layer that drops the test-point count. A traditional design-of-experiments campaign sweeps the operating envelope according to a pre-planned matrix of test conditions, most of which produce confirmatory data rather than novel information. An adaptive agent that selects the next test based on real-time uncertainty mapping should — in principle — concentrate the campaign on the regions of the envelope where data is actually missing, and skip the regions where the digital twin already has high-confidence predictions.
The "order of magnitude" reduction in test points is the program's headline ambition. Even hitting a 3x or 4x reduction on a real flight-test campaign would be a meaningful win for the Department of Defense's test infrastructure. Hitting 10x would qualitatively change the economics of fielding adaptive and AI-enabled combat systems.
Eligibility, Structure, and the OT Pathway
The solicitation is structured as a Program Solicitation under DARPA's Other Transaction authority for prototype agreements, similar in posture to other recent TTO procurements. Performer eligibility is broad: traditional defense contractors, non-traditional defense contractors, academic institutions, small businesses, and consortia of the above are all eligible. The OT pathway means the awards do not flow through standard FAR contracting and the deliverable structure is more flexible than under a conventional cost-plus or fixed-price contract.
That flexibility comes with a corresponding burden on performer organizations. OT agreements are negotiated documents; the cost share, IP, deliverable, and milestone terms are not boilerplate. Performers new to DARPA OTs should expect a multi-week post-selection negotiation, and the time-to-first-dollar can extend several months past award notification. Universities in particular should engage their sponsored-programs offices early; small businesses without prior DARPA experience may want to consider partnering with a prime integrator that has navigated the OT pathway before.
The application mechanism for CyPhER Forge is unusual: oral proposal packages, not the conventional written proposal. Performer teams pitch their technical approach in a structured oral presentation to DARPA evaluators, with supporting documentation. This format favors teams that have done the technical work in advance and can articulate the approach concisely under questioning — it is much less forgiving of teams that intend to write themselves into clarity during the proposal-writing process.
The oral-package deadline is June 15, 2026 at the time specified in the solicitation. The pitch itself is scheduled into the evaluation window that follows. Teams that have not already started building their pitch are running short on time.
Where This Fits in the Broader DoD AI Test Posture
CyPhER Forge is the most visible piece of a larger DoD push to modernize the test-and-evaluation infrastructure that underwrites every weapons-system procurement decision. The Office of the Under Secretary of Defense for Research and Engineering has emphasized digital-twin-enabled T&E in multiple recent strategy documents. The Air Force Test Center, the Naval Air Warfare Center, and the Army Combat Capabilities Development Command all run their own T&E modernization programs. CyPhER Forge sits upstream of all of them — the goal is to produce middleware that the operating commands can adopt rather than to deliver a system to any single one.
For teams trying to read the DoD AI procurement landscape, CyPhER Forge is also a useful counterpoint to the DARPA MATHBAC mathematical-foundations program and the application-driven AI programs out of I2O. DSO is funding the theory of agentic AI. TTO is funding the tooling that will determine whether AI-enabled combat platforms can be fielded at all. Both bets need to pay off for the broader DoD AI thesis to work.
The transition pathway is the part of the program that should reassure performers about long-term funding sustainability. CyPhER Forge is being structured with explicit attention to the question of who picks up the resulting capability after DARPA's involvement ends. The flight-test demonstration in Phase 2 is not theater — it is the artifact that lets a DoD test range say "yes, this works, we will adopt it." For performers building Phase 3 commercialization or transition plans, the relevant downstream customers are the service test centers and the major defense system integrators whose internal T&E processes the new tooling would augment.
What Strong Proposals Will Look Like
A competitive CyPhER Forge proposal will demonstrate three things the casual reader of the solicitation might miss. First, that the team has thought carefully about the integration architecture between the surrogate-model, UQ, and assimilation layers — because the strongest individual components do not necessarily integrate into the strongest system. Second, that the team has identified a credible demonstration platform partner and has a concrete plan for what the Phase 2 flight-test campaign actually looks like — because the program manager will be reading every proposal against the question of whether it ends in flight rather than slideware. Third, that the team has internalized the statistical-safety-guarantee requirement and can articulate what its AI test agent will not do — because the failure mode DARPA is most worried about is an over-eager agent that schedules a test point outside the validated envelope and breaks an experimental aircraft.
For aerospace primes, the question is whether to lead or to partner. For mid-size AI/ML firms with expertise in surrogate modeling, UQ, or active learning, the question is which prime to attach to and whether to also submit a standalone proposal. For academic teams with deep expertise in any of the three real-time-twin pillars, the question is whether to pitch into a consortium or to wait for the SBIR/STTR pathways that often follow major DARPA programs.
The June 15 deadline is real and the program is well-funded enough that early-stage teams who fit the technical profile should treat the next three weeks as a forced-march pitch-development window. The work CyPhER Forge will fund over the next 36 months is the kind of work that, if it succeeds, shows up in every adaptive-controls or AI-enabled platform DoD fields for the following decade. The teams in the program will be the ones building that tooling. The teams that miss this solicitation will be the ones integrating it.