Trainator gives downstream booking, corporate-travel and connection-routing systems a full probability distribution over how late a train will arrive — not a binary on-time / delayed flag.
Delay forecasts drawn directly into search results and trip details — no re-typing, no tab-switching.
A 12-minute transfer in Frankfurt or Zürich is the difference between a kept appointment and a missed one — and today the passenger only finds out which one they bought after they're already on the platform. Live feeds tell you whether a train is "late" once it already is. They do not tell you, at booking time, how late, how confident, and how to price the risk.
A missed connection turns into a 1-star review and a support ticket — even when the booking platform did nothing wrong. The brand wears the delay.
One late arrival to a meeting, wedding or flight is enough to push a high-value traveller back to the airline app for the next booking. Trust is hard to rebuild.
Missed connections trigger compensation payouts, free rebookings and live-agent support calls — costs that quietly compound across every itinerary the platform sells.
Because the forecast exists before the itinerary is committed, the platform gets to act on it — quietly re-routing, surfacing safer alternatives, or pricing the risk into a guarantee instead of absorbing it later.
Risky transfers get flagged or hidden at booking. The traveller arrives when they expected to — and credits the platform for the smooth trip, not the train operator.
Show the realistic arrival window — not just the schedule — on every itinerary. Transparent expectations are something competitors don't surface, and they're a reason to come back to your app for the next booking.
Steering bookings away from high-risk transfers shrinks the long tail of compensation, rebookings and live-agent calls — a margin improvement that compounds across every itinerary sold.
Rail delay data has a stubborn shape: a sharp spike at zero — most trains really do arrive exactly on time — followed by a heavy positive tail of occasional severe delays. No standard textbook distribution fits both halves at once. The Zero-Inflated q-Exponential (ZIqE) does, by construction: a point mass at zero handles the on-time spike, and a q-exponential tail handles the rare-but-large delays. The result is the closest match to the empirical reality of European rail we have been able to find — and the entire curve is described by just three parameters. Move the sliders to see how the shape changes.
Probability the train arrives exactly on time — the spike at t = 0.
Decay rate of the tail when a delay occurs. Larger λ → faster decay, shorter tail.
Shape of the tail — fitted globally across the fleet and held constant. Higher q means a heavier, slower-decaying tail.
The jump at t = 0 is p₀ — the probability of being exactly on time. From there the curve climbs along the heavy q-exponential tail toward 1.
Drop in the parts you have — trip, station, optional minute. Trainator returns exactly the resolution you asked for: a full delay distribution for the whole trip, a station-specific forecast, or a single scalar probability when all your booking flow needs is a yes/no answer. Sub-400 ms latency means responses arrive in real time, and clients can compute multiple probabilities from one payload without refetching.
Cumulative probability that the arrival delay is ≤ t minutes, computed from the returned distribution parameters.
Trip numbers, station IDs and reliability figures shown here are illustrative — inspired by real Deutsche Bahn services and IFOPT identifiers but not live. The production API exposes the live fleet.
Calibration is what makes probabilistic forecasts usable downstream. Sharpness is what makes them better than a coin-flip. Trainator delivers both, measured on n = 27,000 German long-distance connections from May 2026.
If we say 80% of trains will arrive within X minutes, roughly 80% actually do. Worst deviation: 5.9 pp at the 30th percentile — the model is slightly conservative at the low end and converges to perfect calibration above the 80th.
The model's predicted cumulative probability of delay ≤ t (averaged across the holdout) plotted against the empirical CDF we actually observed. The two curves stay within ∼1 pp of each other across the entire 0–60 min range — the predicted distribution describes reality on a per-minute level, not just at the headline percentiles.
A reference for evaluators who want to reproduce or extend the comparison. Each metric below was computed on the same May 2026 European long-distance holdout (n = 27,000).
The Brier score averages (p − y)² across all (probability, outcome) pairs, where y ∈ {0, 1} is the realised event "train was on time". It rewards confident-and-correct probabilities and punishes confident-and-wrong ones symmetrically. Scores here are weighted against the empirical event frequency, so a model that just predicts the base rate cannot get a free pass on the easy cases.
Trainator's probabilities aren't just more accurate on average — they're sharper. The naive base-rate predictor sits safely in the middle of the (0,1) interval and hedges; Trainator commits closer to 0 or 1 when the data warrants it, and that confidence — when correctly placed — is what drives the Brier gap. Random is the no-information floor at 0.25.
For any choice of "delayed = late by more than X minutes", the ROC curve plots true-positive rate against false-positive rate as the model's decision threshold sweeps from 0 to 1. AUC — the area under that curve — is the probability that a randomly chosen delayed connection scores higher than a randomly chosen on-time connection.
The chart shows the curve at the strictest definition (X = 0 min — "any delay at all"), where Trainator hits AUC = 0.854. Across X from 0 to 30 min the AUC stays in the 0.80–0.85 band: the model discriminates well no matter where you draw the line. A random classifier sits at 0.50 on the diagonal.
For each minute t we have a predicted cumulative probability and an observed one. The residual is the gap between them: r(t) = Fpred(t) − Fobs(t). Mean absolute error and root mean squared error summarise that residual series into a single number.
Weightedversions multiply each t's residual by the empirical share of trains arriving in that minute. Because 56% of trains arrive on time, the bin at t = 0 dominates the weights — so weighted RMSE / MAE measure how well the model fits the bins customers actually book around, rather than treating a residual at the rare 50-min tail as equally important as the residual at the on-time spike.
We're open to conversations that go beyond this page — whether you need more rigorous benchmarks, want to walk through a specific workflow, or are evaluating direct model access for your platform.
Holdout breakdowns by route, operator, season, and delay magnitude. Raw prediction logs available under NDA for technical due diligence.
Custom evaluation runs on your historical data, plus end-to-end integration walkthroughs tailored to your existing passenger information stack.
Dedicated inference endpoints, higher rate limits, SLA tiers, and on-premise deployment options for operators with strict data residency requirements.
I respond to all serious enquiries within as soon as possible, still please be aware, that I prioritze existing customers and technical conversations over general interest. If you don't get a reply within a week, feel free to send a follow-up email.