BetOnline Cheating Detection: Architecture, Signals, Failure Modes
By Raul Moriarty ·Poker Software Expert
Reverse-engineered notes on what BetOnline's security stack looks like from outside — behavioural fingerprinting and play-pattern analysis at lower offline cadence than GGPoker, aggressive collusion graphs on the regulatory-exposure side, and a human-review layer with a smaller queue. Plus the documented 2014 and 2018 bot cleanups and what they imply.
Summary
- BetOnline runs the same four-layer detection model as every operator — behavioural fingerprinting, statistical play-pattern analysis, anti-collusion graph models, human review — but at a smaller budget. The system is more reactive than proactive.
- Collusion and multi-accounting are detected aggressively because they carry direct regulatory and financial exposure. Solver-anchored single-account bots usually only surface on review triggers — large withdrawals, formal complaints, anomalous long-sample winrates.
- Two documented historical bot cleanups: a 2014 sweep of a single ring caught after forum-led public pressure, and a larger 2018 action with refunds issued to affected players. Both cleanups were human-review-driven batch decisions, not realtime detection firing.
- HUD policy is permissive in practice. Holdem Manager 3 and PokerTracker 4 run unobstructed, screen names are stable, and the long-horizon data-mined HUD attack that died at GGPoker still works here. This shifts the opponent-modelling picture for bots, and it shifts the operator's threat surface too.
- The bot-account lifetime distribution is bimodal: most accounts run for months or years uncaught; a minority are caught in batched human-review waves. The right account-risk model is not a stationary detection probability but a bursty one.
- Anti-detection is an adversarial-classification problem (Dalvi 2004, Lowd & Meek 2005), not a checklist of "human-looking" behaviours.
What counts as cheating in BetOnline's terms
BetOnline's terms of service prohibit the same set of categories as every other operator, with each category mapping to a different signal stack, false-positive budget, and consequence path. The categories matter because the operator does not spend equal effort on all of them — regulatory exposure determines priority, not the player-experience harm directly.
| Category | Operator priority | Detection difficulty | Typical signal |
|---|---|---|---|
| Collusion / chip dumping | Highest (regulatory + financial) | Medium | Account graph + suspicious hand sequences |
| Multi-accounting | High | Low–Medium | Device fingerprint + crypto-wallet join + KYC |
| Botting (single account) | Medium | Medium-High at smaller scale | Behavioural fingerprint + play-pattern + review |
| Real-time assistance (RTA) | Medium | High | Statistical play-pattern over volume |
| Bot farms | High (catches via collusion graph) | Low–Medium | Shared fingerprint across many accounts |
| Ghosting in MTTs | Medium (spikes around Sunday flagship) | High | Win-rate vs known-skill baseline + IP joins |
Collusion sits at the top because dumping rings drain the room's recreational pool and trigger regulatory complaints from licensed jurisdictions. Bot farms get caught primarily through the collusion graph, not the behavioural layer — when a single operator runs fifty bot accounts on one fingerprint, the graph layer makes the bots a side-effect of catching a farm. Single-account bots running carefully are not on the same priority tier.
The four-layer detection model at a smaller operator
The structure of the detection stack at BetOnline is the same as at GGPoker, PokerStars, or any other room. The difference is in the budget — how often each layer runs, how big its queue is, how much computer and human time it consumes per night.
- Layer 1: Behavioural fingerprinting
- Client telemetry on input timing, mouse-path geometry on desktop, touch dwell on mobile, action-confirmation latency, idle behaviour between hands. Cheap to compute, runs continuously, feeds a per-session behavioural score. The Chico client collects this at lower fidelity than the GGPoker client — fewer signals, less aggressive instrumentation. Naive constant-latency bots still get flagged here; carefully shaped behavioural noise passes consistently.
- Layer 2: Statistical play-pattern analysis
- Per-account distributional analysis on VPIP, PFR, 3-bet by position, fold-to-cbet by board texture, bet-sizing histograms, river aggression, all-in equity at showdown. Heavy compute. At a smaller operator the cadence is slower — anecdotally weekly to monthly rather than nightly — and the per-account risk score decays before it converts to action unless human review escalates.
- Layer 3: Anti-collusion graph models
- Account graphs joined by IP, device fingerprint, deposit method, KYC document, table co-occurrence, action correlation within hands. On BetOnline specifically, deposit method matters more than at fiat-only rooms because shared crypto wallets are a strong join key. This layer is where the operator spends; it catches the high-impact multi-accounting and chip-dumping cases that hurt the room financially.
- Layer 4: Human review
- The decisive layer. Reviewers consult the model output and read hand history, chat logs, sit-out behaviour, session patterns. Volume is the differentiator at BetOnline — fewer reviewers, longer queue, slower cycle. Most bot bans here are signed off by a person, often after a triggering event has moved the account up the queue.
The asynchronous weighting matters. Layer 1 fires continuously and mostly stays below threshold. Layer 2 produces a slowly-decaying per-account score. Layer 3 fires event-driven on graph changes (new accounts, shared fingerprint, suspicious table co-occurrence). Layer 4 is the bottleneck and the queue is prioritised by combined risk score, expected revenue impact, recent withdrawal activity, and — uniquely visible at BetOnline — by external pressure events (forum complaints, media coverage, public refund demands).
2014 and 2018: the documented bot cleanups
Two public events anchor the empirical picture of BetOnline's enforcement. Both were batched human-review actions, both were triggered by external pressure rather than the system firing in real time, and both led to refunds being issued — which is the strongest public signal of an operator admitting that detection lagged behind reality.
The 2014 incident involved a single botting ring detected after a series of TwoPlusTwo forum threads documenting suspiciously similar play patterns across multiple accounts. BetOnline acknowledged the issue, banned the accounts, and processed limited refunds. The scale was small — a handful of accounts — but the timeline (forum pressure for weeks before action) revealed the reactive cadence of the review queue.
The 2018 action was larger. Multiple bot rings were caught in a coordinated sweep after extended forum and media coverage. The cleanup ran over several weeks, ended in account closures, balance confiscations, and refunds to opponents who had played significant volume against the offending accounts. The operator did not publish detection-system internals (no operator ever does), but the pattern matched 2014: the behavioural and play-pattern signals had likely been visible for months; the queue advanced under external pressure.
Two structural inferences follow. First, the per-account detection probability inside a quiet stretch is meaningfully lower than the literature on adversarial classification would predict for a stationary classifier. Second, the per-account detection probability inside a sweep is meaningfully higher — accounts running an obvious play-pattern signal that had been ignored for months get caught in a single batch action.
Signal weights and observable failure modes
Exact signal weights are operator-confidential. The relative weights below are inferred from the observable pattern of which accounts get caught, in what sequence, and after what triggering events — pieced together over years of operator-side conversations, forum reports, and bust post-mortems.
| Signal | Layer | Relative weight | Naive failure mode |
|---|---|---|---|
| Action-timing variance < population | L1 | Medium-High | Constant-latency action emission |
| Mouse-path linearity on desktop | L1 | Medium | Straight-line cursor on every action |
| Idle behaviour between hands too uniform | L1 | Low-Medium | No tab-switch, no chat, no micro-movement |
| VPIP/PFR at population mass with low variance | L2 | High | Pure GTO baseline, no human-noise overlay |
| Bet sizing clustered on exact pot fractions | L2 | High | Solver output without sizing perturbation |
| Win rate persistently outside skill-pool envelope | L2 | Very High | Sustained high winrate at mid stakes with no human sessions |
| Shared crypto wallet across accounts | L3 | Very High | Bot farm funded from one BTC/ETH address |
| Shared device fingerprint across accounts | L3 | Very High | Bot farm on one IP / device |
| Large first withdrawal after long quiet period | L3+L4 | High | Patient grind for 6 months, then big-bang cashout |
| External forum complaint or media coverage | L4 | Very High (event-driven) | Account becomes a named example in a public thread |
| Chat behaviour: zero outgoing messages over 5k+ hands | L4 | Medium | Bot never says "nh" |
The signal pattern that gets accounts caught is consistent across both 2014 and 2018 cleanups: an L2 statistical-outlier score that had accumulated quietly for months, plus an L4 triggering event — usually external — that pushed the account from "flagged in long tail" to "actioned." The accounts that survive long-term are accounts that stay near population distributions on L1 and L2 and avoid the L4 triggers. That is not a checklist; it is a description of where the EV-detection frontier sits empirically.
Action-timing fingerprints
Action-timing distributions are the most-discussed and worst-implemented signal in the bot literature. A naive implementation fires at constant intervals or with uniform noise around a centroid — both are statistically catastrophic.
Real human action-timing distributions are log-normal in shape, with heavy right tails, and the location parameter conditions on decision difficulty. A snap-fold on a clearly trash hand resolves in 600–1200ms. A routine flop continuation-bet on a clean board lands in 1.5–4 seconds. A boundary river call against a triple-barrel sits in the 5–30 second range. Distraction events — phone notification, conversation, bathroom break — produce an independent 8–25 second tail at roughly 3% per action. The shape of the distribution is the fingerprint, not the mean.
# Schematic: behaviourally-shaped action timing
# Conceptual sketch, not the production implementation
def sample_action_delay(decision_difficulty, action_type, hand_state):
"""Return seconds-to-act drawn from a state-conditional log-normal."""
# decision_difficulty in [0,1]: 0 = trivial fold, 1 = boundary call
mu_base = {
'fold_trivial': math.log(0.9),
'cbet_routine': math.log(2.4),
'check_routine': math.log(1.6),
'river_boundary': math.log(8.5),
'all_in_decision': math.log(12.0),
}[action_type]
mu = mu_base + 0.7 * decision_difficulty
sigma = 0.35 + 0.55 * decision_difficulty
delay = random.lognormvariate(mu, sigma)
# ~3% chance of independent distraction tail
if random.random() < 0.03:
delay += random.uniform(8, 25)
# Floor at a non-zero minimum — humans cannot react under 250ms
return max(0.25, delay) This is schematic. Production systems condition on more state — stack depth, opponent action sequence, position, multiway versus heads-up, table count, a session-alertness parameter that drifts down over long sessions to mimic fatigue. The right behaviour is not "add noise"; it is "draw from a distribution whose shape matches the population, conditioned on state."
HUD policy and what it implies for collusion
BetOnline's stated terms permit reading-only HUD tools and prohibit real-time advice software. In practice, the standard tracker products — Holdem Manager 3, PokerTracker 4 — run unobstructed against the Chico client. Hand histories save to disk. Long-horizon stats accumulate against stable screen names. There is no operator-side anti-HUD process detection of the kind GGPoker ships.
That policy has two consequences for the detection picture. For bot authors, the long-horizon HUD is an opponent-modelling prior with no extra acquisition cost; the data-mined HUD that died at GGPoker survives here. For the operator, the permissive HUD environment widens the collusion surface — colluding players can share hand histories cheaply, and the operator has to lean harder on the L3 graph layer to compensate. The 2018 cleanup notably involved both bots and collusion patterns, which is consistent with this structural feature: at an operator with stable IDs and tolerated trackers, collusion becomes easier and detection has to absorb that.
Anti-detection as adversarial classification
The standard mistake among bot builders is to treat detection as a feature checklist — add latency noise, vary touch coordinates, randomise schedule, never play more than eight hours. This is the wrong frame. Detection is an adversarial classifier: the operator builds a model that distinguishes bot behaviour from human behaviour, and the bot's task is to produce a behaviour distribution the classifier cannot separate from the population while preserving EV.
The formal literature begins with Dalvi et al. (2004), Adversarial Classification, KDD, and Lowd & Meek (2005), Adversarial Learning, KDD. The setting maps cleanly: an attacker chooses an action distribution that maximises expected utility under a classifier whose decision boundary it can probe but not fully observe. The modern adversarial-ML literature (Goodfellow et al. 2014 onward, the certified-robustness lineage) extends this with neural classifiers and gradient-based probing.
Three operational consequences fall out:
- The classifier's decision boundary is non-stationary
- Operators retrain. Behaviour that passed in 2024 may not pass in 2026, and behaviour that passes in BetOnline's quiet stretch may not pass during a public-pressure-driven sweep. The right risk model is non-stationary detection probability, not a fixed per-account number.
- Population baseline is the reference, not "looking human"
- The classifier separates your distribution from the population distribution — not from some general notion of "human-looking." If the NL50 6-max population at BetOnline has a specific bet-sizing histogram with an extended tail on small overbets, the bot needs that shape too. The target is not anthropomorphism; the target is statistical indistinguishability from the pool.
- EV vs detection is the right optimisation
- Pure-GTO output maximises EV under fixed opponents. Behaviourally-shaped output gives up some EV for a lower detection score. The right operating point is not zero detection — it is EV-maximising under a budgeted detection probability over the account's expected lifetime, with the lifetime model accounting for BetOnline's bursty enforcement pattern.
This frame resolves a frequent apparent contradiction. Pure-GTO bots tend to get caught faster than well-noised slightly-sub-GTO bots. The GTO bot makes more EV per hand but is more separable from the population on bet-sizing and frequency, so its expected hand count before action is lower. The optimisation is not "win more per hand" — it is "win more in expectation over the account lifetime, accounting for the detection-induced lifetime distribution."
Have a question? Talk to us
Adversarial classification in this domain, behavioural shaping under EV constraints, bursty enforcement at smaller operators — questions on any of it land with the Poker Bot AI team.
References and related work
Selected sources. Names and identifiers provided; URLs are stable (arXiv) and persistent (Science / KDD proceedings).
- Brown & Sandholm, 2019. Superhuman AI for multiplayer poker. Science 365 (Pluribus). Reference result for 6-max NLH at superhuman level. arXiv:1905.10311.
- Moravčík et al., 2017. DeepStack: Expert-level artificial intelligence in heads-up no-limit poker. Science 356. arXiv:1701.01724.
- Brown & Sandholm, 2017. Safe and nested subgame solving for imperfect-information games. NeurIPS (Libratus core technique).
- Dalvi, Domingos, Mausam, Sanghai & Verma, 2004. Adversarial Classification. KDD. Foundational paper on the adversarial-classifier framing.
- Lowd & Meek, 2005. Adversarial Learning. KDD. Probing the decision boundary of a deployed classifier.
- Heinrich & Silver, 2016. Deep Reinforcement Learning from Self-Play in Imperfect-Information Games. NIPS DRL workshop. arXiv:1603.01121.
The companion notes cover the broader picture: why "BetOnline hacks" do not exist and the homepage's overview of what we mean by "poker bot" in 2026 against this specific room. The FAQ answers the implementation questions that come up most often in the chat.