Home · Detection Architecture

BetOnline cheating detection: my breakdown of what actually fires

Updated 27 May 2026 14 min read

By Raul Moriarty ·Poker Software Expert

Reverse-engineered notes on what BetOnline's security stack looks like from outside — behavioural fingerprinting and play-pattern analysis at lower offline cadence than GGPoker, aggressive collusion graphs on the regulatory-exposure side, and a human-review layer with a smaller queue. Plus the documented 2014 and 2018 bot cleanups and what they imply.

Summary

BetOnline runs the same four-layer detection model as every operator — behavioural fingerprinting, statistical play-pattern analysis, anti-collusion graph models, human review — but at a smaller budget. The system is more reactive than proactive.
Collusion and multi-accounting are detected aggressively because they carry direct regulatory and financial exposure. Solver-anchored single-account bots usually only surface on review triggers — large withdrawals, formal complaints, anomalous long-sample winrates.
Two documented historical bot cleanups: a 2014 sweep of a single ring caught after forum-led public pressure, and a larger 2018 action with refunds issued to affected players. Both cleanups were human-review-driven batch decisions, not realtime detection firing.
HUD policy is permissive in practice. Holdem Manager 3 and PokerTracker 4 run unobstructed, screen names are stable, and the long-horizon data-mined HUD attack that died at GGPoker still works here. This shifts the opponent-modelling picture for bots, and it shifts the operator's threat surface too.
The bot-account lifetime distribution is bimodal: most accounts run for months or years uncaught; a minority are caught in batched human-review waves. The right account-risk model is not a stationary detection probability but a bursty one.
Anti-detection is an adversarial-classification problem (Dalvi 2004, Lowd & Meek 2005), not a checklist of "human-looking" behaviours.

What counts as cheating in BetOnline's terms

BetOnline's terms of service prohibit the same set of categories as every other operator, with each category mapping to a different signal stack, false-positive budget, and consequence path. The categories matter because the operator does not spend equal effort on all of them — regulatory exposure determines priority, not the player-experience harm directly.

Prohibited categories — operator priority and detection difficulty at BetOnline
Category	Operator priority	Detection difficulty	Typical signal
Collusion / chip dumping	Highest (regulatory + financial)	Medium	Account graph + suspicious hand sequences
Multi-accounting	High	Low–Medium	Device fingerprint + crypto-wallet join + KYC
Botting (single account)	Medium	Medium-High at smaller scale	Behavioural fingerprint + play-pattern + review
Real-time assistance (RTA)	Medium	High	Statistical play-pattern over volume
Bot farms	High (catches via collusion graph)	Low–Medium	Shared fingerprint across many accounts
Ghosting in MTTs	Medium (spikes around Sunday flagship)	High	Win-rate vs known-skill baseline + IP joins

Collusion sits at the top because dumping rings drain the room's recreational pool and trigger regulatory complaints from licensed jurisdictions. Bot farms get caught primarily through the collusion graph, not the behavioural layer — when a single operator runs fifty bot accounts on one fingerprint, the graph layer makes the bots a side-effect of catching a farm. Single-account bots running carefully are not on the same priority tier.

The four-layer detection model at a smaller operator

The structure of the detection stack at BetOnline is the same as at GGPoker, PokerStars, or any other room. The difference is in the budget — how often each layer runs, how big its queue is, how much computer and human time it consumes per night.

Layer 1: Behavioural fingerprinting: Client telemetry on input timing, mouse-path geometry on desktop, touch dwell on mobile, action-confirmation latency, idle behaviour between hands. Cheap to compute, runs continuously, feeds a per-session behavioural score. The Chico client collects this at lower fidelity than the GGPoker client — fewer signals, less aggressive instrumentation. Naive constant-latency bots still get flagged here; carefully shaped behavioural noise passes consistently.
Layer 2: Statistical play-pattern analysis: Per-account distributional analysis on VPIP, PFR, 3-bet by position, fold-to-cbet by board texture, bet-sizing histograms, river aggression, all-in equity at showdown. Heavy compute. At a smaller operator the cadence is slower — anecdotally weekly to monthly rather than nightly — and the per-account risk score decays before it converts to action unless human review escalates.
Layer 3: Anti-collusion graph models: Account graphs joined by IP, device fingerprint, deposit method, KYC document, table co-occurrence, action correlation within hands. On BetOnline specifically, deposit method matters more than at fiat-only rooms because shared crypto wallets are a strong join key. This layer is where the operator spends; it catches the high-impact multi-accounting and chip-dumping cases that hurt the room financially.
Layer 4: Human review: The decisive layer. Reviewers consult the model output and read hand history, chat logs, sit-out behaviour, session patterns. Volume is the differentiator at BetOnline — fewer reviewers, longer queue, slower cycle. Most bot bans here are signed off by a person, often after a triggering event has moved the account up the queue.

The asynchronous weighting matters. Layer 1 fires continuously and mostly stays below threshold. Layer 2 produces a slowly-decaying per-account score. Layer 3 fires event-driven on graph changes (new accounts, shared fingerprint, suspicious table co-occurrence). Layer 4 is the bottleneck and the queue is prioritised by combined risk score, expected revenue impact, recent withdrawal activity, and — uniquely visible at BetOnline — by external pressure events (forum complaints, media coverage, public refund demands).

2014 and 2018: the documented bot cleanups

Two public events anchor the empirical picture of BetOnline's enforcement. Both were batched human-review actions, both were triggered by external pressure rather than the system firing in real time, and both led to refunds being issued — which is the strongest public signal of an operator admitting that detection lagged behind reality.

The 2014 incident involved a single botting ring detected after a series of TwoPlusTwo forum threads documenting suspiciously similar play patterns across multiple accounts. BetOnline acknowledged the issue, banned the accounts, and processed limited refunds. The scale was small — a handful of accounts — but the timeline (forum pressure for weeks before action) revealed the reactive cadence of the review queue.

The 2018 action was larger. Multiple bot rings were caught in a coordinated sweep after extended forum and media coverage. The cleanup ran over several weeks, ended in account closures, balance confiscations, and refunds to opponents who had played significant volume against the offending accounts. The operator did not publish detection-system internals (no operator ever does), but the pattern matched 2014: the behavioural and play-pattern signals had likely been visible for months; the queue advanced under external pressure.

Two structural inferences follow. First, the per-account detection probability inside a quiet stretch is meaningfully lower than the literature on adversarial classification would predict for a stationary classifier. Second, the per-account detection probability inside a sweep is meaningfully higher — accounts running an obvious play-pattern signal that had been ignored for months get caught in a single batch action.

Signal weights and observable failure modes

Exact signal weights are operator-confidential. The relative weights below are inferred from the observable pattern of which accounts get caught, in what sequence, and after what triggering events — pieced together over years of operator-side conversations, forum reports, and bust post-mortems.

Detection signals × observable weight × failure mode (BetOnline)
Signal	Layer	Relative weight	Naive failure mode
Action-timing variance < population	L1	Medium-High	Constant-latency action emission
Mouse-path linearity on desktop	L1	Medium	Straight-line cursor on every action
Idle behaviour between hands too uniform	L1	Low-Medium	No tab-switch, no chat, no micro-movement
VPIP/PFR at population mass with low variance	L2	High	Pure GTO baseline, no human-noise overlay
Bet sizing clustered on exact pot fractions	L2	High	Solver output without sizing perturbation
Win rate persistently outside skill-pool envelope	L2	Very High	Sustained high winrate at mid stakes with no human sessions
Shared crypto wallet across accounts	L3	Very High	Bot farm funded from one BTC/ETH address
Shared device fingerprint across accounts	L3	Very High	Bot farm on one IP / device
Large first withdrawal after long quiet period	L3+L4	High	Patient grind for 6 months, then big-bang cashout
External forum complaint or media coverage	L4	Very High (event-driven)	Account becomes a named example in a public thread
Chat behaviour: zero outgoing messages over 5k+ hands	L4	Medium	Bot never says "nh"

The signal pattern that gets accounts caught is consistent across both 2014 and 2018 cleanups: an L2 statistical-outlier score that had accumulated quietly for months, plus an L4 triggering event — usually external — that pushed the account from "flagged in long tail" to "actioned." The accounts that survive long-term are accounts that stay near population distributions on L1 and L2 and avoid the L4 triggers. That is not a checklist; it is a description of where the EV-detection frontier sits empirically.

Action-timing fingerprints

Action-timing distributions are the most-discussed and worst-implemented signal in the bot literature. A naive implementation fires at constant intervals or with uniform noise around a centroid — both are statistically catastrophic.

Real human action-timing distributions are log-normal in shape, with heavy right tails, and the location parameter conditions on decision difficulty. A snap-fold on a clearly trash hand resolves in 600–1200ms. A routine flop continuation-bet on a clean board lands in 1.5–4 seconds. A boundary river call against a triple-barrel sits in the 5–30 second range. Distraction events — phone notification, conversation, bathroom break — produce an independent 8–25 second tail at roughly 3% per action. The shape of the distribution is the fingerprint, not the mean.

# Schematic: behaviourally-shaped action timing
# Conceptual sketch, not the production implementation

def sample_action_delay(decision_difficulty, action_type, hand_state):
    """Return seconds-to-act drawn from a state-conditional log-normal."""
    # decision_difficulty in [0,1]: 0 = trivial fold, 1 = boundary call

    mu_base = {
        'fold_trivial':    math.log(0.9),
        'cbet_routine':    math.log(2.4),
        'check_routine':   math.log(1.6),
        'river_boundary':  math.log(8.5),
        'all_in_decision': math.log(12.0),
    }[action_type]

    mu = mu_base + 0.7 * decision_difficulty
    sigma = 0.35 + 0.55 * decision_difficulty

    delay = random.lognormvariate(mu, sigma)

    # ~3% chance of independent distraction tail
    if random.random() < 0.03:
        delay += random.uniform(8, 25)

    # Floor at a non-zero minimum — humans cannot react under 250ms
    return max(0.25, delay)

This is schematic. Production systems condition on more state — stack depth, opponent action sequence, position, multiway versus heads-up, table count, a session-alertness parameter that drifts down over long sessions to mimic fatigue. The right behaviour is not "add noise"; it is "draw from a distribution whose shape matches the population, conditioned on state."

HUD policy and what it implies for collusion

BetOnline's stated terms permit reading-only HUD tools and prohibit real-time advice software. In practice, the standard tracker products — Holdem Manager 3, PokerTracker 4 — run unobstructed against the Chico client. Hand histories save to disk. Long-horizon stats accumulate against stable screen names. There is no operator-side anti-HUD process detection of the kind GGPoker ships.

That policy has two consequences for the detection picture. For bot authors, the long-horizon HUD is an opponent-modelling prior with no extra acquisition cost; the data-mined HUD that died at GGPoker survives here. For the operator, the permissive HUD environment widens the collusion surface — colluding players can share hand histories cheaply, and the operator has to lean harder on the L3 graph layer to compensate. The 2018 cleanup notably involved both bots and collusion patterns, which is consistent with this structural feature: at an operator with stable IDs and tolerated trackers, collusion becomes easier and detection has to absorb that.

Anti-detection as adversarial classification

The standard mistake among bot builders is to treat detection as a feature checklist — add latency noise, vary touch coordinates, randomise schedule, never play more than eight hours. This is the wrong frame. Detection is an adversarial classifier: the operator builds a model that distinguishes bot behaviour from human behaviour, and the bot's task is to produce a behaviour distribution the classifier cannot separate from the population while preserving EV.

The formal literature begins with Dalvi et al. (2004), Adversarial Classification, KDD, and Lowd & Meek (2005), Adversarial Learning, KDD. The setting maps cleanly: an attacker chooses an action distribution that maximises expected utility under a classifier whose decision boundary it can probe but not fully observe. The modern adversarial-ML literature (Goodfellow et al. 2014 onward, the certified-robustness lineage) extends this with neural classifiers and gradient-based probing.

Three operational consequences fall out:

The classifier's decision boundary is non-stationary: Operators retrain. Behaviour that passed in 2024 may not pass in 2026, and behaviour that passes in BetOnline's quiet stretch may not pass during a public-pressure-driven sweep. The right risk model is non-stationary detection probability, not a fixed per-account number.
Population baseline is the reference, not "looking human": The classifier separates your distribution from the population distribution — not from some general notion of "human-looking." If the NL50 6-max population at BetOnline has a specific bet-sizing histogram with an extended tail on small overbets, the bot needs that shape too. The target is not anthropomorphism; the target is statistical indistinguishability from the pool.
EV vs detection is the right optimisation: Pure-GTO output maximises EV under fixed opponents. Behaviourally-shaped output gives up some EV for a lower detection score. The right operating point is not zero detection — it is EV-maximising under a budgeted detection probability over the account's expected lifetime, with the lifetime model accounting for BetOnline's bursty enforcement pattern.

This frame resolves a frequent apparent contradiction. Pure-GTO bots tend to get caught faster than well-noised slightly-sub-GTO bots. The GTO bot makes more EV per hand but is more separable from the population on bet-sizing and frequency, so its expected hand count before action is lower. The optimisation is not "win more per hand" — it is "win more in expectation over the account lifetime, accounting for the detection-induced lifetime distribution."

Have a question? Talk to us

Adversarial classification in this domain, behavioural shaping under EV constraints, bursty enforcement at smaller operators — questions on any of it land with the Poker Bot AI team.

Reach the team

References and related work

Selected sources. Names and identifiers provided; URLs are stable (arXiv) and persistent (Science / KDD proceedings).

Brown & Sandholm, 2019. Superhuman AI for multiplayer poker. Science 365 (Pluribus). Reference result for 6-max NLH at superhuman level. arXiv:1905.10311.
Moravčík et al., 2017. DeepStack: Expert-level artificial intelligence in heads-up no-limit poker. Science 356. arXiv:1701.01724.
Brown & Sandholm, 2017. Safe and nested subgame solving for imperfect-information games. NeurIPS (Libratus core technique).
Dalvi, Domingos, Mausam, Sanghai & Verma, 2004. Adversarial Classification. KDD. Foundational paper on the adversarial-classifier framing.
Lowd & Meek, 2005. Adversarial Learning. KDD. Probing the decision boundary of a deployed classifier.
Heinrich & Silver, 2016. Deep Reinforcement Learning from Self-Play in Imperfect-Information Games. NIPS DRL workshop. arXiv:1603.01121.

The companion notes cover the broader picture: why "BetOnline hacks" do not exist and the homepage's overview of what we mean by "poker bot" in 2026 against this specific room. The FAQ answers the implementation questions that come up most often in the chat.