Errors and Retries

The SDK maps every YDB server status code to a Python exception class. All exceptions inherit from Error, so you can catch the whole hierarchy with a single except ydb.Error clause, or handle individual error types precisely.

Exception Hierarchy

ydb.Error
├── ydb.BadRequest          — malformed query or invalid argument
├── ydb.Unauthorized        — authentication succeeded but operation not permitted
├── ydb.Unauthenticated     — authentication failed
├── ydb.InternalError       — server-side internal error (usually transient)
├── ydb.Aborted             — transaction aborted due to a conflict; safe to retry
├── ydb.Unavailable         — service temporarily unavailable; safe to retry
├── ydb.Overloaded          — server is overloaded; retry with backoff
├── ydb.SchemeError         — schema-related error (e.g. table not found, already exists)
├── ydb.GenericError        — unclassified server error
├── ydb.Timeout             — server-side deadline exceeded
├── ydb.BadSession          — session is invalid or expired; create a new one
├── ydb.PreconditionFailed  — operation precondition not met
├── ydb.AlreadyExists       — object already exists (DDL)
├── ydb.NotFound            — object not found
├── ydb.SessionExpired      — session TTL expired
├── ydb.Cancelled           — operation was cancelled
├── ydb.Undetermined        — outcome is unknown (e.g. network lost before ack)
├── ydb.Unsupported         — operation not supported by this server version
├── ydb.SessionBusy         — session is executing another request
├── ydb.ExternalError       — error in an external data source
├── ydb.TruncatedResponseError — result set was truncated by the server
├── ydb.SessionPoolEmpty    — all sessions are busy; pool exhausted
├── ydb.SessionPoolClosed   — session pool has been stopped
├── ydb.ConnectionError     — base class for transport-level errors
│   ├── ydb.ConnectionFailure  — could not establish connection
│   ├── ydb.ConnectionLost     — connection dropped mid-request
│   └── ydb.Unimplemented      — server does not support this RPC
└── ydb.DeadlineExceed      — client-side deadline exceeded

Catching Errors

import ydb

try:
    pool.execute_with_retries("SELECT * FROM users")
except ydb.SchemeError:
    print("table does not exist")
except ydb.Unauthorized:
    print("access denied")
except ydb.Unavailable:
    print("service temporarily unavailable")
except ydb.Error as e:
    print(f"other YDB error: {e}")

Each exception exposes:

  • str(e) — human-readable message from the server.

  • e.issues — list of structured IssueMessage objects with message, issue_code, and severity.

  • e.status — the ydb.StatusCode enum value.

try:
    pool.execute_with_retries("BAD QUERY")
except ydb.Error as e:
    print(e.status)    # e.g. StatusCode.BAD_REQUEST
    print(e.message)
    if e.issues:
        for issue in e.issues:
            print(issue.message, issue.severity)

Retriable vs Non-Retriable Errors

Not every error is worth retrying. The SDK classifies errors into three groups:

Always retriable (fast backoff):

  • Unavailable — service is temporarily unavailable.

  • ClientInternalError — internal SDK error.

  • SessionExpired — session TTL passed; the SDK opens a new one.

  • NotFound — by default retried (configurable via retry_not_found).

  • Cancelled — only when retry_cancelled=True is set.

Retriable with slow backoff:

Retriable only for idempotent operations (slow backoff):

  • Undetermined — outcome unknown; only safe to retry if the operation is idempotent (i.e. repeating it has the same effect as running it once). Set idempotent=True in RetrySettings to enable.

Never retried:

RetrySettings

All pool methods that perform retries (execute_with_retries, retry_operation_sync, retry_tx_sync) accept an optional RetrySettings object:

import ydb

retry = ydb.RetrySettings(
    max_retries=5,               # max retry attempts (default: 10)
    idempotent=False,            # set True to also retry Undetermined errors
    retry_cancelled=False,       # set True to retry Cancelled errors
)

pool.execute_with_retries("SELECT 1", retry_settings=retry)

BackoffSettings

RetrySettings uses two separate backoff curves:

  • fast backoff — used for errors expected to clear quickly (Unavailable, SessionExpired, etc.).

  • slow backoff — used for errors where the server needs more breathing room (Overloaded, Aborted, connection failures, etc.).

Both are instances of BackoffSettings:

fast = ydb.BackoffSettings(
    ceiling=10,           # exponent cap: max wait slot = 2^ceiling * slot_duration
    slot_duration=0.005,  # base time unit in seconds (default: 5 ms)
    uncertain_ratio=0.5,  # fraction of the window that is randomised (jitter)
)

slow = ydb.BackoffSettings(
    ceiling=6,
    slot_duration=1.0,    # 1 second base (default)
    uncertain_ratio=0.5,
)

retry = ydb.RetrySettings(
    max_retries=10,
    fast_backoff_settings=fast,
    slow_backoff_settings=slow,
)

The actual sleep duration for retry n is:

slots  = 2 ^ min(n, ceiling)
max_ms = slots * slot_duration * 1000
sleep  = max_ms * (random() * uncertain_ratio + (1 - uncertain_ratio)) / 1000

Error Callback

To log or instrument every retry attempt, pass on_ydb_error_callback:

import logging

logger = logging.getLogger(__name__)

def log_retry(err: ydb.Error):
    logger.warning("YDB error, will retry: %s", err)

retry = ydb.RetrySettings(
    max_retries=10,
    on_ydb_error_callback=log_retry,
)

pool.execute_with_retries("SELECT 1", retry_settings=retry)

@ydb_retry Decorator

For functions that run outside a session pool, use the @ydb.ydb_retry decorator. It wraps both synchronous and asynchronous functions:

import ydb

@ydb.ydb_retry(max_retries=5, idempotent=True)
def fetch_user(driver: ydb.Driver, user_id: int):
    with ydb.QuerySessionPool(driver) as pool:
        result = pool.execute_with_retries(
            "SELECT name FROM users WHERE id = $id",
            parameters={"$id": user_id},
        )
        return result[0].rows[0]["name"]

# Async version — the decorator detects coroutines automatically:
@ydb.ydb_retry(max_retries=5, idempotent=True)
async def fetch_user_async(driver: ydb.aio.Driver, user_id: int):
    async with ydb.aio.QuerySessionPool(driver) as pool:
        result = await pool.execute_with_retries(
            "SELECT name FROM users WHERE id = $id",
            parameters={"$id": user_id},
        )
        return result[0].rows[0]["name"]

Note

The decorator retries the entire function on failure, not just the individual query. Only use idempotent=True when repeating the full function body is safe.

Handling Undetermined

Undetermined means the network was lost before the server could confirm whether the operation succeeded or failed. The server may have applied the write — or not.

For read-only queries this is always safe to retry. For writes, only retry if you can tolerate duplicates or the query is naturally idempotent (e.g. UPSERT):

retry = ydb.RetrySettings(
    max_retries=10,
    idempotent=True,  # enables retry on Undetermined
)

# Safe: UPSERT is idempotent
pool.execute_with_retries(
    "UPSERT INTO events (id, data) VALUES (42, 'payload')",
    retry_settings=retry,
)

# Unsafe: INSERT will fail with AlreadyExists on the second attempt —
# but that error is not retriable, so at worst you get an exception.
pool.execute_with_retries(
    "INSERT INTO events (id, data) VALUES (42, 'payload')",
    retry_settings=retry,
)

Common Patterns

Fail fast on connection errors during startup:

try:
    driver.wait(timeout=5, fail_fast=True)
except TimeoutError:
    raise SystemExit("Could not reach YDB — check endpoint and credentials")

Distinguish schema errors (fix the code) from transient errors (retry):

try:
    pool.execute_with_retries("SELECT * FROM nonexistent_table")
except ydb.SchemeError as e:
    raise RuntimeError(f"Schema problem: {e}") from e
except ydb.Error:
    pass  # handled by retry logic inside execute_with_retries