Production Guide

This guide collects the operational concerns that come up when running aiographql-client in a long-lived service: session lifecycle, connection pool sizing, timeouts, retries, observability, and shutdown.

Sections build on Configuring Transport, Authentication, and Errors and Exceptions; each one is referenced where relevant rather than restated.

Session Lifecycle

For any process that handles more than a handful of requests, provide your own transport session and reuse it across the lifetime of the client. The client does not close sessions it did not create; ownership stays with the caller.

The recommended shape is an application-scoped session bound to the application lifecycle (FastAPI startup/shutdown, asyncio.run entry function, contextlib.AsyncExitStack, etc.):

import aiohttp
from aiographql.client import GraphQLClient

async def main():
    async with aiohttp.ClientSession(
        connector=aiohttp.TCPConnector(limit=200),
    ) as session:
        client = GraphQLClient(
            endpoint="https://api.example.com/graphql",
            session=session,
        )
        await run_application(client)
    # Session closes here; in-flight requests are aborted.

For httpx:

import httpx
from aiographql.client import GraphQLClient

async with httpx.AsyncClient(
    limits=httpx.Limits(max_connections=200, max_keepalive_connections=50),
) as session:
    client = GraphQLClient(
        endpoint="https://api.example.com/graphql",
        session=session,
    )
    await run_application(client)

Connection Pool Sizing

The default aiohttp connector limit is 100. Two patterns push past it:

  • High request concurrency. Size the pool above your steady-state request fan-out, leaving headroom for retries.

  • Subscriptions. Each subscription holds an open WebSocket for its lifetime and counts against the pool. Add the expected concurrent subscription count to the request fan-out estimate.

Set the limit on the connector for aiohttp or on Limits for httpx. Going too low causes new requests to queue; going too high wastes file descriptors and lets a stalled backend retain more in-flight work.

Timeouts

The library does not impose timeouts. Configure them on the session:

timeout = aiohttp.ClientTimeout(total=30, connect=5, sock_read=10)
async with aiohttp.ClientSession(timeout=timeout) as session:
    ...

For httpx:

timeout = httpx.Timeout(30.0, connect=5.0, read=10.0)
async with httpx.AsyncClient(timeout=timeout) as session:
    ...

A total timeout caps end-to-end latency for a request. connect and read (or sock_read) catch unresponsive peers without waiting out the full budget. Pick values that match the slowest acceptable user-facing latency for the operation.

Retries

The client does not retry. Wrap calls with tenacity (or any retry library) and limit retries to transient failures only: connection errors, timeouts, and HTTP 5XX. Do not retry validation errors or 4XX responses.

from tenacity import (
    retry,
    retry_if_exception_type,
    stop_after_attempt,
    wait_exponential,
)
from aiographql.client import GraphQLRequestException
from aiographql.client.exceptions import GraphQLTransportException

@retry(
    retry=retry_if_exception_type((GraphQLTransportException, asyncio.TimeoutError)),
    wait=wait_exponential(multiplier=0.5, min=0.5, max=8),
    stop=stop_after_attempt(4),
    reraise=True,
)
async def query_with_retry(client, request):
    return await client.query(request)

For HTTP 5XX-specific retries, catch GraphQLRequestException and inspect exc.response.json to decide whether to retry.

Schema Validation

Pre-flight validation catches programmer errors before they hit the network, but it requires introspection on the server.

  • Development: keep validate=True. Bad queries fail fast with a clear message.

  • Production with introspection disabled: set validate=False on the client. Surface server-side validation as GraphQLRequestException or via response.errors.

  • Production with introspection enabled: consider setting schema_ttl on the client so the introspected schema refreshes periodically without restarting the service.

See Errors and Exceptions for handling each layer.

Observability

Logging and tracing belong on the transport session, not on the GraphQL client. Use aiohttp’s TraceConfig or httpx’s event hooks to capture request/response metadata for every call, including ones that fail before the GraphQL layer sees them.

Minimal aiohttp example:

import logging
import aiohttp

log = logging.getLogger("graphql")

async def on_request_start(session, ctx, params):
    ctx.start = asyncio.get_running_loop().time()

async def on_request_end(session, ctx, params):
    elapsed = asyncio.get_running_loop().time() - ctx.start
    log.info(
        "graphql %s %s %.3fs",
        params.method,
        params.url,
        elapsed,
    )

trace_config = aiohttp.TraceConfig()
trace_config.on_request_start.append(on_request_start)
trace_config.on_request_end.append(on_request_end)

async with aiohttp.ClientSession(trace_configs=[trace_config]) as session:
    client = GraphQLClient(endpoint="...", session=session)

For tracing, install an OpenTelemetry instrumentation for the underlying transport (opentelemetry-instrumentation-aiohttp-client or opentelemetry-instrumentation-httpx); spans cover every call the client makes.

Graceful Shutdown

The client itself is cheap to construct and discard. The expensive part is the session. On shutdown:

  1. Stop accepting new work that would call the client.

  2. Cancel or drain any subscriptions (GraphQLSubscription.unsubscribe).

  3. Await in-flight queries with a deadline.

  4. Exit the async with block (or call await session.close()) so the transport closes its connector and releases file descriptors.

Skipping step 4 in long-running services leaks connections; in short-running scripts it leaks them only on process exit but produces Unclosed client session warnings.

Health Checks

A liveness probe should not call the upstream GraphQL endpoint. A readiness probe can, but should issue a minimal query (often { __typename }) and treat any GraphQLClientException subclass as “not ready” without bringing the pod down.

async def graphql_ready(client) -> bool:
    try:
        await asyncio.wait_for(
            client.query("{ __typename }"),
            timeout=2,
        )
    except Exception:
        return False
    return True