Write API Best Practices

Author: Siddharth Agrawal | Date: Sep 18, 2025

This document builds upon a set of best practices to follow when using the BigQuery Write API. These practices are being incorporated into the Aiven BigQuery Sink Connector.

TABLE OF CONTENTS

Enable Retries

Enable Multiplexing

Enable Java Client-Side Retries for Storage Write API

Author: Mariia Podgaietska | Date: Aug 5, 2025

Objective

Enable the built-in retry mechanism provided by the Storage Write API Java client library to handle request-level errors within the BigQuery Sink Connector.

Motivation

The connector currently implements its own retry mechanism to handle failures encountered during append rows calls. The BigQuery Storage Write API Java client library, however, already includes built-in and configurable retries (see documentation) for handling request-level transient errors.

By deferring handling of request-level error retries to the Java client's internal retry mechanism, the connector can benefit from more advanced and tunable retry behaviour, such as exponential backoff, etc. This would also help simplify the connector's logic and enable a more standardized way of handling transient errors.

Design

Java Client Retry Configuration

When instantiating a JsonStreamWriter, the connector will construct and apply a RetrySettings object to enable the Java client's built-in retry mechanism. Existing connector configurations (bigQueryRetry and bigQueryRetryWait), which currently control connector-level retries, will be reused to set the maxAttempts and initialRetryDelay respectively. Other useful configurations could be set with default values for now and made configurable later through additional connector configs if needed, such as setRetryDelayMultiplier(1.1).

Error Handling Behaviour

The connector's existing retry logic remains important for handling error scenarios that require intervention and then a re-attempt of the write. Examples of such scenarios are:

Schema mismatch: connector updates table schema (if configured to do so) and reattempts write.
Missing table: connector creates a table (if configured to do so) and reattempts write.
Malformed records: connector reroutes malformed records to DLQ (if configured to do so) and reattempts write.
Closed stream: connector recreates stream and reattempts write.
Request too large: connector reattempts request with smaller batch size and reattempts write.

For request-level errors (e.g, transient gRPC failures), however, we want to avoid compounded retries resulting from both the Java client and the connector retrying requests one after another. To achieve this, we will ensure that:

If an error returned by append call is a known logical error (one of the listed above), the connector will proceed with its own retry mechanism as before.
If the error does not match any of the above cases, the task will fail immediately, as we can assume that:
- Append failed due to a non-retryable error, or
- Append failed due to a request-level error, but the java client exhausted all retry attempts

Enable Multiplexing for Storage Write API

As explained in the official documentation, the multiplexing or “connection management” feature is available when using the Write API default stream. Presently this feature is turned off by default within the Java client library. However, enabling this feature provides benefits of:

Minimizing the number of connections that could be simultaneously opened when using multiple JsonStreamWriter objects to write to one or more BigQuery tables. Instead, existing connections will automatically be shared by traffic intended for multiple tables that reside in the same destination region. Using fewer connections improves efficiency of the data transfer and also avoids hitting BigQuery connection quota limits.
Automatically scaling up (and down) the number of connections being used to send traffic to one or more destination tables. The scaling up (and down) process is managed entirely within the Java client library which simplifies client-side code. For example, let's say the connector is processing a single topic. As the rate of traffic from this topic increases, using a single JsonStreamWriter might normally not be enough as a single connection can get saturated at roughly 10MB/s. With multiplexing enabled, the Java client library will automatically add new connections in the background to handle the increasing load.

It is recommended to have multiplexing turned on whenever the default stream is being used. For example, this practice is also followed by the Spark Dataproc connector.