Fork me on GitHub

Design Notes

The BigQuery sink connector supports two distinct paths for inserting data into BigQuery. The original BatchLoader path is uses GCS to store intermediate files before writing them to tables in BigQuery. The second path is to use the StorageWriteAPI to stream the data to BigQuery.

The general flow

Records come into the connector from Kafka.
They are processed and converted into BigQuery table data.
The table data are written to temporary files in GCS.
The data from the files is written to BigQuery through either:
1. Batch loading
2. StorageWriterAPI.

Configuration options that are influenced by other options

commitInterval

useStorageWriteApi

enableBatchMode

bigQueryPartitionDecorator
useStorageWriteApi

intermediateTableSuffix

deleteEnabled
upsertEnabled

kafkaKeyFieldName

deleteEnabled
upsertEnabled

mergeIntervalMs

deleteEnabled
upsertEnabled

useStorageWriteApi

deleteEnabled
upsertEnabled