Design Notes
The BigQuery sink connector supports two distinct paths for inserting data into BigQuery. The original BatchLoader path is uses GCS to store intermediate files before writing them to tables in BigQuery. The second path is to use the StorageWriteAPI to stream the data to BigQuery.
The general flow
- Records come into the connector from Kafka.
- They are processed and converted into BigQuery table data.
- The table data are written to temporary files in GCS.
- The data from the files is written to BigQuery through either:
- Batch loading
- StorageWriterAPI.
Configuration options that are influenced by other options
bigQueryPartitionDecorator
- useStorageWriteApi
commitInterval
- useStorageWriteApi
enableBatchMode
- useStorageWriteApi
intermediateTableSuffix
-
deleteEnabled
-
upsertEnabled
kafkaKeyFieldName
-
deleteEnabled
-
upsertEnabled
mergeIntervalMs
-
deleteEnabled
-
upsertEnabled
useStorageWriteApi
-
deleteEnabled
-
upsertEnabled