Fork me on GitHub

Design Notes

The BigQuery sink connector supports two distinct paths for inserting data into BigQuery. The original BatchLoader path is uses GCS to store intermediate files before writing them to tables in BigQuery. The second path is to use the StorageWriteAPI to stream the data to BigQuery.

The general flow

  1. Records come into the connector from Kafka.
  2. They are processed and converted into BigQuery table data.
  3. The table data are written to temporary files in GCS.
  4. The data from the files is written to BigQuery through either:
    1. Batch loading
    2. StorageWriterAPI.

Configuration options that are influenced by other options

bigQueryPartitionDecorator

  • useStorageWriteApi

commitInterval

  • useStorageWriteApi

enableBatchMode

  • useStorageWriteApi

intermediateTableSuffix

  • deleteEnabled

  • upsertEnabled

kafkaKeyFieldName

  • deleteEnabled

  • upsertEnabled

mergeIntervalMs

  • deleteEnabled

  • upsertEnabled

useStorageWriteApi

  • deleteEnabled

  • upsertEnabled