Kafka Connect BigQuery Connector
This is an implementation of a sink connector from Apache Kafka to Google BigQuery, built on top of Apache Kafka Connect.
Download
The current release is v2.8.0
We provide the following convenience packages
See the release notes for information on all releases.
The Kafka Connect BigQuery Connector is dependent upon or uses the following:
History
This connector was originally developed by WePay. In late 2020 the project moved to Confluent, with both companies taking on maintenance duties. In 2024, Aiven created its own fork based off the Confluent project in order to continue maintaining an open source, Apache 2-licensed version of the connector.
Configuration
Sample
An example connector configuration, that reads records from Kafka with JSON-encoded values and writes their values to BigQuery:
{
"connector.class": "com.wepay.kafka.connect.bigquery.BigQuerySinkConnector",
"topics": "users, clicks, payments",
"tasks.max": "3",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"project": "kafka-ingest-testing",
"defaultDataset": "kcbq-example",
"keyfile": "/tmp/bigquery-credentials.json"
}
Configuration options documentation
See the Configuration options for a list of the connector's configuration properties.
Building from source
This project uses the Maven build tool.
To compile the project without running the integration tests execute mvn package -DskipITs
.
To build the documentation execute the following steps:
mvn install -DskipITs
mvn -f tools
mvn -f docs
Once the documentation is built it can be run by executing mvn -f docs site:run
.
Integration test setup
Integration tests require a live BigQuery and Kafka installation. Configuring those components is beyond the scope of this document.
Once you have the test environment ready, integration specific environment variables must be set.
Local configuration
- GOOGLE_APPLICATION_CREDENTIALS - the path to a json file that was download when the GCP account key was created.
- KCBQ_TEST_BUCKET - the name of the bucket to use for testing,
- KCBQ_TEST_DATASET - the name of the dataset to use for testing,
- KCBQ_TEST_KEYFILE - same as the GOOGLE_APPLICATION_CREDENTIALS
- KCBQ_TEST_PROJECT - the name of the project to use.
GitHub configuration
To run the integration tests from a GitHub action the following variables must be set
- GCP_CREDENTIALS - the contents of a json file that was download when the GCP account key was created.
- KCBQ_TEST_BUCKET - the bucket to use for the tests
- KCBQ_TEST_DATASET - the data set to use for the tests.
- KCBQ_TEST_PROJECT - the project to use for the tests.