Google Cloud Storage sink connector naming and data formats#
The Apache Kafka Connect® GCS sink connector by Aiven enables you to move data from an Aiven for Apache Kafka® cluster to a Google Cloud Storage bucket for long term storage. The full connector documentation is available in the dedicated GitHub repository.
File name format#
The connector uses the following format for output files (blobs)
<prefix><topic>-<partition>-<start-offset>[.gz]
The file name format has the following building blocks:
<prefix>: the file name prefix, useful, for example, to define subdirectories in the storage bucket<topic>: the source Apache Kafka topic name<partition>: the source Apache Kafka topic’s partition number<start-offset>: the offset of the first record in the file[.gz]: the file suffix, added when compression is enabled and depending on compression type
Data format#
The connector output files are text files that contain one record per line (separated by \n).
There are two types of data format available:
Flat structure: it’s the default data format, where the field values are separated by comma (CSV).
You can use the CSV format by setting the
format.output.typetocsv.Complex structure: the file stores messages in the format of JSON lines. It contains one record per line and each line is a valid JSON object (
jsonl).You can use the JSON format by setting the
format.output.typetojsonl.