Aiven for Apache Kafka® as a source for Aiven for ClickHouse® ============================================================= This article shows by way of example how to integrate Aiven for Apache Kafka® with Aiven for ClickHouse® using `Terraform provider for Aiven <https://registry.terraform.io/providers/aiven/aiven/latest/docs>`_. An Apache Kafka® source topic is used as a data source and Aiven for ClickHouse® is used to filter or transform the raw data with a materialized view before writing it to a regular table. .. topic:: Sample of sensor data First, check out how sensor data can look like for a better understanding of this recipe and the ``clickhouse_kafka_user_config`` Terraform block used in this article. .. code-block:: shell { "sensor_id": 10000001, "ts": "2022-12-01T10:08:24.446369", "key": "cpu_usage", "value": 96 } Let's cook! ----------- .. mermaid:: flowchart LR id1(iot_measurements_topic) id2[(iot_measurements)] id3(edge_measurements_raw_table) id4(cpu_high_usage_table) subgraph Aiven for Apache Kafka id1 end subgraph Aiven for ClickHouse id2 subgraph iot_measurements DB id3-->|Filter|id4 end end id1-->|Service integration|id2 Imagine that you've been collecting IoT measurements from thousands of sensors and these metrics are being populated in an Apache Kafka topic called ``iot_measurements``. Now, you'd like to set up an Aiven for ClickHouse database and write filtered messages into table ``cpu_high_usage``. This recipe calls for the following: 1. Set up an Aiven for ClickHouse database for writing and processing raw data. 2. Insert the measurements data from Apache Kafka topic ``iot_measurements`` into the Aiven for ClickHouse database. 3. Filter the data and save the output to the new ``cpu_high_usage`` table. Configure common files '''''''''''''''''''''' .. dropdown:: Expand to check out the common files needed for this recipe. Navigate to a new folder and add the following files: 1. ``provider.tf`` file .. code-block:: terraform terraform { required_providers { aiven = { source = "aiven/aiven" version = "~> 3.10.0" } } } provider "aiven" { api_token = var.aiven_api_token } .. tip:: You can set environment variable ``TF_VAR_aiven_api_token`` for the ``api_token`` property so that you don't need to pass the ``-var-file`` flag when executing Terraform commands. 2. ``variables.tf`` file Use it for defining the variables to avoid including sensitive information in source control. The ``variables.tf`` file defines the API token, the project name, and the prefix for the service name. .. code-block:: terraform variable "aiven_api_token" { description = "Aiven console API token" type = string } variable "project_name" { description = "Aiven console project name" type = string } 3. ``*.tfvars`` file Use it to indicate the actual values of the variables so that they can be passed (with the ``-var-file=`` flag) to Terraform during runtime and excluded later on. Configure the ``var-values.tfvars`` file as follows: .. code-block:: terraform aiven_api_token = "<YOUR-AIVEN-AUTHENTICATION-TOKEN-GOES-HERE>" project_name = "<YOUR-AIVEN-CONSOLE-PROJECT-NAME-GOES-HERE>" Configure the ``services.tf`` file '''''''''''''''''''''''''''''''''' The following Terraform script initializes both Aiven for Apache Kafka and Aiven for ClickHouse services, creates the service integration, the source Apache Kafka topic, and the Aiven for ClickHouse database. .. code-block:: terraform resource "aiven_kafka" "kafka" { project = var.project_name cloud_name = "google-europe-west1" plan = "business-4" service_name = "kafka-gcp-eu" maintenance_window_dow = "monday" maintenance_window_time = "10:00:00" } resource "aiven_kafka_topic" "source" { project = var.project_name service_name = aiven_kafka.kafka.service_name partitions = 50 replication = 3 topic_name = "iot_measurements" } resource "aiven_clickhouse" "clickhouse" { project = var.project_name cloud_name = "google-europe-west1" plan = "startup-8" service_name = "clickhouse-gcp-eu" maintenance_window_dow = "monday" maintenance_window_time = "10:00:00" } resource "aiven_service_integration" "clickhouse_kafka_source" { project = var.project_name integration_type = "clickhouse_kafka" source_service_name = aiven_kafka.kafka.service_name destination_service_name = aiven_clickhouse.clickhouse.service_name clickhouse_kafka_user_config { tables { name = "edge_measurements_raw" group_name = "clickhouse-ingestion" data_format = "JSONEachRow" columns { name = "sensor_id" type = "UInt64" } columns { name = "ts" type = "DateTime64(6)" } columns { name = "key" type = "LowCardinality(String)" } columns { name = "value" type = "Float64" } topics { name = aiven_kafka_topic.source.topic_name } } } } resource "aiven_clickhouse_database" "measurements" { project = var.project_name service_name = aiven_clickhouse.clickhouse.service_name name = "iot_measurements" } Execute the Terraform files ''''''''''''''''''''''''''' .. dropdown:: Expand to check out how to execute the Terraform files. 1. Run the following command: .. code-block:: shell terraform init The ``init`` command performs initialization operations to prepare the working directory for use with Terraform. For this recipe, ``init`` automatically finds, downloads, and installs the necessary Aiven Terraform Provider plugins. 2. Run the following command: .. code-block:: bash terraform plan -var-file=var-values.tfvars The ``plan`` command creates an execution plan and shows the resources to be created (or modified). This command doesn't actually create any resources but gives you a heads-up on what's going to happen next. 3. If the output of ``terraform plan`` looks as expected, run the following command: .. code-block:: bash terraform apply -var-file=var-values.tfvars The ``terraform apply`` command creates (or modifies) your infrastructure resources. Check out the results --------------------- * Resource ``aiven_clickhouse`` creates an Aiven for ClickHouse service with the project name, the cloud name (provider, region, zone), the Aiven service plan, and the service name as specified in the ``services.tf`` file. * Resource ``aiven_clickhouse_database`` creates a database that can be used to further transform the ingested data and perform analytics on it. * Resource ``aiven_kafka`` creates an Aiven for Apache Kafka cluster. * Resource ``aiven_kafka_topic`` creates Apache Kafka topic ``iot_measurements``. * Resource ``aiven_service_integration`` creates the integration between the Aiven for Apache Kafka and the Aiven for ClickHouse service. The service integration creates a database to insert the ingested data to. In this instance, the database name is ``service_kafka-gcp-eu`` (it depends on the Kafka service name) and the table name is ``edge_measurements_raw`` as specified in the code. Learn more ---------- When you use this recipe, parameters and configurations will vary from those used in this article. For Aiven for Apache Kafka and Aiven for ClickHouse advanced parameters, a related blog, and instructions on how to get started with Aiven Terraform Provider, see `Set up your first Aiven Terraform project <https://docs.aiven.io/docs/tools/terraform/get-started.html>`_. Follow up --------- * You can `create databases and tables <https://docs.aiven.io/docs/products/clickhouse/howto/integrate-kafka.html#update-apache-kafka-integration-settings>`_ so that you can `read and store your data <https://docs.aiven.io/docs/products/clickhouse/howto/integrate-kafka.html#read-and-store-data>`_. * You can also `create a materialized view <https://docs.aiven.io/docs/products/clickhouse/howto/materialized-views.html>`_ to store the Kafka® messages in Aiven for ClickHouse.