Skip to main content

Enable MSK S3 sink

The MSK Connect S3 sink is optional, but the surrounding workflow is strict: the plugin ZIP must already exist in S3 before the connector can be created.

Operator flow

  1. Create the plugin bucket.

    Run the targeted apply first and capture the bucket name.

    cd terraform/staging
    terraform apply -target=module.msk_connect_plugin_bucket
    terraform output -raw msk_connect_plugin_bucket_name
  2. Upload the connector ZIP.

    Download the Confluent S3 sink ZIP manually and upload it to the plugin bucket:

    aws s3 cp confluentinc-kafka-connect-s3-<version>.zip s3://<plugin-bucket>/
  3. Set the sink variables.

    Configure the root variables so Terraform knows which topics to consume, where to write objects, and which plugin object key to use.

  4. Run the full plan.

    Apply the environment root again once the ZIP is present.

Variables that matter

VariablePurpose
enable_msk_s3_sinkTurns the connector on
create_msk_connect_plugin_bucketCreates the plugin artifact bucket
msk_s3_sink_topics_regexSelects which Atlas topics are exported
msk_s3_sink_s3_prefixBase prefix inside the destination bucket
msk_s3_sink_partition_fieldsPartition fields used by the connector
msk_s3_sink_plugin_file_keyExact ZIP object key uploaded to the plugin bucket

Current defaults

The example values currently use:

  • topics_regex = "atlas\\.events\\..*"
  • s3_prefix = "raw"
  • partition_fields = ["organization_id", "brand_id"]
  • flush_size = 1000
  • rotate_interval_ms = 300000

What Terraform creates

  • one optional export bucket for Kafka records
  • one optional plugin bucket
  • one aws_mskconnect_custom_plugin
  • one aws_mskconnect_connector
  • one CloudWatch log group for connector logs
tip

The VPC module already provisions an S3 gateway endpoint for private route tables, so S3 traffic from private workloads does not have to use the NAT path.