Skip to main content
Applies to:
  • Plan:
  • Deployment:

Summary

Issue: The S3 exporter produces transient jsonl.gz files that are created and immediately deleted, generating noise in downstream S3 notification pipelines. Cause: The exporter uses a streaming architecture that uploads files to S3 before the row count is known — if the final batch of an export cycle contains zero rows, the already-uploaded empty file is deleted. Resolution: This is expected behavior; filter out transient files in your downstream pipeline using one of the approaches below.

When this occurs

Two code paths produce transient files:
  1. End of export cycle — The exporter iterates in batches. The final batch, when the exporter has caught up to current data, returns zero rows. The empty file is deleted after upload.
  2. Test automation button — Creates and immediately deletes a test file to verify S3 access permissions.
No data is lost. Only empty (0-row) files and test files are deleted. Files containing actual exported data are never deleted by the exporter.

Resolution steps

Filter by object size

Empty gzipped JSONL files are approximately 20 bytes. Skip processing for files below a safe threshold.
def should_process(s3_event):
    size = s3_event["object"]["size"]
    return size > 100  # bytes

Check file existence before processing

The delete follows the create almost immediately. A HEAD request confirms the file still exists before triggering downstream logic.
import boto3
from botocore.exceptions import ClientError

def file_exists(bucket, key):
    s3 = boto3.client("s3")
    try:
        s3.head_object(Bucket=bucket, Key=key)
        return True
    except ClientError:
        return False

Filter S3 event types

If your notification listener handles both ObjectCreated and ObjectRemoved events, restrict it to ObjectCreated only, then add the existence check above. In your S3 event notification configuration, set the event filter to:
s3:ObjectCreated:*
Remove any s3:ObjectRemoved:* triggers.

Use the export status API instead of S3 notifications

Poll the automation run history to confirm a successful export with actual rows before processing files. This avoids reacting to transient S3 events entirely. See the automations documentation for run result details.