Skip to main content
Elementary’s Google Cloud Storage (GCS) integration enables streaming audit logs and system logs directly to your GCS bucket for long-term storage, analysis, and integration with other Google Cloud services.

Overview

When enabled, Elementary automatically streams your workspace’s audit logs (user activity logs and system logs) to your GCS bucket using the Google Cloud Storage API. This allows you to:
  • Store logs in your own GCS bucket for long-term retention
  • Integrate logs with BigQuery, Dataflow, or other Google Cloud analytics services
  • Maintain full control over log storage and access policies
  • Process logs using Google Cloud data processing tools
  • Archive logs for compliance and audit requirements

Prerequisites

Before configuring log streaming to GCS, you’ll need:
  1. GCS Bucket - A Google Cloud Storage bucket where logs will be stored
    • The bucket must exist and be accessible
    • You’ll need the bucket path (e.g., gs://my-logs-bucket)
  2. Authentication - Either a Google Cloud service account or a Workload Identity Federation setup, with the Storage Object User (roles/storage.objectUser) role granted on the bucket. See Authentication methods below for both options.

Authentication methods

Elementary supports two authentication methods for GCS. Pick the one that fits your security model:
  • Service account — create a service account, download its JSON key, and upload the key to Elementary. Simplest to set up.
  • Workload Identity Federation (WIF) — Elementary authenticates from its AWS role through a federated identity. No long-lived credentials are stored in Elementary.
Select a tab below and follow the steps for your chosen method.
  1. Go to Google Cloud Console > IAM & Admin > Service Accounts and create a service account (or select an existing one).
  2. Grant the service account the Storage Object User (roles/storage.objectUser) role on your GCS bucket.
  3. Generate a JSON key for the service account:
    1. Select your service account.
    2. Click the three dots menu and select ‘Manage keys’.
    3. Click ‘ADD KEY’ and select ‘Create new key’.
    4. Choose ‘JSON’ format and click ‘CREATE’. The JSON file downloads automatically.
  4. You will upload this JSON key to Elementary in the connection form below, in the Service account file field.

Configuring Log Streaming to GCS

  1. Navigate to the Logs page:
    • Click on your account name in the top-right corner of the UI
    • Open the dropdown menu
    • Select Logs
  2. In the External Integrations section, click the Connect button
  3. In the modal that opens, select Google Cloud Storage (GCS) as your log streaming destination
  4. Enter your GCS configuration:
    • Bucket Path: The full GCS bucket path (e.g., gs://my-logs-bucket)
    • Authentication method: Use the toggle to select Service account or Workload Identity Federation, matching the method you set up in Authentication methods above.
    • Service account file (Service account method) or WIF credential file (Workload Identity Federation method): Upload the JSON file you prepared.
  5. Click Save to enable log streaming
The log streaming configuration applies to your entire workspace. Both user activity logs and system logs will be streamed to your GCS bucket in batches.

Log Batching

Logs are automatically batched and written to GCS files based on the following criteria:
  • Time-based batching: A new file is created every 15 minutes
  • Size-based batching: A new file is created when the batch reaches 100MB
Whichever condition is met first triggers a new file to be created. This ensures efficient storage while maintaining reasonable file sizes for processing.

File Path Format

Logs are stored at the root of your bucket using a Hive-based partitioning structure for efficient querying and organization:
log_type={log_type}/date={YYYY-MM-DD}/hour={HH}/file_{timestamp}_{batch_id}.ndjson
Where:
  • {log_type}: Either audit (for user activity logs) or system (for system logs)
  • {YYYY-MM-DD}: Date in ISO format (e.g., 2024-01-15)
  • {HH}: Hour in 24-hour format (e.g., 14)
  • {timestamp}: Unix timestamp when the file was created
  • {batch_id}: Unique identifier for the batch

Example File Paths

log_type=audit/date=2024-01-15/hour=14/file_1705320000_batch_abc123.ndjson
log_type=system/date=2024-01-15/hour=14/file_1705320900_batch_def456.ndjson
This Hive-based structure allows you to:
  • Efficiently query logs by date and hour using BigQuery or other tools
  • Filter logs by type (audit or system)
  • Process logs in parallel by partition

Log Format

Logs are stored as line-delimited JSON (NDJSON), where each line represents a single log entry as a JSON object.

User Activity Logs

Each user activity log entry includes:
{
  "timestamp": "2024-01-15T14:30:45.123456Z",
  "log_type": "audit",
  "event_name": "user_login",
  "success": true,
  "user_email": "john.doe@example.com",
  "user_name": "John Doe",
  "env_id": "env_7890123456abcdef",
  "env_name": "Production",
  "event_content": {
    "additional": "context"
  }
}

System Logs

Each system log entry includes:
{
  "timestamp": "2024-01-15T14:30:45.123456Z",
  "log_type": "system",
  "event_name": "dbt_data_sync_completed",
  "success": true,
  "env_id": "env_7890123456abcdef",
  "env_name": "Production",
  "event_content": {
    "environment_id": "env_789",
    "environment_name": "Production"
  }
}

Field Descriptions

  • timestamp: ISO 8601 timestamp of the event (UTC)
  • log_type: Either "audit" for user activity logs or "system" for system logs
  • event_name: The specific action that was performed (e.g., user_login, create_test, dbt_data_sync_completed)
  • success: Boolean indicating whether the action completed successfully
  • user_email: User email address (only present in audit logs)
  • user_name: User display name (only present in audit logs)
  • env_id: Environment identifier (empty string for account-level actions)
  • env_name: Environment name (empty string for account-level actions)
  • event_content: Additional context-specific information as a JSON object

Disabling Log Streaming

To disable log streaming to GCS:
  1. Navigate to the Logs page
  2. In the External Integrations section, find your GCS integration
  3. Click Disable or remove the GCS configuration
  4. Confirm the action
Disabling log streaming will stop sending new logs to GCS immediately. Historical logs already written to GCS will remain in your bucket.