Skip to main content

Integrating Databrain with Grafana Cloud

This guide explains how to send OpenTelemetry traces, metrics, and logs from your self-hosted Databrain instance to Grafana Cloud.

Prerequisites

  • Databrain self-hosted version with OpenTelemetry support
  • Grafana Cloud account (free tier available)
  • Grafana Cloud API token

What You’ll Get

Grafana Cloud provides three integrated observability products:
ProductPurposeWhat It Shows
Grafana TempoDistributed tracingRequest traces with spans and timing
Grafana LokiLog aggregationStructured logs with trace correlation
PrometheusMetricsRequest rates, latency histograms, error rates

Configuration

1. Get Your Grafana Cloud Credentials

  1. Log into Grafana Cloud
  2. Navigate to your stack (e.g., yourstack.grafana.net)
  3. Go to ConnectionsAdd new connectionOpenTelemetry
  4. Note the following:
    • Tempo endpoint: tempo-<region>.grafana.net:443
    • Prometheus endpoint: prometheus-<region>.grafana.net:443
    • Loki endpoint: logs-<region>.grafana.net
    • Instance ID: Your unique instance identifier
    • API Token: Generate one under Access Policies
Grafana Alloy is the easiest way to send telemetry to Grafana Cloud. It replaces the generic OpenTelemetry Collector.

Docker Compose Configuration

services:
  grafana-alloy:
    image: grafana/alloy:latest
    restart: always
    volumes:
      - ./alloy-config.alloy:/etc/alloy/config.alloy
    ports:
      - "4317:4317"  # OTLP gRPC
      - "4318:4318"  # OTLP HTTP
      - "12345:12345"  # Alloy UI
    environment:
      - GRAFANA_CLOUD_TEMPO_ENDPOINT=tempo-us-central1.grafana.net:443
      - GRAFANA_CLOUD_PROMETHEUS_ENDPOINT=prometheus-us-central1.grafana.net/api/prom/push
      - GRAFANA_CLOUD_LOKI_ENDPOINT=logs-prod-us-central1.grafana.net/loki/api/v1/push
      - GRAFANA_CLOUD_INSTANCE_ID=123456
      - GRAFANA_CLOUD_API_KEY=${GRAFANA_CLOUD_API_KEY}
    networks:
      - databrain

  databrainbackend:
    environment:
      OTEL_ENABLED: "true"
      OTEL_EXPORTER_OTLP_ENDPOINT: "http://grafana-alloy:4318"
      OTEL_SERVICE_NAME: "databrain-api"
      LOG_LEVEL: "info"
    depends_on:
      - grafana-alloy

Alloy Configuration

Create alloy-config.alloy:
// OTLP Receiver
otelcol.receiver.otlp "default" {
  grpc {
    endpoint = "0.0.0.0:4317"
  }
  http {
    endpoint = "0.0.0.0:4318"
  }

  output {
    traces  = [otelcol.processor.batch.default.input]
    metrics = [otelcol.processor.batch.default.input]
    logs    = [otelcol.processor.batch.default.input]
  }
}

// Batch Processor
otelcol.processor.batch "default" {
  timeout          = "1s"
  send_batch_size  = 1024

  output {
    traces  = [otelcol.exporter.otlp.tempo.input]
    metrics = [otelcol.exporter.prometheus.default.input]
    logs    = [otelcol.exporter.loki.default.input]
  }
}

// Tempo (Traces) Exporter
otelcol.exporter.otlp "tempo" {
  client {
    endpoint = env("GRAFANA_CLOUD_TEMPO_ENDPOINT")
    auth     = otelcol.auth.basic.grafana_cloud.handler
  }
}

// Prometheus (Metrics) Exporter
otelcol.exporter.prometheus "default" {
  forward_to = [prometheus.remote_write.grafana_cloud.receiver]
}

prometheus.remote_write "grafana_cloud" {
  endpoint {
    url = "https://${env("GRAFANA_CLOUD_PROMETHEUS_ENDPOINT")}"
    basic_auth {
      username = env("GRAFANA_CLOUD_INSTANCE_ID")
      password = env("GRAFANA_CLOUD_API_KEY")
    }
  }
}

// Loki (Logs) Exporter
otelcol.exporter.loki "default" {
  forward_to = [loki.write.grafana_cloud.receiver]
}

loki.write "grafana_cloud" {
  endpoint {
    url = "https://${env("GRAFANA_CLOUD_LOKI_ENDPOINT")}"
    basic_auth {
      username = env("GRAFANA_CLOUD_INSTANCE_ID")
      password = env("GRAFANA_CLOUD_API_KEY")
    }
  }
}

// Basic Auth
otelcol.auth.basic "grafana_cloud" {
  username = env("GRAFANA_CLOUD_INSTANCE_ID")
  password = env("GRAFANA_CLOUD_API_KEY")
}

3. Alternative: Direct OTLP to Grafana Cloud

You can also send data directly without Alloy (less flexible):
services:
  databrainbackend:
    environment:
      OTEL_ENABLED: "true"
      OTEL_EXPORTER_OTLP_ENDPOINT: "https://tempo-us-central1.grafana.net:443"
      OTEL_SERVICE_NAME: "databrain-api"
      OTEL_EXPORTER_OTLP_HEADERS: "authorization=Basic $(echo -n '${GRAFANA_INSTANCE_ID}:${GRAFANA_API_KEY}' | base64)"
Note: This sends traces only. For metrics and logs, use Alloy.

4. Kubernetes with Grafana Agent

For Kubernetes, use the Grafana Agent Operator:
apiVersion: v1
kind: Secret
metadata:
  name: grafana-cloud-credentials
  namespace: monitoring
type: Opaque
stringData:
  instance-id: "123456"
  api-key: "your-grafana-cloud-api-key"
---
apiVersion: monitoring.grafana.com/v1alpha1
kind: GrafanaAgent
metadata:
  name: grafana-agent
  namespace: monitoring
spec:
  image: grafana/agent:latest
  integrations:
    selector:
      matchLabels:
        agent: grafana-agent
  logs:
    instanceSelector:
      matchLabels:
        agent: grafana-agent
  metrics:
    instanceSelector:
      matchLabels:
        agent: grafana-agent
---
apiVersion: monitoring.grafana.com/v1alpha1
kind: Integration
metadata:
  name: opentelemetry
  namespace: monitoring
spec:
  name: otel
  type: otel
  config:
    receivers:
      otlp:
        protocols:
          http:
            endpoint: "0.0.0.0:4318"
          grpc:
            endpoint: "0.0.0.0:4317"
    exporters:
      otlp/tempo:
        endpoint: "tempo-us-central1.grafana.net:443"
        auth:
          authenticator: basicauth/grafana_cloud
      prometheusremotewrite:
        endpoint: "https://prometheus-us-central1.grafana.net/api/prom/push"
        basic_auth:
          username: "123456"
          password: "${GRAFANA_API_KEY}"
    service:
      pipelines:
        traces:
          receivers: [otlp]
          exporters: [otlp/tempo]
        metrics:
          receivers: [otlp]
          exporters: [prometheusremotewrite]

Verification

1. Check Alloy Status

Visit the Alloy UI: http://localhost:12345 You should see:
  • Active OTLP receivers
  • Successful exports to Grafana Cloud

2. Generate Test Traffic

curl -X GET "https://your-databrain-instance.com/api/health"

3. View in Grafana Cloud

Tempo (Traces)

  1. Go to Explore in Grafana
  2. Select Tempo as the data source
  3. Query: {service.name="databrain-api"}
  4. You should see traces within 1-2 minutes

Prometheus (Metrics)

  1. Go to Explore
  2. Select Prometheus as the data source
  3. Query: rate(http_server_duration_milliseconds_count{service_name="databrain-api"}[5m])

Loki (Logs)

  1. Go to Explore
  2. Select Loki as the data source
  3. Query: {service_name="databrain-api"}
  4. Click on any log to see correlated traces

Creating Dashboards

Pre-built Dashboard

Import the OpenTelemetry APM dashboard:
  1. Go to DashboardsNewImport
  2. Use dashboard ID: 19419 (OpenTelemetry APM)
  3. Select your Tempo and Prometheus data sources
  4. Filter by service_name = databrain-api

Custom Dashboard Example

Create a new dashboard with these panels: Request Rate:
rate(http_server_duration_milliseconds_count{service_name="databrain-api"}[5m])
Average Latency:
rate(http_server_duration_milliseconds_sum{service_name="databrain-api"}[5m])
/
rate(http_server_duration_milliseconds_count{service_name="databrain-api"}[5m])
Error Rate:
rate(http_server_duration_milliseconds_count{service_name="databrain-api",http_status_code=~"5.."}[5m])
P95 Latency:
histogram_quantile(0.95, 
  rate(http_server_duration_milliseconds_bucket{service_name="databrain-api"}[5m])
)

Setting Up Alerts

Grafana Alerting

Create alert rules in Grafana: High Error Rate:
  1. Go to AlertingAlert rulesNew alert rule
  2. Query:
(rate(http_server_duration_milliseconds_count{service_name="databrain-api",http_status_code=~"5.."}[5m])
/
rate(http_server_duration_milliseconds_count{service_name="databrain-api"}[5m]))
> 0.05
  1. Threshold: Alert when value > 0.05 (5% errors)
  2. Add notification channel (email, Slack, PagerDuty)
High Latency:
histogram_quantile(0.95,
  rate(http_server_duration_milliseconds_bucket{service_name="databrain-api"}[5m])
) > 2000

Trace Exemplars

Link metrics to traces using exemplars:
# In alloy-config.alloy, add to Prometheus exporter
prometheus.remote_write "grafana_cloud" {
  endpoint {
    url = "https://${env("GRAFANA_CLOUD_PROMETHEUS_ENDPOINT")}"
    basic_auth {
      username = env("GRAFANA_CLOUD_INSTANCE_ID")
      password = env("GRAFANA_CLOUD_API_KEY")
    }
    
    # Enable exemplars
    send_exemplars = true
  }
}
Now when viewing metrics, you can click on data points to see example traces.

LogQL Queries

Use LogQL to query structured logs in Loki: All errors:
{service_name="databrain-api"} | json | level="error"
Slow requests (>1s):
{service_name="databrain-api"} | json | duration > 1000
Logs for specific user:
{service_name="databrain-api"} | json | userId="123"
Logs with trace context:
{service_name="databrain-api"} | json | trace_id!=""

Troubleshooting

IssueSolution
No data in Grafana CloudCheck Alloy logs: docker logs grafana-alloy
401 UnauthorizedVerify Instance ID and API Key are correct
Connection timeoutCheck firewall allows outbound HTTPS (443)
Missing traces in TempoWait 2-3 minutes for ingestion delay
High costsImplement sampling in Alloy configuration

Debug Alloy

Check Alloy logs:
docker logs grafana-alloy | grep -i error
View Alloy UI metrics:
http://localhost:12345/

Test Connectivity

# Test Tempo endpoint
curl -v https://tempo-us-central1.grafana.net:443

# Test with auth
curl -u "${INSTANCE_ID}:${API_KEY}" \
  https://tempo-us-central1.grafana.net:443/status

Cost Optimization

Grafana Cloud free tier includes:
  • Tempo: 50 GB traces/month
  • Loki: 50 GB logs/month
  • Prometheus: 10k active series
Tips to stay within limits:
  1. Sampling: Sample traces in Alloy
otelcol.processor.probabilistic_sampler "default" {
  sampling_percentage = 10  // Sample 10%
  output {
    traces = [otelcol.exporter.otlp.tempo.input]
  }
}
  1. Filter low-value logs: Exclude health checks
otelcol.processor.filter "exclude_healthchecks" {
  traces {
    span = ["attributes[\"http.target\"] == \"/health\""]
  }
}
  1. Set retention: Adjust in Grafana Cloud settings
    • Tempo: Default 7 days
    • Loki: Default 30 days
    • Prometheus: Default 13 months

Best Practices

1. Use Service Graph

Enable service graph in Tempo to visualize service dependencies:
  1. Go to TempoService Graph
  2. View request flow between services
  3. Identify bottlenecks and failures

2. Correlate Logs with Traces

When logging, Winston automatically includes trace context:
{
  "level": "info",
  "message": "Processing order",
  "orderId": "123",
  "trace_id": "abc123",
  "span_id": "def456"
}
In Loki, click the trace ID to jump to the full trace in Tempo.

3. Use Grafana Oncall

Integrate with Grafana Oncall for advanced alert management:
  • On-call rotations
  • Escalation policies
  • Alert grouping and deduplication

Support

For Databrain configuration issues, contact your Databrain support team.