Skip to main content

Telemetry

UCS hosts run an embedded observability stack (Grafana / Loki / Tempo / Prometheus / OpenTelemetry Collector). Applications should ship metrics, logs, and traces to the local OpenTelemetry Collector, which forwards each signal to the appropriate backend.

For the operator-facing overview, see Observability.

Endpoints

The OpenTelemetry Collector listens locally on:

EndpointProtocolUse it for
localhost:4317OTLP / gRPCServer-side apps with an OTel SDK
localhost:4318OTLP / HTTPServer-side apps that prefer HTTP, or curl/manual sends
https://<host>/otel/v1/*OTLP / HTTPSExternal clients (browsers, mobile apps) — routed through Traefik

All three accept all three signals (traces, metrics, logs). HTTP paths for the OTLP/HTTP endpoint are standard:

  • POST /v1/traces
  • POST /v1/metrics
  • POST /v1/logs

The Traefik route strips the /otel prefix before forwarding, so POST /otel/v1/traces is forwarded as POST /v1/traces to the local collector.

Signal → backend routing

The collector's pipelines (defined in /etc/otelcol-contrib/config.yaml) route each signal as follows:

SignalOTLP pathBackendWhere to view
Traces/v1/tracesTempoGrafana → Explore → Tempo
Metrics/v1/metricsPrometheusGrafana → Explore → Prometheus (via remote-write)
Logs/v1/logsLokiGrafana → Explore → Loki

Use the same single endpoint regardless of signal type — the collector demultiplexes by the HTTP path or gRPC method.

Instrumenting an application

Python

Install the OTel SDK and OTLP exporter:

pip install opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp

Minimal example — emit a trace from a Python service:

from opentelemetry import trace
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

resource = Resource.create({"service.name": "ucs-api"})
provider = TracerProvider(resource=resource)
provider.add_span_processor(
BatchSpanProcessor(OTLPSpanExporter(endpoint="http://localhost:4317", insecure=True))
)
trace.set_tracer_provider(provider)

tracer = trace.get_tracer(__name__)

with tracer.start_as_current_span("handle-request"):
# ... your code ...
pass

For Python apps, prefer auto-instrumentation for HTTP frameworks, databases, etc. — it adds no boilerplate per call site:

pip install opentelemetry-instrumentation
opentelemetry-bootstrap --action=install # installs instrumentations for detected libs
opentelemetry-instrument --traces_exporter otlp \
--metrics_exporter otlp \
--logs_exporter otlp \
--exporter_otlp_endpoint http://localhost:4317 \
python your_app.py

Node.js

npm install @opentelemetry/api @opentelemetry/sdk-node @opentelemetry/auto-instrumentations-node @opentelemetry/exporter-trace-otlp-grpc
const { NodeSDK } = require('@opentelemetry/sdk-node')
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-grpc')
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node')

const sdk = new NodeSDK({
serviceName: 'ucs-operator',
traceExporter: new OTLPTraceExporter({ url: 'http://localhost:4317' }),
instrumentations: [getNodeAutoInstrumentations()],
})
sdk.start()

Other languages

OpenTelemetry has SDKs for most major languages — Java, Go, .NET, Rust, PHP, Ruby. The OTLP endpoint URL is the same in all of them: http://localhost:4317 (gRPC) or http://localhost:4318 (HTTP).

Browser / mobile client telemetry

For client-side code (web pages, mobile apps), use the public HTTPS endpoint through Traefik:

import { WebTracerProvider } from '@opentelemetry/sdk-trace-web'
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http'

const provider = new WebTracerProvider({
resource: new Resource({ 'service.name': 'ucs-admin-ui' }),
})
provider.addSpanProcessor(new BatchSpanProcessor(new OTLPTraceExporter({
url: 'https://your-ucs-host/otel/v1/traces',
})))
provider.register()

The Traefik route includes CORS middleware, rate limiting, and a request body size cap — see /etc/traefik/conf.d/07_otel.yaml.

Manual sends (testing / one-offs)

Useful for verifying connectivity, debugging the pipeline, or sending ad-hoc events from shell scripts.

Send a log via curl:

NOW=$(date +%s%N)
curl -X POST -H "Content-Type: application/json" \
http://localhost:4318/v1/logs \
-d "{\"resourceLogs\":[{\"resource\":{\"attributes\":[{\"key\":\"service.name\",\"value\":{\"stringValue\":\"shell-script\"}}]},\"scopeLogs\":[{\"scope\":{},\"logRecords\":[{\"timeUnixNano\":\"$NOW\",\"severityText\":\"INFO\",\"body\":{\"stringValue\":\"hello from shell\"}}]}]}]}"

Send a trace:

TRACE=$(openssl rand -hex 16); SPAN=$(openssl rand -hex 8)
NOW=$(date +%s%N); END=$((NOW + 1000000))
curl -X POST -H "Content-Type: application/json" \
http://localhost:4318/v1/traces \
-d "{\"resourceSpans\":[{\"resource\":{\"attributes\":[{\"key\":\"service.name\",\"value\":{\"stringValue\":\"shell-script\"}}]},\"scopeSpans\":[{\"scope\":{},\"spans\":[{\"traceId\":\"$TRACE\",\"spanId\":\"$SPAN\",\"name\":\"test-span\",\"kind\":1,\"startTimeUnixNano\":\"$NOW\",\"endTimeUnixNano\":\"$END\"}]}]}]}"
echo "TRACE=$TRACE"

Send a metric:

NOW=$(date +%s%N)
curl -X POST -H "Content-Type: application/json" \
http://localhost:4318/v1/metrics \
-d "{\"resourceMetrics\":[{\"resource\":{\"attributes\":[{\"key\":\"service.name\",\"value\":{\"stringValue\":\"shell-script\"}}]},\"scopeMetrics\":[{\"scope\":{},\"metrics\":[{\"name\":\"shell_test_counter\",\"sum\":{\"dataPoints\":[{\"asInt\":\"1\",\"timeUnixNano\":\"$NOW\"}],\"aggregationTemporality\":2,\"isMonotonic\":true}}]}]}]}"

Each request should return HTTP 200 with body {"partialSuccess":{}}.

Conventions

Resource attributes

Always set at minimum service.name on every emitted signal — Grafana groups data by this attribute. Strongly recommended additions:

  • service.namespace — e.g. ucs, uphone, operator
  • service.version — release version of your service
  • deployment.environmentproduction, staging, development

These are part of the OpenTelemetry Resource semantic conventions.

Span / metric naming

Follow OTel semantic conventions for common attributes (HTTP, DB, RPC). For example HTTP servers should emit spans named <METHOD> <route> with attributes like http.request.method, http.response.status_code, etc. Auto-instrumentation packages do this for you on supported libraries.

Sampling

For high-traffic services, configure head-based or tail-based sampling in the SDK to limit volume. Default OTel SDK sampler is parentbased_always_on which emits every span — fine for low-traffic backends, too expensive for hot HTTP servers.

Log/trace correlation

When emitting logs from a context that has an active trace (typical inside an instrumented HTTP handler), the SDK will automatically attach the trace_id and span_id to log records. In Grafana's Loki view, you can then click a log line and jump to the corresponding trace in Tempo.

Viewing the data

Open Grafana at https://<server-address>/grafana/ (or http://<server-address>:3030/grafana/ direct, no proxy). Default credentials admin/admin. The Loki, Prometheus, and Tempo datasources are pre-provisioned.

Explore (free-form queries): ☰ → Explore, then pick a datasource.

  • Loki query example: {service_name="ucs-api"} |= "error"
  • Prometheus query example: rate(http_server_request_duration_seconds_count{service_name="ucs-api"}[5m])
  • Tempo query example: paste a trace_id directly, or use TraceQL.

Troubleshooting

Application appears to send telemetry but nothing shows in Grafana:

  1. Tail the collector logs for export errors:
    journalctl -u otelcol-contrib -n 50 --no-pager
  2. Confirm the collector accepted the data — POST returning 200 means it was received. If the backend exporter (Loki / Tempo / Prometheus) is failing, errors appear in collector logs.
  3. Verify the time window in Grafana includes your test send — by default Explore shows the last hour.

OTLP HTTP returns 415 Unsupported Media Type — make sure the Content-Type header is application/json (not text/plain).

Trace ID lookup returns 404 in Tempo — Tempo has a brief ingest delay (seconds). Wait and retry. Also verify the trace ID is exactly 32 lowercase hex characters.

Logs show up but the severity field is empty — set both severityText (string) and severityNumber (integer) in the OTLP log record. The OTel SDKs handle this automatically; manual payloads need both.