Tracing

This section contains information on configuring OpenTelemetry instrumentation for the Vela server.

This component is responsible for instrumenting OpenTelemetry traces based off the configuration provided.

Configuration

The following options are used to configure the component:

NameDescriptionRequiredDefaultEnvironment Variables
tracing.enableThis variable enables OpenTelemetry tracing for the Vela server. You must provide VELA_OTEL_EXPORTER_OTLP_ENDPOINT when tracing is enabled.falsefalseVELA_OTEL_TRACING_ENABLE
tracing.service.nameThis variable sets the service name applied to traces.falsevela-serverVELA_OTEL_TRACING_SERVICE_NAME
tracing.exporter.endpointThis variable sets the OTel exporter endpoint (ex. scheme://host:port).falseN/AVELA_OTEL_EXPORTER_OTLP_ENDPOINT
tracing.exporter.cert_pathThis variable sets the path to certs used for communicating with the OTel exporter. If nothing is provided the server will use insecure communication.falseN/AVELA_OTEL_TRACING_EXPORTER_SSL_CERT_PATH
tracing.exporter.tls-min-versionThis optional variable sets a TLS minimum version used when exporting traces to the OTel exporter.false1.2VELA_OTEL_TRACING_TLS_MIN_VERSION
tracing.sampler.persecondThis variable sets OTel tracing head-sampler rate-limiting to N per second.false100VELA_OTEL_TRACING_SAMPLER_RATELIMIT_PER_SECOND
tracing.sampler.tasksThis variable sets an (optional) filepath to the OTel tracing head-sampler configurations json to alter how certain tasks (API endpoints, queries, etc) are sampled.falseN/AVELA_OTEL_TRACING_SAMPLER_TASKS_CONFIG_FILEPATH
tracing.resource.attributesThis variable sets OTel resource (span) attributes as a list of key=value pairs. each one will be attached to each span as a ‘process’ attribute.falseN/AVELA_OTEL_TRACING_RESOURCE_ATTRIBUTES
tracing.resource.env_attributesThis variable sets OTel resource (span) attributes as a list of key=env_variable_key pairs. each one will be attached to each span as a ‘process’ attribute where the value is retrieved from the environment using the pair value.falseN/AVELA_OTEL_TRACING_RESOURCE_ENV_ATTRIBUTES
tracing.span.attributesThis variable sets trace span attributes as a list of key=value pairs. Each pair will be attached to each span as a ’tag’ attribute.falseN/AVELA_OTEL_TRACING_SPAN_ATTRIBUTES
tracing.tracestate.attributesThis variable sets OTel tracestate (span) attributes as a list of key=value pairs. Each pair will be inserted into the tracestate for each sampled span.falseN/AVELA_OTEL_TRACING_TRACESTATE_ATTRIBUTES

Exporter

To start using tracing you first need to set VELA_OTEL_TRACING_ENABLE=true in the runtime environment.

Enabling tracing requires that VELA_OTEL_EXPORTER_OTLP_ENDPOINT be set to an exporter host that is reachable over HTTP.

If the exporter requires SSL then VELA_OTEL_TRACING_EXPORTER_SSL_CERT_PATH must be set to a filepath that contains valid certificates. If no certificate filepath is set, then the server will communicate with the exporter over HTTP (insecure).

The Vela local stack is configured to export traces to Jaeger using their “all-in-one” Docker image, making it easy to view traces out of the box.

+   jaeger:
+     image: jaegertracing/all-in-one:latest
...

$ docker run \
  --detach=true \
  --env=VELA_ADDR=https://vela-server.example.com \
+ --env=VELA_OTEL_TRACING_ENABLE: true \
+ --env=VELA_OTEL_EXPORTER_OTLP_ENDPOINT: http://jaeger:4318 \
...

From the Jaeger official website:

This image, designed for quick local testing, launches the Jaeger UI, collector, query, and agent, with an in memory storage component.

Sampling

The server uses a combination of “shared” and “per-task” head sampling to control how traces are recorded or dropped.

“Shared” Samplers

The following samplers are utilized by all traces produced by the server.

Global Rate-limiting

All traces share a global rate limit controlled by VELA_OTEL_TRACING_SAMPLER_RATELIMIT_PER_SECOND.

Use VELA_OTEL_TRACING_SAMPLER_RATELIMIT_PER_SECOND to set a maximum threshold of “N traces per second”. The default is 100 traces per second.

“Task” Samplers

Set VELA_OTEL_TRACING_SAMPLER_TASKS_CONFIG_FILEPATH to point to a JSON filepath to control sampler configurations on a per-task basis.

tracing.json
{
    "task-name": {
        "active": bool
    }
}

A task is basically the “span name” based on the work being performed. A task can be an API endpoint interaction, a Gorm query, etc. See OTel span docs for more information.

If a task is not represented in the configuration file then the task will be treated normally, with tracing enabled using the “shared” samplers.

FieldTypeDescription
activeboolSet to false to completely disable traces for a particular task.

Examples of trace tasks include API endpoints, gorm queries, etc. The list of tasks will change as functionality is added to the server.

See the configuration file examples below.

Example - No Tasks

{}

Because tasks that do not exist in the configuration file will be treated as enabled, this file will enable all tracing.

Example - Disable /health Endpoint

{
    "/health": {
        "active": false
    }
}

Because active can be used to disable tracing, this configuration file will enable tracing for all tasks except for the /health endpoint.

Example - Mixed Tasks

{
    "/health": {
        "active": false
    },
    "/api/v1/deployments/:org/:repo": {
        "active": false
    },
    "/api/v1/:worker": {
        "active": true
    },
    "gorm.query": {
        "active": false
    }
}

The task /health with active: false will disable tracing on /health.

/api/v1/deployments/:org/:repo with active: false will disable tracing on /api/v1/deployments/:org/:repo for ALL :org and :repo parameters.

/api/v1/:worker with active: true will do nothing at this point, because any tasks that are not present in the configuration file will automatically be sampled normally. For now this is slightly confusing, but in the future there will be more configuration fields that will determine how an active: true task is sampled.

The task gorm.query with active: false would disable tracing for raw gorm queries. This is meant to show that the config applies to all trace tasks and not just API endpoints.

All other tasks will be sampled as normal using the global shared samplers (like rate limiting)! This includes all API endpoints and gorm queries.

Last modified October 2, 2024: feat: otel tracing (#427) (9623a0f1)