Tracing
This component is responsible for instrumenting OpenTelemetry traces based off the configuration provided.
Configuration
The following options are used to configure the component:
Name | Description | Required | Default | Environment Variables |
---|---|---|---|---|
tracing.enable | This variable enables OpenTelemetry tracing for the Vela server. You must provide VELA_OTEL_EXPORTER_OTLP_ENDPOINT when tracing is enabled. | false | false | VELA_OTEL_TRACING_ENABLE |
tracing.service.name | This variable sets the service name applied to traces. | false | vela-server | VELA_OTEL_TRACING_SERVICE_NAME |
tracing.exporter.endpoint | This variable sets the OTel exporter endpoint (ex. scheme://host:port). | false | N/A | VELA_OTEL_EXPORTER_OTLP_ENDPOINT |
tracing.exporter.cert_path | This variable sets the path to certs used for communicating with the OTel exporter. If nothing is provided the server will use insecure communication. | false | N/A | VELA_OTEL_TRACING_EXPORTER_SSL_CERT_PATH |
tracing.exporter.tls-min-version | This optional variable sets a TLS minimum version used when exporting traces to the OTel exporter. | false | 1.2 | VELA_OTEL_TRACING_TLS_MIN_VERSION |
tracing.sampler.persecond | This variable sets OTel tracing head-sampler rate-limiting to N per second. | false | 100 | VELA_OTEL_TRACING_SAMPLER_RATELIMIT_PER_SECOND |
tracing.sampler.tasks | This variable sets an (optional) filepath to the OTel tracing head-sampler configurations json to alter how certain tasks (API endpoints, queries, etc) are sampled. | false | N/A | VELA_OTEL_TRACING_SAMPLER_TASKS_CONFIG_FILEPATH |
tracing.resource.attributes | This variable sets OTel resource (span) attributes as a list of key=value pairs. each one will be attached to each span as a ‘process’ attribute. | false | N/A | VELA_OTEL_TRACING_RESOURCE_ATTRIBUTES |
tracing.resource.env_attributes | This variable sets OTel resource (span) attributes as a list of key=env_variable_key pairs. each one will be attached to each span as a ‘process’ attribute where the value is retrieved from the environment using the pair value. | false | N/A | VELA_OTEL_TRACING_RESOURCE_ENV_ATTRIBUTES |
tracing.span.attributes | This variable sets trace span attributes as a list of key=value pairs. Each pair will be attached to each span as a ’tag’ attribute. | false | N/A | VELA_OTEL_TRACING_SPAN_ATTRIBUTES |
tracing.tracestate.attributes | This variable sets OTel tracestate (span) attributes as a list of key=value pairs. Each pair will be inserted into the tracestate for each sampled span. | false | N/A | VELA_OTEL_TRACING_TRACESTATE_ATTRIBUTES |
Note:
For more information on how the runtime properties are consumed, please see the server reference.Exporter
To start using tracing you first need to set VELA_OTEL_TRACING_ENABLE=true
in the runtime environment.
Enabling tracing requires that VELA_OTEL_EXPORTER_OTLP_ENDPOINT
be set to an exporter host that is reachable over HTTP.
If the exporter requires SSL then VELA_OTEL_TRACING_EXPORTER_SSL_CERT_PATH
must be set to a filepath that contains valid certificates. If no certificate filepath is set, then the server will communicate with the exporter over HTTP (insecure).
The Vela local stack is configured to export traces to Jaeger using their “all-in-one” Docker image, making it easy to view traces out of the box.
+ jaeger:
+ image: jaegertracing/all-in-one:latest
...
$ docker run \
--detach=true \
--env=VELA_ADDR=https://vela-server.example.com \
+ --env=VELA_OTEL_TRACING_ENABLE: true \
+ --env=VELA_OTEL_EXPORTER_OTLP_ENDPOINT: http://jaeger:4318 \
...
From the Jaeger official website:
This image, designed for quick local testing, launches the Jaeger UI, collector, query, and agent, with an in memory storage component.
Sampling
The server uses a combination of “shared” and “per-task” head sampling to control how traces are recorded or dropped.
“Shared” Samplers
The following samplers are utilized by all traces produced by the server.
Global Rate-limiting
All traces share a global rate limit controlled by VELA_OTEL_TRACING_SAMPLER_RATELIMIT_PER_SECOND
.
Use VELA_OTEL_TRACING_SAMPLER_RATELIMIT_PER_SECOND
to set a maximum threshold of “N traces per second”. The default is 100
traces per second.
“Task” Samplers
Set VELA_OTEL_TRACING_SAMPLER_TASKS_CONFIG_FILEPATH
to point to a JSON filepath to control sampler configurations on a per-task basis.
tracing.json
{
"task-name": {
"active": bool
}
}
A task
is basically the “span name” based on the work being performed. A task
can be an API endpoint interaction, a Gorm query, etc. See OTel span docs for more information.
If a task
is not represented in the configuration file then the task will be treated normally, with tracing enabled using the “shared” samplers.
Field | Type | Description |
---|---|---|
active | bool | Set to false to completely disable traces for a particular task. |
Examples of trace tasks include API endpoints, gorm queries, etc. The list of tasks will change as functionality is added to the server.
See the configuration file examples below.
Example - No Tasks
{}
Because tasks that do not exist in the configuration file will be treated as enabled, this file will enable all tracing.
Example - Disable /health Endpoint
{
"/health": {
"active": false
}
}
Because active
can be used to disable tracing, this configuration file will enable tracing for all tasks except for the /health
endpoint.
Example - Mixed Tasks
{
"/health": {
"active": false
},
"/api/v1/deployments/:org/:repo": {
"active": false
},
"/api/v1/:worker": {
"active": true
},
"gorm.query": {
"active": false
}
}
The task /health
with active: false
will disable tracing on /health
.
/api/v1/deployments/:org/:repo
with active: false
will disable tracing on /api/v1/deployments/:org/:repo
for ALL :org
and :repo
parameters.
/api/v1/:worker
with active: true
will do nothing at this point, because any tasks that are not present in the configuration file will automatically be sampled normally. For now this is slightly confusing, but in the future there will be more configuration fields that will determine how an active: true
task is sampled.
The task gorm.query
with active: false
would disable tracing for raw gorm queries. This is meant to show that the config applies to all trace tasks and not just API endpoints.
All other tasks will be sampled as normal using the global shared samplers (like rate limiting)! This includes all API endpoints and gorm queries.