Skip to content

Commit

Permalink
chore(docs): Add Log Namespacing docs (#16571)
Browse files Browse the repository at this point in the history
This updates documentation and adds a blog-post announcing the log
namespacing feature (as a beta release).

---------

Co-authored-by: Spencer Gilbert <spencer.gilbert@datadoghq.com>
  • Loading branch information
fuchsnj and spencergilbert authored Jun 30, 2023
1 parent b8e3dbe commit 7d098e4
Show file tree
Hide file tree
Showing 5 changed files with 263 additions and 4 deletions.
169 changes: 169 additions & 0 deletions website/content/en/blog/log-namespacing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
---
title: Log Namespacing
short: Log Namespacing
description: Changing Vector's data model
authors: ["fuchsnj"]
date: "2023-06-30"
badges:
type: announcement
domains: ["data model"]
tags: []
---

The Vector team has been hard at work improving the data model of events in Vector. These
changes are now available for beta testing for those who want to try it out and give feedback.
This is an opt-in feature. Nothing should change unless you specifically enable it.

## Why

Currently, all data for events is placed at the root of the event, regardless of where the data came
from or how it was obtained. Not only can that make it confusing to understand what a certain field
represents (eg: was the `timestamp` field generated by Vector when it was ingested, or is it when
the source originally created the event) but it can easily cause data collisions.

Log namespacing also unblocks powerful features being worked on, such as end-to-end type checking
of events in Vector.

## How to enable

The [global config] `schema.log_namespace` can be set to `true` to enable the new
Log Namespacing feature for all components. The default is `false`.

Every source also has a `log_namespace` config option. This will override the global setting,
so you can try out Log Namespacing on individual sources.

The following example enables the `log_namespace` feature globally, then disables it for a single
source.

```toml
schema.log_namespace = true

[sources.input_with_log_namespace]
type = "demo_logs"
format = "shuffle"
lines = ["input_with_log_namespace"]
interval = 1

[sources.input_without_log_namespace]
type = "demo_logs"
format = "shuffle"
lines = ["input_without_log_namespace"]
interval = 1
log_namespace = false

[sinks.console]
type = "console"
inputs = ["input_with_log_namespace", "input_without_log_namespace"]
encoding.codec = "json"

```

## How It Works

### Data Layout

When handling log events, information is categorized into one of the following groups:
(Examples are from the `datadog_agent` source)

- Event Data: The decoded event data. (eg: the log itself)
- Source Metadata: Metadata provided by the source of the event. (eg: hostname / tags)
- Vector Metadata: Metadata provided by Vector. (eg: the time when Vector received the event)

#### Without Log Namespacing

All three of these are placed at the root of the event. The exact layout depends on the source,
some fields are configurable, and the [global log schema] can change the name / location of some
fields.

Example log event from the `datadog_agent` source (with the JSON decoder)

```json
{
"ddsource": "vector",
"ddtags": "env:prod",
"hostname": "alpha",
"foo": "foo field",
"service": "cernan",
"source_type": "datadog_agent",
"bar": "bar field",
"status": "warning",
"timestamp": "1970-02-14T20:44:57.570Z"
}
```

#### With Log Namespacing

When enabled, the layout of this data is well-defined and consistent.

Event Data (and _only_ Event Data) is placed at the root of the event (eg: `.`).
Source metadata is placed in event metadata, prefixed by the source name. (eg: `%datadog_agent`)
Vector metadata is placed in event metadata, prefixed by `vector`. (eg: `%vector`)

Generally sinks will only send the event data. If you want to include any metadata fields,
it's recommended to use a [remap] transform to add data to the event as needed.

It's important to note that previously the type of an event (`.`) would always be an object
with fields. Now it is possible for event to be any type, such as a string.

Example log event from the `datadog agent` source. (same data as the example above)

Event root (`.`)

```json
{
"foo": "foo field",
"bar": "bar field"
}
```

Source metadata fields (`%datadog_agent`)

```json
{
"ddsource": "vector",
"ddtags": "env:prod",
"hostname": "alpha",
"service": "cernan",
"status": "warning",
"timestamp": "1970-02-14T20:44:57.570Z"
}
```

Source vector fields (`%vector`)

```json
{
"source_type": "datadog_agent",
"ingest_timestamp": "1970-02-14T20:44:58.236Z"
}
```

Here is a sample VRL script accessing different parts of an event when log namespacing is enabled.

```coffee
event = .
field_from_event = .foo

all_metadata = %
tags = %datadog_agent.ddtags
timestamp = %vector.ingest_timestamp

```

### Semantic Meaning

Before Log Namespacing, Vector used the [global log schema] to keep certain types of information
at known locations. This is changing, and when log namespacing is enabled, the [global log schema]
will no longer be used. To replace it, a new feature called "semantic meaning" will be used instead.
This allows assigning meaning to different fields of an event, which allows sinks to access
information needed, such as timestamps, hostname, the message, etc.

Semantic meaning will automatically be assigned by all sources. Sinks will check on startup to make
sure a meaning exists for all required fields. If a source does not provide a required field, or
a meaning needs to be manually adjusted for any reason, the VRL function [set_semantic_meaning] can
be used.

[global log schema]: /docs/reference/configuration/global-options/#log_schema
[set_semantic_meaning]: /docs/reference/vrl/functions/#set_semantic_meaning
[remap]: /docs/reference/configuration/transforms/remap/
[global config]: /docs/reference/configuration/global-options/#log_namespacing
20 changes: 16 additions & 4 deletions website/cue/reference/configuration.cue
Original file line number Diff line number Diff line change
Expand Up @@ -251,6 +251,17 @@ configuration: {
}
}
}
log_namespacing: {
common: false
description: """
Globally enables / disables log namespacing. See [Log Namespacing](\(urls.log_namespacing_blog))
for more details. If you want to enable individual sources, there is a config
option in the source configuration.
"""
required: false
warnings: []
type: bool: default: false
}

telemetry: {
common: false
Expand All @@ -274,7 +285,7 @@ configuration: {
common: true
description: """
Add a `source` tag with the source component the event was received from.
If there is no source component, for example if the event was generated by
the `lua` transform a `-` is emitted for this tag.
"""
Expand Down Expand Up @@ -309,13 +320,14 @@ configuration: {
}

log_schema: {
common: false
common: false
description: """
Configures default log schema for all events. This is used by
Vector source components to assign the fields on incoming
Vector components to assign the fields on incoming
events.
These values are ignored if log namespacing is enabled. (See [Log Namespacing](\(urls.log_namespacing_blog)))
"""
required: false
required: false
type: object: {
examples: []
options: {
Expand Down
44 changes: 44 additions & 0 deletions website/cue/reference/remap/functions/set_semantic_meaning.cue
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
package metadata

remap: functions: set_semantic_meaning: {
category: "Event"
description: """
Sets a semantic meaning for an event. Note that this function assigns
meaning at Vector startup, and has _no_ runtime behavior. It is suggested
to put all calls to this function at the beginning of a VRL function. The function
cannot be conditionally called (eg: using an if statement cannot stop the meaning
from being assigned).
"""

arguments: [
{
name: "target"
description: """
The path of the value that will be assigned a meaning.
"""
required: true
type: ["path"]
},
{
name: "meaning"
description: """
The name of the meaning to assign.
"""
required: true
type: ["string"]
},
]
internal_failure_reasons: [
]
return: types: ["null"]

examples: [
{
title: "Sets custom field semantic meaning"
source: #"""
set_semantic_meaning(.foo, "bar")
"""#
return: null
},
]
}
1 change: 1 addition & 0 deletions website/cue/reference/urls.cue
Original file line number Diff line number Diff line change
Expand Up @@ -313,6 +313,7 @@ urls: {
logfmt_specs: "https://pkg.go.dev/github.com/kr/logfmt#section-documentation"
logstash: "https://www.elastic.co/logstash"
logstash_protocol: "https://github.com/elastic/logstash-forwarder/blob/master/PROTOCOL.md"
log_namespacing_blog: "/blog/log-namespacing/"
loki: "https://grafana.com/oss/loki/"
loki_multi_tenancy: "\(github)/grafana/loki/blob/master/docs/operations/multi-tenancy.md"
log_event_source: "\(vector_repo)/blob/master/src/event/"
Expand Down
33 changes: 33 additions & 0 deletions website/layouts/partials/data.html
Original file line number Diff line number Diff line change
Expand Up @@ -257,6 +257,39 @@
</a>
</span>

<div
class="mt-3 border-2 rounded-md border-yellow-400 flex flex-col space-y-1.5 py-2 px-3">
<span>
<h4
x-data="{ show: false }" x-on:mouseover="show = true" x-on:mouseleave="show = false"
id="warning" class="flex items-center text-dark dark:text-gray-200 js-toc-ignore">
<span>Warning</span>
<a class="ml-2" x-show="show" href="#warning" style="display: none;"><svg
xmlns="http://www.w3.org/2000/svg" class="text-secondary dark:text-primary h-4 w-4"
fill="none"
viewBox="0 0 24 24" stroke="currentColor">
<path stroke-linecap="round" stroke-linejoin="round" stroke-width="2"
d="M7 20l4-16m2 16l4-16M6 9h14M4 15h14"></path>
</svg>
</a>
</h4>
</span>
<div class="flex space-x-5 items-center">
<div class="flex-shrink-0">
<svg xmlns="http://www.w3.org/2000/svg" class="text-yellow-500 h-5 w-5" fill="none"
viewBox="0 0 24 24" stroke="currentColor">
<path stroke-linecap="round" stroke-linejoin="round" stroke-width="2"
d="M12 8v4m0 4h.01M21 12a9 9 0 11-18 0 9 9 0 0118 0z"></path>
</svg>
</div>
<div class="prose dark:prose-dark max-w-none leading-snug">The fields shown below will be
different if log namespacing is enabled.
See <a href="/blog/log-namespacing/">Log Namespacing</a> for
more details
</div>
</div>
</div>

<div class="mt-3 border rounded divide-y dark:border-gray-700 dark:divide-gray-700">
{{ template "logs_output" . }}
</div>
Expand Down

0 comments on commit 7d098e4

Please sign in to comment.