Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Registry manifest and Schema diff #400

Draft
wants to merge 35 commits into
base: main
Choose a base branch
from

Conversation

lquerel
Copy link
Contributor

@lquerel lquerel commented Oct 3, 2024

Note: This scope of this PR has been reduced to focus only focus on the schema diff feature. Github issues have been created to track the features that have been postponed #482, #483.

This PR implements the command registry diff, see the following example:

cargo run -- registry diff -r https://github.com/open-telemetry/semantic-conventions/archive/refs/tags/v1.27.0.zip[model] --baseline-registry https://github.com/open-telemetry/semantic-conventions/archive/refs/tags/v1.26.0.zip[model] --diff-format markdown

In this example, the diff is displayed in markdown format. The following formats are supported: json, yaml, markdown, ansi, ansi_stats

Tasks:

  • Track renamings and deletions in SemConv. See the discussion here.
  • Specify the format and structure of the file used to track the SemConv registry name, version, schema location, and other details.
  • Implement the registry manifest struct.
  • Parse/Read the registry-manifest.yaml.
  • Implement the new format for the deprecated field.
  • Create a sub-command to generate a diff from two versions of the same semconv registry.

Notes: Some people would like to generate database migration scripts based on the diff between two registries. This reinforces the need to decouple the registry diff from how the diff is used (e.g., generation of OTEL schema, migration guides (documentation), db migration script, etc.).

Notes:

  • The crate weaver_otel_schema is not essential for this PR; it was initially included as part of the preparations for the registry schema-update command. We have decided to implement this command in a future PR. However, for simplicity, I prefer to keep the preparation code in place instead of removing it. Same thing for all_changes in weaver_version.

List of modifications to apply to the semantic conventions repository after the release of the Weaver containing the current PR:

  • Add a registry-manifest.yaml file with the version of the next release.
  • Update all deprecated fields.

Closes: #186

@lquerel lquerel self-assigned this Oct 3, 2024
@lquerel lquerel added the enhancement New feature or request label Oct 3, 2024
crates/weaver_semconv/src/manifest.rs Fixed Show fixed Hide fixed
crates/weaver_semconv/src/manifest.rs Fixed Show fixed Hide fixed
crates/weaver_semconv/src/manifest.rs Fixed Show fixed Hide fixed
crates/weaver_semconv/src/manifest.rs Fixed Show fixed Hide fixed
crates/weaver_semconv/src/manifest.rs Fixed Show fixed Hide fixed
crates/weaver_semconv/src/manifest.rs Fixed Show fixed Hide fixed
crates/weaver_semconv/src/deprecated.rs Fixed Show fixed Hide fixed
crates/weaver_semconv/src/deprecated.rs Fixed Show fixed Hide fixed
crates/weaver_semconv/src/deprecated.rs Fixed Show fixed Hide fixed
crates/weaver_semconv/src/deprecated.rs Fixed Show fixed Hide fixed
crates/weaver_semconv/src/deprecated.rs Fixed Show fixed Hide fixed
crates/weaver_semconv/src/deprecated.rs Fixed Show fixed Hide fixed
crates/weaver_semconv/src/deprecated.rs Fixed Show fixed Hide fixed
crates/weaver_semconv/src/deprecated.rs Fixed Show fixed Hide fixed
crates/weaver_semconv/src/deprecated.rs Fixed Show fixed Hide fixed
crates/weaver_semconv/src/deprecated.rs Fixed Show fixed Hide fixed
src/registry/mod.rs Fixed Show fixed Hide fixed
src/registry/mod.rs Fixed Show fixed Hide fixed
src/registry/update_schema.rs Fixed Show fixed Hide fixed
src/registry/update_schema.rs Fixed Show fixed Hide fixed
src/registry/update_schema.rs Fixed Show fixed Hide fixed
src/registry/update_schema.rs Fixed Show fixed Hide fixed
src/registry/update_schema.rs Fixed Show fixed Hide fixed
src/registry/update_schema.rs Fixed Show fixed Hide fixed
src/registry/update_schema.rs Fixed Show fixed Hide fixed
src/registry/update_schema.rs Fixed Show fixed Hide fixed
src/registry/update_schema.rs Fixed Show fixed Hide fixed
src/registry/update_schema.rs Fixed Show fixed Hide fixed
@lquerel lquerel changed the title [WIP] Registry manifest and OTEL schema update [WIP] Registry manifest and Schema diff Nov 27, 2024
# Conflicts:
#	.clippy.toml
#	Cargo.toml
#	crates/weaver_semconv_gen/src/lib.rs
#	src/registry/search.rs
#	src/registry/stats.rs
#	src/registry/update_markdown.rs
@@ -39,7 +39,7 @@ Brief: {{ resource.brief }}
- Sampling relevant: {{ attribute.sampling_relevant }}
{%- endif %}
{%- if attribute.deprecated %}
- Deprecated: {{ attribute.deprecated }}
- Deprecated: {{ attribute.deprecated.note }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is/will this be a breaking change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is a breaking change at the rendering level because, without the .note, Jinja renders the new JSON of the deprecated field instead of the text that now corresponds to the note. I couldn’t find another way to handle this at the rendering level. However, at the semconv parsing level, it is not a breaking change since both formats remain supported.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related comment: https://github.com/open-telemetry/weaver/pull/400/files#r1868713601 ( I thought we'd only keep the note on the attribute without introducing one on deprecated).

But if we do have deprecated.note, it'd still optional thing that does not need to describe the action.

I think codegen would need to generate something like

/*
 * ...
 * @deprecated renamed to {@link
 *     io.opentelemetry.semconv.FooAttributes#FOO_BAR}.
 *  Here's an additional note.
 */

from

deprecated: 
  action: renamed
  new_name: foo.bar
note: Here's an additional note.

so maybe we should eventually provide some helper filters to render those.

In the scope of this PR, if someone uses {{ attr.deprecated }}, I wonder if jinja can call to_string on the original deprecated object to return a custom string? We'd format it similarly to "Renamed to foo.bar", but someone who wants to use new structured info would be able to obtain it with attr.deprecated.action, etc

WDYT?

.map(|group| match group_type {
GroupType::AttributeGroup
| GroupType::Event
| GroupType::Span
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we going to require span names now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, we still have this thorn in our side with spans. How do you see things from the perspective of semantic conventions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can start relying on group.id across all signals now - open-telemetry/semantic-conventions#1512

I'll keep making a slow progress on spans to improve it further - open-telemetry/semantic-conventions#1513, but it should not be a blocker

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, now that we have a form of uniqueness for group IDs within the same registry, I’m fine with it. We’ll need to scope these IDs by registry once we add multi-registry support.

}

// Attributes in the registry
self.diff_attributes(baseline_schema, &mut changes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two concerns here:

  1. overall attribute renames rely on the registry concept. We can only look at renames/added/removed from attribute groups there. We should probably avoid looking at deprecated on a per-signal basis in that instance.
  2. On a per-signal basis, we should be tracking added/removed attributes. (We can ignore changed, as those should be caught in overall "registry" renames.

Example:

  • There are two attributes error.type and http.status_code.
  • There is an event http.my.count. It currently uses error.type.
  • Example1: http.my.count event is updated to use http.status_code instead, using some kind of "ref"/"deprecated" combo.
  • Example2: http.my.count updates error.type from opt-in to recommended.

Perhaps we don't want to handle this yet. Curious on @lmolkova's thoughts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concerns 1 and 2 make sense to me. I can remove rename tracking for everything except the attributes in the registry.

Regarding the examples, could you describe more precisely what you’d like to see in the diff?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR detects renames only (for attributes, metrics, etc) and does not analyze what's changed within a group.

I can see cases when we want to go deeper, but they are beyond changelog/schema transformation automation. E.g. we do it in compatibility policy checks.

I wonder if good old weaver.yaml + JQ + jinja with access to baseline and new semconv at the same time can cover all possible complex diff scenarios one would want to do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at some point we'll want diffs on what to include in a metric or span. As Lidumila calls out we can defer that until later .

Copy link

codecov bot commented Dec 17, 2024

Codecov Report

Attention: Patch coverage is 37.19008% with 228 lines in your changes missing coverage. Please review.

Project coverage is 70.9%. Comparing base (33bd40e) to head (6ba427c).

Files with missing lines Patch % Lines
crates/weaver_resolved_schema/src/lib.rs 0.0% 119 Missing ⚠️
crates/weaver_version/src/schema_changes.rs 0.0% 49 Missing ⚠️
crates/weaver_common/src/result.rs 0.0% 18 Missing ⚠️
crates/weaver_semconv/src/deprecated.rs 71.6% 17 Missing ⚠️
crates/weaver_otel_schema/src/lib.rs 43.7% 9 Missing ⚠️
crates/weaver_semconv_gen/src/lib.rs 70.5% 5 Missing ⚠️
crates/weaver_semconv/src/manifest.rs 90.3% 3 Missing ⚠️
crates/weaver_semconv/src/registry.rs 83.3% 3 Missing ⚠️
crates/weaver_common/src/diagnostic.rs 0.0% 2 Missing ⚠️
crates/weaver_resolver/src/registry.rs 83.3% 2 Missing ⚠️
... and 1 more
Additional details and impacted files
@@           Coverage Diff           @@
##            main    #400     +/-   ##
=======================================
- Coverage   74.1%   70.9%   -3.3%     
=======================================
  Files         50      54      +4     
  Lines       3946    4273    +327     
=======================================
+ Hits        2927    3032    +105     
- Misses      1019    1241    +222     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@lquerel
Copy link
Contributor Author

lquerel commented Dec 19, 2024

@lmolkova @jsuereth

Note 1: I have addressed most of the feedback. The main task remaining is the removal of change detection for elements other than attributes. That will be done soon.

Note 2: I will also write a document describing: the format of the schema diff, examples of what can be done with it, the current limitations, and ideas for future development.

I have a question regarding the format of the new deprecated field. In the current version of this PR, a deprecated field can take one of the following three forms:

Old approach (still supported for compatibility reasons):

deprecated: "deprecation message"

or

deprecated:
  action: renamed
  renamed_to: attribute_name

or

deprecated:
  action: deprecated

With this, we can handle simple attribute renaming scenarios, as well as merge scenarios (e.g., A and B are renamed to C; Weaver will detect this automatically). However, we currently have no way to represent a split (e.g., A is renamed to B and C). So with the current implementation, the semconv author will need to set deprecated to action: deprecated and provide a note at the object level to explain the split in textual form.

We could make this explicit in the format of the deprecated field and in the diff output. This would allow for migration documentation that more accurately reflects the desired changes. However, it still wouldn’t enable automatic downgrades in the schema processor for the split scenario (at least without logic taking into account some additional context).

Question: Adding such an advanced definition for the deprecated field isn’t particularly complicated, so I don’t mind including it. What do you think? Are there other types of deprecations you’d like to codify?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Next Release
Development

Successfully merging this pull request may close these issues.

Automate OTEL Schema Generation and Update Process with Migration Guide Support
3 participants