feat: Added debug IDs to source bundle javascript files #762

mitsuhiko · 2023-02-20T12:43:40Z

This adds a new header debug-id to files stored in a source bundle that is used as artifact bundle. The goal of this is to enable the lookup of JavaScript files by debug ID. As we are not using much of the DebugSession abstraction for this yet, i chose not to add this to the trait but just the particular implementation of a session for source bundles. Lookup requires knowledge of the file type and only returns contents, no meta information.

let sess = bundle.debug_session().unwrap();
let f = sess
    .source_by_debug_id(
        "5e618b9f-54a9-4389-b196-519819dd7c47".parse().unwrap(),
        SourceFileType::MinifiedSource,
    )
    .unwrap()
    .expect("should exist");

Manifest Format Changes

The manifest gains a new well known key in the headers dictionary called debug-id which so far is exclusively used to refer to debug IDs of source maps and minified source files. However it's intentionally kept generic in case there is further use of this in the future. For instance it might be valuable for discovering debug files for .wasm files on the internet via HTTP header.

{
  "files": {
    "foo.min.js": {
      "type":"minified_sourec",
      "url": "https://mycdn.invalid/foo/foo.min.js",
      "headers": {
        "sourcemap": "foo.min.map",
        "debug-id": "eb6e60f1-65ff-4f6f-adff-f1bbeded627b"
      }
    }
  },
  "debug_id": "06f93af5-832b-3bbc-6dfd-27a46912605a"
}

Additionally the use of debug_id in the toplevel manifest is now also used to refer to the artifact bundle for JavaScript. This means that in the JavaScript case we have multiple IDs at play:

toplevel debug_id: refers to the artifact bundle (source bundle) of a particular upload / release
per file debug_id: refers to the particular debug ID of an individual minified source and it's corresponding source map

This also changes the implementation to always lowercase headers before writing and after reading for simplified handling.

Source Maps vs Indexed RAM Bundles

Because we have different SourceFileTypes for normal source maps and indexed RAM bundles they require separate handling for the caller. This is probably sensible as they are nothing alike but it's a bit odd.

Artifact vs Source Bundle

This commit also documents that a source bundle and an artifact bundle are really the same thing, with some minor differences about how we refer to it depending on if it's use to hold sources for a debug file, or if it holds release like artifacts for JavaScript processing.

Other API Changes

This also adds missing functionality to the SourceBundleDebugSession to make this actually useful for JavaScript processing:

source_by_url is added to look up a source by URL. Open question here is if this can just be the same as source_by_path? It's odd that we have two ways to key this really.
source_by_debug_id is the canonical way now to look up sources by debug ID + kind
SourceFileDescriptor is added to better describe what a source is for. The descriptor has all the API necessary to retrieve information from source files (contents and meta data) that is relevant for symbolication and unminification:
- ty returns the SourceFileType. This was previously impossible to read after setting.
- contents returns the contents if available, otherwise url should be used for fetching.
- url doubles as a way to either refer to a source link or the canonical URL of the source file. There is no reason why these need to be separate methods as they are referring to the same thing anyways.
- path is just the path that was used to look up the source for native source bundle companions
- debug_id returns the parsed DebugId for source maps and minified sources in bundles that have debug IDs
- source_mapping_url returns either the sourcemap or x-sourcemap reference in the manifest of the bundle. If neither can be found, it falls back to parsing the embedded contents. This makes it convenient enough to use though one could make an argument that the parsing logic should go elsewhere.

github-actions · 2023-02-20T12:44:40Z

	Messages
📖	Do not forget to update Sentry-docs with your feature once the pull request gets approved.

Generated by 🚫 dangerJS against 9172abc

symbolic-debuginfo/src/sourcebundle.rs

codecov · 2023-02-20T13:26:05Z

Codecov Report

Merging #762 (0b23291) into master (9e55fec) will increase coverage by 0.31%.
The diff coverage is 85.24%.

❗ Current head 0b23291 differs from pull request most recent head 9172abc. Consider uploading reports for the commit 9172abc to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #762      +/-   ##
==========================================
+ Coverage   74.22%   74.53%   +0.31%     
==========================================
  Files          70       70              
  Lines       15117    15317     +200     
==========================================
+ Hits        11220    11417     +197     
- Misses       3897     3900       +3

Swatinem

Its very unfortunate that we use the same file format for both artifact bundles (that need metadata), and source bundles (that only care about files).

I would separate these two things from one another. We could have an accessor for the metadata which lazily parses the JSON manifest.
This metadata then allows you to query things by path or by debug-id, etc.

For applying the native source context, we do not need the metadata at all.
For that native use-case, the file paths in the zip are completely predictable.
Profiling has even shown that unzipping and parsing the JSON manifest (which can be hundreds of KiB) is expensive, especially considering that you mostly want just a couple of files from it.
So there, it might be cheaper to just access the contents directly via the predictable file paths, and do the manifest parsing and lookup as a fallback.

I’m not entirely sure what the best idea is to deal with "contents" vs "url". We could do the same logic there as well.

One optimized code path for:

Give me the contents of the file contained in the zip.

Plus another code path for:

Give me the metadata (including the url where to fetch the file) for file with name or debug-id.

symbolic-debuginfo/src/sourcebundle.rs

Swatinem · 2023-02-21T10:29:52Z

symbolic-debuginfo/src/sourcebundle.rs

+            if possible_sourcemap.starts_with("data:application/json") {
+                return None;
+            }


not sure if we actually want to return the data url here? Otherwise we would have to do that same detection logic in symbolicator again. CC @loewenheim wdyt, as you implemented the base64 encoded data urls?

I think we should expose the data url here, yes.

I'm not sure though if we should leave the data url prefix alone; if the URL is returned as a plain string, we will need to be able to recognize it later.

Co-authored-by: Sebastian Zivota <loewenheim@users.noreply.github.com>

Co-authored-by: Arpad Borsos <swatinem@swatinem.de>

mitsuhiko requested a review from a team February 20, 2023 12:43

mitsuhiko force-pushed the feature/debug-ids branch from 20d4201 to 835a85c Compare February 20, 2023 12:45

mitsuhiko commented Feb 20, 2023

View reviewed changes

symbolic-debuginfo/src/sourcebundle.rs Show resolved Hide resolved

loewenheim reviewed Feb 20, 2023

View reviewed changes

symbolic-debuginfo/src/sourcebundle.rs Outdated Show resolved Hide resolved

symbolic-debuginfo/src/sourcebundle.rs Outdated Show resolved Hide resolved

symbolic-debuginfo/src/sourcebundle.rs Outdated Show resolved Hide resolved

Swatinem reviewed Feb 20, 2023

View reviewed changes

symbolic-debuginfo/src/sourcebundle.rs Outdated Show resolved Hide resolved

symbolic-debuginfo/src/sourcebundle.rs Outdated Show resolved Hide resolved

mitsuhiko commented Feb 20, 2023

View reviewed changes

symbolic-debuginfo/src/sourcebundle.rs Outdated Show resolved Hide resolved

Swatinem reviewed Feb 20, 2023

View reviewed changes

symbolic-debuginfo/src/sourcebundle.rs Outdated Show resolved Hide resolved

symbolic-debuginfo/src/sourcebundle.rs Outdated Show resolved Hide resolved

Swatinem approved these changes Feb 21, 2023

View reviewed changes

mitsuhiko requested a review from Swatinem February 21, 2023 10:48

Swatinem approved these changes Feb 21, 2023

View reviewed changes

mitsuhiko and others added 13 commits February 21, 2023 12:47

feat: Added debug IDs to source bundle javascript files

988ba56

Update symbolic-debuginfo/src/sourcebundle.rs

4da1724

Co-authored-by: Sebastian Zivota <loewenheim@users.noreply.github.com>

Fix header

2d7279a

Fix fastpath

10b0909

Avoid alternative import

9840449

Expose source file types from source file descriptor

b439539

Added URL based lookup and sourceMappingURL parsing

483f001

Update symbolic-debuginfo/src/sourcebundle.rs

217ad68

Co-authored-by: Arpad Borsos <swatinem@swatinem.de>

Update symbolic-debuginfo/src/sourcebundle.rs

45c5d7b

Co-authored-by: Arpad Borsos <swatinem@swatinem.de>

Reuse one large index for source bundle lookups

e1609bf

Kill duplicate code

d6fa0d7

Always return the source mapping URL

a9df5e1

Rename confusing function

fba2b69

mitsuhiko force-pushed the feature/debug-ids branch from 6cd5ee0 to fba2b69 Compare February 21, 2023 11:47

Fix bad merge

9172abc

Swatinem merged commit 68147ad into master Feb 21, 2023

Swatinem deleted the feature/debug-ids branch February 21, 2023 12:27

mitsuhiko mentioned this pull request Feb 21, 2023

feat: Add support for embedded debug IDs in minified files #765

Merged

loewenheim mentioned this pull request Feb 22, 2023

WIP: Return SourceDescriptor in DebugSession #763

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Added debug IDs to source bundle javascript files #762

feat: Added debug IDs to source bundle javascript files #762

mitsuhiko commented Feb 20, 2023 •

edited

Loading

github-actions bot commented Feb 20, 2023 •

edited

Loading

codecov bot commented Feb 20, 2023 •

edited

Loading

Swatinem left a comment

Swatinem Feb 21, 2023

loewenheim Feb 21, 2023

loewenheim Feb 21, 2023

feat: Added debug IDs to source bundle javascript files #762

feat: Added debug IDs to source bundle javascript files #762

Conversation

mitsuhiko commented Feb 20, 2023 • edited Loading

Manifest Format Changes

Source Maps vs Indexed RAM Bundles

Artifact vs Source Bundle

Other API Changes

github-actions bot commented Feb 20, 2023 • edited Loading

codecov bot commented Feb 20, 2023 • edited Loading

Codecov Report

Swatinem left a comment

Choose a reason for hiding this comment

Swatinem Feb 21, 2023

Choose a reason for hiding this comment

loewenheim Feb 21, 2023

Choose a reason for hiding this comment

loewenheim Feb 21, 2023

Choose a reason for hiding this comment

mitsuhiko commented Feb 20, 2023 •

edited

Loading

github-actions bot commented Feb 20, 2023 •

edited

Loading

codecov bot commented Feb 20, 2023 •

edited

Loading