Capture VCS information #147

graphaelli · 2019-09-17T19:58:34Z

To enable code deeper code integration a way to identify the exact version of service/file/dependency would be very useful. #130 introduces some dependency-specific concepts. The proposal here is to capture version control information for all first party/non-dependency code, though a single, resuable definition of fields probably makes sense. elastic/ecs#560 may be useful as well.

Note that service.version may also contain useful information but isn't purpose built for integration purposes - it commonly contains user-defined values like v7, 2019-09-16-001, etc.

I propose we capture the following fields under vcs:

field	definition	example value
`repository`	source code location	`https://github.com/elastic/apm.git`
`revision`	revision identifier, eg git ref	`f009776789e91de1909ca59d8415df64c8db16bb`
`type`	vcs type	`git`
`url`	related web URL	`https://github.com/elastic/apm`

That group would be captured under service initially, and considered for inclusion in each stacktrace.frame later. The result would be service.vcs.revision, etc.

Agents would collect this information automatically if possible, eg if a .git directory is present, and otherwise provide a mechanism for users to specify vcs information.

The text was updated successfully, but these errors were encountered:

axw · 2019-09-18T08:19:10Z

What's the difference between vcs.repository and vcs.url? I mean obviously one has .git on the end and one doesn't, but isn't that a matter of canonicalisation? You can clone with or without the trailing .git.

axw · 2019-09-18T08:34:26Z

To answer my own question (I think): vcs.repository is the VCS URL, and vcs.url is the related web URL.

graphaelli · 2019-09-18T15:57:36Z

Thanks for pointing that out @axw, just added descriptions for each field.

Qard · 2019-09-18T19:01:37Z

Github just happens to have very similar repo and web URLs, some other platforms have quite different URLs.

roncohen · 2019-09-19T09:01:33Z

I think we need to consider carefully if this should be under service. I think we might run the risk of adding all our fields under service because everything seems service related. For example, we have host.name but we could also easily have put it under service.host.name. We did it for service.node.name but I'm not even sure that was right :)

roncohen · 2019-09-19T09:02:31Z

happy to see this move forward btw @graphaelli. Thanks for getting it started.
Will we support customers building a service from multiple repositories?

graphaelli · 2019-09-19T18:42:03Z

The nesting proposed is driven partially by multiple repository support, whether those are multiple first-party or from dependencies. What this proposal is after is a way to specify a default code repository and eventually add an override code repository. service was chosen to align with the current design, which calls for a service to repo mapping. overrides could be specified by adding vcs information to a stacktrace frame, profile event, etc at a later time.

This means that given a given filename and line number, code would resolve the repo using: 1. stack frame 2. service 3. kibana mapping. The ref would have to be specified with the frame and otherwise fall back to the service for that information. We can dig into details later, but I expect if repo is provided at a more specific level, ref must also be provided.

Initially these 3 frames would map to code as follows:

{
  "service": {
    "environment": "production",
    "name": "some-node-service",
    "vcs": {
      "revision": "abcd1234"
    }
  },
  "span": {
    "stacktrace": [
      {
        "abs_path": "/app/server.js",
        "filename": "server.js",
        "library_frame": false,
        "line": {
          "number": 101
        }
      },
      {
        "abs_path": "/app/vendor/handle.js",
        "filename": "handle.js",
        "library_frame": false,
        "line": {
          "number": 67
        },
        "vcs": {
          "repo": "handle.git",
          "revision": "xxx"
        }
      },
      {
        "abs_path": "/app/node_modules/express/lib/router/layer.js",
        "filename": "node_modules/express/lib/router/layer.js",
        "library_frame": true,
        "line": {
          "number": 95
        },
        "vcs": {
          "revision": "4.17.1"
        }
      }
    ]
  }
}

/app/server.js -> kibana-mapped repo @ abcd1234 (from service.vcs.revision)
/app/vendor/handle.js -> handle.git @ xxx

Knowing the final frame is a library_frame provides an opportunity for a kibana side mapping for libraries, but initially code would not be resolved for that bit.

Open to ideas / counter proposals / etc. The goal is to open issues as soon as possible for apm-server to intake and persist this information and agents to provide it.

axw · 2019-09-20T02:48:28Z

@graphaelli that makes sense to me, but I'm a little concerned about increasing the size of stack traces at the API level, given that this is now adding to every frame from a third-party repo, which in some apps/languages may be many. It's already expensive as it is for some agents.

Also, if we add profile events it may be useful to separate the mapping so we can reuse existing profiling formats.

Maybe we could go with optionally adding vcs info to stack frames for now, and later introduce a mapping in the metadata based on module prefixes? If we were to do that, would it still make sense to have vcs nested under service?

graphaelli · 2019-09-20T17:38:13Z

I'm a little concerned about increasing the size of stack traces at the API level

Same here, that's why I originally proposed we save that for later (vcs under stack frame), at the expense of not supporting multiple repositories / library frames.

if we add profile events it may be useful to separate the mapping so we can reuse existing profiling formats.

Agreed. This makes me rethink the proposed resolution chain for which repo to use. Perhaps the kibana mapping should always win if present so that older events that include repo data could still be used, in case the repo is moved after the event occurs. I think that's a discussion for another issue.

Maybe we could go with optionally adding vcs info to stack frames for now, and later introduce a mapping in the metadata based on module prefixes? If we were to do that, would it still make sense to have vcs nested under service?

I'm less focused on the repo mapping then on the revision. service.vcs.revision would still be needed in case repository isn't included in the initial list of fields at all, assuming mapping is only possible via kibana.

The benefit of allowing repo via intake is that agents can automatically report it (in some cases, eg .git/ exists), and then no user invention would be required to show code, beyond configuring the kibana code app with that repository. Eventually it's probably possible to import the code without manual intervention too, for well known repos or in case the package is the path like is common in go.

axw · 2019-09-23T06:10:06Z

I'm less focused on the repo mapping then on the revision. service.vcs.revision would still be needed in case repository isn't included in the initial list of fields at all, assuming mapping is only possible via kibana.

I was thinking that the mapping in the intake metadata would include both the repo and the commit, mapping from module name -> {rep, commit}. This assumes a single version/commit if each dependency though, and some languages do support multiple, so it wouldn't always be sufficient.

I'm still left wondering if service.vcs makes sense in that case. Perhaps vcs.service instead, indicating the vcs info to apply to application stack frames, in lieu of a higher precedence mapping? That would also leave room for, say, vcs.module, which would be a mapping from module names/prefixes to vcs info.

graphaelli · 2019-09-25T16:17:07Z

Discussed with @axw, mostly around ways to communicate the vcs information from agents efficiently while still enabling the UI to reliably map the code for a given frame.

Expanding on the comment above, the idea is to send vcs info via event metadata to perform mapping per frame to repo & revision later, perhaps in apm-server. In that case, it may be that there is nothing special about mapping first party vs dependency code. It's unclear whether that works for all languages / agents. I'm not sure file prefixes are reliable either as that information is probably lost in some languages.

For discussion sake, this is what metadata might look like if vcs were included only there instead of also in stack frames:

{
  "metadata": {
    "service": {"name": "myservice"},
    "vcs": {
      "modules": [
        {"name":"express", "repository": "https://github.com/expressjs/express.git", "revision": "4.17.1"}
      ]
    }
  }
}

It does look like a vcs.service would be useful after all, for non-library frames. I'm planning to continue exploring this option, feedback welcome in the meantime.

felixbarny · 2019-09-25T17:21:43Z

For the lack of better alternatives, the Java agent sets the package name as the module in stacktraces. One repository has multiple packages, however. I currently also don't see how we could detect the vcs information for an arbitrary stack frame.

graphaelli · 2019-09-26T15:58:18Z

Thanks for raising @felixbarny. This is a concern we've discussed a bit but haven't explored deeply. Curious to hear from other @elastic/apm-agent-devs on feasibility.

axw · 2019-09-27T03:35:16Z

One repository has multiple packages, however

Is that problematic? We can map multiple modules to the same repo.

I currently also don't see how we could detect the vcs information for an arbitrary stack frame.

I think this will generally be pretty complicated to set up for Go devs, particularly as more people move towards using Go Modules where vcs info isn't necessarily locally available. If it's helpful I can try to describe what I think would be a more ideal approach for Go, but I don't want to distract the conversation.

dgieselaar mentioned this issue Nov 20, 2019

[APM] Deployment annotations: Show service.version annotations and filter #167

Closed

graphaelli closed this as completed Dec 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Capture VCS information #147

Capture VCS information #147

graphaelli commented Sep 17, 2019 •

edited

Loading

axw commented Sep 18, 2019

axw commented Sep 18, 2019

graphaelli commented Sep 18, 2019

Qard commented Sep 18, 2019

roncohen commented Sep 19, 2019

roncohen commented Sep 19, 2019

graphaelli commented Sep 19, 2019 •

edited by axw

Loading

axw commented Sep 20, 2019 •

edited

Loading

graphaelli commented Sep 20, 2019 •

edited

Loading

axw commented Sep 23, 2019

graphaelli commented Sep 25, 2019 •

edited

Loading

felixbarny commented Sep 25, 2019

graphaelli commented Sep 26, 2019

axw commented Sep 27, 2019

Capture VCS information #147

Capture VCS information #147

Comments

graphaelli commented Sep 17, 2019 • edited Loading

axw commented Sep 18, 2019

axw commented Sep 18, 2019

graphaelli commented Sep 18, 2019

Qard commented Sep 18, 2019

roncohen commented Sep 19, 2019

roncohen commented Sep 19, 2019

graphaelli commented Sep 19, 2019 • edited by axw Loading

axw commented Sep 20, 2019 • edited Loading

graphaelli commented Sep 20, 2019 • edited Loading

axw commented Sep 23, 2019

graphaelli commented Sep 25, 2019 • edited Loading

felixbarny commented Sep 25, 2019

graphaelli commented Sep 26, 2019

axw commented Sep 27, 2019

graphaelli commented Sep 17, 2019 •

edited

Loading

graphaelli commented Sep 19, 2019 •

edited by axw

Loading

axw commented Sep 20, 2019 •

edited

Loading

graphaelli commented Sep 20, 2019 •

edited

Loading

graphaelli commented Sep 25, 2019 •

edited

Loading