Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incorrect CAPABILITY_NETWORK classification #37

Open
capnspacehook opened this issue Sep 27, 2023 · 2 comments
Open

incorrect CAPABILITY_NETWORK classification #37

capnspacehook opened this issue Sep 27, 2023 · 2 comments

Comments

@capnspacehook
Copy link

capnspacehook commented Sep 27, 2023

When running capslock against one of my projects, I noticed some of the CAPABILITY_NETWORK classifications didn't seem to make sense. Digging into it further revealed that they were incorrect.

Running capslock at 29c2da0 against https://github.com/capnspacehook/egress-eddie/tree/faa23e15384d4a7f148e3bcb9fa30f3ab4d37d4c with capslock -packages github.com/capnspacehook/egress-eddie -output j displayed a few classifications like this:

{
  "packageName": "egresseddie",
  "capability": "CAPABILITY_NETWORK",
  "depPath": "github.com/capnspacehook/egress-eddie.parseConfigBytes github.com/BurntSushi/toml.Decode (*github.com/BurntSushi/toml.Decoder).Decode (*github.com/BurntSushi/toml.MetaData).unify (*github.com/BurntSushi/toml.MetaData).unifyText (net.pipeAddr).String",
  "path": [
    {
      "name": "github.com/capnspacehook/egress-eddie.parseConfigBytes"
    },
    {
      "name": "github.com/BurntSushi/toml.Decode",
      "site": {
        "filename": "config.go",
        "line": "90",
        "column": "24"
      }
    },
    {
      "name": "(*github.com/BurntSushi/toml.Decoder).Decode",
      "site": {
        "filename": "decode.go",
        "line": "36",
        "column": "51"
      }
    },
    {
      "name": "(*github.com/BurntSushi/toml.MetaData).unify",
      "site": {
        "filename": "decode.go",
        "line": "169",
        "column": "21"
      }
    },
    {
      "name": "(*github.com/BurntSushi/toml.MetaData).unifyText",
      "site": {
        "filename": "decode.go",
        "line": "213",
        "column": "22"
      }
    },
    {
      "name": "(net.pipeAddr).String",
      "site": {
        "filename": "decode.go",
        "line": "513",
        "column": "19"
      }
    }
  ],
  "packageDir": "github.com/capnspacehook/egress-eddie",
  "capabilityType": "CAPABILITY_TYPE_TRANSITIVE"
}

capslock seems to think toml.Decode is calling (net.pipeAddr).String eventually, but digging into the source reveals this is unlikely. (*github.com/BurntSushi/toml.MetaData).unifyText uses a type switch to create a string from an argument of type any. In the fmt.Stringer case capslock thinks that the now known fmt.Stringer type is the type net.pipeAddr. Source of the final call in the stack: https://github.com/BurntSushi/toml/blob/v1.2.1/decode.go#L513.

I understand that fmt.Stringer is an interface and apparently net.pipeAddr satisfies it, but it seems like capslock is assuming the concrete type of the fmt.Stringer here.

EDIT: after looking into this a bit more it seems this is just what golang.org/x/tools/go/ssa and golang.org/x/tools/go/callgraph reports and I'm not sure how difficult detecting this situation would be.

I tried to create a minimal reproducer the just called toml.Decode and some net functions so they would be loaded, but couldn't reproduce the same behavior unfortunately.

Thanks for building and open sourcing this tool, I've wanted something like this for a long time!

@jcd2
Copy link
Collaborator

jcd2 commented Sep 28, 2023

Thanks for the report!

This is one of the current limitations of the analysis. As you said, we use the golang.org/x/tools module's callgraph generators, which find possible calls between pairs of functions. Stitching these calls together can produce stacks of calls that don't happen in practice -- if function A can call function B in one part of a program, and B can call C in another part of the program, that doesn't mean that the path A->B->C can occur, as you've found.

We have some workarounds for this in limited cases, and we also have plans for more general improvements to the callgraph analysis to tackle this problem in the future!

@capnspacehook
Copy link
Author

Thanks for the detailed explanation! I'm really curious what your plans for improving this are. I started researching different call graph analysis algorithms and discovered CHA is guaranteed to produce a sound but not very precise graph. Running the graph through VTA to prune it helps but there's still a lot of superfluous edges as you said.

Since I was analyzing a program with an entrypoint instead of a library I tried using RTA + VTA to create a more precise callgraph. I've read conflicting information as to if RTA produces a sound callgraph, but I found that it doesn't. There are some false negatives compared to using CHA, but less false positives.

Because this is a security tool I understand why you aim to avoid false negatives as much as possible. I do think RTA could be used alongside CHA when main packages are being analyzed to help users find false positives. Any capabilities found from the RTA callgraph or in both callgraphs would be considered reliable, and any capabilities solely found by CHA would be marked as a possible false positive in the output.

Callgraph analysis is very new to me so I'm sure whatever ideas you have in mind to improve it are better than what I proposed, but I figured it wouldn't hurt to lay out my thought process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants