Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

:PATH in access logs without query string #7583

Closed
bobby-stripe opened this issue Jul 15, 2019 · 28 comments · Fixed by #15711
Closed

:PATH in access logs without query string #7583

bobby-stripe opened this issue Jul 15, 2019 · 28 comments · Fixed by #15711
Assignees

Comments

@bobby-stripe
Copy link
Contributor

The Envoy access_log documentation has an example with REQ(X-ENVOY-ORIGINAL-PATH?:PATH) for logging the request's path, but both the :PATH and X-ENVOY-ORIGINAL-PATH header contain the query string.

Query strings can sometimes contain sensitive details like access tokens - it would be great to be able to log the path without the query string, but I don't see a way to do this in Envoy currently. Opening this to either see if I'm missing something, or track adding this (small, I think) feature.

@dio
Copy link
Member

dio commented Jul 15, 2019

Currently, we have %REQ(MAIN?ALT_IF_MAIN_DOES_NOT_EXISTS):MAX_LENGTH

Do you have a suggestion on how we express that (remove qs)? Probably it is good to have a "special operator" for PATH e.g. %REQ(REMOVE_QUERY_STRINGS(MAIN, *)?REMOVE_QUERY_STRINGS(ALT_IF_DOES_NOT_EXISTS, token&id&key)). But not sure. WDYT?

@mattklein123 mattklein123 added the enhancement Feature requests. Not bugs or questions. label Jul 16, 2019
@bobby-stripe
Copy link
Contributor Author

something like REMOVE_QUERY_STRING($x) sounds reasonable - it would be nice to be able to say %REMOVE_QUERY_STRING(%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%)% or %REMOVE_QUERY_STRING(:REFERER)% (can logging operators like this currently be nested?)

I think my preference would be for a blunt tool - removing the entire query string. Denylists of fields to remove from the query string are very brittle - its too easy for clients (either internal clients developed by other teams or external users) to purposefully or accidentally change field names and break denylist-based redaction.

@dio
Copy link
Member

dio commented Jul 17, 2019

@bobby-stripe thanks!

I think my preference would be for a blunt tool - removing the entire query string

I agree. That makes sense.

@zuercher do you have advice here?

@snowp
Copy link
Contributor

snowp commented Jul 25, 2019

The current code won't automatically handle nesting afaik, though I imagine you could make that work. Would it be easier to just have a %REQ_NO_QUERY(:path)? The "remove query params" operator is coupled with headers to begin with, so I don't imagine there are other relevant nesting combinations?

Relevant formatting code: https://github.com/envoyproxy/envoy/blob/master/source/common/access_log/access_log_formatter.cc#L226

@cetanu
Copy link
Contributor

cetanu commented Aug 6, 2019

Referring to the following section of the RFC for URIs:
https://tools.ietf.org/html/rfc3986#section-3

   The following are two example URIs and their component parts:

         foo://example.com:8042/over/there?name=ferret#nose
         \_/   \______________/\_________/ \_________/ \__/
          |           |            |            |        |
       scheme     authority       path        query   fragment
          |   _____________________|__
         / \ /                        \
         urn:example:animal:ferret:nose

Shouldn't the path not include the query string in the first place?

I suppose since the current way to log this information is by the macro %REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%, indicating that it's a header set by envoy (?), then it should be okay to make additional macros which make up for it, such as:

%PATH% the path component of the URI
%QUERY% the query component of the URI

Are there concerns over creating more macros like this?

I don't want to break anyone's logs, but I also wish to log the path and query separately.

Thoughts?

@dio
Copy link
Member

dio commented Aug 6, 2019

@cetanu I feel your argument is valid. Let see what @snowp thinks about it.

@lizan
Copy link
Member

lizan commented Aug 7, 2019

@cetanu the :path here is referring to HTTP's :path pseudo-header, described in HTTP/2 spec RFC 7540:

The ":path" pseudo-header field includes the path and query parts
of the target URI (the "path-absolute" production and optionally a
'?' character followed by the "query" production (see Sections 3.3
and 3.4 of [RFC3986]). A request in asterisk form includes the
value '*' for the ":path" pseudo-header field.

@dio
Copy link
Member

dio commented Aug 9, 2019

@cetanu how about referer? And also as per @lizan above the :path contains query string.

I made an attempt with REQ_WITHOUT_QUERY in #7847, it seems straightforward from the implementation perspective. As always feel free to comment about it.

And I also feel like probably REQ(X?Y):Z:POST_PROCESSING somewhat future proof. WDYT?

@cetanu
Copy link
Contributor

cetanu commented Aug 20, 2019

A lot of requests don't have a referer in my experience, plus it can be somewhat falsified. The REQ_WITHOUT_QUERY is probably good enough though. If it fits with the existing convention for logging macros, then why not. I'm not an authority on this :P

@stale
Copy link

stale bot commented Sep 19, 2019

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or other activity occurs. Thank you for your contributions.

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Sep 19, 2019
@stale
Copy link

stale bot commented Sep 26, 2019

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted". Thank you for your contributions.

@stale stale bot closed this as completed Sep 26, 2019
@yks0000
Copy link

yks0000 commented Dec 4, 2019

Any Plan as when this will be available i.e. access log Path without Query Param?

@jpbelanger-mtl
Copy link

We just got hit with this during our last upgrade of istio. Where the latest log format had great information about the path but was leaking information in our logs that we didn't want (I know GET shouldn't send any sensitive information, but 3rd party SDK are not always clean). Would have been perfect to be able to keep the path without the query string.

@cetanu
Copy link
Contributor

cetanu commented Aug 18, 2020

Can we please reopen this :)

@dio dio reopened this Aug 18, 2020
@stale stale bot removed the stale stalebot believes this issue/PR has not been touched recently label Aug 18, 2020
@mattklein123 mattklein123 added area/access_log help wanted Needs help! and removed enhancement Feature requests. Not bugs or questions. labels Aug 18, 2020
@dio
Copy link
Member

dio commented Aug 26, 2020

@zuercher do you think we can add another one to the rules in substitution formatter for this?

@awangc
Copy link

awangc commented Sep 14, 2020

We'd like to have this feature too. I'd be glad to work on this if nobody is working on it, or to collaborate in any way needed to make it go faster.

@dio
Copy link
Member

dio commented Sep 15, 2020

Relevant: #12991 (comment) cc. @zuercher @rgs1

@Zsolt-LazarZsolt
Copy link

Hello Everyone.

Is there any development in this issue? Any workaround that helps keep the path without the query?
Would it be supported with json type logging too?

@cetanu
Copy link
Contributor

cetanu commented Mar 10, 2021

I ended up separating this with my logging agent

@Zsolt-LazarZsolt
Copy link

Hi @cetanu

Could you elaborate on that solution?

@cetanu
Copy link
Contributor

cetanu commented Mar 10, 2021

Hi @cetanu

Could you elaborate on that solution?

Sure.

  1. Our envoy proxy lives on an EC2 server
  2. The main listener has an access log filter, which emits a JSON formatted log
  3. There is another process (the log agent) on the machine which reads this log message (either via syslog, or from a file)
  4. The process performs transformation to the fields

For example, we have a field uri which is %REQ(X-ENVOY-ORIGINAL-PATH?:PATH):256%.
The log agent splits this by the '?' character and creates a new field named query with the second part of the split string.

The particular log agent we use is called vector.
There are others, such as fluentd and fluentbit.

@Zsolt-LazarZsolt
Copy link

Hi @cetanu
Thank you very much for your answer.

@Zsolt-LazarZsolt
Copy link

In my opinion to natively support logging the path without the query is an important feature, and missing it poses a certain security risk.

@snowp
Copy link
Contributor

snowp commented Mar 10, 2021

I don't think anyone is currently working on this, so we just need someone willing to pick this up.

@dio
Copy link
Member

dio commented Mar 10, 2021

Yes, this one I think we can use: #14512. cc. @awangc

@tsaarni
Copy link
Member

tsaarni commented Mar 11, 2021

I might be able to help implement this.

I was checking #14512 for the new command formatter extension API. I don't find any examples on custom formatters in the code base yet (outside the test case) but if I understood correctly, a custom formatter would be first (a) registered with CommandParserFactory and then (b) activated by the user, by listing the registered name in config.core.v3.SubstitutionFormatString.formatters. After that, the new command operator can be used in the access log format rules.

Where in the code base one should place new custom formatter implementations? I suppose some will naturally belong to e.g. specific filters, but I think this one is more general.

I suppose (b) is necessary when taking the custom formatter approach? When Envoy is configured via xDS, it seems to me that each custom formatter has small impact on the control plane as well?

@snowp
Copy link
Contributor

snowp commented Mar 11, 2021

@tsaarni Extensions would go in source/extensions/formatters or something like that (maybe source/extensions/access_log_formatters if this is limited to access logging?).

For b) yea it needs to be listed next to the access log string that uses it: presumably this is a small cost since the access log string would also need to be updated to make use of the new formatter.

@tsaarni
Copy link
Member

tsaarni commented Apr 15, 2021

While implementing proposal #15711 I have became bit suspicious if the approach is the best:

  1. Is it optimal that we create specific extensions for trivial things like stripping ?querystring or will that approach result in many extensions with very narrow use case but high overhead of being extensions?
  2. Non-optimal user experience when seemingly trivial thing requires relatively complicated configuration

Regarding (2): Control plane may already provide means for user to configure access log format string. User cannot use the new command in the format string before control plane has activated the formatter extension (xref projectcontour/contour#3576). This is of course how the formatter extensions work in Envoy, but maybe it could be avoided in this specific case.

Alternative proposal:

Stripping the query string is a common need, rather than being an exception. In other proxies one approach seems to be having sanitized variable $uri vs $request without sanitation. This approach is not necessarily suitable for Envoy since there is no such variables, but what about modifying the existing command:

Could we redefine %REQ(X?Y):Z% as %REQ(X?Y)[:Z:STRIP_QUERY_STRING]%?

Was there conclusion for this @dio #7583 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet