Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: wildcard support for public gateways #7319

Merged
merged 3 commits into from
Aug 19, 2020

Conversation

MichaelMure
Copy link
Contributor

This is a preview for feedback. Some things are obviously missing, like the documentation.

Add support for one or more wildcards in the hostname definition
of a public gateway. This is useful for example to support easily
multiples environment.

Wildcarded hostnames are set in the config as for example *.domain.tld or even *.*.domain.tld

This PR address #7317 as well

cc @lidel @Stebalien

@MichaelMure
Copy link
Contributor Author

MichaelMure commented May 15, 2020

Note that this also support more complex things like foo.bar-*-*-foo.domain.tld.

It's also possible to abuse it with something like **.domain.tld which will require two characters minimum for the subdomain.

I'll add more test for the first kind.

@Stebalien Stebalien requested a review from lidel May 18, 2020 07:55
Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @MichaelMure

Is my understanding correct, that the purpose of this PR is to support a single gateway that handles: {cid}.ipfs.example.com AND {cid}.ipfs.user1.example.com AND {cid}.ipfs.app1.user1.example.com – all using the same root domain name?

Quick feedback from my end:

  • foo.bar-*-*-foo.domain.tld is a bit of controversial idea
    • I believe left side of URL should be open-ended for content identifier, maintaining the authority hierarchy (used for Origin calculation) from right to left
    • This is not just for "purity" reasons
      • local gateway supports DNSLink identifiers:
        http://en.wikipedia-on-ipfs.org.ipns.localhost:8080/wiki/
      • we may have to start splitting longer CIDs at 63th character due to Subdomain support for CIDs longer than 63 #7318
        • perhaps this is something you'd like to tackle as part of this PR? or at least be aware its something we need to support in the future (touches the same codepaths)
        • for example (sha2-512): bafkrgqe3ohjcjplc6n4f3fwunlj6upltggn7xqujbsvnvyw764srszz4u4rshq6ztos4chl4plgg4ffyyxnayrtdi5oc4xb2332g645433aeghttps://bafkrgqe3ohjcjplc6n4f3fwunlj6upltggn7xqujbsvnvyw.764srszz4u4rshq6ztos4chl4plgg4ffyyxnayrtdi5oc4xb2332g645433aeg.ipfs.example.com
  • nit: if we know that remainder on the left side is always a content identifier, are we able to simplify this PR? perhaps do the detection without regex?
  • When you add tests make sure we have ones that that document behavior when multiple rules are added for with the same root domain: example.com, *.example.com, *.*.example.com

@MichaelMure
Copy link
Contributor Author

MichaelMure commented May 18, 2020

Is my understanding correct, that the purpose of this PR is to support a single gateway that handles: {cid}.ipfs.example.com AND {cid}.ipfs.user1.example.com AND {cid}.ipfs.app1.user1.example.com – all using the same root domain name?

The goal for Infura is:

  • be able to have an undefined user ID as a subdomain while avoiding to reimplement the complex query handling that is in go-ipfs (and maintaining it)
  • way less critical, to simplify the configuration and define a simplified gateway configuration that would support any deployment (dev/staging/prod, possibly different variant at the same time).
* `foo.bar-*-*-foo.domain.tld` is a bit of controversial idea

It was complex on purpose just to show that wildcard within a subdomain label is possible. That said we do use internally subdomains like ipfs-vX-dev and it'd be nice if we don't have to change that.

  * I believe left side of URL should be open-ended for content identifier, maintaining the authority hierarchy (used for Origin calculation) from right to left

Agree on that, not sure where I'm going against it.

      * perhaps this is something you'd like to tackle as part of this PR? or at least be aware its something we need to support in the future (touches the same codepaths)

Err, for now I'm just trying to have things working :). I'm aware and things should be compatible once that land.

     * nit: if we know that remainder on the left side is always a content identifier, are we able to simplify this PR? perhaps do the detection without regex?

Possibly but I tried to follow the original logic of this piece of code with a very minimal change.

Regexes are compiled once when launching go-ipfs. A quick benchmark gives me 265ns and 0 allocation to check a match so performance should not be a problem. Also, exact matches are checked first so they should not even be used unless actually necessary.

I suppose it would be possible to rewrite it without regex, but doing it that way give a nice expressiveness to the gateway operators at a very small cost.

* This needs tests that document behavior when multiple rules are added for  with the same root domain: `example.com`, `*.example.com`, `*.*.example.com`

Ha yes, I can do that once you agree on the logic.

@lidel
Copy link
Member

lidel commented May 19, 2020

I believe left side of URL should be open-ended for content identifier, maintaining the authority hierarchy (used for Origin calculation) from right to left
Agree on that, not sure where I'm going against it.

That comment was aimed at the example of foo.bar-*-*-foo.domain.tld specifically.
In practice, it won't be possible to use - not use foo. prefix:

  • DNS label length is limited to 63 characters, so CID + ipfs namespace + username won't fit if you use - as separator
  • We will (probably) split CIDs at 63th char to work around DNS spec limitations (Subdomain support for CIDs longer than 63 #7318), that means everything on the left of the .ipfs. will be merged (. will be removed): bafy.longhash.ipfs.example.combafylonghash
    If there is foo. prefix, it will break the CID, unless we make it extra smart and aware of PublicGateway wildcards.

This needs tests that document behavior when multiple rules are added for with the same root domain: example.com, *.example.com, *.*.example.com
Ha yes, I can do that once you agree on the logic.

Right now, knownSubdomainDetails will match the shortest hostname in PublicGateways map:

https://github.com/ipfs/go-ipfs/blob/0114869d7e7d43038d1a05cd76fec22aa07f78ff/core/corehttp/hostname.go#L210-L213

If rules for example.com, *.example.com, and *.*.example.com exist simultanously in PublicGateways it may be a problem for cid.ipfs.foo.bar.example.com – it will return false-negative for example.com because isSubdomainNamespace('bar') is false.

I think there are two acceptable ways to resolve this (I am ok with either):

  • (A) refactor knownSubdomainDetails so it tries to match against all matching gateways, and return result for one that pass Host value checks
  • (B) disallow hostname overlaps: throw error on start if there are multiple rules ending with the same FQDN – removing the need for handling gateway overlaps

Do you need gateway overlap support? If not, B may be less work.


@MichaelMure I think its ok for you to continue working on this PR – wildcard support is something we do want to have, and proposed logic seems sound (just need to decide between A and B).

Just FYSA we will want to land fix for #7318 first, and then rebase this PR on top of that. I'd know more details on Thursday/Friday after check in with @Stebalien and others.

@MichaelMure
Copy link
Contributor Author

MichaelMure commented May 19, 2020

Ha, I'm not trying to have the CID being a wildcard. Wildcards in this PR are only for the part right of .ipfs. so even with long hash support it should still work.

I'll have a look at knownSubdomainDetails and isSubdomainNamespace.

@MichaelMure
Copy link
Contributor Author

MichaelMure commented May 22, 2020

FYI, the main usecase for this PR (to pass a user ID in the host) pretty much disappeared when I found out that TLS certificate wildcards are only possible for a single left-most wildcard. It is still useful IMHO but not critical anymore.

@Stebalien Stebalien requested a review from lidel May 22, 2020 16:41
@MichaelMure
Copy link
Contributor Author

FYI, the main usecase for this PR (to pass a user ID in the host) pretty much disappeared when I found out that TLS certificate wildcards are only possible for a single left-most wildcard. It is still useful IMHO but not critical anymore.

^ disregard that, it's still useful for us, although the plan is getting more complex by the hour.

Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MichaelMure quick heads up: we want to merge #7358 first, to add generic suport for long CIDs.
This PR will have to be rebased on top of that to ensure it passes newly added splitting tests.

@MichaelMure
Copy link
Contributor Author

@lidel if #7358 is parked, can we merge this one ?

@MichaelMure
Copy link
Contributor Author

In dc5e7c5 I made sure that the correct GW spec is returned for each matches. To do so, they are now passed around with a pointer instead of by value.

@lidel lidel added the topic/gateway Topic gateway label Jun 15, 2020
Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MichaelMure I agree, this is no longer blocked by splitting logic.

AFAIK things missing from this PR:

  • add basic wildcard sharness tests at the end of test/sharness/t0114-gateway-subdomains.sh
    • set config that defines subdomain gateways at *.example.net and *.*.example.com
    • request $CIDv1.ipfs.foo.example.net and $CIDv1.ipfs.foo.bar.example.com and confirm the payload is $CID_VAL
    • request example.net/ipfs/$CIDv1 and example.com/ipfs/$CIDv1 and confirm response includes Location header with redirect to correct subdomain
    • (reuse helper funcrions from existing tests – ping me if something is unclear)
  • add a wildcard example as a new recipe in https://github.com/ipfs/go-ipfs/blob/master/docs/config.md#gateway-recipes

@MichaelMure
Copy link
Contributor Author

MichaelMure commented Jul 1, 2020

@lidel I rebased on master and added some sharness tests

Last step should be the documentation.

@MichaelMure MichaelMure requested a review from lidel July 1, 2020 15:25
@MichaelMure
Copy link
Contributor Author

Now with documentation.

@MichaelMure
Copy link
Contributor Author

@lidel I noticed that I somehow overwrote my own PR, which removed the sharness tests and other improvements. It's fixed now and rebased on master. However, your latest changes are now failing (see https://app.circleci.com/pipelines/github/ipfs/go-ipfs/3177/workflows/9473c54a-5884-4fec-9b02-6f4da71f72ad/jobs/36945/tests). Any idea why ?

@lidel
Copy link
Member

lidel commented Jul 15, 2020

I can confirm, cd test/sharness && ./t0114-gateway-subdomains.sh -v fails for me locally too (this PR) but pass for master @ b98f797

I don't have bandwidth to dig deeper into this, but suspect recent changes related to ED25519 support are the cause (see #7251 (comment)), let's wait for that set of changes to settle in.

lidel referenced this pull request Jul 15, 2020
* add support for choosing a peer key type (e.g. RSA or Ed25519) when initializing the repo
* test all variants of ipfs init: RSA, Ed25519 and default
* update subdomain gateway sharness test to publish IPNS using RSA and
Ed25519 keys
* use default identity bit lengths defined in config repo instead of
having separate defaults in go-ipfs
* update config repo dependency

Co-authored-by: Will Scott <will@cypherpunk.email>
Co-authored-by: Petar Maymounkov <petarm@gmail.com>
@MichaelMure
Copy link
Contributor Author

Any chance to have this merged for go-ipfs 0.7 ?

@lidel
Copy link
Member

lidel commented Jul 24, 2020

@MichaelMure I believe the problem was fixed in master, RSA and ED have now separate tests now:

https://github.com/ipfs/go-ipfs/blob/5b28704e505eb9a65c1ef8d2336da95af8e828c8/test/sharness/t0114-gateway-subdomains.sh#L115-L137

Do you mind rebasing to see if sharness problems go away?

Add support for one or more wildcards in the hostname definition
of a public gateway. This is useful for example to support easily
multiples environment.

Wildcarded hostname are set in the config as for example "*.domain.tld".
@MichaelMure
Copy link
Contributor Author

@lidel it didn't work :( Any other idea?

@lidel
Copy link
Member

lidel commented Aug 6, 2020

I'm blocking some time today to take a closer look.

This ensures implicit defaults are always present, even when
Gateway.PublicGateways is defined in the config.

User still can disable them, but needs to do it per hostname.

License: MIT
Signed-off-by: Marcin Rataj <lidel@lidel.org>
Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, sharness is green.

@MichaelMure FYSA I fixed tests and adjusted the logic in 2ff6f1a (details inline)

@aschmahmann this is ready for your review.

Comment on lines 236 to 245
if len(publicGateways) == 0 {
hosts.exact = make(
map[string]*config.GatewaySpec,
len(defaultKnownGateways),
)
for hostname, gw := range defaultKnownGateways {
hosts.exact[hostname] = gw
}
return hosts
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MichaelMure FYI this is why sharness tests were failing: you were applying implicit defaults only when Gateway.PublicGateways was empty. This meant subdomain gateway at localhost was missing when you had any public gateway defined. I fixed it in 2ff6f1a and added explicit test to guard against regressions.

test/sharness/t0114-gateway-subdomains.sh Show resolved Hide resolved
@lidel lidel requested a review from aschmahmann August 6, 2020 12:13
@lidel lidel added need/maintainers-input Needs input from the current maintainer(s) need/review Needs a review labels Aug 6, 2020
@MichaelMure
Copy link
Contributor Author

Thanks :)

@aschmahmann
Copy link
Contributor

@MichaelMure thanks for the PR. @lidel thanks for getting the tests working here and for the ping, I'll try and take a look this this week.

@MichaelMure
Copy link
Contributor Author

@aschmahmann friendly ping :)

@aschmahmann
Copy link
Contributor

@MichaelMure thanks for the ping. I'm a little low bandwidth this week, but I'd like to get this in for the 0.7 RC. Will take a look.

@jacobheun jacobheun added this to the go-ipfs 0.7 milestone Aug 18, 2020
Copy link
Contributor

@aschmahmann aschmahmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a few nits and questions for you two @MichaelMure and @lidel.

Thanks for moving this along and the ping. We should be able to get this merged in later today.

core/corehttp/hostname.go Outdated Show resolved Hide resolved
Comment on lines 108 to 112
gwLocalhost := &config.GatewaySpec{}
gwDweb := &config.GatewaySpec{}
gwLong := &config.GatewaySpec{}
gwWildcard1 := &config.GatewaySpec{}
gwWildcard2 := &config.GatewaySpec{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the fact that we no longer need UseSubdomains : true just something that happens to be true now?

It seems to me like knownSubdomainDetails would reasonably not return gateways that did not have UseSubdomains : true, but perhaps I've misunderstood.

We don't necessarily need to change the test here, I'm just asking to understand what's going on and if we should be adding checks to knownSubdomainDetails

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave @lidel answer that one. If I made a change like that it was not on purpose.

Copy link
Member

@lidel lidel Aug 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

knownSubdomainDetails does not check UseSubdomains flag because we perform that check later in:

https://github.com/MichaelMure/go-ipfs/blob/2ff6f1a80d6a070465e8c157a750ecf17687642b/core/corehttp/hostname.go#L149-L153

Without this, knownSubdomainDetails request would fallback to the generic, host-agnostic gateway at 127.0.0.1:8080/ipfs/* and we dont want that to happen on something that looks like a subdomain gateway, but is not one, as illustrated in:
https://github.com/MichaelMure/go-ipfs/blob/2ff6f1a80d6a070465e8c157a750ecf17687642b/test/sharness/t0114-gateway-subdomains.sh#L646-L652

I've updated tests to be explicit and comments in 6b6569f (#7319)

I am open to renaming knownSubdomainDetails to something else if it is still confusing.

Comment on lines +265 to +267
hosts.wildcard = append(hosts.wildcard, wildcardHost{re: re, spec: gw})
} else {
hosts.exact[hostname] = gw
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason why we're using pointers to GatewaySpec now?

Are we concerned at all about the caller doing something silly like modifying the struct after it's been passed in here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC it was just so that knownSubdomainDetails could be tested as returning the correct gateway config.

https://github.com/ipfs/go-ipfs/pull/7319/files#diff-a2ecbeeb066383b7a375fc1f924cb53dR189

ipfs#7319 (comment)

License: MIT
Signed-off-by: Marcin Rataj <lidel@lidel.org>
@aschmahmann aschmahmann merged commit 0c57e9d into ipfs:master Aug 19, 2020
@MichaelMure
Copy link
Contributor Author

🎉 thanks for the help :)

@aschmahmann aschmahmann mentioned this pull request Sep 22, 2020
72 tasks
hacdias pushed a commit to ipfs/boxo that referenced this pull request Jan 27, 2023
ipfs/kubo#7319 (comment)

License: MIT
Signed-off-by: Marcin Rataj <lidel@lidel.org>


This commit was moved from ipfs/kubo@6b6569f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need/maintainers-input Needs input from the current maintainer(s) need/review Needs a review topic/gateway Topic gateway
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants