Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix data leak between mux.cool connections #3718

Merged
merged 2 commits into from
Aug 25, 2024
Merged

Conversation

mmmray
Copy link
Collaborator

@mmmray mmmray commented Aug 22, 2024

Fix #116

This issue does not appear at all if mux.cool is disabled. Maybe the issue can be avoided if routing is disabled too, but it's possible that the same issue appears elsewhere.

I have debugged this issue in prod with a lot of print statements, and can confirm that this patch fixes it. Tests should be added. The mux server has no tests at all!

Step-by-step:

  1. The mux server is calling handleStatusNew with the same context multiple times. That function then calls w.dispatcher.Dispatch

  2. Dispatch assigns some values specific to the connection onto the outbound here:

    ob.OriginalTarget = destination
    ob.Target = destination

  3. Because the outbounds array on the connection object is the same across multiple connections (it's the same context), the value of outbound.Target flips back and forth between values such as "1.1.1.1" or "chat.facebook.com" (IPAddress vs DomainAddress)

  4. Many layers deeper in routing, as part of this Dispatch call, this function is called:

    if ctx.Outbound.Target.Address.Family().IsIP() {
    return []net.IP{ctx.Outbound.Target.Address.IP()}
    }

    It does two things, in order:

    1. Checks if Target is an IPAddress (IsIP)
    2. If it's an IP, call IP()

    Because Target keeps mutating back and forth between multiple kinds
    of address, it sometimes is an IP for Step 1, and a DomainAddress for
    Step 2

  5. Calling IP() on a DomainAddress panics:

     panic: Calling IP() on a DomainAddress.
    
     goroutine 123018 [running]:
     github.com/xtls/xray-core/common/net.domainAddress.IP(...)
         github.com/xtls/xray-core/common/net/address.go:172
     github.com/xtls/xray-core/features/routing/session.(*Context).GetTargetIPs(0xc004175428)
         github.com/xtls/xray-core/features/routing/session/context.go:54 +0x5f
     github.com/xtls/xray-core/app/router.(*MultiGeoIPMatcher).Apply(0xc0003a63e0, {0x156c770?, 0xc004175428?})
         github.com/xtls/xray-core/app/router/condition.go:143 +0x3e
     github.com/xtls/xray-core/app/router.(*ConditionChan).Apply(0x495c9d?, {0x156c770, 0xc004175428})
         github.com/xtls/xray-core/app/router/condition.go:32 +0x5c
     github.com/xtls/xray-core/app/router.(*Rule).Apply(...)
         github.com/xtls/xray-core/app/router/config.go:30
     github.com/xtls/xray-core/app/router.(*Router).pickRouteInternal(0xc000367a40, {0x156c770, 0xc004175428})
         github.com/xtls/xray-core/app/router/router.go:196 +0x176
     github.com/xtls/xray-core/app/router.(*Router).PickRoute(0x1562810?, {0x156c770?, 0xc004175428?})
         github.com/xtls/xray-core/app/router/router.go:84 +0x25
     github.com/xtls/xray-core/app/dispatcher.(*DefaultDispatcher).routedDispatch(0xc000398fc0, {0x1562810, 0xc0050cd350}, 0xc00418dea0, {{0x15625a8, 0xc005974c10}, 0x1bb, 0x2})
         github.com/xtls/xray-core/app/dispatcher/default.go:420 +0x307
     created by github.com/xtls/xray-core/app/dispatcher.(*DefaultDispatcher).Dispatch in goroutine 123015
         github.com/xtls/xray-core/app/dispatcher/default.go:252 +0x56c
    

Fix XTLS#116

This issue does not appear at all if mux.cool is disabled. Maybe the
issue can be avoided if routing is disabled too, but it's possible that
the same issue appears elsewhere.

Tests should be added. ***The mux server has no tests at all!***

Step-by-step:

1. The mux server is calling handleStatusNew with the same context
   multiple times. That function then calls w.dispatcher.Dispatch

2. Dispatch assigns some values specific to the connection onto the
   outbound here:
   https://github.com/XTLS/Xray-core/blob/83eef6bc1f554be84aeb799417688a070cd32ab8/app/dispatcher/default.go#L241-L242

3. Because the outbounds array on the connection object is the same
   across multiple connections (it's the same context), the value of
   outbound.Target flips back and forth between values such as "1.1.1.1"
   or "chat.facebook.com" (IPAddress vs DomainAddress)

4. Many layers deeper in routing, as part of this Dispatch call, this
   function is called: https://github.com/XTLS/Xray-core/blob/83eef6bc1f554be84aeb799417688a070cd32ab8/features/routing/session/context.go#L53-L55

   It does two things, in order:

   1. Checks if Target is an IPAddress (IsIP)
   2. If it's an IP, call IP()

   Because Target keeps mutating back and forth between multiple kinds
   of address, it sometimes is an IP for Step 1, and a DomainAddress for
   Step 2

5. Calling IP() on a DomainAddress panics:

        panic: Calling IP() on a DomainAddress.

        goroutine 123018 [running]:
        github.com/xtls/xray-core/common/net.domainAddress.IP(...)
            github.com/xtls/xray-core/common/net/address.go:172
        github.com/xtls/xray-core/features/routing/session.(*Context).GetTargetIPs(0xc004175428)
            github.com/xtls/xray-core/features/routing/session/context.go:54 +0x5f
        github.com/xtls/xray-core/app/router.(*MultiGeoIPMatcher).Apply(0xc0003a63e0, {0x156c770?, 0xc004175428?})
            github.com/xtls/xray-core/app/router/condition.go:143 +0x3e
        github.com/xtls/xray-core/app/router.(*ConditionChan).Apply(0x495c9d?, {0x156c770, 0xc004175428})
            github.com/xtls/xray-core/app/router/condition.go:32 +0x5c
        github.com/xtls/xray-core/app/router.(*Rule).Apply(...)
            github.com/xtls/xray-core/app/router/config.go:30
        github.com/xtls/xray-core/app/router.(*Router).pickRouteInternal(0xc000367a40, {0x156c770, 0xc004175428})
            github.com/xtls/xray-core/app/router/router.go:196 +0x176
        github.com/xtls/xray-core/app/router.(*Router).PickRoute(0x1562810?, {0x156c770?, 0xc004175428?})
            github.com/xtls/xray-core/app/router/router.go:84 +0x25
        github.com/xtls/xray-core/app/dispatcher.(*DefaultDispatcher).routedDispatch(0xc000398fc0, {0x1562810, 0xc0050cd350}, 0xc00418dea0, {{0x15625a8, 0xc005974c10}, 0x1bb, 0x2})
            github.com/xtls/xray-core/app/dispatcher/default.go:420 +0x307
        created by github.com/xtls/xray-core/app/dispatcher.(*DefaultDispatcher).Dispatch in goroutine 123015
            github.com/xtls/xray-core/app/dispatcher/default.go:252 +0x56c
@mmmray
Copy link
Collaborator Author

mmmray commented Aug 22, 2024

Without mux, the outbounds array is reset entirely for every connection: https://github.com/mmmray/Xray-core/blob/7a80df07b2aa83685d41d72885d201ab47ef9c05/app/proxyman/inbound/worker.go#L83

maybe it's a more appropriate fix?

@mmmray
Copy link
Collaborator Author

mmmray commented Aug 23, 2024

@yuhan6665 or @RPRX please review, if you have ideas on how to test, tell me as well.

@yuhan6665
Copy link
Member

yuhan6665 commented Aug 24, 2024

Thanks for your great investigation @mmmray!
I have another thought:
These addresses has been confusing. As far as I know, we will only have three different target addresses (at most): original from protocol, sniffed, and for routing. Why do we need to mutate? We can refactor so that these addresses should never change?
Maybe, add a convenient function like check if original is fake IP and return sniffed instead..
What do you think @RPRX?

@mmmray
Copy link
Collaborator Author

mmmray commented Aug 24, 2024

Going even further, I don't know why the context is used to carry Target at all. I suppose that somewhere, it was too annoying to pass as function argument, but I did not notice it. But i'm also new to the code.

@yuhan6665
Copy link
Member

In general, context should contain (as much as) session level information. I think inbound, outbounds so on and so forth info is ok. XTLS also use it pass raw conn object. A bit strange but I think it is also ok.

@mmmray
Copy link
Collaborator Author

mmmray commented Aug 25, 2024

I think some refactoring in this area is really needed. The other one is around casting to PacketConnWrapper (see #3711 #3559). Anyway, maybe for another time.

@mmmray mmmray merged commit 3dd3bf9 into XTLS:main Aug 25, 2024
36 checks passed
@mmmray mmmray deleted the mux-routing-leak branch August 25, 2024 19:02
@sssagsag
Copy link

sssagsag commented Aug 25, 2024

I think some refactoring in this area is really needed. The other one is around casting to PacketConnWrapper (see #3711 #3559). Anyway, maybe for another time.

thanks for investigating and fixing , please build new version of core soon as possible
It has been almost a month that the core with new merges has not been released
@RPRX not planning for build new versions?

@mmmray
Copy link
Collaborator Author

mmmray commented Aug 25, 2024

Indeed I think a new version should be released soon. There are a few bugfixes, and this new fragment PR sitting around in master doesn't help with its lifetime. I was hoping to wait for #3711 though. Let's see.

I think the next version was planned to be related to splithttp mux settings, but... it's ready when it's ready IMO and shouldn't hold up critical bugfixes.

@sssagsag
Copy link

sssagsag commented Aug 25, 2024

Indeed I think a new version should be released soon. There are a few bugfixes, and this new fragment PR sitting around in master doesn't help with its lifetime. I was hoping to wait for #3711 though. Let's see.

The new fragment pr has no problem with ws or httpupgrade any operator in I r a n i used now, please don't make a cover for it 😆

and splithttp almost blocked in iran , beacuse udp blocked in many isp

@mikeesierrah
Copy link

@sssagsag nope , thats not true

splithttp isnt based on quic , udp mode - quic - h3 is optional

@sssagsag
Copy link

sssagsag commented Aug 25, 2024

@sssagsag nope , thats not true

splithttp isnt based on quic , udp mode - quic - h3 is optional

yes iknow , but i cant used without h3 , with many isp

@sssagsag
Copy link

sssagsag commented Aug 25, 2024

Indeed I think a new version should be released soon. There are a few bugfixes, and this new fragment PR sitting around in master doesn't help with its lifetime. I was hoping to wait for #3711 though. Let's see.

I think the next version was planned to be related to splithttp mux settings, but... it's ready when it's ready IMO and shouldn't hold up critical bugfixes.

maybe this pr helped #3677

zxspirit pushed a commit to zxspirit/Xray-core that referenced this pull request Aug 30, 2024
@dyhkwong
Copy link
Contributor

Why not just

func (ctx *Context) GetTargetIPs() []net.IP {
	if ctx.Outbound == nil {
		return nil
	}
	target := ctx.Outbound.Target
	if !target.IsValid() {
		return nil
	}

	if target.Address.Family().IsIP() {
		return []net.IP{target.Address.IP()}
	}

	return nil
}

to prevent ctx.Outbound.Target to be called for multiple times? Is a complex "fix" necessary?

And I think this is introduced in 017f53b, not a bug report from 2020.

@mmmray
Copy link
Collaborator Author

mmmray commented Oct 13, 2024

To me the problem is the racing write in Dispatch, not the racing read in GetTargetIPs. If I only fix GetTargetIPs as you said, the following sequence is still possible. (Sub-)Connection A and B are part of the same physical mux.cool connection:

  • Dispatch from connection A is writing
  • Dispatch from connection B is writing
  • GetTargetIPs from connection A is reading (it reads the result from Dispatch B)
  • GetTargetIPs from connection B i reading (it reads the result from Dispatch B)

So now, xray may route connection A based on information found in connection B! And beside GetTargetIPs, there are many other reads for multiple routing rules. Really, the most problematic part of this bug is not the crash, but the potential mis-routing as a result of the race condition.

leninalive pushed a commit to amnezia-vpn/amnezia-xray-core that referenced this pull request Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

panic: Calling IP() on a DomainAddress
5 participants