Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hangs indefinitely while removing toxic #210

Closed
tekjar opened this issue Apr 2, 2018 · 7 comments
Closed

Hangs indefinitely while removing toxic #210

tekjar opened this issue Apr 2, 2018 · 7 comments

Comments

@tekjar
Copy link

tekjar commented Apr 2, 2018

package main

import (
	"time"

	toxiproxy "github.com/Shopify/toxiproxy/client"
	"github.com/sirupsen/logrus"
)

var log = logrus.New()

func init() {
	log.SetLevel(logrus.DebugLevel)
}

type Toxic struct {
	client *toxiproxy.Client
	proxy  *toxiproxy.Proxy
}

func (toxic *Toxic) Clean() {
	log.Debugln("Cleaning Toxics ......")

	toxic.proxy.RemoveToxic("bandwidth_down")
	log.Debugln("Done cleaning ......")
}

func (toxic *Toxic) HalfOpenConnection() {
	toxic.Clean()
	log.Debugln("Applying Halfopen Connection Toxic")

	toxic.proxy.AddToxic("bandwidth_down", "bandwidth", "downstream", 1.0, toxiproxy.Attributes{
		"rate": 0,
	})
}

func main() {
	var err error

	// connect to toxiproxy server
	var client = toxiproxy.NewClient("localhost:8474")
	proxy, err := client.CreateProxy("toxicbroker", "127.0.0.1:9883", "127.0.0.1:1883")

	if err != nil {
		log.Fatalln("Couldn't create proxy client", err)
	}

	toxic := Toxic{client, proxy}

	var clean1 = time.After(1 * time.Second)
	var halfopen <-chan time.Time

	var toxicTime = 20 * time.Second

	for {
		select {
		case <-clean1:
			halfopen = time.After(toxicTime)
			toxic.Clean()
		case <-halfopen:
			clean1 = time.After(toxicTime)
			toxic.HalfOpenConnection()
		}
	}
}

The Clean methods blocks indefinitely. This is happening on both 2.1.2 and 2.1.3

@xthexder
Copy link
Contributor

xthexder commented Apr 2, 2018

This seems to be due to the bandwidth rate being set to 0. Right now zero / negative are undefined behavior for most toxics.
This should probably be tested and documented in toxiproxy. Some of the toxics can be destructive to data too when removed, which shouldn't be the case.

I believe this was one of the issues I was trying to address in my Bidirectional toxics branch. Maybe I'll spend some time updating it.

@tekjar
Copy link
Author

tekjar commented Apr 2, 2018

@xthexder Thanks. I can help with testing your branch. Is there any other way to simulate half open connections?

@tekjar
Copy link
Author

tekjar commented Dec 14, 2018

Hi. Any update on this?

@Donnerbart
Copy link

Donnerbart commented Oct 11, 2022

Hey folks! We ran frequently into this bug when using shopify/toxiproxy:2.1.4 with Testcontainers. Was this by any chance fixed with ghcr.io/shopify/toxiproxy:2.5.0? Some of the 2.5.0 changes sound at least promising regarding this bug (Add TimeoutHandler for the HTTP API server, On interrupting Bandwidth toxic, use non-blocking writes etc.).

@miry
Copy link
Contributor

miry commented Oct 12, 2022

@Donnerbart It should be fixed in latest version by #436 .

I am going to close this issue, pls raise a new issue in case you have still problem.

@miry miry closed this as completed Oct 12, 2022
@Donnerbart
Copy link

@miry Thanks for the clarification about the deadlock.

@miry @xthexder Is it still unsupported to use the bandwidth proxy with rate 0? This is what the official TestContainers container does, see https://github.com/testcontainers/testcontainers-java/blob/main/modules/toxiproxy/src/main/java/org/testcontainers/containers/ToxiproxyContainer.java#L175

We had big issues with hanging tests and created a workaround using the timeout proxy with 0. That one works fine for us, but we're not 100% sure if it's a valid replacement/use-case. If so we could provide a fix for the Testcontainers code.

@xthexder
Copy link
Contributor

I think the timeout/non-blocking write changes have resolved the deadlocks in the bandwidth toxic.
There's still some edge-case behavior to worry about if you plan on pausing a connection with bandwidth 0. Any time the bandwidth toxic is interrupted it may let some data through (Can happen when another toxic is added or removed, or the bandwidth is updated). Also if for some reason buffers get full and write timeouts start happening, the stream will be corrupted due to dropped data (This should log in the toxiproxy server at least).

If you don't care about resuming the connection after pausing, then the timeout toxic will be the better option. It will not buffer any data, and will immediately drop it, since the connection is expected to be closed after.

I'm not an active user of Toxiproxy these days, so I haven't really tested this scenario with the latest version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants