Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade thespian to fix hang on macOS #1653

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

pquentin
Copy link
Member

Closes #1575

See thespianpy/Thespian#70 for more context.

@pquentin pquentin added the bug Something's wrong label Jan 16, 2023
@pquentin pquentin requested review from inqueue and b-deam January 16, 2023 13:10
@pquentin pquentin self-assigned this Jan 16, 2023
Copy link
Member

@b-deam b-deam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't delved into it, but this upgrade breaks Rally completely on MacOS:

$ esrally race --track=geopoint  --pipeline=benchmark-only --target-hosts=https://localhost:9222 --challenge=append-fast-with-conflicts --telemetry=node-stats --track-params="bulk_indexing_clients:32" --client-options="timeout:180,verify_certs:false,basic_auth_user:'elastic',basic_auth_password:'changeme'" --kill-running-processes --test-mode

    ____        ____
   / __ \____ _/ / /_  __
  / /_/ / __ `/ / / / / /
 / _, _/ /_/ / / / /_/ /
/_/ |_|\__,_/_/_/\__, /
                /____/

[INFO] Race id is [0557ed5b-3bb2-46e9-a857-df177e17697b]
DEBUG:esrally.actor:Checking capabilities [{'Thespian ActorSystem Name': 'multiprocQueueBase', 'Thespian ActorSystem Version': 1, 'Python Version': (3, 8, 10, 'final', 0), 'Thespian Generation': (3, 10), 'Thespian Version': '1673927835624'}] against requirements [{'coordinator': True}] failed.
[WARNING] Could not terminate all internal processes within timeout. Please check and force-terminate all Rally processes.
[ERROR] Cannot race. This race ended with a fatal crash.

Getting further help:
*********************
* Check the log files in /Users/bradleydeam/.rally/logs for errors.
* Read the documentation at https://esrally.readthedocs.io/en/latest/.
* Ask a question on the forum at https://discuss.elastic.co/tags/c/elastic-stack/elasticsearch/rally.
* Raise an issue at https://github.com/elastic/rally/issues and include the log files in /Users/bradleydeam/.rally/logs.

--------------------------------
[INFO] FAILURE (took 25 seconds)
--------------------------------
Rally logs
23-01-17 03:57:10,587 -not-actor-/PID:98405 esrally.rally INFO Actor system already running locally? [False]
2023-01-17 03:57:10,587 -not-actor-/PID:98405 esrally.actor INFO Starting actor system with system base [multiprocTCPBase] and capabilities [{'coordinator': True, 'ip': '127.0.0.1', 'Convention Address.IPv4': '127.0.0.1:1900'}].
2023-01-17 03:57:10,615 -not-actor-/PID:98420 root INFO ++++ Actor System gen (3, 10) started, admin @ ActorAddr-(T|:1900)
2023-01-17 03:57:15,616 -not-actor-/PID:98405 esrally.actor ERROR Could not initialize internal actor system.
Traceback (most recent call last):
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/esrally/actor.py", line 263, in bootstrap_actor_system
    return thespian.actors.ActorSystem(system_base, logDefs=log.load_configuration(), capabilities=capabilities)
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/actors.py", line 637, in __init__
    systemBase = self._startupActorSys(
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/actors.py", line 678, in _startupActorSys
    systemBase = sbc(self, logDefs=logDefs)
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/system/multiprocTCPBase.py", line 28, in __init__
    super(ActorSystemBase, self).__init__(system, logDefs)
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/system/multiprocCommon.py", line 83, in __init__
    super(multiprocessCommon, self).__init__(system, logDefs)
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/system/systemBase.py", line 326, in __init__
    self._startAdmin(self.adminAddr,
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/system/multiprocCommon.py", line 115, in _startAdmin
    raise InvalidActorAddress(adminAddr,
thespian.actors.InvalidActorAddress: ActorAddr-(T|:1900) is not a valid ActorSystem admin
2023-01-17 03:57:15,621 -not-actor-/PID:98405 esrally.rally INFO Falling back to offline actor system.
2023-01-17 03:57:15,636 -not-actor-/PID:98433 root INFO ++++ Actor System gen (3, 10) started, admin @ ActorAddr-Q.ThespianQ
2023-01-17 03:57:15,650 -not-actor-/PID:98405 esrally.racecontrol INFO Race id is [0557ed5b-3bb2-46e9-a857-df177e17697b]
2023-01-17 03:57:15,651 -not-actor-/PID:98405 esrally.racecontrol INFO User specified pipeline [benchmark-only].
2023-01-17 03:57:15,651 -not-actor-/PID:98405 esrally.racecontrol INFO Using configured hosts [{'host': 'localhost', 'port': 9222, 'use_ssl': True}]
2023-01-17 03:57:18,660 -not-actor-/PID:98405 esrally.rally INFO Attempting to shutdown internal actor system.
2023-01-17 03:57:18,684 -not-actor-/PID:98433 root INFO ---- Actor System shutdown
2023-01-17 03:57:18,685 -not-actor-/PID:98405 esrally.rally INFO Actor system is still running. Waiting...
2023-01-17 03:57:19,691 -not-actor-/PID:98405 esrally.rally INFO Actor system is still running. Waiting...
2023-01-17 03:57:20,696 -not-actor-/PID:98405 esrally.rally INFO Actor system is still running. Waiting...
2023-01-17 03:57:21,702 -not-actor-/PID:98405 esrally.rally INFO Actor system is still running. Waiting...
2023-01-17 03:57:22,704 -not-actor-/PID:98405 esrally.rally INFO Actor system is still running. Waiting...
2023-01-17 03:57:23,709 -not-actor-/PID:98405 esrally.rally INFO Actor system is still running. Waiting...
2023-01-17 03:57:24,715 -not-actor-/PID:98405 esrally.rally INFO Actor system is still running. Waiting...
2023-01-17 03:57:25,716 -not-actor-/PID:98405 esrally.rally INFO Actor system is still running. Waiting...
2023-01-17 03:57:26,721 -not-actor-/PID:98405 esrally.rally INFO Actor system is still running. Waiting...
2023-01-17 03:57:27,723 -not-actor-/PID:98405 esrally.rally INFO Actor system is still running. Waiting...
2023-01-17 03:57:28,725 -not-actor-/PID:98405 esrally.rally INFO Actor system is still running. Waiting...
2023-01-17 03:57:29,731 -not-actor-/PID:98405 esrally.rally INFO Actor system is still running. Waiting...
2023-01-17 03:57:30,734 -not-actor-/PID:98405 esrally.rally INFO Actor system is still running. Waiting...
2023-01-17 03:57:31,738 -not-actor-/PID:98405 esrally.rally INFO Actor system is still running. Waiting...
2023-01-17 03:57:32,741 -not-actor-/PID:98405 esrally.rally INFO Actor system is still running. Waiting...
2023-01-17 03:57:33,742 -not-actor-/PID:98405 esrally.rally WARNING Shutdown timed out. Actor system is still running.
2023-01-17 03:57:33,742 -not-actor-/PID:98405 esrally.rally ERROR Cannot run subcommand [race].
Traceback (most recent call last):
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/esrally/racecontrol.py", line 369, in run
    pipeline(cfg)
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/esrally/racecontrol.py", line 71, in __call__
    self.target(cfg)
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/esrally/racecontrol.py", line 308, in benchmark_only
    return race(cfg, external=True)
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/esrally/racecontrol.py", line 256, in race
    benchmark_actor = actor_system.createActor(BenchmarkActor, targetActorRequirements={"coordinator": True})
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/actors.py", line 704, in createActor
    return self._systemBase.newPrimaryActor(actorClass,
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/system/systemBase.py", line 205, in newPrimaryActor
    raise NoCompatibleSystemForActor(
thespian.actors.NoCompatibleSystemForActor: No compatible ActorSystem could be found for Actor <class 'esrally.racecontrol.BenchmarkActor'>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/esrally/rally.py", line 1171, in dispatch_sub_command
    race(cfg, args.kill_running_processes)
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/esrally/rally.py", line 919, in race
    with_actor_system(racecontrol.run, cfg)
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/esrally/rally.py", line 949, in with_actor_system
    runnable(cfg)
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/esrally/racecontrol.py", line 378, in run
    raise exceptions.RallyError("This race ended with a fatal crash.").with_traceback(tb)
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/esrally/racecontrol.py", line 369, in run
    pipeline(cfg)
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/esrally/racecontrol.py", line 71, in __call__
    self.target(cfg)
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/esrally/racecontrol.py", line 308, in benchmark_only
    return race(cfg, external=True)
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/esrally/racecontrol.py", line 256, in race
    benchmark_actor = actor_system.createActor(BenchmarkActor, targetActorRequirements={"coordinator": True})
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/actors.py", line 704, in createActor
    return self._systemBase.newPrimaryActor(actorClass,
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/system/systemBase.py", line 205, in newPrimaryActor
    raise NoCompatibleSystemForActor(
esrally.exceptions.RallyError: This race ended with a fatal crash.

I tried it against a non-loopback interface and got the same error. I wonder if @inqueue has the same issue?

@b-deam
Copy link
Member

b-deam commented Jan 17, 2023

FWIW, I tested with "thespian==3.10.6" and I can't reproduce the thespian.actors.InvalidActorAddress exception.

@pquentin
Copy link
Member Author

Thanks! Interesting. If it's not too much work, would you mind trying again with thespian debug logs activated and share the output?

@b-deam
Copy link
Member

b-deam commented Jan 17, 2023

Thanks! Interesting. If it's not too much work, would you mind trying again with thespian debug logs activated and share the output?

Sure!

I think the key line is here, but I haven't dug into why my system is no longer meeting the capabilities requirements. In addition to testing with 3.10.6 I took a look through the Release Notes to see if there were any changes to the constructor that we hadn't adopted, but the fact that I can only repro this on 3.10.7 leads to to believe that Fix Issue #70: setting TCP_NODELAY on MacOS sometimes fails is probably to blame.

$ tail -f actor-system-internal.log
2023-01-17 17:24:23.715134 p11971 dbg  ++++ Starting Admin from /Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/__init__.py
2023-01-17 17:24:23.715477 p11971 I    ++++ Admin started @ ActorAddr-(T|:1900) / gen (3, 10)
2023-01-17 17:24:28.757419 p12025 dbg  ++++ Starting Admin from /Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/__init__.py
2023-01-17 17:24:28.761218 p12025 I    ++++ Admin started @ ActorAddr-Q.ThespianQ / gen (3, 10)
2023-01-17 17:24:28.770882 p12025 dbg  Admin of ReceiveEnvelope(from: ActorAddr-Q.ThespianQ.a, <class 'thespian.system.messages.multiproc.LoggerConnected'> msg: <thespian.system.messages.multiproc.LoggerConnected object at 0x106bd5dc0>)
2023-01-17 17:24:28.771869 p12025 dbg  actualTransmit of TransportIntent(ActorAddr-Q.~-pending-ExpiresIn_0:04:59.999780-<class 'thespian.system.messages.multiproc.EndpointConnected'>-<thespian.system.messages.multiproc.EndpointConnected object at 0x106bd5f10>-quit_0:04:59.999757)
2023-01-17 17:24:28.774415 p11963 dbg  actualTransmit of TransportIntent(ActorAddr-Q.ThespianQ-pending-ExpiresIn_0:04:59.999684-<class 'thespian.system.messages.admin.QueryExists'>-<thespian.system.messages.admin.QueryExists object at 0x106bcaf70>-quit_0:04:59.999662)
2023-01-17 17:24:28.775267 p12025 dbg  Admin of ReceiveEnvelope(from: ActorAddr-Q.~, <class 'thespian.system.messages.admin.QueryExists'> msg: <thespian.system.messages.admin.QueryExists object at 0x106be6040>)
2023-01-17 17:24:28.775951 p12025 dbg  Attempting intent TransportIntent(ActorAddr-Q.~-pending-ExpiresIn_0:04:59.999735-<class 'thespian.system.messages.admin.QueryAck'>-<thespian.system.messages.admin.QueryAck object at 0x106bd5d60>-quit_0:04:59.999715)
2023-01-17 17:24:28.776373 p12025 dbg  actualTransmit of TransportIntent(ActorAddr-Q.~-pending-ExpiresIn_0:04:59.999303-<class 'thespian.system.messages.admin.QueryAck'>-<thespian.system.messages.admin.QueryAck object at 0x106bd5d60>-quit_0:04:59.999289)
2023-01-17 17:24:28.779637 p11963 dbg  actualTransmit of TransportIntent(ActorAddr-Q.ThespianQ-pending-ExpiresIn_0:04:59.999913-<class 'thespian.system.messages.admin.PendingActor'>-PendingActor#1_of_None-quit_0:04:59.999898)
2023-01-17 17:24:28.780059 p12025 dbg  Admin of ReceiveEnvelope(from: ActorAddr-Q.~1, <class 'thespian.system.messages.admin.PendingActor'> msg: PendingActor#1_of_None)
2023-01-17 17:24:28.780399 p12025 I    Pending Actor request received for esrally.racecontrol.BenchmarkActor reqs {'coordinator': True} from ActorAddr-Q.~1
2023-01-17 17:24:28.781272 p12025 dbg  actualTransmit of TransportIntent(ActorAddr-Q.ThespianQ.a-pending-ExpiresIn_0:04:59.999898-<class 'logging.LogRecord'>-<LogRecord: esrally.actor, 10, /Users/bradleydeam/perf/github.com/b-deam/rally/esrally/actor.py, 122, "Checking capabilities [{'Thespian ActorSystem N...-quit_0:04:59.999882)
- 2023-01-17 17:24:28.782002 p12025 Warn no system has compatible capabilities for Actor esrally.racecontrol.BenchmarkActor
2023-01-17 17:24:28.782706 p12025 dbg  Attempting intent TransportIntent(ActorAddr-Q.~1-pending-ExpiresIn_0:04:59.999777-<class 'thespian.system.messages.admin.PendingActorResponse'>-PendingActorResponse(for ActorAddr-Q.~1 inst# 1) errCode 3586 actual None-quit_0:04:59.999756)
2023-01-17 17:24:28.783091 p12025 dbg  actualTransmit of TransportIntent(ActorAddr-Q.~1-pending-ExpiresIn_0:04:59.999380-<class 'thespian.system.messages.admin.PendingActorResponse'>-PendingActorResponse(for ActorAddr-Q.~1 inst# 1) errCode 3586 actual None-quit_0:04:59.999366)
2023-01-17 17:24:31.786774 p11963 I    ActorSystem shutdown requested.

Also worth adding a stripped down repro using esrallyd:

$ esrallyd start --coordinator-ip 192.168.20.6 --node-ip 192.168.20.6
zsh: /usr/local/bin/esrallyd: bad interpreter: /usr/local/opt/python/bin/python3.7: no such file or directory
multiprocTCPBase {'coordinator': True, 'ip': '192.168.20.6', 'Convention Address.IPv4': '192.168.20.6:1900'}
Traceback (most recent call last):
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/bin/esrallyd", line 8, in <module>
    sys.exit(main())
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/esrally/rallyd.py", line 107, in main
    start(args)
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/esrally/rallyd.py", line 39, in start
    actor.bootstrap_actor_system(local_ip=args.node_ip, coordinator_ip=args.coordinator_ip)
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/esrally/actor.py", line 264, in bootstrap_actor_system
    return thespian.actors.ActorSystem(system_base, logDefs=log.load_configuration(), capabilities=capabilities)
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/actors.py", line 637, in __init__
    systemBase = self._startupActorSys(
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/actors.py", line 678, in _startupActorSys
    systemBase = sbc(self, logDefs=logDefs)
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/system/multiprocTCPBase.py", line 28, in __init__
    super(ActorSystemBase, self).__init__(system, logDefs)
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/system/multiprocCommon.py", line 83, in __init__
    super(multiprocessCommon, self).__init__(system, logDefs)
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/system/systemBase.py", line 326, in __init__
    self._startAdmin(self.adminAddr,
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/system/multiprocCommon.py", line 115, in _startAdmin
    raise InvalidActorAddress(adminAddr,
thespian.actors.InvalidActorAddress: ActorAddr-(T|:1900) is not a valid ActorSystem admin

@pquentin
Copy link
Member Author

Thanks a lot! The only information left I need to report the bug to Thespian would be the same logs but on thespian 3.10.6.

Also, why specifically 192.168.20.6? Is it you local IP address?

@inqueue
Copy link
Member

inqueue commented Jan 17, 2023

I tried it against a non-loopback interface and got the same error. I wonder if @inqueue has the same issue?

I see the same issue and I also saw it with a patched 3.10.6 during testing. Unfortunately, I actually validated the changes from https://github.com/thespianpy/Thespian/tree/issue70 by patching the version currently in use by Rally (3.10.1). The error has been referenced at thespianpy/Thespian#70 (comment).

@pquentin
Copy link
Member Author

For what it's worth I'm seeing Warn no system has compatible capabilities for Actor esrally.racecontrol.BenchmarkActor on Linux too. Comparing the logs between Thespian 3.10.6 and 3.10.7 would help a lot.

@b-deam
Copy link
Member

b-deam commented Jan 19, 2023

3.10.6

Patch:

diff --git a/pyproject.toml b/pyproject.toml
index 77469e44fceb228f02c9407bbc1f0709c351c1bf..4aee4ecf020899ce2dd0c0e02e2aa45d6674a706 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -58,7 +58,7 @@ dependencies = [
     "Jinja2==2.11.3",
     "markupsafe==2.0.1",
     # License: MIT
-    "thespian==3.10.7",
+    "thespian==3.10.6",
     # recommended library for thespian to identify actors more easily with `ps`
     # "setproctitle==1.1.10",
     # always use the latest version, these are certificate files...

Invocation:

$ make install
[...]
$ export THESPLOG_FILE="${THESPLOG_FILE:-${HOME}/.rally/logs/actor-system-internal.log}"
$ export THESPLOG_FILE_MAXSIZE=${THESPLOG_FILE_MAXSIZE:-204800}
$ export THESPLOG_THRESHOLD="DEBUG"
$ esrallyd start --coordinator-ip 192.168.20.6 --node-ip 192.168.20.6
[INFO] Successfully started actor system on node [192.168.20.6] with coordinator node IP [192.168.20.6].

Actor logs:

2023-01-19 14:18:05.685305 p6863 dbg  ++++ Starting Admin from /Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/__init__.py
2023-01-19 14:18:05.685794 p6863 I    ++++ Admin started @ ActorAddr-(T|:1900) / gen (3, 10)
2023-01-19 14:18:05.691125 p6863 dbg  Admin of ReceiveEnvelope(from: ActorAddr-(T|:64880), <class 'thespian.system.messages.multiproc.LoggerConnected'> msg: <thespian.system.messages.multiproc.LoggerConnected object at 0x1026c5610>)
2023-01-19 14:18:05.691638 p6863 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:64879)-pending-ExpiresIn_0:04:59.999885-<class 'thespian.system.messages.multiproc.EndpointConnected'>-<thespian.system.messages.multiproc.EndpointConnected object at 0x1026c5a00>-quit_0:04:59.999870)
2023-01-19 14:18:05.693301 p6844 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:1900)-pending-ExpiresIn_0:04:59.999882-<class 'thespian.system.messages.admin.QueryExists'>-<thespian.system.messages.admin.QueryExists object at 0x1026a4c40>-quit_0:04:59.999870)
2023-01-19 14:18:05.693816 p6863 dbg  Admin of ReceiveEnvelope(from: ActorAddr-(T|:64879), <class 'thespian.system.messages.admin.QueryExists'> msg: <thespian.system.messages.admin.QueryExists object at 0x1026c5eb0>)
2023-01-19 14:18:05.694198 p6863 dbg  Attempting intent TransportIntent(ActorAddr-(T|:64879)-pending-ExpiresIn_0:04:59.999883-<class 'thespian.system.messages.admin.QueryAck'>-<thespian.system.messages.admin.QueryAck object at 0x1026c5040>-quit_0:04:59.999874)
2023-01-19 14:18:05.694414 p6863 dbg  actualTransmit of TransportIntent(ActorAddr-(T|:64879)-pending-ExpiresIn_0:04:59.999663-<class 'thespian.system.messages.admin.QueryAck'>-<thespian.system.messages.admin.QueryAck object at 0x1026c5040>-quit_0:04:59.999656)

Rally logs:

2023-01-19 03:48:05,472 -not-actor-/PID:6844 esrally.utils.process DEBUG Running subprocess [git -C /Users/bradleydeam/perf/github.com/b-deam/rally/esrally --version] with logging.
2023-01-19 03:48:05,483 -not-actor-/PID:6844 esrally.utils.process DEBUG git version 2.37.1 (Apple Git-137.1)

2023-01-19 03:48:05,483 -not-actor-/PID:6844 esrally.utils.process DEBUG Subprocess [git -C /Users/bradleydeam/perf/github.com/b-deam/rally/esrally --version] finished with return code [0].
2023-01-19 03:48:05,484 -not-actor-/PID:6844 esrally.utils.process DEBUG Running subprocess [git -C /Users/bradleydeam/perf/github.com/b-deam/rally/esrally rev-parse HEAD] with output.
2023-01-19 03:48:05,499 -not-actor-/PID:6844 esrally.actor INFO Starting actor system with system base [multiprocTCPBase] and capabilities [{'coordinator': True, 'ip': '192.168.20.6', 'Convention Address.IPv4': '192.168.20.6:1900'}].
2023-01-19 03:48:05,686 -not-actor-/PID:6863 root INFO ++++ Actor System gen (3, 10) started, admin @ ActorAddr-(T|:1900)
2023-01-19 03:48:05,686 -not-actor-/PID:6863 root DEBUG Thespian source: /Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/__init__.py

3.10.7

Invocation:

$ export THESPLOG_FILE="${THESPLOG_FILE:-${HOME}/.rally/logs/actor-system-internal.log}"
$ export THESPLOG_FILE_MAXSIZE=${THESPLOG_FILE_MAXSIZE:-204800}
$ export THESPLOG_THRESHOLD="DEBUG"
$ esrallyd start --coordinator-ip 192.168.20.6 --node-ip 192.168.20.6
Traceback (most recent call last):
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/bin/esrallyd", line 8, in <module>
    sys.exit(main())
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/esrally/rallyd.py", line 107, in main
    start(args)
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/esrally/rallyd.py", line 39, in start
    actor.bootstrap_actor_system(local_ip=args.node_ip, coordinator_ip=args.coordinator_ip)
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/esrally/actor.py", line 263, in bootstrap_actor_system
    return thespian.actors.ActorSystem(system_base, logDefs=log.load_configuration(), capabilities=capabilities)
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/actors.py", line 637, in __init__
    systemBase = self._startupActorSys(
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/actors.py", line 678, in _startupActorSys
    systemBase = sbc(self, logDefs=logDefs)
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/system/multiprocTCPBase.py", line 28, in __init__
    super(ActorSystemBase, self).__init__(system, logDefs)
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/system/multiprocCommon.py", line 83, in __init__
    super(multiprocessCommon, self).__init__(system, logDefs)
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/system/systemBase.py", line 326, in __init__
    self._startAdmin(self.adminAddr,
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/system/multiprocCommon.py", line 115, in _startAdmin
    raise InvalidActorAddress(adminAddr,
thespian.actors.InvalidActorAddress: ActorAddr-(T|:1900) is not a valid ActorSystem admin

Actor logs:

2023-01-19 14:10:37.967074 p91216 dbg  ++++ Starting Admin from /Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/__init__.py
2023-01-19 14:10:37.967405 p91216 I    ++++ Admin started @ ActorAddr-(T|:1900) / gen (3, 10)

Rally logs:

2023-01-19 03:40:37,917 -not-actor-/PID:91212 esrally.utils.process DEBUG Running subprocess [git -C /Users/bradleydeam/perf/github.com/b-deam/rally/esrally --version] with logging.
2023-01-19 03:40:37,932 -not-actor-/PID:91212 esrally.utils.process DEBUG git version 2.37.1 (Apple Git-137.1)

2023-01-19 03:40:37,932 -not-actor-/PID:91212 esrally.utils.process DEBUG Subprocess [git -C /Users/bradleydeam/perf/github.com/b-deam/rally/esrally --version] finished with return code [0].
2023-01-19 03:40:37,932 -not-actor-/PID:91212 esrally.utils.process DEBUG Running subprocess [git -C /Users/bradleydeam/perf/github.com/b-deam/rally/esrally rev-parse HEAD] with output.
2023-01-19 03:40:37,951 -not-actor-/PID:91212 esrally.actor INFO Starting actor system with system base [multiprocTCPBase] and capabilities [{'coordinator': True, 'ip': '192.168.20.6', 'Convention Address.IPv4': '192.168.20.6:1900'}].
2023-01-19 03:40:37,967 -not-actor-/PID:91216 root INFO ++++ Actor System gen (3, 10) started, admin @ ActorAddr-(T|:1900)
2023-01-19 03:40:37,967 -not-actor-/PID:91216 root DEBUG Thespian source: /Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/__init__.py
2023-01-19 03:40:42,963 -not-actor-/PID:91212 esrally.actor ERROR Could not initialize internal actor system.
Traceback (most recent call last):
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/esrally/actor.py", line 263, in bootstrap_actor_system
    return thespian.actors.ActorSystem(system_base, logDefs=log.load_configuration(), capabilities=capabilities)
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/actors.py", line 637, in __init__
    systemBase = self._startupActorSys(
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/actors.py", line 678, in _startupActorSys
    systemBase = sbc(self, logDefs=logDefs)
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/system/multiprocTCPBase.py", line 28, in __init__
    super(ActorSystemBase, self).__init__(system, logDefs)
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/system/multiprocCommon.py", line 83, in __init__
    super(multiprocessCommon, self).__init__(system, logDefs)
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/system/systemBase.py", line 326, in __init__
    self._startAdmin(self.adminAddr,
  File "/Users/bradleydeam/perf/github.com/b-deam/rally/.venv/lib/python3.8/site-packages/thespian/system/multiprocCommon.py", line 115, in _startAdmin
    raise InvalidActorAddress(adminAddr,
thespian.actors.InvalidActorAddress: ActorAddr-(T|:1900) is not a valid ActorSystem admin

@gbanasiak
Copy link
Contributor

I'm hoping to be able to re-spin this PR after thespianpy/Thespian#77 is addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something's wrong
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rally hangs indefinitely on OSX when running 'esrally race --revision'
4 participants