-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workers not able to be created message error - but machines exist #1410
Comments
This is just running one of the example files in the repo doing some simple calcs. |
timed out
2024-11-23 06:49:50,337 [DEBUG] standalone.py:277 -- Found 0 free workers connected to VM instance lithops-master-dadab3
2024-11-23 06:49:50,337 [DEBUG] standalone.py:281 -- Going to create 2 new workers
{'username': 'ubuntu', 'password': 'c0d72593-5e1f-458b-aaad-74ab52be154d', 'key_filename': 'lithops-key-b597095a.aws_ec2.id_rsa'}
{'username': 'ubuntu', 'password': 'c0d72593-5e1f-458b-aaad-74ab52be154d', 'key_filename': 'lithops-key-b597095a.aws_ec2.id_rsa'}
2024-11-23 06:49:50,339 [DEBUG] aws_ec2.py:1223 -- Creating new VM instance lithops-worker-10c28285 (Spot)
2024-11-23 06:49:50,341 [DEBUG] aws_ec2.py:1223 -- Creating new VM instance lithops-worker-74d89509 (Spot)
2024-11-23 06:49:57,315 [DEBUG] standalone.py:256 -- Total worker VM instances created: 0/2
Traceback (most recent call last):
File lithopstest.py", line 44, in <module>
fexec.map(my_map_function, args)
File "/home/richard/miniconda3/envs/lithops/lib/python3.10/site-packages/lithops/executors.py", line 279, in map
futures = self.invoker.run_job(job)
File "/home/richard/miniconda3/envs/lithops/lib/python3.10/site-packages/lithops/invokers.py", line 280, in run_job
futures = self._run_job(job)
File "/home/richard/miniconda3/envs/lithops/lib/python3.10/site-packages/lithops/invokers.py", line 222, in _run_job
raise e
File "/home/richard/miniconda3/envs/lithops/lib/python3.10/site-packages/lithops/invokers.py", line 219, in _run_job
self._invoke_job(job)
File "/home/richard/miniconda3/envs/lithops/lib/python3.10/site-packages/lithops/invokers.py", line 267, in _invoke_job
activation_id = self.compute_handler.invoke(payload)
File "/home/richard/miniconda3/envs/lithops/lib/python3.10/site-packages/lithops/standalone/standalone.py", line 286, in invoke
raise Exception('It was not possible to create any worker')
Exception: It was not possible to create any worker I checked the ssh connection error 'timed out' so possibly there or an interaction between the spot api and something else - two machines were created again. I have several alive from the tests last night too - so this error presumably preventing auto shutdown. |
some testing yesterday I did this, which enabled the master VM to start def get_ssh_client(self):
"""
Creates an ssh client against the VM
"""
#if self.public:
if self.public and 1 == 2: #Richard 20241122
if not self.ssh_client or self.ssh_client.ip_address != self.public_ip:
self.ssh_client = SSHClient(self.public_ip, self.ssh_credentials)
else:
if not self.ssh_client or self.ssh_client.ip_address != self.private_ip:
self.ssh_client = SSHClient(self.private_ip, self.ssh_credentials)
return self.ssh_client |
Here, looking at public IP I think? 2024-11-23 07:10:32,657 [DEBUG] ssh_client.py:59 -- 10.1.111.11 ssh client created
2024-11-23 07:10:34,094 [DEBUG] standalone.py:277 -- Found 0 free workers connected to VM instance lithops-master-dadab3 (53.92.198.72) |
When I changed it from default spot, got further 2024-11-23 07:22:02,997 [DEBUG] aws_ec2.py:1275 -- VM instance lithops-worker-f12ca4e5 created successfully
2024-11-23 07:22:03,973 [DEBUG] aws_ec2.py:1275 -- VM instance lithops-worker-d10f6430 created successfully
2024-11-23 07:22:03,973 [DEBUG] standalone.py:259 -- Total worker VM instances created: 2/2
2024-11-23 07:22:03,973 [DEBUG] standalone.py:291 -- ExecutorID 661f55-0 | JobID M000 - Going to run 3 activations in 2 workers
2024-11-23 07:22:09,590 [DEBUG] invokers.py:271 -- ExecutorID 661f55-0 | JobID M000 - Job invoked (11.244s) - Activation ID: 661f55-0-M000
2024-11-23 07:22:09,590 [INFO] invokers.py:225 -- ExecutorID 661f55-0 | JobID M000 - View execution logs at /tmp/lithops-richard/logs/661f55-0-M000.log
2024-11-23 07:22:09,590 [DEBUG] monitor.py:429 -- ExecutorID 661f55-0 - Starting Storage job monitor
2024-11-23 07:22:09,590 [INFO] executors.py:494 -- ExecutorID 661f55-0 - Getting results from 3 function activations
2024-11-23 07:22:09,590 [INFO] wait.py:101 -- ExecutorID 661f55-0 - Waiting for 3 function activations to complete
2024-11-23 07:22:13,440 [DEBUG] monitor.py:144 -- ExecutorID 661f55-0 - Pending: 3 - Running: 0 - Done: 0
2024-11-23 07:22:53,439 [DEBUG] monitor.py:144 -- ExecutorID 661f55-0 - Pending: 3 - Running: 0 - Done: 0
2024-11-23 07:23:34,012 [DEBUG] monitor.py:144 -- ExecutorID 661f55-0 - Pending: 3 - Running: 0 - Done: 0 |
I wouldn't think that example takes very long to run though 2024-11-23 07:22:09,590 [INFO] wait.py:101 -- ExecutorID 661f55-0 - Waiting for 3 function activations to complete
2024-11-23 07:22:13,440 [DEBUG] monitor.py:144 -- ExecutorID 661f55-0 - Pending: 3 - Running: 0 - Done: 0
2024-11-23 07:22:53,439 [DEBUG] monitor.py:144 -- ExecutorID 661f55-0 - Pending: 3 - Running: 0 - Done: 0
2024-11-23 07:23:34,012 [DEBUG] monitor.py:144 -- ExecutorID 661f55-0 - Pending: 3 - Running: 0 - Done: 0
2024-11-23 07:24:14,118 [DEBUG] monitor.py:144 -- ExecutorID 661f55-0 - Pending: 3 - Running: 0 - Done: 0
2024-11-23 07:24:54,269 [DEBUG] monitor.py:144 -- ExecutorID 661f55-0 - Pending: 3 - Running: 0 - Done: 0
2024-11-23 07:25:36,625 [DEBUG] monitor.py:144 -- ExecutorID 661f55-0 - Pending: 3 - Running: 0 - Done: 0 |
and with not spot lithops worker list tells me Worker Name Created Instance Type Processes Runtime Mode Status TTD
----------------------- ----------------------- --------------- ----------- --------- ------ ---------- -------
lithops-worker-f12ca4e5 2024-11-22 20:52:09 UTC t3.medium 2 python3 reuse installing Unknown
lithops-worker-d10f6430 2024-11-22 20:52:09 UTC t3.medium 2 python3 reuse installing Unknown lithops job list has Job ID Function Name Submitted Worker Type Runtime Tasks Done Job Status
------------- ----------------- ----------------------- ------------- --------- ------------ ------------
661f55-0-M000 my_map_function() 2024-11-22 20:51:58 UTC t3.medium python3 0/3 submitted |
I did not change any defaults from anything and default python3 - does installing status mean it is expecting a container called python3? I definitely do not have a runtime set in the config. In invokers if self.backend not in STANDALONE_BACKENDS:
logger.debug(
f'ExecutorID {job.executor_id} | JobID {job.job_id} - Worker processes: '
f'{job.worker_processes} - Chunksize: {job.chunksize}'
)
try:
print("TRYING RUNTIME:",self.runtime_name)
job.runtime_name = self.runtime_name
self._invoke_job(job)
except (KeyboardInterrupt, Exception) as e:
self.stop()
raise e gives TRYING RUNTIME: python3 so should that actually be anything? For this default 'just do some calcs in the OS python3? e.g. I left this going for some time Option 1: By default, Lithops uses an Ubuntu 22.04 image. In this case, no further action is required and you can continue to the next step. Lithops will install all required dependencies in the VM by itself. Notice this can consume about 3 min to complete all installations. Probably 15 minutes plus - I could leave it going for an extended period, but doesn't seem like it was going to work. |
Next test for timeout problems then I guess it to make a simple image or something along those lines. |
try redoing the default and see what happens? |
When I try that 2024-11-23 08:39:18,586 [DEBUG] aws_ec2.py:1276 -- VM instance building-image-lithops-ubuntu-jammy-22.04-amd64-server created successfully
2024-11-23 08:39:18,586 [DEBUG] aws_ec2.py:1132 -- Waiting VM instance building-image-lithops-ubuntu-jammy-22.04-amd64-server to become ready
but ssh create connection
timed out error
and
2024-11-23 08:39:27,666 [DEBUG] aws_ec2.py:1123 -- SSH to 0.0.0.0 failed (publickey): timed out
|
so how to get this def is_ready(self):
"""
Checks if the VM instance is ready to receive ssh connections
"""
login_type = 'password' if 'password' in self.ssh_credentials and \
not self.public else 'publickey'
try:
self.get_ssh_client().run_remote_command('id')
except LithopsValidationError as err:
raise err
except Exception as err:
#logger.debug(f'SSH to {self.public_ip if self.public else self.private_ip} failed ({login_type}): {err}')
#logger.debug(f'SSH to {self.public_ip if self.public else self.private_ip} failed ({login_type}): {err}')
logger.debug(f'SSH to {self.public_ip if self.public else self.private_ip} failed ({login_type}): {err}')
self.del_ssh_client()
return False
return True in aws_ec2 to handle a private case? |
and if I disable the error handling in the above timed out
Traceback (most recent call last):
File "/home/richard/miniconda3/envs/lithops/bin/lithops", line 8, in <module>
sys.exit(lithops_cli())
File "/home/richard/miniconda3/envs/lithops/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/home/richard/miniconda3/envs/lithops/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/home/richard/miniconda3/envs/lithops/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/richard/miniconda3/envs/lithops/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/richard/miniconda3/envs/lithops/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/richard/miniconda3/envs/lithops/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/home/richard/miniconda3/envs/lithops/lib/python3.10/site-packages/click/decorators.py", line 33, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/richard/miniconda3/envs/lithops/lib/python3.10/site-packages/lithops/scripts/cli.py", line 851, in build_image
compute_handler.build_image(name, file, overwrite, include, ctx.args)
File "/home/richard/miniconda3/envs/lithops/lib/python3.10/site-packages/lithops/standalone/standalone.py", line 88, in build_image
self.backend.build_image(image_name, script_file, overwrite, include, extra_args)
File "/home/richard/miniconda3/envs/lithops/lib/python3.10/site-packages/lithops/standalone/backends/aws_ec2/aws_ec2.py", line 659, in build_image
build_vm.get_ssh_client().upload_data_to_file(script, remote_script)
File "/home/richard/miniconda3/envs/lithops/lib/python3.10/site-packages/lithops/util/ssh_client.py", line 145, in upload_data_to_file
self.ssh_client = self.create_client()
File "/home/richard/miniconda3/envs/lithops/lib/python3.10/site-packages/lithops/util/ssh_client.py", line 62, in create_client
raise e
File "/home/richard/miniconda3/envs/lithops/lib/python3.10/site-packages/lithops/util/ssh_client.py", line 52, in create_client
self.ssh_client.connect(
File "/home/richard/miniconda3/envs/lithops/lib/python3.10/site-packages/paramiko/client.py", line 386, in connect
sock.connect(addr)
TimeoutError: timed out |
which could be unintended consequence of my tinkering to get some other private_ip things to work? |
so failing to create a ssh_client on the run_remote part above not sure why failing to connect to these build machines when others are ok as far as ssh connection via private ip |
I can definitely connect manually to a build-* machine anyway via the private ip so not sure why code is failing here. |
This run it has got to building an image, taking a while to do, but that is a bit further. |
2024-11-23 12:01:09,849 [DEBUG] aws_ec2.py:696 -- VM Image is being created. Current status: pending
2024-11-23 12:01:31,280 [DEBUG] aws_ec2.py:696 -- VM Image is being created. Current status: available
2024-11-23 12:01:31,280 [DEBUG] aws_ec2.py:1405 -- Deleting VM instance building-image-lithops-ubuntu-jammy-22.04-amd64-server (i-003042a7a1602ea80)
2024-11-23 12:01:31,753 [INFO] aws_ec2.py:707 -- VM Image created. Image ID: ami-06ea522e39999999
2024-11-23 12:01:31,753 [INFO] cli.py:853 -- VM Image built |
but trying that ami- in target_ami, back to solving this problem: 2024-11-23 12:28:29,116 [DEBUG] monitor.py:429 -- ExecutorID 017dd6-0 - Starting Storage job monitor
2024-11-23 12:28:29,116 [INFO] executors.py:494 -- ExecutorID 017dd6-0 - Getting results from 3 function activations
2024-11-23 12:28:29,116 [INFO] wait.py:101 -- ExecutorID 017dd6-0 - Waiting for 3 function activations to complete
2024-11-23 12:28:32,845 [DEBUG] monitor.py:144 -- ExecutorID 017dd6-0 - Pending: 3 - Running: 0 - Done: 0
2024-11-23 12:29:14,093 [DEBUG] monitor.py:144 -- ExecutorID 017dd6-0 - Pending: 3 - Running: 0 - Done: 0
2024-11-23 12:29:55,072 [DEBUG] monitor.py:144 -- ExecutorID 017dd6-0 - Pending: 3 - Running: 0 - Done: 0
2024-11-23 12:30:36,118 [DEBUG] monitor.py:144 -- ExecutorID 017dd6-0 - Pending: 3 - Running: 0 - Done: 0
2024-11-23 12:31:16,790 [DEBUG] monitor.py:144 -- ExecutorID 017dd6-0 - Pending: 3 - Running: 0 - Done: 0
2024-11-23 12:31:56,778 [DEBUG] monitor.py:144 -- ExecutorID 017dd6-0 - Pending: 3 - Running: 0 - Done: 0
2024-11-23 12:32:38,720 [DEBUG] monitor.py:144 -- ExecutorID 017dd6-0 - Pending: 3 - Running: 0 - Done: 0
2024-11-23 12:33:19,958 [DEBUG] monitor.py:144 -- ExecutorID 017dd6-0 - Pending: 3 - Running: 0 - Done: 0
2024-11-23 12:34:01,740 [DEBUG] monitor.py:144 -- ExecutorID 017dd6-0 - Pending: 3 - Running: 0 - Done: 0
2024-11-23 12:34:43,345 [DEBUG] monitor.py:144 -- ExecutorID 017dd6-0 - Pending: 3 - Running: 0 - Done: 0
2024-11-23 12:35:25,381 [DEBUG] monitor.py:144 -- ExecutorID 017dd6-0 - Pending: 3 - Running: 0 - Done: 0 e.g. this happily runs infinite loop
|
I let it run for an hour then ctrl-C'ed it. 2024-11-23 13:30:40,473 [DEBUG] ssh_client.py:65 -- 10.0.0.0 ssh client created
2024-11-23 13:30:40,683 [DEBUG] monitor.py:457 -- ExecutorID 017dd6-0 - Storage job monitor finished
2024-11-23 13:30:41,571 [INFO] executors.py:618 -- ExecutorID 017dd6-0 - Cleaning temporary data |
job payload has this at the start JOB PAYLOAD: {'config': {'lithops': {'backend': 'aws_ec2', 'log_level': 'DEBUG', 'mode': 'standalone', 'chunksize': 0, 'storage': 'aws_s3', 'monitoring': 'storage', 'monitoring_interval': 2, 'execution_timeout': 1800, 'backend_type': 'batch'}, |
Experiments are here: https://github.com/RichardScottOZ/lithops/tree/private_ip |
These exist - and are using BatchInvoker FUTURES: [<lithops.future.ResponseFuture object at 0x7fc5b3446fb0>, <lithops.future.ResponseFuture object at 0x7fc5b00cc7f0>, <lithops.future.ResponseFuture object at 0x7fc5b00cc850>] |
Will simplify very slightly and just use this https://github.com/lithops-cloud/lithops/blob/master/examples/call_async.py - so one not 3 things. |
The latest test of that, the worker-service-log cat /tmp/lithops-root/worker-service.log
2024-11-23 23:19:26,575 [ERROR] lithops.standalone.worker:107 -- Timeout connecting to server
2024-11-23 23:19:27,361 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:60 -- Creating AWS EC2 client
2024-11-23 23:19:29,680 [INFO] lithops.standalone.backends.aws_ec2.aws_ec2:103 -- AWS EC2 client created - Region: us-east-1
2024-11-23 23:19:29,681 [DEBUG] lithops.standalone.standalone:69 -- Standalone handler created successfully
2024-11-23 23:19:29,681 [DEBUG] lithops.standalone.keeper:53 -- Starting BudgetKeeper for lithops-worker-260b9387 (10.6.131.122), instance ID: i-075b34f8d528f726e
2024-11-23 23:19:29,681 [DEBUG] lithops.standalone.keeper:55 -- Delete lithops-worker-260b9387 on dismantle: True
2024-11-23 23:19:29,682 [DEBUG] lithops.standalone.keeper:72 -- BudgetKeeper started
2024-11-23 23:19:29,682 [DEBUG] lithops.standalone.keeper:75 -- Auto dismantle activated - Soft timeout: 300s, Hard Timeout: 3600s
2024-11-23 23:19:29,682 [DEBUG] lithops.standalone.keeper:108 -- Time to dismantle: 3596 seconds
2024-11-23 23:19:29,683 [INFO] lithops.standalone.worker:263 -- Starting Worker - Instace type: t3.medium - Runtime name: python3 - Worker processes: 2
2024-11-23 23:19:29,683 [INFO] lithops.standalone.worker:154 -- Redis consumer process 0 started
2024-11-23 23:19:29,684 [INFO] lithops.standalone.worker:154 -- Redis consumer process 1 started
2024-11-23 23:20:29,743 [DEBUG] lithops.standalone.keeper:108 -- Time to dismantle: 3536 seconds |
So does that mean that is the problem, workers can't get to master, or is that just first try and then they do? /tmp/lithops-root/jobs and /logs are empty anyway |
on the worker curl -X GET http://lithops-master:8080/ping
{"response":"3.5.1"} and 8080 and 8081 need to be reachable for master and workers |
when I logged into a worker when starting I did see a python3 task appear via top - so something running but results not being collected? |
If wanted to put additional logging on workers - would have to do that, then make a new master so it would have an updated worker.py ? |
current test from new, master log buntu@ip-10-6-131-104:~$ cat /tmp/lithops-root/master-service.log
2024-11-24 01:27:54,597 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:60 -- Creating AWS EC2 client
2024-11-24 01:27:55,650 [INFO] lithops.standalone.backends.aws_ec2.aws_ec2:103 -- AWS EC2 client created - Region: us-east-1
2024-11-24 01:27:55,651 [DEBUG] lithops.standalone.standalone:69 -- Standalone handler created successfully
2024-11-24 01:27:55,652 [DEBUG] lithops.standalone.keeper:53 -- Starting BudgetKeeper for lithops-master-dadab3 (10.6.131.104), instance ID: i-021f144b3714bb841
2024-11-24 01:27:55,652 [DEBUG] lithops.standalone.keeper:55 -- Delete lithops-master-dadab3 on dismantle: False
2024-11-24 01:27:55,652 [DEBUG] lithops.standalone.keeper:72 -- BudgetKeeper started
2024-11-24 01:27:55,652 [DEBUG] lithops.standalone.keeper:75 -- Auto dismantle activated - Soft timeout: 300s, Hard Timeout: 3600s
2024-11-24 01:27:55,652 [DEBUG] lithops.standalone.keeper:108 -- Time to dismantle: 3597 seconds
2024-11-24 01:27:55,652 [INFO] lithops.standalone.master:573 -- Starting job monitoring thread
2024-11-24 01:28:08,208 [DEBUG] lithops.standalone.master:545 -- Received job 86bf30-0-A000
2024-11-24 01:28:08,209 [DEBUG] lithops.standalone.master:375 -- Going to setup 1 workers
2024-11-24 01:28:08,209 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:60 -- Creating AWS EC2 client
2024-11-24 01:28:08,224 [DEBUG] lithops.standalone.master:526 -- Job 86bf30-0-A000 correctly submitted to work queue 'wq:t3.medium-2-python3'
2024-11-24 01:28:08,611 [INFO] lithops.standalone.backends.aws_ec2.aws_ec2:103 -- AWS EC2 client created - Region: us-east-1
2024-11-24 01:28:08,616 [DEBUG] lithops.standalone.standalone:69 -- Standalone handler created successfully
2024-11-24 01:28:08,617 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1143 -- Waiting VM instance lithops-worker-21337e83 to become ready
2024-11-24 01:28:09,330 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:28:14,643 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:28:20,004 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:28:25,273 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:28:30,625 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:28:35,898 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:28:41,215 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:28:46,583 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:28:51,926 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:28:55,684 [DEBUG] lithops.standalone.keeper:108 -- Time to dismantle: 3552 seconds
2024-11-24 01:28:57,202 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:29:02,486 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:29:07,782 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:29:13,156 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:29:18,572 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:29:23,855 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:29:29,257 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:29:34,674 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:29:40,083 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:29:45,422 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:29:50,803 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:29:55,729 [DEBUG] lithops.standalone.keeper:108 -- Time to dismantle: 3492 seconds
2024-11-24 01:29:56,106 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:30:01,602 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:30:06,922 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:30:12,332 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:30:17,618 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:30:22,981 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:30:28,258 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:30:33,584 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:30:38,878 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:30:44,247 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:30:49,612 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:30:54,881 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:30:55,789 [DEBUG] lithops.standalone.keeper:108 -- Time to dismantle: 3432 seconds
2024-11-24 01:31:00,344 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:31:05,689 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:31:10,695 [WARNING] lithops.standalone.master:263 -- Timeout Error. Recreating VM instance lithops-worker-21337e83 (100.1.0.1)
2024-11-24 01:31:10,695 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1407 -- Deleting VM instance lithops-worker-21337e83 (i-057bfea7678c06832)
2024-11-24 01:31:11,159 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1273 -- Creating new VM instance lithops-worker-21337e83
2024-11-24 01:31:12,244 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1288 -- VM instance lithops-worker-21337e83 created successfully
2024-11-24 01:31:12,244 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1143 -- Waiting VM instance lithops-worker-21337e83 to become ready
2024-11-24 01:31:14,436 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): timed out
2024-11-24 01:31:21,576 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): timed out
2024-11-24 01:31:28,741 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): timed out
2024-11-24 01:31:35,927 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): timed out
2024-11-24 01:31:43,078 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): timed out
2024-11-24 01:31:50,271 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): timed out
2024-11-24 01:31:55,445 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): [Errno None] Unable to connect to port 22 on 100.1.0.2
2024-11-24 01:31:55,823 [DEBUG] lithops.standalone.keeper:108 -- Time to dismantle: 3372 seconds
2024-11-24 01:32:00,680 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): [Errno None] Unable to connect to port 22 on 100.1.0.2
2024-11-24 01:32:06,350 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 01:32:11,750 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 01:32:17,202 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 01:32:22,584 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 01:32:27,972 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 01:32:33,419 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 01:32:38,912 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 01:32:44,293 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 01:32:49,733 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 01:32:55,320 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 01:32:55,883 [DEBUG] lithops.standalone.keeper:108 -- Time to dismantle: 3312 seconds
2024-11-24 01:33:00,696 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 01:33:06,052 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 01:33:11,494 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 01:33:16,841 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 01:33:22,272 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 01:33:27,743 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 01:33:33,146 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 01:33:38,532 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 01:33:43,933 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 01:33:49,417 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 01:33:54,839 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 01:33:55,895 [DEBUG] lithops.standalone.keeper:108 -- Time to dismantle: 3252 seconds
2024-11-24 01:34:00,247 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 01:34:05,598 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 01:34:10,962 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 01:34:15,972 [DEBUG] lithops.standalone.master:261 -- Readiness probe expired for VM instance lithops-worker-21337e83 (100.1.0.2)
2024-11-24 01:34:15,973 [ERROR] lithops.standalone.master:411 -- Readiness probe expired on VM instance lithops-worker-21337e83 (100.1.0.2)
2024-11-24 01:34:15,973 [DEBUG] lithops.standalone.master:413 -- 0 of 1 workers started for work queue: wq:t3.medium-2-python3
2024-11-24 01:34:55,955 [DEBUG] lithops.standalone.keeper:108 -- Time to dismantle: 3192 seconds
2024-11-24 01:35:56,015 [DEBUG] lithops.standalone.keeper:108 -- Time to dismantle: 3132 seconds
2024-11-24 01:36:56,075 [DEBUG] lithops.standalone.keeper:108 -- Time to dismantle: 3072 seconds
2024-11-24 01:37:56,135 [DEBUG] lithops.standalone.keeper:108 -- Time to dismantle: 3012 seconds
2024-11-24 01:38:56,196 [DEBUG] lithops.standalone.keeper:108 -- Time to dismantle: 2952 seconds
2024-11-24 01:39:56,206 [DEBUG] lithops.standalone.keeper:108 -- Time to dismantle: 2892 seconds
2024-11-24 01:40:56,263 [DEBUG] lithops.standalone.keeper:108 -- Time to dismantle: 2832 seconds
2024-11-24 01:41:56,323 [DEBUG] lithops.standalone.keeper:108 -- Time to dismantle: 2772 seconds
2024-11-24 01:42:56,355 [DEBUG] lithops.standalone.keeper:108 -- Time to dismantle: 2712 seconds
2024-11-24 01:43:56,415 [DEBUG] lithops.standalone.keeper:108 -- Time to dismantle: 2652 seconds
2024-11-24 01:44:56,475 [DEBUG] lithops.standalone.keeper:108 -- Time to dismantle: 2592 seconds |
and when I test manually on the master 2024-11-24 02:00:53,336 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 02:00:58,667 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 02:01:04,092 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 02:01:07,876 [DEBUG] lithops.standalone.keeper:108 -- Time to dismantle: 3247 seconds
2024-11-24 02:01:09,489 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 02:01:14,877 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 02:01:20,299 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 02:01:25,310 [DEBUG] lithops.standalone.master:261 -- Readiness probe expired for VM instance lithops-worker-957f510e (100.1.0.2)
2024-11-24 02:01:25,415 [ERROR] lithops.standalone.master:411 -- Readiness probe expired on VM instance lithops-worker-957f510e (100.1.0.2)
2024-11-24 02:01:25,415 [DEBUG] lithops.standalone.master:413 -- 0 of 1 workers started for work queue: wq:t3.medium-2-0.1.0.23
2024-11-24 02:02:07,906 [DEBUG] lithops.standalone.keeper:108 -- Time to dismantle: 3187 seconds
q^C
ubuntu@ip-:~$ tail -f -n 100 /tmp/lithops-*/master-service.log^C
ubuntu@ip-:~$ cd .ssh
ubuntu@ip-:~/.ssh$ ls
authorized_keys id_rsa id_rsa.pub lithops_id_rsa lithops_id_rsa.pub
ubuntu@ip-:~/.ssh$ ssh -i lithops_id_rsa ubuntu@
Warning: Permanently added '' (ED25519) to the list of known hosts.
ubuntu@: Permission denied (publickey).
ubuntu@ip-:~/.ssh$ ssh -i id_rsa ubuntu@
Warning: Permanently added '' (ED25519) to the list of known hosts.
ubuntu@: Permission denied (publickey).
ubuntu@i:~/.ssh$ |
so I can ssh fine to master and workers with the key used in config - so ssh_client problems from my adaptation or an actual key problem |
i uploaded the key used in the config to a test directory or the master - and I could ssh to the worker using those from the master |
As an aside, I thought I would do another test - a completely vanilla install where I let the new lithops (3.51 - was 3.3 when I did my books processing earlier in the year) - and default code with public vpc etc. install the hello world ran fine. |
just as an actual test: I was running this via windows/wsl tried a vanilla setup with my private_ip branch and some tests ran ... still something odd with master finding workers that comes and goes it seems - had that working at one stage yesterday |
Possibly a public/private check issue?
The text was updated successfully, but these errors were encountered: