Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workers not able to be created message error - but machines exist #1410

Open
RichardScottOZ opened this issue Nov 22, 2024 · 36 comments
Open

Comments

@RichardScottOZ
Copy link
Contributor

raceback (most recent call last):
  File lithopstest.py", line 44, in <module>
    fexec.map(my_map_function, args)
  File "/home/richard/miniconda3/envs/lithops/lib/python3.10/site-packages/lithops/executors.py", line 279, in map
    futures = self.invoker.run_job(job)
  File "/home/richard/miniconda3/envs/lithops/lib/python3.10/site-packages/lithops/invokers.py", line 280, in run_job
    futures = self._run_job(job)
  File "/home/richard/miniconda3/envs/lithops/lib/python3.10/site-packages/lithops/invokers.py", line 222, in _run_job
    raise e
  File "/home/richard/miniconda3/envs/lithops/lib/python3.10/site-packages/lithops/invokers.py", line 219, in _run_job
    self._invoke_job(job)
  File "/home/richard/miniconda3/envs/lithops/lib/python3.10/site-packages/lithops/invokers.py", line 267, in _invoke_job
    activation_id = self.compute_handler.invoke(payload)
  File "/home/richard/miniconda3/envs/lithops/lib/python3.10/site-packages/lithops/standalone/standalone.py", line 286, in invoke
    raise Exception('It was not possible to create any worker')
Exception: It was not possible to create any worker

image

Possibly a public/private check issue?

@RichardScottOZ
Copy link
Contributor Author

This is just running one of the example files in the repo doing some simple calcs.

@RichardScottOZ
Copy link
Contributor Author

timed out
2024-11-23 06:49:50,337 [DEBUG] standalone.py:277 -- Found 0 free workers connected to VM instance lithops-master-dadab3
2024-11-23 06:49:50,337 [DEBUG] standalone.py:281 -- Going to create 2 new workers
{'username': 'ubuntu', 'password': 'c0d72593-5e1f-458b-aaad-74ab52be154d', 'key_filename': 'lithops-key-b597095a.aws_ec2.id_rsa'}
{'username': 'ubuntu', 'password': 'c0d72593-5e1f-458b-aaad-74ab52be154d', 'key_filename': 'lithops-key-b597095a.aws_ec2.id_rsa'}
2024-11-23 06:49:50,339 [DEBUG] aws_ec2.py:1223 -- Creating new VM instance lithops-worker-10c28285 (Spot)
2024-11-23 06:49:50,341 [DEBUG] aws_ec2.py:1223 -- Creating new VM instance lithops-worker-74d89509 (Spot)
2024-11-23 06:49:57,315 [DEBUG] standalone.py:256 -- Total worker VM instances created: 0/2
Traceback (most recent call last):
  File lithopstest.py", line 44, in <module>
    fexec.map(my_map_function, args)
  File "/home/richard/miniconda3/envs/lithops/lib/python3.10/site-packages/lithops/executors.py", line 279, in map
    futures = self.invoker.run_job(job)
  File "/home/richard/miniconda3/envs/lithops/lib/python3.10/site-packages/lithops/invokers.py", line 280, in run_job
    futures = self._run_job(job)
  File "/home/richard/miniconda3/envs/lithops/lib/python3.10/site-packages/lithops/invokers.py", line 222, in _run_job
    raise e
  File "/home/richard/miniconda3/envs/lithops/lib/python3.10/site-packages/lithops/invokers.py", line 219, in _run_job
    self._invoke_job(job)
  File "/home/richard/miniconda3/envs/lithops/lib/python3.10/site-packages/lithops/invokers.py", line 267, in _invoke_job
    activation_id = self.compute_handler.invoke(payload)
  File "/home/richard/miniconda3/envs/lithops/lib/python3.10/site-packages/lithops/standalone/standalone.py", line 286, in invoke
    raise Exception('It was not possible to create any worker')
Exception: It was not possible to create any worker

I checked the ssh connection error 'timed out' so possibly there or an interaction between the spot api and something else - two machines were created again. I have several alive from the tests last night too - so this error presumably preventing auto shutdown.

@RichardScottOZ
Copy link
Contributor Author

some testing yesterday I did this, which enabled the master VM to start

def get_ssh_client(self):
        """
        Creates an ssh client against the VM
        """
        #if self.public:
        if self.public and 1 == 2:  #Richard 20241122
            if not self.ssh_client or self.ssh_client.ip_address != self.public_ip:
                self.ssh_client = SSHClient(self.public_ip, self.ssh_credentials)
        else:
            if not self.ssh_client or self.ssh_client.ip_address != self.private_ip:
                self.ssh_client = SSHClient(self.private_ip, self.ssh_credentials)

        return self.ssh_client

@RichardScottOZ
Copy link
Contributor Author

RichardScottOZ commented Nov 22, 2024

Here, looking at public IP I think?

2024-11-23 07:10:32,657 [DEBUG] ssh_client.py:59 -- 10.1.111.11 ssh client created
2024-11-23 07:10:34,094 [DEBUG] standalone.py:277 -- Found 0 free workers connected to VM instance lithops-master-dadab3 (53.92.198.72)

@RichardScottOZ
Copy link
Contributor Author

When I changed it from default spot, got further

2024-11-23 07:22:02,997 [DEBUG] aws_ec2.py:1275 -- VM instance lithops-worker-f12ca4e5 created successfully
2024-11-23 07:22:03,973 [DEBUG] aws_ec2.py:1275 -- VM instance lithops-worker-d10f6430 created successfully
2024-11-23 07:22:03,973 [DEBUG] standalone.py:259 -- Total worker VM instances created: 2/2
2024-11-23 07:22:03,973 [DEBUG] standalone.py:291 -- ExecutorID 661f55-0 | JobID M000 - Going to run 3 activations in 2 workers
2024-11-23 07:22:09,590 [DEBUG] invokers.py:271 -- ExecutorID 661f55-0 | JobID M000 - Job invoked (11.244s) - Activation ID: 661f55-0-M000
2024-11-23 07:22:09,590 [INFO] invokers.py:225 -- ExecutorID 661f55-0 | JobID M000 - View execution logs at /tmp/lithops-richard/logs/661f55-0-M000.log
2024-11-23 07:22:09,590 [DEBUG] monitor.py:429 -- ExecutorID 661f55-0 - Starting Storage job monitor
2024-11-23 07:22:09,590 [INFO] executors.py:494 -- ExecutorID 661f55-0 - Getting results from 3 function activations
2024-11-23 07:22:09,590 [INFO] wait.py:101 -- ExecutorID 661f55-0 - Waiting for 3 function activations to complete
2024-11-23 07:22:13,440 [DEBUG] monitor.py:144 -- ExecutorID 661f55-0 - Pending: 3 - Running: 0 - Done: 0
2024-11-23 07:22:53,439 [DEBUG] monitor.py:144 -- ExecutorID 661f55-0 - Pending: 3 - Running: 0 - Done: 0
2024-11-23 07:23:34,012 [DEBUG] monitor.py:144 -- ExecutorID 661f55-0 - Pending: 3 - Running: 0 - Done: 0

@RichardScottOZ
Copy link
Contributor Author

I wouldn't think that example takes very long to run though

2024-11-23 07:22:09,590 [INFO] wait.py:101 -- ExecutorID 661f55-0 - Waiting for 3 function activations to complete
2024-11-23 07:22:13,440 [DEBUG] monitor.py:144 -- ExecutorID 661f55-0 - Pending: 3 - Running: 0 - Done: 0
2024-11-23 07:22:53,439 [DEBUG] monitor.py:144 -- ExecutorID 661f55-0 - Pending: 3 - Running: 0 - Done: 0
2024-11-23 07:23:34,012 [DEBUG] monitor.py:144 -- ExecutorID 661f55-0 - Pending: 3 - Running: 0 - Done: 0
2024-11-23 07:24:14,118 [DEBUG] monitor.py:144 -- ExecutorID 661f55-0 - Pending: 3 - Running: 0 - Done: 0
2024-11-23 07:24:54,269 [DEBUG] monitor.py:144 -- ExecutorID 661f55-0 - Pending: 3 - Running: 0 - Done: 0
2024-11-23 07:25:36,625 [DEBUG] monitor.py:144 -- ExecutorID 661f55-0 - Pending: 3 - Running: 0 - Done: 0

@RichardScottOZ
Copy link
Contributor Author

RichardScottOZ commented Nov 22, 2024

and with not spot

lithops worker list tells me

Worker Name              Created                  Instance Type      Processes  Runtime    Mode    Status      TTD
-----------------------  -----------------------  ---------------  -----------  ---------  ------  ----------  -------
lithops-worker-f12ca4e5  2024-11-22 20:52:09 UTC  t3.medium                  2  python3    reuse   installing  Unknown
lithops-worker-d10f6430  2024-11-22 20:52:09 UTC  t3.medium                  2  python3    reuse   installing  Unknown

lithops job list has

Job ID         Function Name      Submitted                Worker Type    Runtime    Tasks Done    Job Status
-------------  -----------------  -----------------------  -------------  ---------  ------------  ------------
661f55-0-M000  my_map_function()  2024-11-22 20:51:58 UTC  t3.medium      python3    0/3           submitted

@RichardScottOZ
Copy link
Contributor Author

RichardScottOZ commented Nov 22, 2024

I did not change any defaults from anything and default python3 - does installing status mean it is expecting a container called python3?

I definitely do not have a runtime set in the config.

In invokers

        if self.backend not in STANDALONE_BACKENDS:
            logger.debug(
                f'ExecutorID {job.executor_id} | JobID {job.job_id} - Worker processes: '
                f'{job.worker_processes} - Chunksize: {job.chunksize}'
            )

        try:
            print("TRYING RUNTIME:",self.runtime_name)
            job.runtime_name = self.runtime_name
            self._invoke_job(job)
        except (KeyboardInterrupt, Exception) as e:
            self.stop()
            raise e

gives

TRYING RUNTIME: python3

so should that actually be anything? For this default 'just do some calcs in the OS python3?

e.g. I left this going for some time

Option 1: By default, Lithops uses an Ubuntu 22.04 image. In this case, no further action is required and you can continue to the next step. Lithops will install all required dependencies in the VM by itself. Notice this can consume about 3 min to complete all installations.

Probably 15 minutes plus - I could leave it going for an extended period, but doesn't seem like it was going to work.

@RichardScottOZ
Copy link
Contributor Author

RichardScottOZ commented Nov 22, 2024

Next test for timeout problems then I guess it to make a simple image or something along those lines.

@RichardScottOZ
Copy link
Contributor Author

try redoing the default and see what happens?

@RichardScottOZ
Copy link
Contributor Author

RichardScottOZ commented Nov 22, 2024

When I try that

2024-11-23 08:39:18,586 [DEBUG] aws_ec2.py:1276 -- VM instance building-image-lithops-ubuntu-jammy-22.04-amd64-server created successfully
2024-11-23 08:39:18,586 [DEBUG] aws_ec2.py:1132 -- Waiting VM instance building-image-lithops-ubuntu-jammy-22.04-amd64-server to become ready
but ssh create connection
timed out error

and

2024-11-23 08:39:27,666 [DEBUG] aws_ec2.py:1123 -- SSH to 0.0.0.0 failed (publickey): timed out

@RichardScottOZ
Copy link
Contributor Author

so how to get this

  def is_ready(self):
        """
        Checks if the VM instance is ready to receive ssh connections
        """
        login_type = 'password' if 'password' in self.ssh_credentials and \
            not self.public else 'publickey'
        try:
            self.get_ssh_client().run_remote_command('id')
        except LithopsValidationError as err:
            raise err
        except Exception as err:
            #logger.debug(f'SSH to {self.public_ip if self.public else self.private_ip} failed ({login_type}): {err}')
            #logger.debug(f'SSH to {self.public_ip if self.public else self.private_ip} failed ({login_type}): {err}')
            logger.debug(f'SSH to {self.public_ip if self.public else self.private_ip} failed ({login_type}): {err}')
            self.del_ssh_client()
            return False
        return True

in aws_ec2 to handle a private case?

@RichardScottOZ
Copy link
Contributor Author

RichardScottOZ commented Nov 22, 2024

again though it looks like it has made it - in the console

image

@RichardScottOZ
Copy link
Contributor Author

and if I disable the error handling in the above

timed out
Traceback (most recent call last):
  File "/home/richard/miniconda3/envs/lithops/bin/lithops", line 8, in <module>
    sys.exit(lithops_cli())
  File "/home/richard/miniconda3/envs/lithops/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/home/richard/miniconda3/envs/lithops/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/home/richard/miniconda3/envs/lithops/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/richard/miniconda3/envs/lithops/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/richard/miniconda3/envs/lithops/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/richard/miniconda3/envs/lithops/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/home/richard/miniconda3/envs/lithops/lib/python3.10/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/richard/miniconda3/envs/lithops/lib/python3.10/site-packages/lithops/scripts/cli.py", line 851, in build_image
    compute_handler.build_image(name, file, overwrite, include, ctx.args)
  File "/home/richard/miniconda3/envs/lithops/lib/python3.10/site-packages/lithops/standalone/standalone.py", line 88, in build_image
    self.backend.build_image(image_name, script_file, overwrite, include, extra_args)
  File "/home/richard/miniconda3/envs/lithops/lib/python3.10/site-packages/lithops/standalone/backends/aws_ec2/aws_ec2.py", line 659, in build_image
    build_vm.get_ssh_client().upload_data_to_file(script, remote_script)
  File "/home/richard/miniconda3/envs/lithops/lib/python3.10/site-packages/lithops/util/ssh_client.py", line 145, in upload_data_to_file
    self.ssh_client = self.create_client()
  File "/home/richard/miniconda3/envs/lithops/lib/python3.10/site-packages/lithops/util/ssh_client.py", line 62, in create_client
    raise e
  File "/home/richard/miniconda3/envs/lithops/lib/python3.10/site-packages/lithops/util/ssh_client.py", line 52, in create_client
    self.ssh_client.connect(
  File "/home/richard/miniconda3/envs/lithops/lib/python3.10/site-packages/paramiko/client.py", line 386, in connect
    sock.connect(addr)
TimeoutError: timed out

@RichardScottOZ
Copy link
Contributor Author

which could be unintended consequence of my tinkering to get some other private_ip things to work?

@RichardScottOZ
Copy link
Contributor Author

RichardScottOZ commented Nov 22, 2024

so failing to create a ssh_client on the run_remote part above

not sure why failing to connect to these build machines when others are ok as far as ssh connection via private ip

@RichardScottOZ
Copy link
Contributor Author

I can definitely connect manually to a build-* machine anyway via the private ip so not sure why code is failing here.

@RichardScottOZ
Copy link
Contributor Author

RichardScottOZ commented Nov 23, 2024

This run it has got to building an image, taking a while to do, but that is a bit further.
Don't know what was difference, possibly badly flaky vpn and internet at the time earlier I guess, or master justbat shutdown

@RichardScottOZ
Copy link
Contributor Author

2024-11-23 12:01:09,849 [DEBUG] aws_ec2.py:696 -- VM Image is being created. Current status: pending
2024-11-23 12:01:31,280 [DEBUG] aws_ec2.py:696 -- VM Image is being created. Current status: available
2024-11-23 12:01:31,280 [DEBUG] aws_ec2.py:1405 -- Deleting VM instance building-image-lithops-ubuntu-jammy-22.04-amd64-server (i-003042a7a1602ea80)
2024-11-23 12:01:31,753 [INFO] aws_ec2.py:707 -- VM Image created. Image ID: ami-06ea522e39999999
2024-11-23 12:01:31,753 [INFO] cli.py:853 -- VM Image built

@RichardScottOZ
Copy link
Contributor Author

RichardScottOZ commented Nov 23, 2024

but trying that ami- in target_ami, back to solving this problem:

2024-11-23 12:28:29,116 [DEBUG] monitor.py:429 -- ExecutorID 017dd6-0 - Starting Storage job monitor
2024-11-23 12:28:29,116 [INFO] executors.py:494 -- ExecutorID 017dd6-0 - Getting results from 3 function activations
2024-11-23 12:28:29,116 [INFO] wait.py:101 -- ExecutorID 017dd6-0 - Waiting for 3 function activations to complete
2024-11-23 12:28:32,845 [DEBUG] monitor.py:144 -- ExecutorID 017dd6-0 - Pending: 3 - Running: 0 - Done: 0
2024-11-23 12:29:14,093 [DEBUG] monitor.py:144 -- ExecutorID 017dd6-0 - Pending: 3 - Running: 0 - Done: 0
2024-11-23 12:29:55,072 [DEBUG] monitor.py:144 -- ExecutorID 017dd6-0 - Pending: 3 - Running: 0 - Done: 0
2024-11-23 12:30:36,118 [DEBUG] monitor.py:144 -- ExecutorID 017dd6-0 - Pending: 3 - Running: 0 - Done: 0
2024-11-23 12:31:16,790 [DEBUG] monitor.py:144 -- ExecutorID 017dd6-0 - Pending: 3 - Running: 0 - Done: 0
2024-11-23 12:31:56,778 [DEBUG] monitor.py:144 -- ExecutorID 017dd6-0 - Pending: 3 - Running: 0 - Done: 0
2024-11-23 12:32:38,720 [DEBUG] monitor.py:144 -- ExecutorID 017dd6-0 - Pending: 3 - Running: 0 - Done: 0
2024-11-23 12:33:19,958 [DEBUG] monitor.py:144 -- ExecutorID 017dd6-0 - Pending: 3 - Running: 0 - Done: 0
2024-11-23 12:34:01,740 [DEBUG] monitor.py:144 -- ExecutorID 017dd6-0 - Pending: 3 - Running: 0 - Done: 0
2024-11-23 12:34:43,345 [DEBUG] monitor.py:144 -- ExecutorID 017dd6-0 - Pending: 3 - Running: 0 - Done: 0
2024-11-23 12:35:25,381 [DEBUG] monitor.py:144 -- ExecutorID 017dd6-0 - Pending: 3 - Running: 0 - Done: 0

e.g. this happily runs infinite loop

  • I actually left another running overnight - never timed out - so something up there?

@RichardScottOZ
Copy link
Contributor Author

I let it run for an hour then ctrl-C'ed it.

2024-11-23 13:30:40,473 [DEBUG] ssh_client.py:65 -- 10.0.0.0  ssh client created
2024-11-23 13:30:40,683 [DEBUG] monitor.py:457 -- ExecutorID 017dd6-0 - Storage job monitor finished
2024-11-23 13:30:41,571 [INFO] executors.py:618 -- ExecutorID 017dd6-0 - Cleaning temporary data

@RichardScottOZ
Copy link
Contributor Author

job payload has this at the start

JOB PAYLOAD: {'config': {'lithops': {'backend': 'aws_ec2', 'log_level': 'DEBUG', 'mode': 'standalone', 'chunksize': 0, 'storage': 'aws_s3', 'monitoring': 'storage', 'monitoring_interval': 2, 'execution_timeout': 1800, 'backend_type': 'batch'},

@RichardScottOZ
Copy link
Contributor Author

@RichardScottOZ
Copy link
Contributor Author

RichardScottOZ commented Nov 23, 2024

These exist - and are using BatchInvoker

FUTURES: [<lithops.future.ResponseFuture object at 0x7fc5b3446fb0>, <lithops.future.ResponseFuture object at 0x7fc5b00cc7f0>, <lithops.future.ResponseFuture object at 0x7fc5b00cc850>]

@RichardScottOZ
Copy link
Contributor Author

Will simplify very slightly and just use this https://github.com/lithops-cloud/lithops/blob/master/examples/call_async.py - so one not 3 things.

@RichardScottOZ
Copy link
Contributor Author

The latest test of that, the worker-service-log

cat /tmp/lithops-root/worker-service.log
2024-11-23 23:19:26,575 [ERROR] lithops.standalone.worker:107 -- Timeout connecting to server
2024-11-23 23:19:27,361 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:60 -- Creating AWS EC2 client
2024-11-23 23:19:29,680 [INFO] lithops.standalone.backends.aws_ec2.aws_ec2:103 -- AWS EC2 client created - Region: us-east-1
2024-11-23 23:19:29,681 [DEBUG] lithops.standalone.standalone:69 -- Standalone handler created successfully
2024-11-23 23:19:29,681 [DEBUG] lithops.standalone.keeper:53 -- Starting BudgetKeeper for lithops-worker-260b9387 (10.6.131.122), instance ID: i-075b34f8d528f726e
2024-11-23 23:19:29,681 [DEBUG] lithops.standalone.keeper:55 -- Delete lithops-worker-260b9387 on dismantle: True
2024-11-23 23:19:29,682 [DEBUG] lithops.standalone.keeper:72 -- BudgetKeeper started
2024-11-23 23:19:29,682 [DEBUG] lithops.standalone.keeper:75 -- Auto dismantle activated - Soft timeout: 300s, Hard Timeout: 3600s
2024-11-23 23:19:29,682 [DEBUG] lithops.standalone.keeper:108 -- Time to dismantle: 3596 seconds
2024-11-23 23:19:29,683 [INFO] lithops.standalone.worker:263 -- Starting Worker - Instace type: t3.medium - Runtime name: python3 - Worker processes: 2
2024-11-23 23:19:29,683 [INFO] lithops.standalone.worker:154 -- Redis consumer process 0 started
2024-11-23 23:19:29,684 [INFO] lithops.standalone.worker:154 -- Redis consumer process 1 started
2024-11-23 23:20:29,743 [DEBUG] lithops.standalone.keeper:108 -- Time to dismantle: 3536 seconds

@RichardScottOZ
Copy link
Contributor Author

RichardScottOZ commented Nov 23, 2024

So does that mean that is the problem, workers can't get to master, or is that just first try and then they do?

/tmp/lithops-root/jobs and /logs are empty anyway

@RichardScottOZ
Copy link
Contributor Author

RichardScottOZ commented Nov 23, 2024

on the worker

curl -X GET http://lithops-master:8080/ping
{"response":"3.5.1"}

and 8080 and 8081 need to be reachable for master and workers

@RichardScottOZ
Copy link
Contributor Author

when I logged into a worker when starting I did see a python3 task appear via top - so something running but results not being collected?

@RichardScottOZ
Copy link
Contributor Author

If wanted to put additional logging on workers - would have to do that, then make a new master so it would have an updated worker.py ?

@RichardScottOZ
Copy link
Contributor Author

current test from new, master log

buntu@ip-10-6-131-104:~$ cat /tmp/lithops-root/master-service.log
2024-11-24 01:27:54,597 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:60 -- Creating AWS EC2 client
2024-11-24 01:27:55,650 [INFO] lithops.standalone.backends.aws_ec2.aws_ec2:103 -- AWS EC2 client created - Region: us-east-1
2024-11-24 01:27:55,651 [DEBUG] lithops.standalone.standalone:69 -- Standalone handler created successfully
2024-11-24 01:27:55,652 [DEBUG] lithops.standalone.keeper:53 -- Starting BudgetKeeper for lithops-master-dadab3 (10.6.131.104), instance ID: i-021f144b3714bb841
2024-11-24 01:27:55,652 [DEBUG] lithops.standalone.keeper:55 -- Delete lithops-master-dadab3 on dismantle: False
2024-11-24 01:27:55,652 [DEBUG] lithops.standalone.keeper:72 -- BudgetKeeper started
2024-11-24 01:27:55,652 [DEBUG] lithops.standalone.keeper:75 -- Auto dismantle activated - Soft timeout: 300s, Hard Timeout: 3600s
2024-11-24 01:27:55,652 [DEBUG] lithops.standalone.keeper:108 -- Time to dismantle: 3597 seconds
2024-11-24 01:27:55,652 [INFO] lithops.standalone.master:573 -- Starting job monitoring thread
2024-11-24 01:28:08,208 [DEBUG] lithops.standalone.master:545 -- Received job 86bf30-0-A000
2024-11-24 01:28:08,209 [DEBUG] lithops.standalone.master:375 -- Going to setup 1 workers
2024-11-24 01:28:08,209 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:60 -- Creating AWS EC2 client
2024-11-24 01:28:08,224 [DEBUG] lithops.standalone.master:526 -- Job 86bf30-0-A000 correctly submitted to work queue 'wq:t3.medium-2-python3'
2024-11-24 01:28:08,611 [INFO] lithops.standalone.backends.aws_ec2.aws_ec2:103 -- AWS EC2 client created - Region: us-east-1
2024-11-24 01:28:08,616 [DEBUG] lithops.standalone.standalone:69 -- Standalone handler created successfully
2024-11-24 01:28:08,617 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1143 -- Waiting VM instance lithops-worker-21337e83 to become ready
2024-11-24 01:28:09,330 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:28:14,643 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:28:20,004 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:28:25,273 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:28:30,625 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:28:35,898 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:28:41,215 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:28:46,583 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:28:51,926 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:28:55,684 [DEBUG] lithops.standalone.keeper:108 -- Time to dismantle: 3552 seconds
2024-11-24 01:28:57,202 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:29:02,486 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:29:07,782 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:29:13,156 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:29:18,572 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:29:23,855 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:29:29,257 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:29:34,674 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:29:40,083 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:29:45,422 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:29:50,803 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:29:55,729 [DEBUG] lithops.standalone.keeper:108 -- Time to dismantle: 3492 seconds
2024-11-24 01:29:56,106 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:30:01,602 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:30:06,922 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:30:12,332 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:30:17,618 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:30:22,981 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:30:28,258 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:30:33,584 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:30:38,878 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:30:44,247 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:30:49,612 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:30:54,881 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:30:55,789 [DEBUG] lithops.standalone.keeper:108 -- Time to dismantle: 3432 seconds
2024-11-24 01:31:00,344 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:31:05,689 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.1 failed (publickey): Authentication failed.
2024-11-24 01:31:10,695 [WARNING] lithops.standalone.master:263 -- Timeout Error. Recreating VM instance lithops-worker-21337e83 (100.1.0.1)
2024-11-24 01:31:10,695 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1407 -- Deleting VM instance lithops-worker-21337e83 (i-057bfea7678c06832)
2024-11-24 01:31:11,159 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1273 -- Creating new VM instance lithops-worker-21337e83
2024-11-24 01:31:12,244 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1288 -- VM instance lithops-worker-21337e83 created successfully
2024-11-24 01:31:12,244 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1143 -- Waiting VM instance lithops-worker-21337e83 to become ready
2024-11-24 01:31:14,436 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): timed out
2024-11-24 01:31:21,576 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): timed out
2024-11-24 01:31:28,741 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): timed out
2024-11-24 01:31:35,927 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): timed out
2024-11-24 01:31:43,078 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): timed out
2024-11-24 01:31:50,271 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): timed out
2024-11-24 01:31:55,445 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): [Errno None] Unable to connect to port 22 on 100.1.0.2
2024-11-24 01:31:55,823 [DEBUG] lithops.standalone.keeper:108 -- Time to dismantle: 3372 seconds
2024-11-24 01:32:00,680 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): [Errno None] Unable to connect to port 22 on 100.1.0.2
2024-11-24 01:32:06,350 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 01:32:11,750 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 01:32:17,202 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 01:32:22,584 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 01:32:27,972 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 01:32:33,419 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 01:32:38,912 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 01:32:44,293 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 01:32:49,733 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 01:32:55,320 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 01:32:55,883 [DEBUG] lithops.standalone.keeper:108 -- Time to dismantle: 3312 seconds
2024-11-24 01:33:00,696 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 01:33:06,052 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 01:33:11,494 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 01:33:16,841 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 01:33:22,272 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 01:33:27,743 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 01:33:33,146 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 01:33:38,532 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 01:33:43,933 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 01:33:49,417 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 01:33:54,839 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 01:33:55,895 [DEBUG] lithops.standalone.keeper:108 -- Time to dismantle: 3252 seconds
2024-11-24 01:34:00,247 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 01:34:05,598 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 01:34:10,962 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 01:34:15,972 [DEBUG] lithops.standalone.master:261 -- Readiness probe expired for VM instance lithops-worker-21337e83 (100.1.0.2)
2024-11-24 01:34:15,973 [ERROR] lithops.standalone.master:411 -- Readiness probe expired on VM instance lithops-worker-21337e83 (100.1.0.2)
2024-11-24 01:34:15,973 [DEBUG] lithops.standalone.master:413 -- 0 of 1 workers started for work queue: wq:t3.medium-2-python3
2024-11-24 01:34:55,955 [DEBUG] lithops.standalone.keeper:108 -- Time to dismantle: 3192 seconds
2024-11-24 01:35:56,015 [DEBUG] lithops.standalone.keeper:108 -- Time to dismantle: 3132 seconds
2024-11-24 01:36:56,075 [DEBUG] lithops.standalone.keeper:108 -- Time to dismantle: 3072 seconds
2024-11-24 01:37:56,135 [DEBUG] lithops.standalone.keeper:108 -- Time to dismantle: 3012 seconds
2024-11-24 01:38:56,196 [DEBUG] lithops.standalone.keeper:108 -- Time to dismantle: 2952 seconds
2024-11-24 01:39:56,206 [DEBUG] lithops.standalone.keeper:108 -- Time to dismantle: 2892 seconds
2024-11-24 01:40:56,263 [DEBUG] lithops.standalone.keeper:108 -- Time to dismantle: 2832 seconds
2024-11-24 01:41:56,323 [DEBUG] lithops.standalone.keeper:108 -- Time to dismantle: 2772 seconds
2024-11-24 01:42:56,355 [DEBUG] lithops.standalone.keeper:108 -- Time to dismantle: 2712 seconds
2024-11-24 01:43:56,415 [DEBUG] lithops.standalone.keeper:108 -- Time to dismantle: 2652 seconds
2024-11-24 01:44:56,475 [DEBUG] lithops.standalone.keeper:108 -- Time to dismantle: 2592 seconds

@RichardScottOZ
Copy link
Contributor Author

RichardScottOZ commented Nov 24, 2024

and when I test manually on the master

2024-11-24 02:00:53,336 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 02:00:58,667 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 02:01:04,092 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 02:01:07,876 [DEBUG] lithops.standalone.keeper:108 -- Time to dismantle: 3247 seconds
2024-11-24 02:01:09,489 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 02:01:14,877 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 02:01:20,299 [DEBUG] lithops.standalone.backends.aws_ec2.aws_ec2:1134 -- SSH to 100.1.0.2 failed (publickey): Authentication failed.
2024-11-24 02:01:25,310 [DEBUG] lithops.standalone.master:261 -- Readiness probe expired for VM instance lithops-worker-957f510e (100.1.0.2)
2024-11-24 02:01:25,415 [ERROR] lithops.standalone.master:411 -- Readiness probe expired on VM instance lithops-worker-957f510e (100.1.0.2)
2024-11-24 02:01:25,415 [DEBUG] lithops.standalone.master:413 -- 0 of 1 workers started for work queue: wq:t3.medium-2-0.1.0.23
2024-11-24 02:02:07,906 [DEBUG] lithops.standalone.keeper:108 -- Time to dismantle: 3187 seconds
q^C
ubuntu@ip-:~$ tail -f -n 100 /tmp/lithops-*/master-service.log^C
ubuntu@ip-:~$ cd .ssh
ubuntu@ip-:~/.ssh$ ls
authorized_keys  id_rsa  id_rsa.pub  lithops_id_rsa  lithops_id_rsa.pub
ubuntu@ip-:~/.ssh$ ssh -i lithops_id_rsa ubuntu@
Warning: Permanently added '' (ED25519) to the list of known hosts.
ubuntu@: Permission denied (publickey).
ubuntu@ip-:~/.ssh$ ssh -i id_rsa ubuntu@
Warning: Permanently added '' (ED25519) to the list of known hosts.
ubuntu@: Permission denied (publickey).
ubuntu@i:~/.ssh$

@RichardScottOZ
Copy link
Contributor Author

so I can ssh fine to master and workers with the key used in config - so ssh_client problems from my adaptation or an actual key problem

@RichardScottOZ
Copy link
Contributor Author

i uploaded the key used in the config to a test directory or the master - and I could ssh to the worker using those from the master

@RichardScottOZ
Copy link
Contributor Author

As an aside, I thought I would do another test - a completely vanilla install where I let the new lithops (3.51 - was 3.3 when I did my books processing earlier in the year) - and default code with public vpc etc. install the hello world ran fine.

@RichardScottOZ
Copy link
Contributor Author

just as an actual test:

I was running this via windows/wsl

tried a vanilla setup with my private_ip branch and some tests ran ... still something odd with master finding workers that comes and goes it seems - had that working at one stage yesterday

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant