Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persistent Workers #263

Closed
fkorotkov opened this issue Apr 23, 2019 · 25 comments
Closed

Persistent Workers #263

fkorotkov opened this issue Apr 23, 2019 · 25 comments
Labels
feature high-priority Upcoming features that are prioritized

Comments

@fkorotkov
Copy link
Contributor

Once #108 is ready and one can execute Cirrus CI builds locally, we should consider to extend the CLI to be able act as a traditional CI agent.

This will allow customers to connect existing on-prem infrastructure to Cirrus CI. Currently several use cases are foreseen:

  • Connect existing pre-configured macOS machines.
  • Connect existing ARM machines or machines with exotic operation systems that are not supported by clouds.

Each worker will have a set of labels and tasks will be able to configure labels of workers to be scheduled on:

task:
  worker:
    os: darwin
    attached_device: true
    device_model: iPhoneXS
@james-crowley
Copy link

@fkorotkov I would love to see this functionality flushed out. If you need help on the s390x or ppc64le side of this, please let me know!

@fkorotkov
Copy link
Contributor Author

@james-crowley will do! This part will be fully OSSed so in case of any issues will reach out to you. 🙌

@james-crowley
Copy link

@fkorotkov Do you have an ETA on when persistent workers might appear? When they do I am more then willing to test them out on my resource over at IBM. Or I could set you up with resources on the LinuxONE, that you would have control over. Either way, let me know where I can help!

@fkorotkov
Copy link
Contributor Author

#108 is a prerequisite for persistent workers since workers will use it to run tasks. CLI is planned for this quarter and workers will follow in Q1.

@fkorotkov
Copy link
Contributor Author

@james-crowley are you still interested in s390x support? Persistent workers will be available in the next few weeks. Front end part is ready in https://github.com/cirruslabs/cirrus-cli, working on the backend part right now.

@james-crowley
Copy link

@fkorotkov Thanks for tagging me again. Super excited to see that persistent workers closer to being released.

As for the persistent workers, what language will the agents/workers be written in? Golang? If so, the porting work should be next to nothing since Golang has a cross compiler for s390x, ppc64le, arm64, etc.

Let me know how I can help test the feature as it nears release. Happy to spin up a worker on s390x and ppc64le!

@fkorotkov
Copy link
Contributor Author

@james-crowley wonderful! Yes, the persistent worker parts is written in Go. There are two parts:

  1. Cirrus Agent that executes instructions like running script, downloading/uploading caches, etc.
  2. Cirrus CLI which will act as Persistent Worker: poll for new jobs and call the agent to execute them.

Both are written in pure Go so should be very easy to cross compile. I'm planning to test it on Apple Silicon this week and once it's ready for further testing I'll ping you with instruction. Thank you for willingness with helping to test it out!

@james-crowley
Copy link

@fkorotkov Thanks for the new. Happy to see that everything is written in Golang!

Look forward from hearing from you later this week or next!

@fkorotkov
Copy link
Contributor Author

I'm happy to announce that Persistent Worker are available for Beta testing while we are still polishing the functionality.

Now you can create pools of persistent workers for your personal account (https://cirrus-ci.com/settings/profile/) or for an organization (https://cirrus-ci.com/settings/github/). Each pool can be either private (available only for private repositories) or public (available for both private and public repositories).

Here is a guide on how to install Cirrus CLI and run it in Persistent Worker mode: https://github.com/cirruslabs/cirrus-cli/blob/master/PERSISTENT-WORKERS.md

Once everything is configured and you create your first pool, you'll be able to see pools on dedicated URLs like: https://cirrus-ci.com/pool/u5sa29b35b882e3a010a9afe1c72751c0917

Screen Shot 2020-12-16 at 11 57 06 AM

Things that are left to do in the UI:

  1. Show currently running tasks on the workers
  2. Dedicated page for a worker with the whole history of tasks ran on it
  3. Options to Delete/Cordon a worker

@james-crowley
Copy link

james-crowley commented Jan 4, 2021

@fkorotkov I will be able to test s390x and ppc64le functionality towards the end of January. I am excited to see this working! Amazing work.

@fkorotkov
Copy link
Contributor Author

@james-crowley wonderful! Thank you in advance! We'll also add a bit more UI features like currently running task for a worker and a separate worker page with a history of all tasks ran on the worker.

@james-crowley
Copy link

@fkorotkov Didn't get a chance to test a workload against it yet but I had some spare time tonight to play around:

image

Working on getting the s390x working up and running too but running into issues with my cluster.

@fkorotkov Is there a simple test I can schedule against the worker? Or any test cases you want me to run to verify it working?

@james-crowley
Copy link

james-crowley commented Jan 5, 2021

@fkorotkov A couple issues I can into with the workers. First, I tried to manually set os and arch, since I was not sure if these would be auto populated. Trying to set them does not seem to work or at least they do not appear on the UI.

Secondly, when playing around with "custom" labels, I made a mistake in my label and re-launched the worker with the correct label. On the UI it seems to save both labels and just smash them together.

image

Seems like Cirrus is storing the labels somewhere and each time I re-launch the worker its either overriding the value or adding new values.

As for features I saw you mentioned:

Show currently running tasks on the workers
Dedicated page for a worker with the whole history of tasks ran on it
Options to Delete/Cordon a worker

Those all sound like great improvements. Maybe adding a spot to edit/add/delete labels for a given worker? Although I don't know how you would handle re-launch the worker with labels. Which labels do you keep? The ones saved/edited in the UI or keep the ones listed in the file/command you just ran?

Is there an API endpoint I can hit to look at more information about the workers?

@james-crowley
Copy link

Looks like I am running into some issues using persistent workers. It honestly might have to do with my .cirrus.yml file. The issue I see in the logs on the runner is:

ERRO[2809] failed to create an instance for the task 4793466095927296: invalid isolation parameters: unsupported isolation type <nil>

My .cirrus.yml file looks like:

task:
  persistent_worker:
    labels:
      os: linux
      arch: ppc64le
  script: echo "running on-premise"

@RDIL
Copy link
Contributor

RDIL commented Jan 5, 2021

The endpoint you are looking for may be found in the GQL schema, check the cirrus-ci-web repository.

@fkorotkov
Copy link
Contributor Author

@james-crowley thank you for testing workers out!

I tried to manually set os and arch, since I was not sure if these would be auto populated. Trying to set them does not seem to work or at least they do not appear on the UI.

These labels are automatically populated and that's how the UI shows the Status column with arch and OS icon. Documenting the default labels in cirruslabs/cirrus-cli#219

when playing around with "custom" labels, I made a mistake in my label and re-launched the worker with the correct label. On the UI it seems to save both labels and just smash them together.

That should be fixed on server side and after worker restart you should see only the latest labels. Not sure about edit/add/delete labels functionality in UI since not clear how a restart should work in that case. Just append new labels or update the existing ones? Right now labels are propagated upon worker registration.

invalid isolation parameters: unsupported isolation type

It's a regression on CLI version 0.24.0. Fixed on server side so everything should be good.

@james-crowley
Copy link

@fkorotkov Glad the issue was able to be resolved. Just tested running the job on ppc64le:

image

It works! :)

@fkorotkov
Copy link
Contributor Author

Nice! Thank you so much for testing it out!

@james-crowley
Copy link

@fkorotkov Is there any other testing you want to me try out on ppc64le and s390x? Additionally, if you are interesting in adding s390x and ppc64le resources to your offering let me know! Would be cool to have another CI/CD platform with s390x and ppc64le.

@fkorotkov
Copy link
Contributor Author

@james-crowley I won't bother you much more. I've already extensively tested the workers on ARM Linux and macOS machines and everything looks good. Seems power of Go really shines here.

We are interested in adding s390x and ppc64le to the cloud offering for OSS projects but as I see IBM cloud is still on hourly payment for Virtual Servers and not on per-second or at least per-minute. Cirrus uses a kind of novel idea of just spinning up VMs for each task and tearing them down instead of having traditional agent pools. You can check my post about that idea. But unfortunately that design doesn't play well with hourly billing when tasks are running for just a few minutes.

This is one of the reasons for having the persistent workers to support environments like this or when it's not nessesary to have clean ephemeral environment for each task.

@james-crowley
Copy link

@fkorotkov Interesting blog post!

Let me do some digging to see what I can come up with. There might a something out there that will meet your needs or something in the pipeline.

Let me know if you need anymore testing on s390x and ppc64le. Do you have a rough ETA when persistent workers will move out of beta and be merged?

@fkorotkov
Copy link
Contributor Author

There is a hard deadline of January 27th. We are moving Cirrus macOS tasks to essentially persistent workers with isolation through Parallels VMs. So by January 27th it will be well tested and load tested. Expecting a blog post about GA of workers and the migration/dogfooding around the same time.

@fkorotkov
Copy link
Contributor Author

Persistent Workers are now generally available. We even migrated macOS task execution internally to persistent workers which helped to fix a few minor issues when there are many workers in a pool. Here is a blog post with a bit more details https://medium.com/cirruslabs/new-macos-task-execution-architecture-for-cirrus-ci-604250627c94

@james-crowley
Copy link

james-crowley commented Feb 1, 2021

@fkorotkov Sorry about not getting back to you in a timely fashion. Unfortunately the metering currently does not support seconds on the Power and Z offerings in IBM Cloud.

But there might be another way to get Cirrus Labs access to s390x and ppc64le resources. Do you have an email I could reach out too? If you don't want to post your email here, you can email me with my email on my GitHub profile.

Congrats on releasing persistent workers! Cirrus Labs is killing it!

@fkorotkov
Copy link
Contributor Author

Thank you! Feel free to email me at fedor@cirruslabs.org Looking forward to potentially bring s390x and ppc64le to OSS 🙌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature high-priority Upcoming features that are prioritized
Projects
None yet
Development

No branches or pull requests

3 participants