Skip to content
This repository has been archived by the owner on Sep 26, 2021. It is now read-only.

Update Docker Machine Roadmap for 0.4.0 #1239

Closed
wants to merge 1 commit into from

Conversation

nathanleclaire
Copy link
Contributor

Meant for a high-level overview of where @ehazlett and I's heads are at in terms of goals for the 0.4.0 release.

Would like to get feedback. Which ideas are great and will help with your use cases? Which ideas seem terrible? Have we accurately represented these goals in our ongoing conversations so far?

cc @aanand @bfirsh @ehazlett @hairyhenderson @sthulb @huslage @tianon @chanezon @vieux @vincentbernat @jeffmendoza @frapposelli @samalba @SvenDowideit @ibuildthecloud @amylindburg @aluzzardi for feedback - sorry about the CC bomb but I think all of you have valuable insight about "different parts of the elephant" so to speak and deciding the direction(s) to go with this correctly is very important for next steps in Docker orchestration.

Signed-off-by: Nathan LeClaire nathan.leclaire@gmail.com

@ehazlett
Copy link
Contributor

Very nice writeup! I think that captures the discussions we've had. I'm worried a bit about the scope but I think focusing on libmachine would get a good groundwork for the others. Overall, I think this would be awesome :)

addresses are something which must be tackled in some fashion eventually if
Machine is to become a reliable, and error-resilient, tool.

## Support for creating multiple instances at once
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe move this section up above "Uniform Resource Model" so it's not mistaken as being tentative?

@hairyhenderson
Copy link
Contributor

👍

Lots of exciting stuff here :)

@tianon
Copy link
Contributor

tianon commented May 27, 2015

👍

2 similar comments
@huslage
Copy link
Contributor

huslage commented May 27, 2015

+1

@samalba
Copy link
Contributor

samalba commented May 27, 2015

👍

@chanezon
Copy link
Contributor

One common use case for provisioning docker engine on a cloud provider (I have experienced this on azure) is to create an storage backed drive and mount /var/lib/docker on it.
I don't see how to accomplish that with current proposals (except maybe writing a plugin)
Supporting an optional cloud-init parameter would accomplish that.

@nathanleclaire
Copy link
Contributor Author

@chanezon What are the steps to do so? Would docker-machine create -d provider --engine-opt graph=/storage/backed/drive ... help? The issue I can see is needing to mount that volume after the instance has been created, I am curious to hear more about the use case.

@nathanleclaire
Copy link
Contributor Author

Changed the order a bit like @hairyhenderson mentioned to put both of the "tentatives" at the bottom.

Signed-off-by: Nathan LeClaire <nathan.leclaire@gmail.com>
@chanezon
Copy link
Contributor

@nathanleclaire no that would not be enough.

While provisioning you need to create the drive, then format and mount it.

I documented that process in details in this doc https://github.com/chanezon/azure-linux/tree/master/coreos/cluster#data-disk

Look at the --data-disk option doc: the python provisioning script creates the disk https://github.com/chanezon/azure-linux/blob/master/coreos/cluster/azure-coreos-cluster#L225 then the cloud-init units format and mount it.

@ehazlett
Copy link
Contributor

@chanezon I would really like the idea of cloud-init. Perhaps we should re-investigate attempting to use. For those that aren't aware, the reason we delayed on cloudinit is not every provider fully supports it. After running into issues on the 4th (out of 14) provider, we chose to start with our model instead. My fear is that we would rely very heavily on the provider for cloudinit. If they upgrade their version, etc we would have to wait on them to fix (the issue on the 3rd provider was they didn't allow custom repositories to be added - it can vary widely from provider to provider). Another option would be to try to run a version of cloudinit ourselves, however, this usually involves a mounted loopback filesystem and not all providers support that either.

As an aside, a pre-create hook in the rivet system would allow this as well.

@nathanleclaire
Copy link
Contributor Author

cc @efrecon I want to make sure you're looped in on this... I know a lot of these same things are similar to what machinery does today, and it'd be great to get your insight and contributions in terms of the UX (however you can - if you don't know Go, we're happy to help you learn as well). Hopefully we can free you up to do more high-level things with machinery by taking care of some of the boilerplate stuff as well, and/or making machine easier to interact with for you.

@bfirsh
Copy link
Contributor

bfirsh commented May 29, 2015

Looks broadly good, thanks guys! A few comments:

Quality + UX

This is quite ambitious for 0.4! I would much rather see an explicit focus on improving quality and user experience for what Machine is currently good at than trying to add loads of features. When we make Machine the recommended way of running Docker locally, nearly all of our users are going to be using it to manage a local VirtualBox VM and it needs to be really good at that.

I think there's lots of improvements that can be made here -- I'm still running into really basic UX problems such as #962 and #1266. Adding lots of new features is only going to increase the burden of keeping quality high.

Compose has done this pretty well, I think. It is still mostly focused on making the core experience (running development environments) really really good, and has been conservatively adding new features.

Compose files

I'm concerned about automatically running a Compose file on a Machine. We have some plans about how to manage Compose applications over time, and this feels like a hack that would start to shift this responsibility onto Machine. @aanand might want to chip in here.

Extensions

The extension support should use the same method as extensions in the Docker Engine (moby/moby#13161 moby/moby#13222). I don't think there should be fragmentation here.

@ehazlett
Copy link
Contributor

Thanks @bfirsh. I think the scope was a bit much too (see my first comment) but I like the general direction.

When we make Machine the recommended way of running Docker locally, nearly all of our users are going to be using it to manage a local VirtualBox VM and it needs to be really good at that.

I am not opposed to a quality release (we can always use that :) - I would just like to have clarification on what your ideas are for the direction of Machine. Machine has always supported multiple providers both local and remote. If the focus is shifting to local providers then we should probably move to an extension model ASAP, halt on new remote drivers, push them to the community and focus on getting the UX rock solid for local (I'm assuming you mean all local (VirtualBox, VMware Fusion, Hyper-V).

Compose files

I am fine leaving compose duties to compose. That makes sense. I think what @nathanleclaire was referring to was the ability to perform actions at certain times during provisioning.

Extensions

Rivet was an experiment in pushing provisioning responsibilities outside of Machine (in hindsight I should have kept it private). I would much rather use an official extension framework.

@aanand
Copy link

aanand commented May 29, 2015

I am fine leaving compose duties to compose. That makes sense. I think what @nathanleclaire was referring to was the ability to perform actions at certain times during provisioning.

Makes sense to me too. (If Compose is packaged up in a Docker image - which is something we should look into doing officially - then it's just a very simple special case of this feature.)

@efrecon
Copy link

efrecon commented May 29, 2015

(warning... long post)

Very interesting reading indeed, and as @nathanleclaire wrote, a number of those things might have overlaps to machinery. I have been looking for a good excuse to start with Go, so yes there are probably ways to collaborate on this in one form or another!

As for a start, let me try to describe how machinery came to exist, as I think that it provides (opinionated) insights to where machine itself could (or not) head to. This is both from the point of view of a user and developer, but obviously from the outside as I had not looked at any part of the code prior to start working on machinery and I am not part of the docker team.

I took a long and rather unsuccessful foray in CoreOS before taking the decision to move my clusters to using the full docker ecosystem. As I saw it, it looked like this (you all know this, obviously, but sometimes it's perhaps good to look to this through "new" eyes):

  • Docker is extremely good at taking an image (downloading) and creating a container out of it, with all arguments and resources needed. It does a whole lot more, but in essence, this is what it does from a UX perspective.
  • Compose is good at bundling components together so they perform what your application has to do (or parts of your application).
  • Swarm is all about orchestration and putting a cluster of virtual machines together. Combined to compose, this results in scalable ways to start/stop the components that will form the bits and pieces of your application from a central point.
  • Machine (0.2) is all about creating one (in 0.2) virtual machine that will be part of the cluster that swarm is orchestrating.

So to refer back to @bfirsh above, all these tools do one single thing, and aim at doing this right. I think that this is key to their current success.

Machinery came along from reflecting around vagrant, the vagrantfile is a description of a whole cluster, though as a program. As machine was about creating one machine, it felt pretty normal to add a tool that would be able to use machine to create a bunch of machines. Pretty much the same as what compose does to docker: it takes a single file, and from that file creates a number of interrelated components. I largely preferred the YAML-route taken by compose, so I went the same route for machinery. YAML is clear, concise and well-defined: nothing else than what the file format expresses can be done, which is deterministic and can be easier to approach and grasp than the ruby (I have nothing against ruby, don't take me wrong!).

So machinery started with this: take one YAML file, and arrange for the file to describe a number of VMs that will be created by machine while being part of the same cluster (the token command was one of the first command that I implemented). As I wanted to be able to start specific components on specific machines, I arranged for the YAML format to specify labels that can be attached to the VMs that machinery creates through machine. The YAML format would provide "details that won't change": the size(s) of the machine, where they will be hosted, etc.

I could have stopped there, since the next logical step from a UX perspective was to manually run compose to start bunch of components wherever needed through the help of the swarm master. But I realised that the YAML file was intrinsically carrying a key feature: it had the potential to contain (or point at) a complete and deterministic description of a whole cluster and its components. So I went that route and added features like:

  1. Specifying what components to run on a machine of the cluster as soon as it is created. For this machinery will use compose to arrange for the components to be started up. It is useful during development, but also in production as @nathanleclaire summarised in his analysis.
  2. To make sure swarm is able to start components quickly, I also added a list of images that can be pre-loaded on the machines. This is handy, since this is one of the use of the labels, i.e. a way to describe what the created machine will be good at and used for. The current implementation actually copies (docker save and docker load) the images, which improves security as you can arrange for "secret" images that you would have locally to easily be spread out to some of the machines in the cluster without the VMs to need to know about internal registries or credentials to external registries.
  3. To simplify usage and lifecycle management, and because I felt that having to "jump in" the swarm master to dynamically start up components was easy to forget, I extended the swarm command to handle these cases: killing components and restarting them. It does it on compose project files as this is most of the time what you want.

Note that machinery does a little bit more, for example around shares/volumes (vboxsf, rsync) or port openings (VirtualBox specific). But this is out of the scope of this discussion. The only additional feature worth to mention is the ability to substitute environment variables in compose project files. Substituting is handy in order to propagate information from the central point that machinery represent to the "edges", i.e. the components themselves. As this steps off the official compose file format, implementation is a bit hacky and relies on temporary files. But I think that the feature can be of interest. @aanand ?

So back to where machine should be heading at or not. I think that you might be at a turning point:

  • Either you decide that machine should become what boot2docker is, but for "the cloud" and should focus on creating one VM, using plenty of local/remote drivers and supporting several OSes on those machines. This rather vague description contains a lot of possible combinations, corner cases and possible bugs to squeeze before stability is reached. My feeling is that this is more in line with the rest of the docker toolset: one tool for a well-defined purpose.
  • Either you go for a more complex route and make sure that machine is able to create more than one VM. You would still have to squeeze all the corner cases and stability issues mentioned above, but you would also need to decide where machine would stop in its feature set. This is pretty much why I have been trying to summarise the history of machinery: my design has been incremental, obviously with a focus on my particular needs (mostly). But as the evolution of this design has felt natural given the set of existing tools in the docker ecosystem, it is very likely that you would take similar routes and perhaps end up with a tool that slightly looses focus. I was alone, you are a team. Teams are great at reaching consensus, so I trust you would stabilise before that!

I fully understand that I am biased. If you go for the "one machine" route, then, there still is a need for a tool on top of machine, and well... that tool would look very much like machinery (or provide the same kind of abstractions and facilities).

This post has tried to take some distance from the current state of affairs. I have little insights in the way you work as a team and the "grand docker goals", so I might completely have misunderstood. I would be happy to dig much more in the feature set and discuss in more details in further posts. Was I of any help?

@ehazlett
Copy link
Contributor

@efrecon Thanks for the awesome write up! It absolutely helped.

You bring up tremendous points. After reflecting a bit about this I agree with your comment about creating a single VM reliably and stable. We have a lot of work to do around the creation process, provisioning, progress reporting, error reporting / recovery etc as @bfirsh mentioned. I agree with @bfirsh that moving to more features only spreads us out more and exposes a much greater plane for maintenance and issues. I think we could still support the community using Machine as a tool (i.e. machinery, rancher, etc) while still focusing on making that process insanely stable.

It is easy to get caught up in wanting to add features (we've had some very interesting brainstorming discussions about the future) - especially when we have several avenues of input each wanting different things. It is awesome to have the community to help drive what we want Machine to become. I am truly grateful for feedback like this to help me personally see other sides.

With this being said, it doesn't mean Machine won't try to achieve some or all of the goals listed above. I think we just need to put more into the "foundation" before expanding.

@efrecon your feedback was invaluable. Thank you very much!

@efrecon
Copy link

efrecon commented Jun 3, 2015

This is slightly off-topic, but I thought that it would be of interest to you as we are talking roadmap. I have just pushed an initial implementation of a Web API to operate on machinery from a distance. This has been on my roadmap for a while and I thought that now would be a good time. This makes machinery a flexible machine to create machines, and is in line with docker itself. The reason why I am posting this is because this could be some of the things that you would consider in the future for machine itself.

@nathanleclaire
Copy link
Contributor Author

This is different than what we're actually going to focus on this release, but we will revisit most of these ideas at some point.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants