-
Notifications
You must be signed in to change notification settings - Fork 842
Allow instance number to be passed in as an environment variable #1242
Comments
Hi @SEJeff, thanks for your idea. That seems to be a pretty specific use case and I am not sure that I understand it well. Please elaborate if you think I am missing something crucial. We do export the Mesos Task ID as the environment variable As of now, I would like to close the issue. |
I need something like this for creating volumes in a networked storage that are immediately available when a crashed instance comes back up on another server. Right now, if an instance comes up on a different server, the data isn't available. If I could identify individual instances of a single application, this would become insanely easy. At the moment, I have two options: no shared filesystem or using the same shared path for all instances which works for some applications but not most of them. |
Hi @lusid, can you elaborate, please? What do you mean by identifying the instances?
https://mesosphere.github.io/marathon/docs/task-environment-vars.html |
Let's say I have 10 physical servers. I run an app that creates a Docker container with 5 instances that are constrained uniquely by hostname. I want each instance to attach to a volume on the physical server that includes a number of the instance in the scaling group that it represents. Instance 1 = ID 1 Now, let's say Instance 3 dies and is recreated on a completely different server. I want that newly created instance to be able to take over the storage volume it created originally before it died. If I used the MESOS_TASK_ID, then I get a completely unique ID that is in no way related to the previous task that died. Because I have a networked file system between all servers in my cluster, this would basically solve the problem of not being able to locate data when a crashed instance returns on a completely different server, especially when the data stored by each instance must be stored in a different location to avoid data corruption. |
Let me summarize: You want all tasks to have some kind of sequentially assigned ID. If a task fails, you want its replacement task to get the same sequentially assigned ID. So that if you specify "instances": 10, you want to make sure that you always have tasks with the the IDs 1-10 running somewhere. You assign network volumes using these IDs. Thus you always have on task per network volume. @mhausenblas / @air What's our current best practice for dealing with persistence in cases like this? |
Correct. It would be nice if it worked automatically with scaling to different sizes as well, but I can see where difficulties would start appearing in those instances. I've been thinking about this a lot, and having an ID like this is the only thing I've been able to come up with for this use case. I have no idea how anyone runs long running processes on Marathon in its current state when they require persistence and when they aren't scaling to the full capacity of the cluster. If I could find a reliable alternative that worked in most cases, I would be happy. I would prefer to not have to constrain an app to X machines by hostname, and never be able to scale them up further than that. I'm sure I'm missing something, but it is driving me crazy. As soon as I need to store persistent data, all the awesomeness of Marathon starts to turn into crazy tedious Bash hacking tricks, or constraining myself to one machine which defeats the purpose altogether. |
This has come up again a number of times. Maybe this idea has more applications than I originally thought. |
Another use case where having a strong 'I am instance N of M' identity is useful: Cassandra nodes. e.g. instances 1 and 2 know that they are the leaders (their instance numbers are lowest) and configure themselves as seeds. I'm not convinced Marathon is the right level to provide this level of guaranteed identity. It seems like something a minority of apps would benefit from - the implementation weight would be wasted on other apps that scale horizontally with true independence. Marathon's current guarantee is, 'I'll run N of these for you and uniquely identify them' - but they are cattle, and the sense of 'being instance 5' is not carried over if #5 dies and is replaced. That feels like more of a pet. Technically do we see difficulties? I wonder if - in the event of network partitions, restarts - we might run into issues where e.g. there are two 'instance 5s'. |
I am +infinity for this feature. I can think of several places that having an instance number would make things much simpler. Most prominent for me is with monitoring. Let’s say that I have 7 Foos. Typically I’d then want to see a graph with 7 Foo metrics (lines) that I can compare and contrast. The fact that they are ephemeral doesn’t really matter. Conceptually I have 7 Foos — that may move about. I don’t want to see disjointed (and likely different colored) lines and multiple instances on the legend of my graph. I want to see 7 lines. And if I spot an anomaly I want to be able to overlay “event bars” that show me when an instance moved. Something like; “Whoa, what happened to 7. Oh, it flipped onto that spotty server…” And more important. A named instance (what we are asking for with “lasting instance numbers”) helps to keep the number of metric datasources from becoming ridiculously large. Rather than having a zillion instances in the history of a given metric, I can have 7. I.e. 7 datasources versus a brand new one for every time that docker instance is redeployed . In fact, we have had to create exactly this capability (instance numbers) on top of mesos/marathon, which is a real PITA. Honestly, I believe that most people think in terms of "instances of services”. Yeah sure, maybe you don’t name your cattle like that. But really, I think we all kinda do. (I’ve never raised cattle. But I have raised chickens, and while they certainly weren’t pets, I could tell them apart. And it was the same chicken whether it was in the yard or in the coop :~) Most of us don’t run 1000s of Foos. We run 10s or 100s. And clustered solutions (e.g. elasticsearch, cassandra, a bank of proxy servers, …) often want us to conceptually identify Nodes, so we can do things like traffic shaping (e.g. hot spots are routed to specific Nodes, etc). I don’t particularly care where Node 7 lives, but it is servicing only XYZ or is operating on this set of Shards. I like to think of these things as workers — not cattle vs. pets. (Personally, I think the whole cattle/pets analogy misses the mark somewhat.) My workers should be relatively interchangeable — think check-out people at the big-box-store. But they have do have names. And if Bill is working aisle 1 today and aisle 2 tomorrow, I don’t care. But I do care about Bill’ productivity, or whether he died last night. And it would be problematic if every time Bill worked at a different aisle, he had a different name… Thanks, |
Instances may live like cattle, but we need to treat them a little bit like pets when they get ill. Even real cattle are numbered. Even if this were only to function as a way to make it easier for humans to keep instances straight in their heads for a few minutes, it would be worth the effort (as the DNS was, and for much the same reason). Human-friendly naming imposes no burden upon automation, and it eases the cognitive load on the humans involved. |
Hey @BenWhitehead have you thoughts on this? We were talking similar issues recently. |
+1 Without this requested feature, is there a way to have an instance knows it 'shard id' and load its own data when going up? |
@air, @mwasn This should be reasonably easy to implement. Since there is only one Marathon instance that is currently leader and starting tasks, there should be no problems with network partitions (except of course those unrelated to this feature, e.g. that we don't restart tasks in that case). Implementation Proposal:
|
@kolloch does it mean that restart (or killing) application, marathon will remember |
Conceptually it's not difficult for marathon to pass a value as an environment variable. The complicated part is what that value should be and what is done with it during failure scenarios. For example, what should the instance number be for new instances of an app that are being started for an updated app? Once the apps are healthy the old instances will be torn down, should those numbers be re-used, or is it safe to abandon them? If numbers are supposed to be re-used what are the semantics around re-use? Here is a more concrete example that exposes some questions: Managing state of distributed systems is a very challenging thing to do well. Marathon (currently) is first and foremost a system for running stateless applications. If you application has a lot of complex state that needs to be managed/coordinated it would be a good idea to look into what it would take to write a mesos framework where you will have full control over managing the specific considerations of your app. This is why there are frameworks specific to Kafka, Cassandra, HDFS and other stateful apps. If you're asking for anything more than "In the history of my app, what task number am i?" I don't think it's a good idea for marathon to support it. Marathon already creates a task id that is available as an environment variable To the point about pets vs. cattle vs. Bill; from the standpoint of mesos the thing running here is a task. Attempting to further map the analogy to Mesos, Bill is a worker (mesos-slave) that when his resources are available (he's at work) are used to perform a task (checking out customers). This task has the same shape of work day-to-day but it is not the exact same every day. It could also be argued that Bill is a pretty stateless task that could easily be taken over by someone else if Bill was no longer able to perform his task for the day (sickness, break etc). |
If I understand correctly your concern I think that the problems you are raising can be overcome by leaving these things to the application level and not pull them into the framework (Marathon) level - Instead of supporting a "which task number am I" via the framework, just let the Job creation API specify small variations between the replicas agrs. For example: {
"id": "/product/service/myApp",
"instances": 3,
"cmd": "cp /path/to/remote/data/shard_$INSTANCE_NUMBER /local/data && run_my_service --data /local/data",
... And let $INSTANCE_NUMBER be replaced with a running number for each instance. |
@sielaq: What I specified would actually not reuse the
This would be achievable by updating the rules I provided above with only considering tasks with the same configuration. It would NOT ensure that a task with a certain There are plans in Marathon to use "dynamic reservations" to allow sticky tasks that are restarted in-place on failure or upgrade. It would definitely nice, if the MARATHON_APP_INSTANCE_NUMBER would be preserved in this case. But I would consider that a distinct issue. |
+1 |
+1, my use case is limited to monitoring as well. We just want a way of enumerating app tasks in a way that doesn't have duplicates but is otherwise as small as possible. I think it is also worth noting that when scaling down, this feature would mean that the highest number tasks would need to be terminated first. That might make satisfying placement constraints difficult. Also, when deploying if upgradeStrategy.maximumOverCapacity > 0 you have a problem. (I wouldn't actually care about these aspect of correctness, but others would I'd assume) |
I agree that “instance numbering” is not all that simple when one considers failure scenarios, but I don't think that that necessarily means it isn’t worth doing. In fact, I think that a “best effort” solution is completely adequate. I have enumerated some scenarios below. For those apps that don’t care about instance-naming, they can simply ignore it altogether. Also I believe that “host affinity” is a separate concern (although related by the common underlying use case), although I do I think this is another valuable addition to the ecosystem. AFAICT, The different scenarios for instance numbering are as follows; A) we scale down: bc is destroyed. (2 is now free) B) we scale back up, Add de and ef. We reuse the free slot (2) and add a new one. C) de & ab die, and are replaced by aa, bb D) Version X --> Y Optionally, if you have blue/green About Mesos frameworks (per Ben's comment above)— my problem with them is that they are often not layered on top of Docker. They use whatever OS, JVM, etc that is already host resident. Docker’s promise and raison d’être is to bring repeatability all the way down to the OS level. We have all been bitten by an OS that has a different set of patches, or has swap turned on, etc. IMHO, when we step away from that vision, it is a step backwards. Cheers, |
It sounds like people are discussing two different use cases here. I'd also dearly love a way to get metrics consolidated for an app rather than at either the task or slave level. But the re-using instance numbers seems a bit wrong - taking the cattle/pet analogy, this is akin to renaming your new cat 'Mr Tiddles' because that was the old cats name. Doesn't anyone else think it might be confusing to operators to notice Mr Tiddles suddenly grew his leg back and lost 10 pounds? |
I am looking for INSTANCE_NUMBER to be able to assign the correct Flocker volume. Maybe this will be handled some other way soon? |
Hi, I think this would be really nice feature and i have another use case: we are logging app metrics (cpu, rss, event-loop hangs, etc...) into Graphite. Our app is usually a long running service with stable instance count between 2 - 8 instances. The metrics mentioned definitely needs to be logged per instance (=per task in Marathon terminology). And when one of the tasks fails/restarts/whatever, we want the line in graphite to continue. We definitely dont want to have hunderds of metrics in Graphite (its difficult to read them and it will take too much disk space). So this feature would be really helpful - one sequential number that will get recycled (if its free) on a new task start. |
Adding a scenario inline with this request.
We now deploy a task called queue.consumer We want to scale queue.consumer using Marathon to 2 instances. It would be great if there was a way in Marathon to either:
|
Also see Cardinal Service idea in Kubernetes kubernetes/kubernetes#260 (comment) ...which on further reading became the PetSet proposal https://github.com/smarterclayton/kubernetes/blob/petset/docs/proposals/petset.md |
+1 |
3 similar comments
+1 |
+1 |
+1 |
Good news everyone! This is officially on the radar and we'll look at prioritizing it. Thank you for all the excellent use case examples. Internal tracker https://mesosphere.atlassian.net/browse/MARATHON-983 |
has anyone tried using https://github.com/spacejam/zk-glove to coordinate/track the number of instances? it may be trivial to hack up zk-glove to provide an INSTANCE_ID environment variable |
@air any update on this issue? This is quite an important requirement for our production use case. |
Copying from marathon-framework post: The next release is mostly planned at this point, so this feature would be a couple of months out realistically. Two things in the interim,
|
For what its worth, with PetSets, Kubernetes now supports this exact thing. Clearly there is a demand for a feature such as this, or there wouldn't be so many comments on this issue. It makes managing stateful services much nicer.
|
Thanks @SEJeff - spotted that a few comments back. This feature is on the backlog and awaiting prioritization - it's officially a good idea! |
+1 |
1 similar comment
+1 |
+infinity The args section in container probably can help to work it around. But we may have create N very similar apps/container, just with different ids in their args section. I am wondering whether there is a more elegant way to handle this. |
+1 |
"Good news everyone! This is officially on the radar and we'll look at prioritizing it." ...from April 2016. Are there any updates on this? |
Can someone be assigned to this issue, pretty please? By the way, Kubernetes already improved upon the initial PetSet concept in form of StatefulSet. |
+1. Would be a great feature. |
+1 - configuring app clusters across multiple docker containers for all kind of reasons would benefit from this feature. |
Note: This issue has been migrated to https://jira.mesosphere.com/browse/MARATHON-3602. For more information see https://groups.google.com/forum/#!topic/marathon-framework/khtvf-ifnp8. |
Say I have a docker container ie:
kafka:0.8.2.0
and want to run it under mesos. In marathon terminology, for each app, I want 10 instances. I need an integer that is unique amongst all instances in that app, but only for that app.Currently I've got a start script in python which does terrible black magic along the lines of:
This gives me a unique integer I can pass in as a kafka broker. However, I'm having marathon start up 10 instances of said brokers. It would be super nice if the instance number was passed from marathon to the container. Then that above code could be more like:
That way I get a per-app unique integer for each instance of said app. It seems like this wouldn't be super difficult to expose.
Thoughts?
The text was updated successfully, but these errors were encountered: