Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(up): Spawn multiple trigger commands #2213

Merged
merged 5 commits into from
Jan 23, 2024

Conversation

carlokok
Copy link
Contributor

@carlokok carlokok commented Jan 8, 2024

This PR does a few things and fixes #1755:

  • Fix the HTTP/Redis plugin to use the lock file's metadata.triggers.http/redis path instead of metadata.trigger field.
  • Use tokio to async wait for the sub processes to end (I didn't want to introduce a new dependency on async processes)

End result:
WindowsTerminal_fSadslpDYA

One note: I cannot get the integration tests to run well on Windows. Probably a local issue, so I was unable to verify if the integration tests work.

@carlokok
Copy link
Contributor Author

carlokok commented Jan 8, 2024

Fixed the print! that should never have been there.

lann
lann previously requested changes Jan 8, 2024
Copy link
Collaborator

@lann lann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! I think this would benefit from some discussion around a few key points:

  • spin up UX:
    • How should --help work for multiple trigger types
    • If a single trigger executor dies does spin up quit completely?
  • Backwards compatibility in locked app manifest

@@ -154,7 +156,14 @@ impl UpCommand {
.with_context(|| format!("Couldn't find trigger executor for {app_source}"))?;

if self.help {
return self.run_trigger(trigger_cmd, None).await;
return join_all(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to best handle this; as-is I believe it will print complete spin up usage info for each trigger type. Worst case it may even interleave usage output for different triggers.

Copy link
Contributor Author

@carlokok carlokok Jan 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can run them sequentially, but that will still print a lot more than it should. Suggestions on this one would be appreciated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For --help in particular I think the ideal UX would be to print trigger-specific flags in separate sections, but that would be a bigger chunk of work. Also I'm realizing (as I think @itowlson alluded to earlier) that there is a more fundamental problem here with how to parse trigger-specific flags and I'm really not sure how to handle that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it comes to it, I'd argue we can punt on help printing more than it should - it will be untidy, but we can track it as a "known issue with the first release" and it's not worth blocking getting the feature into people's hands.

Re trigger specific flags, one thing I tried in the composite trigger plugin experiment was prefixing flags with the type of the trigger for which they were intended e.g. --http:listen / --http-listen. The plugin would look for prefixes matching the composed triggers and would selectively pass the de-munged flags on. It's probably tedious to require that on all flags, but perhaps it could be used to disambiguate when you want to pass different flag values with the same name e.g. --http-listen 3001 --websocket-listen 3002? Just thinking out loud here though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think this one should be something you guys have to get to agreement on. It's now not properly working due to timers not supporting listen, but this is more of a choice on approach than something I can fix myself.

};

self.run_trigger(trigger_cmd, Some(run_opts)).await
join_all(trigger_cmd.iter().map(|cmd| {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if one of the trigger executors fails to run? If I'm not missing anything I believe spin up would happily continue executing the other triggers which probably (?) isn't what we want.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into it.

crates/app/src/lib.rs Show resolved Hide resolved
@@ -2,13 +2,11 @@ use serde::{Deserialize, Serialize};
use spin_locked_app::MetadataKey;

/// Http trigger metadata key
pub const METADATA_KEY: MetadataKey<Metadata> = MetadataKey::new("trigger");
pub const METADATA_KEY: MetadataKey<Metadata> = MetadataKey::new("triggers.http");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto re: backward compat.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't used anymore in my pr, bad leftover. If we can wrap up above I can remove this one.

crates/app/src/lib.rs Show resolved Hide resolved
&'this self,
trigger_type: &'a str,
) -> Result<T> {
Some(&self.locked.metadata["triggers"][trigger_type])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will panic if "triggers" or the specific trigger_type don't exist. This whole function should probably return Result<Option<T>> instead, with Ok(None) if the key(s) are missing.

Copy link
Contributor

@itowlson itowlson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of comments, mostly at the nit level. I'm really excited to see this!

};

self.run_trigger(trigger_cmd, Some(run_opts)).await
join_all(trigger_cmd.iter().map(|cmd| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should exit as soon as any trigger exits - otherwise the user could end up with only half their application running.

That said, there has been vague talk of a run or once trigger, which would invoke its guest and then exit. If that trigger existed, we'd need to be more nuanced about cancelling other triggers in the same application. That's speculative at the moment though!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to try_join_all that seems to give this effect.

@@ -235,7 +249,7 @@ impl UpCommand {
})?;
}

let status = child.wait()?;
let status = tokio::task::spawn(async move { child.wait() }).await??;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Would it be more natural to change this function to use a tokio::process::Command, so that you merely had to await child.wait() rather than explicitly spawn a task and awaiting that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be a lot better, had I known that existed :) This is pretty much my first serious venture into Rust. I'll fix.

@@ -83,7 +83,7 @@ pub enum ResolvedAppSource {
}

impl ResolvedAppSource {
pub fn trigger_type(&self) -> anyhow::Result<&str> {
pub fn trigger_type(&self) -> anyhow::Result<Vec<&str>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I am getting into the weeds at this point, but this will need to be renamed trigger_types

@carlokok
Copy link
Contributor Author

carlokok commented Jan 9, 2024

I did a new push and think I fixed all comments you provided. Let me know if you see anything else that needs fixing.

@dicej
Copy link
Contributor

dicej commented Jan 9, 2024

Lann and Ivan asked me to comment on this from the perspective of the WebSocket proposal. It looks fine to me, and I think it's pretty much orthogonal to the addition of a websocket trigger.

The current plan is for the websocket trigger type to be an alias for the http trigger type with an additional feature enabled (the ability to upgrade HTTP connections to WebSocket connections). I don't expect that supporting multiple trigger types in a single spin up command will affect that one way or another.

@itowlson
Copy link
Contributor

itowlson commented Jan 10, 2024

@carlokok I think I see what is happening with the timer trigger. Because TRIGGER_METADATA_KEY has switched over to triggers, the deserialisation type now need to reflect the timer nesting e.g.

// This is the "triggers" element of the lockfile, which can contain multiple
// objects e.g. "http", "timer", etc. Ignore all the ones the timer trigger
// doesn't care about.
#[derive(Clone, Debug, Default, Deserialize, Serialize)]
struct TriggerMetadataParent {
    timer: Option<TriggerMetadata>,
}

#[derive(Clone, Debug, Default, Deserialize, Serialize)]
#[serde(deny_unknown_fields)]
struct TriggerMetadata {
    speedup: Option<u64>,
}

// This now needs to point to the *parent* schema not the timer-specific schema
const TRIGGER_METADATA_KEY: MetadataKey<TriggerMetadataParent> = MetadataKey::new("triggers");

#[async_trait]
impl TriggerExecutor for TimerTrigger {
    // ...
    async fn new(engine: spin_trigger::TriggerAppEngine<Self>) -> anyhow::Result<Self> {
        let speedup = engine
            .app()
            .require_metadata(TRIGGER_METADATA_KEY)?
            .timer               // <------- new element here
            .unwrap_or_default()
            .speedup
            .unwrap_or(1);
        // ...
    }
}

Looking into some of the other troubles you noted as well - thanks for persevering with this. The devil was always going to end up in the details...!

@carlokok
Copy link
Contributor Author

Thanks! I'll try to fix the linting issue when I get back in the office tomorrow.

@itowlson
Copy link
Contributor

@carlokok I think that 'linting' issue is actually a compiler error that child.id() is now an Option rather than a plain number (a side effect I didn't realise when suggesting changing to tokio::process::Command, sorry).

Although this is reasonably easy to fix (e.g. using if let or Option::map), there's also going to be a problem with callint ctrlc::set_handler in a loop. That function can only be called once for a process: when there are two trigger types, the second call panics.

I think the way this ends up looking is moving the Ctrl+C handling out of the run_trigger function, and instead collecting the futures from the calls to run_trigger and having a single Ctrl+C handler that loops over them. This may have its own ramifications in the design of run_trigger though.

Does that all make sense? I'm happy to sketch this out in more detail if that would be useful. (Or I can send you a PR for it, if that's better for you - I already had to rough this out because I am on WSL, so had to make these tweaks anyway!)

@carlokok
Copy link
Contributor Author

If you can help me a bit with a PR that would be greatly appreciated. I'm not currently setup with Linux development and the error does not trigger on Windows.

@itowlson
Copy link
Contributor

@carlokok Is there any further support we can give you to land this? I ask this not in order to hound you, but just wanted to check in, in case you're stuck or need more information or discussion from us.

@carlokok
Copy link
Contributor Author

I was actually wondering if anything else needs to be fixed or changed. But now I see some of the checks failed so I will try to reproduce it locally tomorrow and fix it

@itowlson
Copy link
Contributor

Thanks @carlokok! There's also the issue I mentioned in my second PR, where the HTTP trigger seemed to be working in multi-trigger apps but not in single-trigger apps - I'm not sure how that's even possible and it makes me anxious that the two cases are going down different code paths. (I was able to work around it, and I shared the workaround in the PR description, but would like to be sure that we understand what's happening.)

@lann
Copy link
Collaborator

lann commented Jan 18, 2024

I also created a PR against your branch that reorganizes the metadata code: carlokok#3

@lann lann dismissed their stale review January 18, 2024 17:13

original concerns addressed; needs re-review

@carlokok
Copy link
Contributor Author

Would it be possible that someone does another quick review & run the workflows? Iann did a good fix I think and it shows at least what testcases I need ot look at.

@carlokok
Copy link
Contributor Author

@itowlson I think @lann s pr might have fixed your concerns too. All tests pass too now. Is there anything else you want me change further?

@itowlson
Copy link
Contributor

@carlokok Thanks for the update. I've kicked the tyres on this and it seems good. What I tried:

  • HTTP (built in) single and with timer
  • Timer (updated) single and with HTTP
  • SQS (not updated) single and with HTTP and timer

Everything worked except for SQS in the multi-trigger configuration. This is because SQS, being linked to the existing trigger crates, looks at trigger instead of triggers, which is not set in the multi-trigger case. There's nothing we can do about this, and unfortunately I don't think we can even improve the error handling because the unwrap() is baked into the existing trigger crate.

There is still something amiss in the SQS + timer + HTTP case where, in one of my tests, timer did not exit after SQS crashed. (HTTP did exit, so this is likely a race condition.) I am okay with continuing to work on this as a separate PR - I don't think it's a blocker for this one, but it's something to consider for the release.

There are also a few supporting changes that don't block this PR but need picking up, e.g. the template experience for "add a X trigger to a Y application". (Plus of course docs but we know to do those as part of the relevant release docs.)

@melissaklein24 @lann Given that status, and the upcoming 2.2 release, how do you want to play this? We can either:

  1. Merge now, include ir in 2.2, document any known bugs, and announce it as an experimental feature in 2.2.
  2. Merge now, include it in 2.2, but don't announce it immediately - give it a few weeks to shake out bugs and UX stuff like templates, and announce it in 2.3
  3. Hold off merging until after 2.2, just in case there are any lurking nasties in the single-trigger scenario. (Which I don't think there are, but I wouldn't stake my life on it.) Then merge immediately after 2.2 so we can start the shaking-out process ready for 2.3.
  4. Something else

What do you reckon?

@itowlson
Copy link
Contributor

@carlokok Could you GPG-sign the commits please? We require GPG signing, but unfortunately GitHub doesn't seem to check it until the PR is approved and ready to merge. Sorry about this.

carlokok and others added 4 commits January 23, 2024 08:35
Signed-off-by: Carlo Kok <carlo@rb2.nl>
Signed-off-by: Carlo Kok <carlo@rb2.nl>
Signed-off-by: itowlson <ivan.towlson@fermyon.com>
Signed-off-by: itowlson <ivan.towlson@fermyon.com>
Signed-off-by: Lann Martin <lann.martin@fermyon.com>
@carlokok
Copy link
Contributor Author

@itowlson I had to rebase the whole thing to resign it but I think it should be good now.

Copy link
Collaborator

@lann lann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its fine to merge. We can always feature-gate it behind a flag/env var if we aren't confident enough to have it enabled by default in 2.2.

@itowlson
Copy link
Contributor

@carlokok Thanks - all green now!

@lann 👍

@itowlson itowlson merged commit 4ed8f53 into fermyon:main Jan 23, 2024
11 checks passed
@carlokok carlokok deleted the feat/multiple-triggertypes branch January 24, 2024 08:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for multiple trigger types in a single Spin application
4 participants