Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prioritized handling of specific routes (for health checks) #223

Open
arlowhite opened this issue Sep 27, 2024 · 4 comments
Open

Prioritized handling of specific routes (for health checks) #223

arlowhite opened this issue Sep 27, 2024 · 4 comments

Comments

@arlowhite
Copy link

arlowhite commented Sep 27, 2024

When deploying to a cloud environment such as AWS ECS Fargate, the load balancer pings the containers to see if they're still responding and kills them if they do not report back in a certain amount of time. If all :default worker threads are busy, then the /health check will hang, resulting in the container being killed even though it is working.

Solution

Oxygen should have a way to prioritize certain routes so that they are handled even when all threads in the :default threadpool are busy.

To be determined: should it use the main interactive thread, or another threadpool? maybe this should be configurable?

API ideas

router("health", prioritized=true)

or

my_router = router()
serve(prioritized_routers=[my_router])

However, router/route based approach would require refactoring Oxygen code significantly.

simple solution

A simple solution is proposed in PR #224

    serveparallel(
        is_prioritized = (req::HTTP.Request) -> req.target == "/health"
    )

Example Server

An example server to demonstrate the problem.
Run this server with -t 1,1
GET http://localhost:8080/block
GET http://localhost:8080/health

/health won't respond until /block finishes.

using Base.Threads
using Oxygen

println("$(nthreadpools()) threadpools. default=$(nthreads(:default)) interactive=$(nthreads(:interactive))")

function block(n::Int)
    start_time = time()
    while (time() - start_time) < n
    end
    return "Blocking complete after $n seconds"
end


@get "/health" function()
    @info "entered /health threadid=$(Threads.threadid())"
    json(Dict(:status => "healthy"))
end

@get "/block" function ()
    block_time = 10
    @info "blocking for $block_time seconds. threadid=$(Threads.threadid())"
    # using block because sleep() would yield
    block(block_time)
    @info "done blocking threadid=$(Threads.threadid())"
    text("done")
end

serveparallel()
@arlowhite arlowhite changed the title Prioritized handling certain requests (for health checks) Prioritized handling of specific routes (for health checks) Sep 27, 2024
@arlowhite
Copy link
Author

I have a proof of concept here #224

The easiest thing would be to just look at HTTP.Request.target.

core.js

function parallel_stream_handler(handle_stream::Function)
    function (stream::HTTP.Stream)
        # parse request before spawning handler
        @debug "parallel_stream_handler threadid=$(Threads.threadid())"

        req::HTTP.Request = stream.message

        # if prioritized path
        if (req.target == "/health")
            @debug "prioritized request to $(req.target), running in Main interactive thread"
            handle_stream(stream)
        else
            task = Threads.@spawn begin
                @debug "within @spawn begin threadid=$(Threads.threadid())"
                handle = @async handle_stream(stream)
                @debug "wait on @async handle_stream threadid=$(Threads.threadid())"
                wait(handle)
                @debug "after wait threadid=$(Threads.threadid())"
            end
            @debug "root wait threadid=$(Threads.threadid())"
            wait(task)
            @debug "after root wait threadid=$(Threads.threadid())"
        end
    end
end

Configuration could work like this:
a22d8f3

    serveparallel(
        is_prioritized = (req::HTTP.Request) -> req.target == "/health"
    )

@ndortega
Copy link
Member

Hi @arlowhite,

Thanks for raising this issue, we definitely don't want running oxygen apps to be mislabeled as unhealthy by external monitoring systems.

For cases like this, I think the most succinct way to solve this issue is through a app-level middleware function:

using Oxygen

@get "/" function()
    text("welcome to the api")
end

@get "/block" function ()
    block_time = 10
    @info "blocking for $block_time seconds. threadid=$(Threads.threadid())"
    # using block because sleep() would yield
    block(block_time)
    @info "done blocking threadid=$(Threads.threadid())"
    text("done")
end

function HealthCheck(handler)
    return function(req::HTTP.Request)
       # case 1: return early on health checks
        if req.target == "/health"
            return HTTP.Response(200, "ok")
        # case 2: handle all other incoming requests
        else 
            return handler(req)
        end
    end
end

# set application level middleware
serve(middleware=[HealthCheck])

Middleware functions are run before we pass the HTTP.Request to the internal router for the web server, so they can also act as a short-circuit logic for import routes like health checks.

We could go one step further and provide a helper function to help bootstrap this middleware function

function HealthCheck(route::String)
    return function(handler)
        return function(req::HTTP.Request)
            # case 1: return early on health checks
            if req.target == route
                return HTTP.Response(200, "ok")
            # case 2: handle all other incoming requests
            else 
                return handler(req)
            end
        end
    end
end

serve(middleware=[HealthCheck("/health")])

I believe this should solve this issue, but let me know if there are any important edge cases I haven't addressed.

@arlowhite
Copy link
Author

@ndortega the problem is the thread is spawned before executing the middleware chain. (from parallel_stream_handler). So if all threads in the :default threadpool are in use, requests cannot be handled until a thread frees.

If you want want something more flexible in case there are other use cases for running some code before the thread spawn, then maybe the thread spawning code could be moved to internal middleware, then things to run in the interactive thread are before ThreadSpawn (middleware), and middleware after is run in the worker thread.

@ndortega
Copy link
Member

ndortega commented Oct 1, 2024

Ah, I see what you mean. Out of curiosity, how many threads are in each threadpool for your application? If your api is primarily deals with longer lived requests it could just be a resource allocation issue. In that case, interactive threads aren't going to do much to help improve performance since they're designed for handling short lived tasks.

If that's not the case, then we'll need to see where we can address this. You bring up a good point, we could move the thread spawning around and let the routing be done exclusively on the main thread. We could also go with some kind of prioritized router or middleware approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants