Prioritized handling of specific routes (for health checks) #223

arlowhite · 2024-09-27T01:07:24Z

When deploying to a cloud environment such as AWS ECS Fargate, the load balancer pings the containers to see if they're still responding and kills them if they do not report back in a certain amount of time. If all :default worker threads are busy, then the /health check will hang, resulting in the container being killed even though it is working.

Solution

Oxygen should have a way to prioritize certain routes so that they are handled even when all threads in the :default threadpool are busy.

To be determined: should it use the main interactive thread, or another threadpool? maybe this should be configurable?

API ideas

router("health", prioritized=true)

or

my_router = router()
serve(prioritized_routers=[my_router])

However, router/route based approach would require refactoring Oxygen code significantly.

simple solution

A simple solution is proposed in PR #224

    serveparallel(
        is_prioritized = (req::HTTP.Request) -> req.target == "/health"
    )

Example Server

An example server to demonstrate the problem.
Run this server with -t 1,1
GET http://localhost:8080/block
GET http://localhost:8080/health

/health won't respond until /block finishes.

using Base.Threads
using Oxygen

println("$(nthreadpools()) threadpools. default=$(nthreads(:default)) interactive=$(nthreads(:interactive))")

function block(n::Int)
    start_time = time()
    while (time() - start_time) < n
    end
    return "Blocking complete after $n seconds"
end


@get "/health" function()
    @info "entered /health threadid=$(Threads.threadid())"
    json(Dict(:status => "healthy"))
end

@get "/block" function ()
    block_time = 10
    @info "blocking for $block_time seconds. threadid=$(Threads.threadid())"
    # using block because sleep() would yield
    block(block_time)
    @info "done blocking threadid=$(Threads.threadid())"
    text("done")
end

serveparallel()

The text was updated successfully, but these errors were encountered:

arlowhite · 2024-09-27T03:00:53Z

I have a proof of concept here #224

The easiest thing would be to just look at HTTP.Request.target.

core.js

function parallel_stream_handler(handle_stream::Function)
    function (stream::HTTP.Stream)
        # parse request before spawning handler
        @debug "parallel_stream_handler threadid=$(Threads.threadid())"

        req::HTTP.Request = stream.message

        # if prioritized path
        if (req.target == "/health")
            @debug "prioritized request to $(req.target), running in Main interactive thread"
            handle_stream(stream)
        else
            task = Threads.@spawn begin
                @debug "within @spawn begin threadid=$(Threads.threadid())"
                handle = @async handle_stream(stream)
                @debug "wait on @async handle_stream threadid=$(Threads.threadid())"
                wait(handle)
                @debug "after wait threadid=$(Threads.threadid())"
            end
            @debug "root wait threadid=$(Threads.threadid())"
            wait(task)
            @debug "after root wait threadid=$(Threads.threadid())"
        end
    end
end

Configuration could work like this:
a22d8f3

    serveparallel(
        is_prioritized = (req::HTTP.Request) -> req.target == "/health"
    )

ndortega · 2024-09-29T02:15:05Z

Hi @arlowhite,

Thanks for raising this issue, we definitely don't want running oxygen apps to be mislabeled as unhealthy by external monitoring systems.

For cases like this, I think the most succinct way to solve this issue is through a app-level middleware function:

using Oxygen

@get "/" function()
    text("welcome to the api")
end

@get "/block" function ()
    block_time = 10
    @info "blocking for $block_time seconds. threadid=$(Threads.threadid())"
    # using block because sleep() would yield
    block(block_time)
    @info "done blocking threadid=$(Threads.threadid())"
    text("done")
end

function HealthCheck(handler)
    return function(req::HTTP.Request)
       # case 1: return early on health checks
        if req.target == "/health"
            return HTTP.Response(200, "ok")
        # case 2: handle all other incoming requests
        else 
            return handler(req)
        end
    end
end

# set application level middleware
serve(middleware=[HealthCheck])

Middleware functions are run before we pass the HTTP.Request to the internal router for the web server, so they can also act as a short-circuit logic for import routes like health checks.

We could go one step further and provide a helper function to help bootstrap this middleware function

function HealthCheck(route::String)
    return function(handler)
        return function(req::HTTP.Request)
            # case 1: return early on health checks
            if req.target == route
                return HTTP.Response(200, "ok")
            # case 2: handle all other incoming requests
            else 
                return handler(req)
            end
        end
    end
end

serve(middleware=[HealthCheck("/health")])

I believe this should solve this issue, but let me know if there are any important edge cases I haven't addressed.

arlowhite · 2024-09-29T23:31:17Z

@ndortega the problem is the thread is spawned before executing the middleware chain. (from parallel_stream_handler). So if all threads in the :default threadpool are in use, requests cannot be handled until a thread frees.

If you want want something more flexible in case there are other use cases for running some code before the thread spawn, then maybe the thread spawning code could be moved to internal middleware, then things to run in the interactive thread are before ThreadSpawn (middleware), and middleware after is run in the worker thread.

ndortega · 2024-10-01T02:37:00Z

Ah, I see what you mean. Out of curiosity, how many threads are in each threadpool for your application? If your api is primarily deals with longer lived requests it could just be a resource allocation issue. In that case, interactive threads aren't going to do much to help improve performance since they're designed for handling short lived tasks.

If that's not the case, then we'll need to see where we can address this. You bring up a good point, we could move the thread spawning around and let the routing be done exclusively on the main thread. We could also go with some kind of prioritized router or middleware approach.

arlowhite mentioned this issue Sep 27, 2024

Deployment demos? #75

Open

arlowhite changed the title ~~Prioritized handling certain requests (for health checks)~~ Prioritized handling of specific routes (for health checks) Sep 27, 2024

arlowhite mentioned this issue Sep 27, 2024

prioritized route proof of concept #224

Draft

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prioritized handling of specific routes (for health checks) #223

Prioritized handling of specific routes (for health checks) #223

arlowhite commented Sep 27, 2024 •

edited

Loading

arlowhite commented Sep 27, 2024

ndortega commented Sep 29, 2024

arlowhite commented Sep 29, 2024

ndortega commented Oct 1, 2024

Prioritized handling of specific routes (for health checks) #223

Prioritized handling of specific routes (for health checks) #223

Comments

arlowhite commented Sep 27, 2024 • edited Loading

Solution

API ideas

simple solution

Example Server

arlowhite commented Sep 27, 2024

ndortega commented Sep 29, 2024

arlowhite commented Sep 29, 2024

ndortega commented Oct 1, 2024

arlowhite commented Sep 27, 2024 •

edited

Loading