Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Definite memory leak when calling dynamic function #1341

Open
ccleve opened this issue Oct 17, 2023 · 7 comments
Open

Definite memory leak when calling dynamic function #1341

ccleve opened this issue Oct 17, 2023 · 7 comments
Assignees

Comments

@ccleve
Copy link
Contributor

ccleve commented Oct 17, 2023

In an attempt to diagnose #1330 I installed the new 0.11.0 and tried out the new dynamic function calls. I got a leak. Perhaps a fix for this one will also fix my issue.

Run this and watch memory usage climb to multiple gb:

use pgrx::fn_call::fn_call;
use pgrx::fn_call::Arg;
use pgrx::pg_extern;

#[pg_extern]
fn tester() {
    let bar = "bar";
    for _ in 0..10000000 {
        let result = fn_call::<String>("myfunc", &[&Arg::Value(bar)]);
    }
}

#[pg_extern]
fn myfunc(bar: &str) -> String {
    let mut out = "foo".to_owned();
    out.push_str(bar);
    out
}

Just run select tester(); at the command line, and if you're on a Mac open up Activity Monitor / Memory tab and watch the memory usage grow for the postgres process.

@eeeebbbbrrrr
Copy link
Contributor

Thanks for the report. We’ll investigate

@eeeebbbbrrrr
Copy link
Contributor

I rewrote this to be even more aggressive:

#[pg_extern]
fn tester() {
    let bar = "bar";
    for _ in 0..10000000 {
        check_for_interrupts!();

        let s = fn_call::<String>("myfunc", &[&Arg::Value(bar)]);
        assert!(matches!(s, Ok(Some(_))))
    }
}

#[pg_extern]
fn myfunc(bar: &str) -> String {
    static CAPACITY: usize = 1024 * 1024 * 500; // 500 meg
    let mut s = String::with_capacity(CAPACITY);
    while s.len() < CAPACITY {
        s.push_str(bar)
    }
    s
}

What's happening here is that when myfunc is called and returns that 500 meg String, it's converted into a postgres-allocated (via palloc) text datum. Then fn_call takes that and converts it back into an owned String. We're going through Postgres' internal function calling interfaces here, and things are passed around as datums.

Anyways, that, what is now an intermediate palloc'd text datum, gets lost. pgrx can't know to free it when it returns from fn_call because it doesn't really know where it came from.

As such, it's the user's responsibility to manage that memory, in an unsafe manner using (at least) a transient memory context, like so:

#[pg_extern]
fn tester() {
    let bar = "bar";
    for _ in 0..10000000 {
        check_for_interrupts!();

        unsafe {
            PgMemoryContexts::switch_to(
                &mut PgMemoryContexts::Transient {
                    parent: PgMemoryContexts::CurrentMemoryContext.value(),
                    name: "per-call",
                    min_context_size: 4096,
                    initial_block_size: 4096,
                    max_block_size: 4096,
                },
                |_| {
                    let s = fn_call::<String>("myfunc", &[&Arg::Value(bar)]);
                    assert!(matches!(s, Ok(Some(_))));
                },
            )
        }
    }
}

This pattern is common in the Postgres sources, and is the reason something like the below doesn't leak -- there's a temporary memory context that manages the allocations within the FOR loop's scope.

create function make_big_string(s text) returns text language sql AS $$
    SELECT repeat(s, (1024 * 1024 * 500) / length(s))
$$;

DO LANGUAGE plpgsql $$
BEGIN

    FOR i in 0..10000000 LOOP
        PERFORM make_big_string('bar');
    END LOOP;

    RETURN;
END;
$$;

I don't know what pgrx or fn_call could do here, other than actually execute the target function in a temporary context, and then copy that Datum to the outer context. I'm not sure if that's a good idea and I'm not sure how this kinda pattern might interact with the work @thomcc and @workingjubilee are doing, re #1342.

@workingjubilee
Copy link
Member

workingjubilee commented Oct 18, 2023

This seems like basically the same problem as with SPI: we want to operate in a temporary context, but then return the result into a longer-lived context. This could be safely done, possibly by accepting arguments to fn_call for relevant memory contexts.

@eeeebbbbrrrr
Copy link
Contributor

eeeebbbbrrrr commented Oct 19, 2023

If fn_call() takes a memory context, all that means is that it's the memory context in which the Datum should be copied -- it wouldn't be the memory context under which the desired function is executed. We'd still need to wrap that function call in a temporary context and copy out pass-by-ref return Datums into the context provided to fn_call.

And really, I don't see how any of our existing "owned datum" approaches can survive this. Like, creating a Rust String means we've lost the source Datum. From over here it seems like we will need to instead create "zero-copy" wrappers for all pass-by-ref types, and they'd maintain the backing Datum. And their construction mechanism will definitely need a memory context reference so we can tell the compiler the lifetime to which they've borrowed memory.

Then it makes sense that fn_call() would need a memory context argument. How the user gets that I suppose is more for #1342 than it is here...

@eeeebbbbrrrr
Copy link
Contributor

eeeebbbbrrrr commented Oct 19, 2023

Thinking on this issue a bit more, I can have it create a temporary context to execute the function in, and use our FromDatum::from_datum_in_context() function to make sure its return value is allocated in the caller's memory context. Like Spi, that would still allow leaks of &str and &[u8] as they'd be assumed 'static, but that's a different problem.

Thoughts?

@ccleve
Copy link
Contributor Author

ccleve commented Oct 19, 2023

Some thoughts:

  1. I'm unlikely to use fn_call() very much except for one-offs, calls that get made once or a small number of times during a transaction. In that case, memory usage doesn't matter very much if the memory gets cleared at the end of the transaction.

  2. If I call a dynamic function frequently, in a tight loop, then memory matters a lot. My current use for it is to call an op class support function once per column per row when creating a custom index. My code for doing this is in Possible memory leak #1330. If I'm calling the function a lot, then I won't want to use fn_call() because of the overhead of doing the function lookup every time. Instead, we'll want to cache some kind of handle to the function. Perhaps we could have a function wrapper that has a convenient method for calling it and a local memory context for temporary use during the call.

  3. Long term, we really should think about unifying Postgres and Rust memory. Rust allows you to implement custom memory allocators. It would be cool if Rust allocations on the heap actually went into a pg memory context, so I could create something like a Vec knowing that if it grew, it would grow in the right context. I don't know how this would work, exactly, but there is some discussion of it in Design sketch for potential new {alloc,memcxt,drop,borrow} abstractions #891.

@eeeebbbbrrrr
Copy link
Contributor

eeeebbbbrrrr commented Oct 19, 2023

3 is actively being worked on from different angles. It's an incredibly hard problem, as you can imagine, and a lot of the work is exploratory, trying to mind-meld with Postgres.

Regarding 2, we still need to see the exact code you're running for us to help with #1330. If pgrx has another leak somewhere, I'd like to know.

FWIW, when implementing this "fn_call" feature, I did consider a "prepared" version that only needed to do the function lookup and such once, but decided to hold off on that until users thought such a thing would be helpful.

So for 1, yes, I think the general use case for "fn_call" is exactly that -- just call some other function once. FWIW, if "fn_call" is used via a #[pg_extern], the leak isn't as long as the transaction, it's only as long as the #[pg_extern] call (at least that should be the case with how Postgres sets up function call memory contexts).

Knowing this, I'm not super motivated to make more changes here. There's lots of ways to leak memory in Rust and in Postgres.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants