-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cgo Memory Leak #132
Comments
|
Thanks for your reporting. It could be much appreciated if you could provide the model |
Can you give me an email,I send this model to you, I try to find which function but it's hard for me, i am not familiar with c++ |
But I think any model may have memory leaks. |
Please do a fork and add your model file to the release or make an entry in the |
I have quick test your example and putting forward pass inside package main
import (
"fmt"
"github.com/sugarme/gotch"
"github.com/sugarme/gotch/ts"
)
func main() {
TestModel()
}
func TestModel() {
N := 1_000_000_000
m, err := ts.ModuleLoad("test_full_save.pt")
if err != nil {
panic(err)
}
m.SetEval()
for i := 0; i < N; i++ {
// tf := ts.MustRand([]int64{1, 7}, gotch.Float, gotch.CPU)
tf := ts.MustRand([]int64{1024, 7}, gotch.Float, gotch.CPU)
ts.NoGrad(func() {
res, err := m.Forward(tf)
if err != nil {
panic(err)
}
res.MustDrop()
})
tf.MustDrop()
if i%1000 == 0 {
fmt.Printf("Done %d \n", i)
}
}
} Please always handle error as well. Let's me know if that's fine in your box. A note that when putting forward() in a for loop particularly for Go in CPU, we should see some spiky fluctuation of memory consuming. |
@sugarme I use valgrind ,it still find memory leak,I re-wrote your code and found through stress testing that the memory is still growing, but the QPS has not increased. |
@sugarme My service over 5000QPS/Per node,it's easy to reach 1M cycles, It‘s a 20C/32G node |
I would try the following things:
tf := ts.MustRand([]int64{1024, 7}, gotch.Float, gotch.CPU)
for i := 0; i < N; i++ {
ts.NoGrad(func() {
res, err := m.Forward(tf)
if err != nil {
panic(err)
}
res.MustDrop()
})
// tf.MustDrop()
if i%1000 == 0 {
fmt.Printf("Done %d \n", i)
}
} If no leak, then the problem is at tensor initiation
|
@sugarme Sorry, on the way home just now, the service using gotch has been online. Now the cluster will be restarted regularly every day to ensure that there will be no OOM. The service code is not a for loop, it is calculated once per request. I used valgrind to run 100 loops and detected a memory leak of 18B. 1 I did a stress test for 2 days last week. The service memory increased to 95% of the memory and then OOM restarted.
|
I suspect it's related to closed issue #102 func Rand(...) (...) {
var untypedPtr uintptr
ptr := (*lib.Ctensor)(unsafe.Pointer(&untypedPtr))
// Some C call that stores an allocated tensor at *ptr.
retVal = &Tensor{ctensor: *ptr}
return retVal, err
}
Thank you. |
@sugarme I tried, but it does not work, you can use valgrind, it still memory leak,and I did stress test,memory still going up |
@sugarme According to the above changes, there is still a memory leak. Please help me solve it. |
We use gotch for our online services, but we find that the server's RSS, that is, the memory usage indicator, has been rising. I suspect it is a memory leak, and the program using pprof golang is only a few dozen M, but the RSS has been rising. Then I used many methods and finally found out through valgrind that there was indeed a cgo memory leak problem.
here is my test code
There is valgrind find memory leak informations, The following is the command executed.
valgrind --leak-check=full ./model_test
The text was updated successfully, but these errors were encountered: