Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Py_NewInterpreterFromConfig end with hard crash #123488

Closed
aotto1968 opened this issue Aug 30, 2024 · 41 comments
Closed

Py_NewInterpreterFromConfig end with hard crash #123488

aotto1968 opened this issue Aug 30, 2024 · 41 comments
Labels
type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@aotto1968
Copy link

aotto1968 commented Aug 30, 2024

Crash report

What happened?

I started using Py_NewInterpreterFromConfig to add an PER-THREAD interpreter with my library.
The library is thread safe and supports many languages

  • C C++ JAVA C# VB.Net Perl Python GO TCL Ruby Php

including other languages with thread support

  • C C++ JAVA C# TCL GO

are working fine. Even python works fine without thread support.

Task: now I started to use thread support in python.
I had to change my code to to Multi-phase initialization, isolate the python data on an per-thread-level and finally get an easy thread example working -> ok

I test a client server application. the server create a thread in my library and this thread is initialized with

  if (create == MQ_FACTORY_NEW_THREAD) {
//mk_debug_color(MK_YELLOW,"%s","MQ_FACTORY_NEW_THREAD");
    static PyInterpreterConfig config = {
        .use_main_obmalloc = 0,
        .allow_fork = 0,
        .allow_exec = 0,
        .allow_threads = 1,
        .allow_daemon_threads = 0,
        .check_multi_interp_extensions = 1,
        .gil = PyInterpreterConfig_OWN_GIL,
    };

    PyThreadState *tstate = NULL;
    PyStatus status = Py_NewInterpreterFromConfig(&tstate, &config);
    MK_RT_REF.threadData = tstate;

//printV("#2 PyThreadState_Get<%p>", PyThreadState_Get())

    if (PyStatus_Exception(status)) {
      return MkErrorSetC_3_NULL(status.err_msg, status.func, 300+status.exitcode);
    }

    // mark new interpreter as "MQ_STARTUP_IS_THREAD"
    PyObject *mainM = PyImport_AddModule("__main__");
    PyObject *strO = PyUnicode_FromString("MQ_STARTUP_IS_THREAD");
    PyObject_SetAttrString(mainM,"__name__", strO);
    Py_CLEAR(strO);
    // source initial script
    FILE *FH = fopen(MqInitGetArg0()->data[1]->storage.first.C, "r");
    check_LNG(PyRun_SimpleFileEx(FH,MqInitGetArg0()->data[1]->storage.first.C,true)) {
      return MkErrorSetV_1E("script '%s' failed",MqInitGetArg0()->data[1]->storage.first.C);
    }
  }

I figure out that always after 4 successful simple round trip clinet → server → client the server crash with the message below

  • The stack-trace comes from my MqDisasterSignal handler.
exec[#6] -> 'NHI1_EXT/debug/bin/python3' 'NHI1_HOME/example/py/MyServer.py' '--thread' '--uds' '--file' '/tmp/test.uds'
Debug memory block at address p=0x7f0012d1a730: API '^@'
    18302063728033398269 bytes originally requested
    The 7 pad bytes at p-7 are not all FORBIDDENBYTE (0xfd):
        at p-7: 0x00 *** OUCH
        at p-6: 0x00 *** OUCH
        at p-5: 0x00 *** OUCH
        at p-4: 0x00 *** OUCH
        at p-3: 0x00 *** OUCH
        at p-2: 0x00 *** OUCH
        at p-1: 0x00 *** OUCH
    Because memory is corrupted at the start, the count of bytes requested
       may be bogus, and checking the trailing pad bytes may segfault.
    The 8 pad bytes at tail=0xfdfe7cfe10cfa52d are X> {PRINT               :pid(66272):tid(0x7f000f479700):X:dlv(0):ctxId( 0):rc(1):ctx(0x1b87fb8     ):MqDisasterSignal              }: BackTrace {
X> {PRINT               :pid(66272):tid(0x7f000f479700):X:dlv(0):ctxId( 0):rc(1):ctx(0x1b87fb8     ):MqDisasterSignal              }:   [ library              : filename                       : lineno ] function
X> {PRINT               :pid(66272):tid(0x7f000f479700):X:dlv(0):ctxId( 0):rc(1):ctx(0x1b87fb8     ):MqDisasterSignal              }:   [ -------              : --------                       : ------ ] --------
X> {PRINT               :pid(66272):tid(0x7f000f479700):X:dlv(0):ctxId( 0):rc(1):ctx(0x1b87fb8     ):MqDisasterSignal              }:   [ theLink              : c/sys_mq.c                     : 705    ] MqDisasterSignal
X> {PRINT               :pid(66272):tid(0x7f000f479700):X:dlv(0):ctxId( 0):rc(1):ctx(0x1b87fb8     ):MqDisasterSignal              }:   [ unknown              : unknown                        : 0      ] unknown
X> {PRINT               :pid(66272):tid(0x7f000f479700):X:dlv(0):ctxId( 0):rc(1):ctx(0x1b87fb8     ):MqDisasterSignal              }:   [ system               : Objects/obmalloc.c             : 2408   ] _PyObject_DebugDumpAddress
X> {PRINT               :pid(66272):tid(0x7f000f479700):X:dlv(0):ctxId( 0):rc(1):ctx(0x1b87fb8     ):MqDisasterSignal              }:   [ system               : Objects/obmalloc.c             : 2326   ] _PyMem_DebugCheckAddress
X> {PRINT               :pid(66272):tid(0x7f000f479700):X:dlv(0):ctxId( 0):rc(1):ctx(0x1b87fb8     ):MqDisasterSignal              }:   [ system               : Objects/obmalloc.c             : 2159   ] _PyMem_DebugRawFree
X> {PRINT               :pid(66272):tid(0x7f000f479700):X:dlv(0):ctxId( 0):rc(1):ctx(0x1b87fb8     ):MqDisasterSignal              }:   [ system               : Objects/obmalloc.c             : 685    ] PyMem_RawFree
X> {PRINT               :pid(66272):tid(0x7f000f479700):X:dlv(0):ctxId( 0):rc(1):ctx(0x1b87fb8     ):MqDisasterSignal              }:   [ system               : Objects/obmalloc.c             : 1853   ] _PyObject_Free
X> {PRINT               :pid(66272):tid(0x7f000f479700):X:dlv(0):ctxId( 0):rc(1):ctx(0x1b87fb8     ):MqDisasterSignal              }:   [ system               : Objects/obmalloc.c             : 2163   ] _PyMem_DebugRawFree
X> {PRINT               :pid(66272):tid(0x7f000f479700):X:dlv(0):ctxId( 0):rc(1):ctx(0x1b87fb8     ):MqDisasterSignal              }:   [ system               : Objects/obmalloc.c             : 2296   ] _PyMem_DebugFree
X> {PRINT               :pid(66272):tid(0x7f000f479700):X:dlv(0):ctxId( 0):rc(1):ctx(0x1b87fb8     ):MqDisasterSignal              }:   [ system               : Objects/obmalloc.c             : 830    ] PyObject_Free
X> {PRINT               :pid(66272):tid(0x7f000f479700):X:dlv(0):ctxId( 0):rc(1):ctx(0x1b87fb8     ):MqDisasterSignal              }:   [ system               : Objects/dictobject.c           : 1569   ] dictresize
X> {PRINT               :pid(66272):tid(0x7f000f479700):X:dlv(0):ctxId( 0):rc(1):ctx(0x1b87fb8     ):MqDisasterSignal              }:   [ system               : Objects/dictobject.c           : 1194   ] insertion_resize
X> {PRINT               :pid(66272):tid(0x7f000f479700):X:dlv(0):ctxId( 0):rc(1):ctx(0x1b87fb8     ):MqDisasterSignal              }:   [ system               : Objects/dictobject.c           : 1261   ] insertdict
X> {PRINT               :pid(66272):tid(0x7f000f479700):X:dlv(0):ctxId( 0):rc(1):ctx(0x1b87fb8     ):MqDisasterSignal              }:   [ system               : Objects/dictobject.c           : 1865   ] _PyDict_SetItem_Take2
X> {PRINT               :pid(66272):tid(0x7f000f479700):X:dlv(0):ctxId( 0):rc(1):ctx(0x1b87fb8     ):MqDisasterSignal              }:   [ system               : Objects/dictobject.c           : 1883   ] PyDict_SetItem
X> {PRINT               :pid(66272):tid(0x7f000f479700):X:dlv(0):ctxId( 0):rc(1):ctx(0x1b87fb8     ):MqDisasterSignal              }:   [ system               : Objects/typeobject.c           : 7618   ] add_subclass
X> {PRINT               :pid(66272):tid(0x7f000f479700):X:dlv(0):ctxId( 0):rc(1):ctx(0x1b87fb8     ):MqDisasterSignal              }:   [ system               : Objects/typeobject.c           : 7352   ] type_ready_add_subclasses
X> {PRINT               :pid(66272):tid(0x7f000f479700):X:dlv(0):ctxId( 0):rc(1):ctx(0x1b87fb8     ):MqDisasterSignal              }:   [ system               : Objects/typeobject.c           : 7515   ] type_ready
X> {PRINT               :pid(66272):tid(0x7f000f479700):X:dlv(0):ctxId( 0):rc(1):ctx(0x1b87fb8     ):MqDisasterSignal              }:   [ system               : Objects/typeobject.c           : 7553   ] PyType_Ready
X> {PRINT               :pid(66272):tid(0x7f000f479700):X:dlv(0):ctxId( 0):rc(1):ctx(0x1b87fb8     ):MqDisasterSignal              }:   [ system               : Objects/typeobject.c           : 3795   ] type_new_impl
X> {PRINT               :pid(66272):tid(0x7f000f479700):X:dlv(0):ctxId( 0):rc(1):ctx(0x1b87fb8     ):MqDisasterSignal              }:   [ system               : Objects/typeobject.c           : 3929   ] type_new
X> {PRINT               :pid(66272):tid(0x7f000f479700):X:dlv(0):ctxId( 0):rc(1):ctx(0x1b87fb8     ):MqDisasterSignal              }:   [ system               : Objects/typeobject.c           : 1664   ] type_call
X> {PRINT               :pid(66272):tid(0x7f000f479700):X:dlv(0):ctxId( 0):rc(1):ctx(0x1b87fb8     ):MqDisasterSignal              }:   [ system               : Objects/call.c                 : 240    ] _PyObject_MakeTpCall
X> {PRINT               :pid(66272):tid(0x7f000f479700):X:dlv(0):ctxId( 0):rc(1):ctx(0x1b87fb8     ):MqDisasterSignal              }:   [ system               : Objects/typeobject.c           : 3949   ] type_vectorcall
X> {PRINT               :pid(66272):tid(0x7f000f479700):X:dlv(0):ctxId( 0):rc(1):ctx(0x1b87fb8     ):MqDisasterSignal              }:   [ system               : Objects/call.c                 : 133    ] _PyObject_FastCallDictTstate
X> {PRINT               :pid(66272):tid(0x7f000f479700):X:dlv(0):ctxId( 0):rc(1):ctx(0x1b87fb8     ):MqDisasterSignal              }:   [ system               : Objects/call.c                 : 157    ] PyObject_VectorcallDict
X> {PRINT               :pid(66272):tid(0x7f000f479700):X:dlv(0):ctxId( 0):rc(1):ctx(0x1b87fb8     ):MqDisasterSignal              }:   [ system               : Python/bltinmodule.c           : 208    ] builtin___build_class__
X> {PRINT               :pid(66272):tid(0x7f000f479700):X:dlv(0):ctxId( 0):rc(1):ctx(0x1b87fb8     ):MqDisasterSignal              }:   [ system               : Objects/methodobject.c         : 438    ] cfunction_vectorcall_FASTCALL_KEYWORDS
X> {PRINT               :pid(66272):tid(0x7f000f479700):X:dlv(0):ctxId( 0):rc(1):ctx(0x1b87fb8     ):MqDisasterSignal              }:   [ system               : Include/internal/pycore_call.h : 92     ] _PyObject_VectorcallTstate
X> {PRINT               :pid(66272):tid(0x7f000f479700):X:dlv(0):ctxId( 0):rc(1):ctx(0x1b87fb8     ):MqDisasterSignal              }:   [ system               : Objects/call.c                 : 325    ] PyObject_Vectorcall
X> {PRINT               :pid(66272):tid(0x7f000f479700):X:dlv(0):ctxId( 0):rc(1):ctx(0x1b87fb8     ):MqDisasterSignal              }:   [ system               : Python/bytecodes.c             : 2714   ] _PyEval_EvalFrameDefault
X> {PRINT               :pid(66272):tid(0x7f000f479700):X:dlv(0):ctxId( 0):rc(1):ctx(0x1b87fb8     ):MqDisasterSignal              }:   [ system               : nclude/internal/pycore_ceval.h : 89     ] _PyEval_EvalFrame
X> {PRINT               :pid(66272):tid(0x7f000f479700):X:dlv(0):ctxId( 0):rc(1):ctx(0x1b87fb8     ):MqDisasterSignal              }:   [ system               : Python/ceval.c                 : 1683   ] _PyEval_Vector
X> {PRINT               :pid(66272):tid(0x7f000f479700):X:dlv(0):ctxId( 0):rc(1):ctx(0x1b87fb8     ):MqDisasterSignal              }:   [ system               : Python/ceval.c                 : 578    ] PyEval_EvalCode
X> {PRINT               :pid(66272):tid(0x7f000f479700):X:dlv(0):ctxId( 0):rc(1):ctx(0x1b87fb8     ):MqDisasterSignal              }:   [ system               : Python/pythonrun.c             : 1722   ] run_eval_code_obj
X> {PRINT               :pid(66272):tid(0x7f000f479700):X:dlv(0):ctxId( 0):rc(1):ctx(0x1b87fb8     ):MqDisasterSignal              }:   [ system               : Python/pythonrun.c             : 1743   ] run_mod
X> {PRINT               :pid(66272):tid(0x7f000f479700):X:dlv(0):ctxId( 0):rc(1):ctx(0x1b87fb8     ):MqDisasterSignal              }:   [ system               : Python/pythonrun.c             : 1643   ] pyrun_file
X> {PRINT               :pid(66272):tid(0x7f000f479700):X:dlv(0):ctxId( 0):rc(1):ctx(0x1b87fb8     ):MqDisasterSignal              }:   [ system               : Python/pythonrun.c             : 433    ] _PyRun_SimpleFileObject
X> {PRINT               :pid(66272):tid(0x7f000f479700):X:dlv(0):ctxId( 0):rc(1):ctx(0x1b87fb8     ):MqDisasterSignal              }:   [ system               : Python/pythonrun.c             : 466    ] PyRun_SimpleFileExFlags
X> {PRINT               :pid(66272):tid(0x7f000f479700):X:dlv(0):ctxId( 0):rc(1):ctx(0x1b87fb8     ):MqDisasterSignal              }:   [ theLink              : py/MqFactoryC_py.c             : 195    ] py_mqmsgque_sFactoryCTor
X> {PRINT               :pid(66272):tid(0x7f000f479700):X:dlv(0):ctxId( 0):rc(1):ctx(0x1b87fb8     ):MqDisasterSignal              }:   [ theLink              : c/MqFactoryS_mq.c              : 289    ] MqFactoryInvoke_RT
X> {PRINT               :pid(66272):tid(0x7f000f479700):X:dlv(0):ctxId( 0):rc(1):ctx(0x1b87fb8     ):MqDisasterSignal              }:   [ theLink              : c/sys_mq.c                     : 269    ] MqSysServerThreadMain

The server code (file sourced into the new interpreter) is quite simple.

# Add a code block here, if required
import sys
from pymqmsgque import *

# package-item
class MyServer(MqContextC):
  
  # factory startup
  def __init__(self, tmpl=None):
    self.ConfigSetServerSetup(self.ServerSetup)
    super().__init__(tmpl)
    
  # service to serve all incoming requests for token "HLWO"
  def MyFirstService(self):
    self.SendSTART()
    self.SendSTR(self.ReadSTR() + " World")
    self.SendRETURN()
      
  # define a service as link between the token "HLWO" and the callback "MyFirstService"
  def ServerSetup(self):
    self.ServiceCreate("HLWO",self.MyFirstService)
  
# package-main
if __name__ == "__main__":

  # create the "MyServer" factory… and the object
  srv = MqFactoryC.Add(MyServer).New()

  try:
    srv.LinkCreate(sys.argv)
    srv.ProcessEvent(MqWaitOnEventE.FOREVER)
  except Exception as ex:
    srv.ErrorCatch(ex)
  finally:
    srv.Exit()

the problem is that running this example with a server with valgrind nothing happen (no crash)
the same example using fork or using spawn to startup the server instances works fine.
the problem is only with thread support.

additional Information: because my library only support one server-instance per thread, I choose PyInterpreterConfig_OWN_GIL to be clear that I don't want to have interaction between different python interpreters.
all the communication (between processes and threads) is done by my library. my library uses thread-local-storage to isolate the data per-thread.

CPython versions tested on:

3.12

Operating systems tested on:

Linux

Output from running 'python -VV' on the command line:

Python 3.12.4 (main, Aug 29 2024, 18:00:55) [GCC 13.3.0]

@aotto1968 aotto1968 added the type-crash A hard crash of the interpreter, possibly with a core dump label Aug 30, 2024
@vstinner
Copy link
Member

cc @ericsnowcurrently

@ZeroIntensity
Copy link
Member

I wonder if this is related to #123134. However, have you called PyGILState_Ensure before calling Py_NewInterpreterFromConfig? According to the docs, you still need the GIL to create a subinterpreter.

@aotto1968
Copy link
Author

the docs say also…

Note that the PyGILState_* functions assume there is only one global interpreter (created automatically by Py_Initialize()). Python supports the creation of additional interpreters (using Py_NewInterpreter()), but mixing multiple interpreters and the PyGILState_* API is unsupported.

@ZeroIntensity
Copy link
Member

ZeroIntensity commented Aug 30, 2024

Right, mixing them is unsupported. You still need to have Python initialized, and then you can create subinterpreters with the main interpreter's GIL -- once the sub interpreter is initialized, it will have its own GIL.

@aotto1968
Copy link
Author

aotto1968 commented Aug 30, 2024

First success … If I use a "release" build, (non debug python) than valgrind report an error

==22661== Thread 2:
==22661== Invalid free() / delete / delete[] / realloc()
==22661==    at 0x483B18B: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==22661==    by 0x4A7C51A: _PyMem_RawFree (obmalloc.c:73)
==22661==    by 0x4A7C51A: PyMem_RawFree (obmalloc.c:685)
==22661==    by 0x4A7C51A: _PyObject_Free (obmalloc.c:1853)
==22661==    by 0x4A7C51A: PyObject_Free (obmalloc.c:830)
==22661==    by 0x4A729F7: dictresize (dictobject.c:1569)
==22661==    by 0x4A721EA: insertion_resize (dictobject.c:1194)
==22661==    by 0x4A721EA: insertdict (dictobject.c:1261)
==22661==    by 0x4AFC275: add_subclass (typeobject.c:7618)
==22661==    by 0x4AFA491: type_ready_add_subclasses (typeobject.c:7352)
==22661==    by 0x4AFA491: type_ready (typeobject.c:7515)
==22661==    by 0x4AFD1B9: type_new_impl (typeobject.c:3795)
==22661==    by 0x4AFD1B9: type_new (typeobject.c:3929)
==22661==    by 0x4A838BD: type_call (typeobject.c:1664)
==22661==    by 0x4A575BA: _PyObject_MakeTpCall (call.c:240)
==22661==    by 0x4A57882: _PyObject_FastCallDictTstate (call.c:133)
==22661==    by 0x4A57882: PyObject_VectorcallDict (call.c:157)
==22661==    by 0x4B21EE4: builtin___build_class__ (bltinmodule.c:208)
==22661==    by 0x4A783AE: cfunction_vectorcall_FASTCALL_KEYWORDS (methodobject.c:438)
==22661==    by 0x4A57D2E: _PyObject_VectorcallTstate (pycore_call.h:92)
==22661==    by 0x4A57D2E: PyObject_Vectorcall (call.c:325)
==22661==    by 0x49D573B: _PyEval_EvalFrameDefault.cold (bytecodes.c:2714)
==22661==    by 0x4B236B1: PyEval_EvalCode (ceval.c:578)
==22661==    by 0x4B430C6: run_eval_code_obj (pythonrun.c:1722)
==22661==    by 0x4B4303A: run_mod (pythonrun.c:1743)
==22661==    by 0x4B437C1: pyrun_file (pythonrun.c:1643)
==22661==    by 0x4B433AE: _PyRun_SimpleFileObject (pythonrun.c:433)
==22661==    by 0x4A00EE2: PyRun_SimpleFileExFlags (pythonrun.c:466)
==22661==    by 0x58BD664: py_mqmsgque_sFactoryCTor (MqFactoryC_py.c:195)
==22661==    by 0x593D9EF: MqFactoryInvoke_RT (MqFactoryS_mq.c:289)
==22661==    by 0x595CE22: MqSysServerThreadMain (sys_mq.c:269)
==22661==    by 0x595CE22: MqSysServerThreadMain (sys_mq.c:219)
==22661==    by 0x595D6F8: sSysServerThreadInit (sys_mq.c:324)
==22661==    by 0x4ECA6E9: start_thread (in /lib64/libpthread-2.31.so)
==22661==    by 0x515358E: clone (in /lib64/libc-2.31.so)
==22661==  Address 0x601f7d0 is in a rw- anonymous segment

@aotto1968
Copy link
Author

aotto1968 commented Aug 30, 2024

I have an initialized main python interpreter which loads my library and in the library a thread is created etc.. this is the reason why it is workc but only for 4 instances :-(

Task → my current research is to make the type itself thread-local I mean the new type I define in my extension but now the thread startup does not understand the type template (because the original type is from the main-thread and this type has no meaning in the new thread. :-(

> Nhi1Exec -r=uds MyServer.py --thread
exec[#6] -> 'NHI1_EXT/debug/bin/python3' 'NHI1_HOME/example/py/MyServer.py' '--thread' '--uds' '--file' '/tmp/test.uds'
X> {MqFactoryC_py.c     :pid(32901):tid(0x7f975a3e3700):X:dlv(0):ctxId( 0):rc(0):ctx((nil)         ):py_mqmsgque_sFactoryCTor      }: MQ_FACTORY_NEW_THREAD -> thread<0x7f9759b99950>
X> {MkErrorC            :pid(32901):tid(0x7f975a3e3700):X:dlv(0):ctxId( 0):rc(1):ctx(0x2421b10     ):MkErrorPrintln_RT             }:
BACKGROUND ERROR >>>
X> (MkErrorC)       [WrongTypeOfArgError] for argument 'tmpl'...expect type 'pymqmsgque.MqContextC'...but got 'MyServer'
X> (MkErrorC)         | 211   | mk_misc_check_lng.h  | py_mkkernel_CheckClass
X> (MkErrorC)         | 3568  | MqContextC_py.c      | MqContextC MqContextC.new(?tmpl:MqContextC=None?)
X> (MkErrorC)         | 258   | MqFactoryC_py.c      | py_mqmsgque_sFactoryCTor
X> (MkErrorC)         | 327   | MqFactoryS_mq.c      | MqFactoryInvoke_RT
BACKGROUND ERROR <<<

The "MyServer" is the initial server from the main thread

import sys
from pymqmsgque import *
#import gc

#gc.set_debug(gc.DEBUG_LEAK)

# package-item
class MyServer(MqContextC):

  # factory startup
  def __init__(self, tmpl=None):
    self.ConfigSetServerSetup(self.ServerSetup)
    super().__init__(tmpl)

  # service to serve all incoming requests for token "HLWO"
  def MyFirstService(self):
    self.SendSTART()
    self.SendSTR(self.ReadSTR() + " World")
    self.SendRETURN()

  # define a service as link between the token "HLWO" and the callback "MyFirstService"
  def ServerSetup(self):
    self.ServiceCreate("HLWO",self.MyFirstService)

# package-main
if __name__ == "__main__":

  # create the "MyServer" factory… and the object
  srv = MqFactoryC.Add(MyServer).New()

  try:
    srv.LinkCreate(sys.argv)
    srv.ProcessEvent(MqWaitOnEventE.FOREVER)
  except Exception as ex:
    srv.ErrorCatch(ex)
  finally:
    srv.Exit()

@ZeroIntensity
Copy link
Member

Yeah, this looks like it's probably a duplicate of #123134. I haven't reproduced this on Linux yet, but on Windows, there seems to be a thread safety issue inside the allocator when using a per-interpreter GIL. I'll investigate this further later today. In the meantime, try adding some synchronization between your threads to prevent them from calling Py_NewInterpreterFromConfig concurrently.

@aotto1968
Copy link
Author

even if I do a one thread per step my python is failing after the fourth thread was created (and closed) → there is no "concurrency" right now

@ZeroIntensity
Copy link
Member

You'll have to share more of the C code, but that's probably just a side-effect of a prior memory error.

@aotto1968
Copy link
Author

question : do you use a TYPE per THREAD or a TYPE per process , I mean the C-Python-Type definition

I add the macro OT_CLASS_TYPE_REF this is __thread PyTypeObject or just PyTypeObject
→ I test this setup.

OT_CLASS_TYPE_REF NS(MqContextCR) = {
  PyVarObject_HEAD_INIT(NULL, 0)
  .tp_name        = "pymqmsgque.MqContextC",
  .tp_basicsize   = sizeof(MqContextC_Obj),
  .tp_itemsize    = 0,
  .tp_new         = (newfunc)     NS(MqContextC_new),
  .tp_doc         = MqContextC_Doc,
  .tp_methods     = NS(MqContextC_Methods),
  .tp_str         = (reprfunc)    NS(sMqContextC_Str),
  .tp_repr        = (reprfunc)    NS(sMqContextC_Repr),
  .tp_flags       = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE,
};

@aotto1968
Copy link
Author

new test case with parallel access . looks like the "race" thing

exec[#6] -> 'NHI1_EXT/debug/bin/python3' 'NHI1_HOME/example/py/perfserver.py' '--thread' '--uds' '--file' '/tmp/test.uds'
X> {MqFactoryC_py.c     :pid(72897):tid(0x7f85a1251700):X:dlv(0):ctxId( 0):rc(0):ctx((nil)         ):py_mqmsgque_sFactoryCTor      }: MQ_FACTORY_NEW_THREAD -> thread<0x7f85a0a07950>
X> {MqFactoryC_py.c     :pid(72897):tid(0x7f859fc63700):X:dlv(0):ctxId( 0):rc(0):ctx((nil)         ):py_mqmsgque_sFactoryCTor      }: MQ_FACTORY_NEW_THREAD -> thread<0x7f859daf2950>
X> {MqFactoryC_py.c     :pid(72897):tid(0x7f859f462700):X:dlv(0):ctxId( 0):rc(0):ctx((nil)         ):py_mqmsgque_sFactoryCTor      }: MQ_FACTORY_NEW_THREAD -> thread<0x7f859d8d4950>
X> {MqFactoryC_py.c     :pid(72897):tid(0x7f859e385700):X:dlv(0):ctxId( 0):rc(0):ctx((nil)         ):py_mqmsgque_sFactoryCTor      }: MQ_FACTORY_NEW_THREAD -> thread<0x7f859da94950>
X> {MqFactoryC_py.c     :pid(72897):tid(0x7f859ebcf700):X:dlv(0):ctxId( 0):rc(0):ctx((nil)         ):py_mqmsgque_sFactoryCTor      }: MQ_FACTORY_NEW_THREAD -> thread<0x7f859d554950>
X> {MqFactoryC_py.c     :pid(72897):tid(0x7f85a0464700):X:dlv(0):ctxId( 0):rc(0):ctx((nil)         ):py_mqmsgque_sFactoryCTor      }: MQ_FACTORY_NEW_THREAD -> thread<0x7f859d714950>
X> {MqContextC_py.c     :pid(72897):tid(0x7f859fc63700):X:dlv(0):ctxId( 0):rc(0):ctx((nil)         ):py_mqmsgque_sThreadExit       }: thread EXIT, current state<0x7f859daf2950>, rt state<0x7f859daf2950>
X> {MqFactoryC_py.c     :pid(72897):tid(0x7f859c19a700):X:dlv(0):ctxId( 0):rc(0):ctx((nil)         ):py_mqmsgque_sFactoryCTor      }: MQ_FACTORY_NEW_THREAD -> thread<0x180c3b0>
Debug memory block at address p=0x7f85a16431b0: API ''
    18302063728033398269 bytes originally requested
    The 7 pad bytes at p-7 are Debug memory block at address p=0x7f85a16431b0: API ''
not all FORBIDDENBYTE (0xfd):
        at p-7: 0x00    18302063728033398269 bytes originally requested
 *** OUCH    The 7 pad bytes at p-7 are
        at p-6: 0x00not all FORBIDDENBYTE (0xfd):
        at p-7: 0x00 *** OUCH
        at p-5: 0x00 *** OUCH
 *** OUCH
        at p-6: 0x00        at p-4: 0x00 *** OUCH
 *** OUCH
        at p-5: 0x00 *** OUCH
        at p-3: 0x00        at p-4: 0x00 *** OUCH
 *** OUCH        at p-3: 0x00 *** OUCH
        at p-2: 0x00
        at p-2: 0x00 *** OUCH
 *** OUCH
        at p-1: 0x00 *** OUCH
    Because memory is corrupted at the start, the count of bytes requested
       may be bogus, and checking the trailing pad bytes may segfault.
        at p-1: 0x00 *** OUCH
    The 8 pad bytes at tail=0xfdfe7d839f622fad are     Because memory is corrupted at the start, the count of bytes requested
       may be bogus, and checking the trailing pad bytes may segfault.
    The 8 pad bytes at tail=0xfdfe7d839f622fad are X> {PRINT 

@aotto1968
Copy link
Author

aotto1968 commented Aug 31, 2024

OK - now a success news (with smell)

In the docu about the PyInterpreterConfig_OWN_GIL (A Per-Interpreter GIL) there was a paragraph about isolation ...

Using an isolated interpreter requires vigilance in preserving that isolation. That especially means not sharing any objects
or mutable state without guarantees about thread-safety. Even objects that are otherwise immutable (e.g. None, (1, 5)) 
can’t normally be shared because of the refcount. One simple but less-efficient approach around this is to use a global lock
around all use of some state (or object). Alternately, effectively immutable objects (like integers or strings) can be made safe
in spite of their refcounts by making them “immortal”. In fact, this has been done for the builtin singletons, small integers,
and a number of other built in objects.

and my first question was , does the type object have to be isolated as well?

I started a research about thread local type object

switching to thread-local-types the interpreter does not crash anymore BUT ...

  1. this thread-local-type will probably break old python thread code which sill assume one-type-per-interpreter even if multiple threads per interpreter are supported.

analysis

  1. using my simple example from above works fine.
  2. the concurrency problem is still real
    2.1. tests done with my performance test tool → parallel creating of a per-thread-session
    2.2. test up to 3 workers → everything is OK
    2.3. test 4 or more workers → a race crash happen…
  3. the performance of starting a session via thread versa starting a session vie spawn is almost identical !!
    3.1. there is no (performance)-reason to use a thread at all in python.
    3.2. the startup-speed for spawn depends on the speed of the filesystem, a 10 times difference between a mounted file system and a local filesystem is possible. the info from 3.1. was related to an local filesystem.

race crash

Program terminated with signal SIGABRT, Aborted.
#0  0x00007fc0e3999d2b in raise () from /lib64/libc.so.6
[Current thread is 1 (Thread 0x7fc0e0a19700 (LWP 41435))]
#0  0x00007fc0e3999d2b in raise () from /lib64/libc.so.6
#1  0x00007fc0e399b3e5 in abort () from /lib64/libc.so.6
#2  0x00007fc0e39dfc87 in __libc_message () from /lib64/libc.so.6
#3  0x00007fc0e39e7d2a in malloc_printerr () from /lib64/libc.so.6
#4  0x00007fc0e39ebde6 in free_check () from /lib64/libc.so.6
#5  0x00007fc0e3ebe832 in _PyMem_RawFree (_unused_ctx=<optimized out>, ptr=<optimized out>) at Objects/obmalloc.c:73
#6  0x00007fc0e3ec047f in PyMem_RawFree (ptr=ptr@entry=0x25c2bb0) at Objects/obmalloc.c:685
#7  0x00007fc0e3fe3808 in PyThread_free_lock (lock=0x25c2bb0) at Python/thread_pthread.h:418
#8  0x00007fc0e404dee2 in rlock_dealloc (self=0x7fc0df702870) at ./Modules/_threadmodule.c:348
#9  0x00007fc0e3ebb83c in _Py_Dealloc (op=op@entry=0x7fc0df702870) at Objects/object.c:2625
#10 0x00007fc0e3ea538b in Py_DECREF (filename=filename@entry=0x7fc0e4065c24 "./Include/object.h", lineno=lineno@entry=798, op=0x7fc0df702870) at ./Include/object.h:690
#11 0x00007fc0e3ea53aa in Py_XDECREF (op=<optimized out>) at ./Include/object.h:798
#12 0x00007fc0e3eaac58 in _PyObject_FreeInstanceAttributes (self=self@entry=0x7fc0df715910) at Objects/dictobject.c:5571
#13 0x00007fc0e3eda1eb in subtype_dealloc (self=0x7fc0df715910) at Objects/typeobject.c:2020
#14 0x00007fc0e3ebb83c in _Py_Dealloc (op=op@entry=0x7fc0df715910) at Objects/object.c:2625
#15 0x00007fc0e3ea538b in Py_DECREF (filename=filename@entry=0x7fc0e4065c24 "./Include/object.h", lineno=lineno@entry=798, op=0x7fc0df715910) at ./Include/object.h:690
#16 0x00007fc0e3ea53aa in Py_XDECREF (op=<optimized out>) at ./Include/object.h:798
#17 0x00007fc0e3eaac58 in _PyObject_FreeInstanceAttributes (self=self@entry=0x7fc0df7158c0) at Objects/dictobject.c:5571
#18 0x00007fc0e3eda1eb in subtype_dealloc (self=0x7fc0df7158c0) at Objects/typeobject.c:2020
#19 0x00007fc0e3ebb83c in _Py_Dealloc (op=op@entry=0x7fc0df7158c0) at Objects/object.c:2625
#20 0x00007fc0e3e6f549 in Py_DECREF (filename=filename@entry=0x7fc0e4065c24 "./Include/object.h", lineno=lineno@entry=798, op=0x7fc0df7158c0) at ./Include/object.h:690
#21 0x00007fc0e3e6f568 in Py_XDECREF (op=<optimized out>) at ./Include/object.h:798
#22 0x00007fc0e3e6f67d in method_dealloc (im=0x7fc0df7029f0) at Objects/classobject.c:240
#23 0x00007fc0e3ebb83c in _Py_Dealloc (op=op@entry=0x7fc0df7029f0) at Objects/object.c:2625
#24 0x00007fc0e3f5a22b in Py_DECREF (filename=filename@entry=0x7fc0e4073f78 "Python/bytecodes.c", lineno=lineno@entry=2737, op=op@entry=0x7fc0df7029f0) at ./Include/object.h:690
#25 0x00007fc0e3f6d264 in _PyEval_EvalFrameDefault (tstate=0x7fc0dfc67950, frame=0x7fc0df677398, throwflag=0) at Python/bytecodes.c:2737
#26 0x00007fc0e3f73246 in _PyEval_EvalFrame (tstate=tstate@entry=0x7fc0dfc67950, frame=<optimized out>, throwflag=throwflag@entry=0) at ./Include/internal/pycore_ceval.h:89
#27 0x00007fc0e3f7337b in _PyEval_Vector (tstate=0x7fc0dfc67950, func=func@entry=0x7fc0df688f50, locals=locals@entry=0x0, args=args@entry=0x7fc0e0a17cf0, argcount=argcount@entry=2, kwnames=kwnames@entry=0x0) at Python/ceval.c:1683
#28 0x00007fc0e3e6bddf in _PyFunction_Vectorcall (func=0x7fc0df688f50, stack=0x7fc0e0a17cf0, nargsf=<optimized out>, kwnames=0x0) at Objects/call.c:419
#29 0x00007fc0e3e6c14e in _PyObject_VectorcallTstate (tstate=tstate@entry=0x7fc0dfc67950, callable=callable@entry=0x7fc0df688f50, args=args@entry=0x7fc0e0a17cf0, nargsf=nargsf@entry=2, kwnames=kwnames@entry=0x0) at ./Include/internal/pycore_call.h:92
#30 0x00007fc0e3e6d04c in object_vacall (tstate=tstate@entry=0x7fc0dfc67950, base=base@entry=0x0, callable=0x7fc0df688f50, vargs=vargs@entry=0x7fc0e0a17d60) at Objects/call.c:850
#31 0x00007fc0e3e6d16d in PyObject_CallMethodObjArgs (obj=0x0, name=<optimized out>) at Objects/call.c:911
#32 0x00007fc0e3fa4f18 in import_find_and_load (tstate=tstate@entry=0x7fc0dfc67950, abs_name=abs_name@entry=0x7fc0e433f160 <const_str_abc>) at Python/import.c:2779
#33 0x00007fc0e3fa7d37 in PyImport_ImportModuleLevelObject (name=name@entry=0x7fc0e433f160 <const_str_abc>, globals=<optimized out>, locals=locals@entry=0x7fc0df6db650, fromlist=fromlist@entry=0x7fc0e42a2460 <_Py_NoneStruct>, level=level@entry=0) at Python/import.c:2862
#34 0x00007fc0e3f5bf7b in import_name (tstate=tstate@entry=0x7fc0dfc67950, frame=frame@entry=0x7fc0df677320, name=0x7fc0e433f160 <const_str_abc>, fromlist=fromlist@entry=0x7fc0e42a2460 <_Py_NoneStruct>, level=level@entry=0x7fc0e438de88 <_PyRuntime+19272>) at Python/ceval.c:2482
#35 0x00007fc0e3f6a90b in _PyEval_EvalFrameDefault (tstate=0x7fc0dfc67950, frame=<optimized out>, throwflag=0) at Python/bytecodes.c:2143
#36 0x00007fc0e3f73246 in _PyEval_EvalFrame (tstate=tstate@entry=0x7fc0dfc67950, frame=<optimized out>, throwflag=throwflag@entry=0) at ./Include/internal/pycore_ceval.h:89
#37 0x00007fc0e3f7337b in _PyEval_Vector (tstate=tstate@entry=0x7fc0dfc67950, func=func@entry=0x7fc0df705c10, locals=locals@entry=0x7fc0df6db650, args=args@entry=0x0, argcount=argcount@entry=0, kwnames=kwnames@entry=0x0) at Python/ceval.c:1683
#38 0x00007fc0e3f73433 in PyEval_EvalCode (co=co@entry=0x25f7340, globals=globals@entry=0x7fc0df6db650, locals=locals@entry=0x7fc0df6db650) at Python/ceval.c:578
#39 0x00007fc0e3f58686 in builtin_exec_impl (module=module@entry=0x7fc0dfbbf9b0, source=0x25f7340, globals=0x7fc0df6db650, locals=0x7fc0df6db650, closure=0x0) at Python/bltinmodule.c:1096
#40 0x00007fc0e3f58777 in builtin_exec (module=0x7fc0dfbbf9b0, args=<optimized out>, args@entry=0x7fc0df715878, nargs=nargs@entry=2, kwnames=kwnames@entry=0x0) at Python/clinic/bltinmodule.c.h:586
#41 0x00007fc0e3eb72b2 in cfunction_vectorcall_FASTCALL_KEYWORDS (func=0x7fc0dfbc8050, args=0x7fc0df715878, nargsf=<optimized out>, kwnames=0x0) at Objects/methodobject.c:438
#42 0x00007fc0e3e6d9fa in _PyVectorcall_Call (tstate=tstate@entry=0x7fc0dfc67950, func=0x7fc0e3eb7251 <cfunction_vectorcall_FASTCALL_KEYWORDS>, callable=callable@entry=0x7fc0dfbc8050, tuple=tuple@entry=0x7fc0df715860, kwargs=kwargs@entry=0x7fc0df702750) at Objects/call.c:271

@ZeroIntensity
Copy link
Member

After playing around with this, I was unable to reproduce the data race. Though, I did encounter a deadlock when trying to call Py_NewInterpreterFromConfig without the GIL and the proper Py_BEGIN_ALLOW_THREADS section -- are those present in your code? If not, then this is a user error!

@aotto1968
Copy link
Author

aotto1968 commented Sep 1, 2024

Just to be clear what the flow is:

  1. I start python running my server.py
  2. server.py load an external library
  3. external library create new thread from C
  4. new thread start Py_NewInterpreterFromConfig with PyInterpreterConfig_OWN_GIL

If I do…

  if (create == MQ_FACTORY_NEW_THREAD) {
    PyThreadState *threadstate = PyEval_SaveThread();
    PyGILState_STATE gilstate = PyGILState_Ensure();

    static PyInterpreterConfig config = {
        .use_main_obmalloc = 0,
        .allow_fork = 0,
        .allow_exec = 0,
        .allow_threads = 1,
        .allow_daemon_threads = 0,
        .check_multi_interp_extensions = 1,
        .gil = PyInterpreterConfig_OWN_GIL,
    };

    PyThreadState *tstate = NULL;
    PyStatus status = Py_NewInterpreterFromConfig(&tstate, &config);
    MK_RT_REF.threadData = tstate;
...

than I get :

Fatal Python error: PyEval_SaveThread: the function must be called with the GIL held, after Python initialization and before Python finalization, but the GIL is released (the current Python thread state is NULL)
Python runtime state: initialized

I think Py_NewInterpreterFromConfig→PyInterpreterConfig_OWN_GIL create and set threadstate and gil.

If I skip PyEval_SaveThread and PyGILState_Ensure everything works for easy tests BUT In a load test I get after a while

Debug memory block at address p=0x483a870: API ''
    14988823984819142663 bytes originally requested
    The 7 pad bytes at p-7 are not all FORBIDDENBYTE (0xfd):
        at p-7: 0x0d *** OUCH
        at p-6: 0x00 *** OUCH
        at p-5: 0x00 *** OUCH
        at p-4: 0x00 *** OUCH
        at p-3: 0x00 *** OUCH
        at p-2: 0x00 *** OUCH
        at p-1: 0x00 *** OUCH
    Because memory is corrupted at the start, the count of bytes requested
       may be bogus, and checking the trailing pad bytes may segfault.
    The 8 pad bytes at tail=0xd00300000483a877 are 

@ZeroIntensity
Copy link
Member

The error is this line: PyThreadState *threadstate = PyEval_SaveThread()

PyEval_SaveThread fails because there is no thread state yet! You need to move that to after the call to PyGILState_Ensure

@ZeroIntensity
Copy link
Member

(FWIW, the call to PyEval_SaveThread isn't needed here, you would use that to release the GIL, which you don't want to do.)

@aotto1968
Copy link
Author

aotto1968 commented Sep 1, 2024

If I skip the PyEval_SaveThread and just do the PyGILState_Ensure the library is blocking at PyGILState_Ensure

  if (create == MQ_FACTORY_NEW_THREAD) {
M0
    PyGILState_STATE gilstate = PyGILState_Ensure();
    PyThreadState *threadstate = PyEval_SaveThread();

    static PyInterpreterConfig config = {
        .use_main_obmalloc = 0,
        .allow_fork = 0,
        .allow_exec = 0,
        .allow_threads = 1,
        .allow_daemon_threads = 0,
        .check_multi_interp_extensions = 1,
        .gil = PyInterpreterConfig_OWN_GIL,
    };

    PyThreadState *tstate = NULL;
    PyStatus status = Py_NewInterpreterFromConfig(&tstate, &config);
    MK_RT_REF.threadData = tstate;
...

block

In the "Non-Python created threads" docu they say:

Note that the PyGILState_* functions assume there is only **one** global interpreter (created automatically by
Py_Initialize()). Python supports the creation of additional interpreters (using Py_NewInterpreter()), but mixing multiple
interpreters and the PyGILState_* API is unsupported.

@aotto1968
Copy link
Author

Question before I delete a thread I call Py_EndInterpreter with the thread state returned from Py_NewInterpreterFromConfig

→ is this ok?

@ZeroIntensity
Copy link
Member

Yeah, that's what you're supposed to do. I'm guessing we need to document subinterpreters a little better 😅

@aotto1968
Copy link
Author

Why I need to to a PyGILState_Ensure if

  1. the thread is new and a non python thread
  2. the Py_NewInterpreterFromConfig→PyInterpreterConfig_OWN_GIL create its own GIl etc?

@ZeroIntensity
Copy link
Member

You need to call it because it's a non-Python thread. Py_NewInterpreterFromConfig will create its own GIL for the new interpreter, but it needs the GIL of the current interpreter to do that.

@aotto1968
Copy link
Author

aotto1968 commented Sep 1, 2024

and why the PyGILState_Ensure is blocking?
basically the thread is created in a callback → does I need to do a PyGILState_Release befor I create a thread in my library?

from the docu:

Releasing the GIL from extension code
Most extension code manipulating the GIL has the following simple structure:

Save the thread state in a local variable.
Release the global interpreter lock.                 <<<<<<<<<<< !! release !!
... Do some blocking I/O operation ...
Reacquire the global interpreter lock.
Restore the thread state from the local variable.

@ZeroIntensity
Copy link
Member

You need to release the GIL in the main thread, in a Py_BEGIN_ALLOW_THREADS block.

@aotto1968
Copy link
Author

aotto1968 commented Sep 1, 2024

so I changed my thread creating function

static typeof(MqLal.SysServerThread) save_SysServerThread = NULL;
static int num = 0;

static enum MkErrorE NS(PyServerThread) (
  MK_RT_ARGS
  MQ_CTX const context,
  struct MqSysServerThreadMainS * const argP,
  MK_STRN  name,    
  int thread_status,    
  struct MkIdS * idP  
) {
  enum MkErrorE ret;
printI(num);
  PyGILState_STATE gilstate = PyGILState_Ensure();
  Py_BEGIN_ALLOW_THREADS
  ret = (*save_SysServerThread)(MK_RT_CALL context, argP, name, thread_status, idP);
  Py_END_ALLOW_THREADS
  PyGILState_Release(gilstate);
  return ret;
}

...

OT_CLASS_TYPE_REF NS(MqContextCR) = {
  PyVarObject_HEAD_INIT(NULL, 0)
  .tp_name        = "pymqmsgque.MqContextC",
  .tp_basicsize   = sizeof(MqContextC_Obj),
  .tp_itemsize    = 0,
  .tp_new         = (newfunc)     NS(MqContextC_new),
  .tp_doc         = MqContextC_Doc,
  .tp_methods     = NS(MqContextC_Methods),
  .tp_str         = (reprfunc)    NS(sMqContextC_Str),
  .tp_repr        = (reprfunc)    NS(sMqContextC_Repr),
  .tp_flags       = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE,
  .tp_base        = NULL
};

int
NS(pMqContextC_Init)(MQ_RT_ARGS OT_LNG_T m) {

  // define type
  MqContextCTT = MkTypeDup2(MqContextC_TT,"pymqmsgque_MqContextC");
  MqContextCT->fIncrSelf = NS(sIncrSelf);
  MqContextCT->fDecrSelf = NS(sDecrSelf);
//  >> replaced by os_fork
//  MqContextCT->fParentBeforeFork = NS(sParentBeforeFork);
//  MqContextCT->fParentAfterFork = NS(sParentAfterFork);
//  MqContextCT->fChildAfterFork = NS(sChildAfterFork);
  MqContextCT->fProcessExit = NS(sProcessExit);
  MqContextCT->fThreadExit = NS(sThreadExit);
  //MqContextCT->ignoreThread = true;

  ClassInit

  save_SysServerThread = MqLal.SysServerThread;
  MqLal.SysServerThread = NS(PyServerThread);

  return 0;
error:
  return -1;
}

and the result is still blocking :

> Nhi1Exec -r=uds perfserver.py --thread --debug 3
exec[#8] -> 'NHI1_EXT/debug/bin/python3' 'NHI1_HOME/example/py/perfserver.py' '--thread' '--debug' '3' '--uds' '--file' '/tmp/test.uds'
S> {perfserver          :pid(55814):tid(0x7fb72957fb80):L:dlv(2):ctxId( 0):rc(2):ctx(0x204a980     ):pLinkPrepare                  }: PREPARE: factory<perfserver>, io->com<UDS>
S> {perfserver          :pid(55814):tid(0x7fb72957fb80):L:dlv(2):ctxId( 0):rc(2):ctx(0x204a980     ):MqLinkCreate_RT               }: START: io->com<UDS>
S> {perfserver          :pid(55814):tid(0x7fb72957fb80):L:dlv(2):ctxId( 0):rc(2):ctx(0x204a980     ):UdsServer                     }: create UNIX uds socket file</tmp/test.uds>
S> {perfserver          :pid(55814):tid(0x7fb72957fb80):L:dlv(2):ctxId( 0):rc(2):ctx(0x204a980     ):GenericServer                 }: server accept incoming calls
S> {perfserver          :pid(55814):tid(0x7fb72957fb80):L:dlv(2):ctxId( 0):rc(2):ctx(0x204a980     ):GenericServer                 }: server get incoming call
S> {perfserver          :pid(55814):tid(0x7fb72957fb80):L:dlv(2):ctxId( 0):rc(2):ctx(0x204a980     ):pIoStartServer                }: MQ_START_SERVER_AS_THREAD
S> {perfserver          :pid(55814):tid(0x7fb72957fb80):L:dlv(0):ctxId( 0):rc(2):ctx(0x204a980     ):pIoStartServer                }: FACTORY[perfserver]
...
X> {MqContextC_py.c     :pid(55814):tid(0x7fb72957fb80):X:dlv(0):ctxId( 0):rc(0):ctx((nil)         ):py_mqmsgque_PyServerThread    }: num<1>
S> {perfserver          :pid(55814):tid(0x7fb72957fb80):L:dlv(2):ctxId( 0):rc(2):ctx(0x204a980     ):GenericServer                 }: server accept incoming calls
X> {MqFactoryC_py.c     :pid(55814):tid(0x7fb728ccd700):X:dlv(0):ctxId( 0):rc(0):ctx((nil)         ):py_mqmsgque_sFactoryCTor      }: 11111111111111111 [k/py/MqFactoryC_py.c:171]

@ZeroIntensity
Copy link
Member

Yeah, you're still calling PyEval_SaveThread without the GIL. Did you move it to after the PyGILState_Ensure call?

@aotto1968
Copy link
Author

already changed.. I update the code.

@ZeroIntensity
Copy link
Member

Oh wait, I see the problem. Py_BEGIN_ALLOW_THREADS makes a call to PyEval_SaveThread. You need to call it from the main thread, not a created one. I suggest taking a look at the docs.

@aotto1968
Copy link
Author

aotto1968 commented Sep 1, 2024

I call the Py_BEGIN_ALLOW_THREADS on the main thread because there are no other threads before the PyServerThread → just look to the tid column and the py_mqmsgque_PyServerThread

the py_mqmsgque_sFactoryCTor is still blocking…

  if (create == MQ_FACTORY_NEW_THREAD) {
M1
     PyGILState_STATE gilstate = PyGILState_Ensure();
M2
> Nhi1Exec -r=uds perfserver.py --thread --debug 3
exec[#8] -> 'NHI1_EXT/debug/bin/python3' 'NHI1_HOME/example/py/perfserver.py' '--thread' '--debug' '3' '--uds' '--file' '/tmp/test.uds'             
S> {perfserver          :pid(55814):tid(0x7fb72957fb80):L:dlv(2):ctxId( 0):rc(2):ctx(0x204a980     ):pLinkPrepare                  }: PREPARE: factory<perfserver>, io->com<UDS>
S> {perfserver          :pid(55814):tid(0x7fb72957fb80):L:dlv(2):ctxId( 0):rc(2):ctx(0x204a980     ):MqLinkCreate_RT               }: START: io->com<UDS>
S> {perfserver          :pid(55814):tid(0x7fb72957fb80):L:dlv(2):ctxId( 0):rc(2):ctx(0x204a980     ):UdsServer                     }: create UNIX uds socket file</tmp/test.uds>
S> {perfserver          :pid(55814):tid(0x7fb72957fb80):L:dlv(2):ctxId( 0):rc(2):ctx(0x204a980     ):GenericServer                 }: server accept incoming calls 
S> {perfserver          :pid(55814):tid(0x7fb72957fb80):L:dlv(2):ctxId( 0):rc(2):ctx(0x204a980     ):GenericServer                 }: server get incoming call
S> {perfserver          :pid(55814):tid(0x7fb72957fb80):L:dlv(2):ctxId( 0):rc(2):ctx(0x204a980     ):pIoStartServer                }: MQ_START_SERVER_AS_THREAD
S> {perfserver          :pid(55814):tid(0x7fb72957fb80):L:dlv(0):ctxId( 0):rc(2):ctx(0x204a980     ):pIoStartServer                }: FACTORY[perfserver]
...
X> {MqContextC_py.c     :pid(55814):tid(0x7fb72957fb80):X:dlv(0):ctxId( 0):rc(0):ctx((nil)         ):py_mqmsgque_PyServerThread    }: num<1>
S> {perfserver          :pid(55814):tid(0x7fb72957fb80):L:dlv(2):ctxId( 0):rc(2):ctx(0x204a980     ):GenericServer                 }: server accept incoming calls
X> {MqFactoryC_py.c     :pid(55814):tid(0x7fb728ccd700):X:dlv(0):ctxId( 0):rc(0):ctx((nil)         ):py_mqmsgque_sFactoryCTor      }: 11111111111111111 [k/py/MqFactoryC_py.c:171]

@ZeroIntensity
Copy link
Member

It's difficult for me to follow the code due to the use of macros, but if it's truly the main thread, then I think the problem is the PyGILState_* calls -- they aren't necessary to call Py_BEGIN_ALLOW_THREADS. Releasing the GIL state in this case might prevent any future API calls (such as Py_Finalize). Though, I could be wrong on that, the GIL state APIs might have a mechanism to deal with that.

Regardless, I don't think this is a CPython issue :(

@aotto1968
Copy link
Author

aotto1968 commented Sep 1, 2024

Found something special under "load" !! → Q: how to check if "ts" is still valid?

The main problem is that every thread is a server that can terminate itself. So if a server terminates between PyThreadState_Swap#1 and PyThreadState_Swap#2 and that server "happens" to be returned as ts by the first PyThreadState_Swap#1, then it just CRASHes.

  1. this is probably the reason why it works without load and with load it crashes because a PyThreadState disappears.
  2. just to be clear that python has no control when a thread is deleted
  3. the global variable PyThreadState saved as local variable ts can not follow an external thread delete.
MK_UNUSED static void NS(sThreadExit) (MK_RT_ARGS int num) {
mk_debug_color(MK_RED,"%s, current state<%p>, rt state<%p>", "thread EXIT", PyThreadState_Get(), MK_RT_REF.threadData);

  PyThreadState* ts = PyThreadState_Swap(MK_RT_REF.threadData);

printV("ts start: %p → %p", ts, MK_RT_REF.threadData);
  Py_EndInterpreter(MK_RT_REF.threadData);
printV("ts end  : %p → %p", MK_RT_REF.threadData, ts);

  // CRASH because of "ts" disappeared between "PyThreadState_Swap#1" and "PyThreadState_Swap#2"
  // Q: how to check if "ts" is still valid?
  PyThreadState_Swap(ts);

  MK_RT_REF.threadData = NULL;
}

If I switch to Py_BEGIN_ALLOW_THREADS instead of PyThreadState_Swap the Py_EndInterpreter(interpreter_ts)
fails with Fatal Python error: Py_EndInterpreter: thread is not current

update

even if I disable the Py_EndInterpreter and the pthread_exit the python crash with
Debug memory block at address ... FORBIDDENBYTE

@ZeroIntensity
Copy link
Member

AFAIK, Py_EndInterpreter will deallocate the thread state, so trying to use ts past that point is invalid. Why are you calling PyThreadState_Swap instead of just using the actual thread state for the interpreter?

@aotto1968
Copy link
Author

aotto1968 commented Sep 1, 2024

the problem is that this is just an example → this lost ts could happen everywhere in the code because of the PyThreadState_Swap api design.

> grep -r --include=*.[ch] PyThreadState_Swap
Include/internal/pycore_ceval.h:extern PyThreadState * _PyThreadState_SwapNoGIL(PyThreadState *);
Include/internal/pycore_pystate.h:PyAPI_FUNC(PyThreadState *) _PyThreadState_Swap(
Include/pystate.h:PyAPI_FUNC(PyThreadState *) PyThreadState_Swap(PyThreadState *);
Modules/_xxsubinterpretersmodule.c:        save_tstate = PyThreadState_Swap(tstate);
Modules/_xxsubinterpretersmodule.c:        PyThreadState_Swap(save_tstate);
Modules/_xxsubinterpretersmodule.c:    PyThreadState_Swap(save_tstate);
Modules/_xxsubinterpretersmodule.c:        save_tstate = PyThreadState_Swap(tstate);
Modules/_xxsubinterpretersmodule.c:        PyThreadState_Swap(save_tstate);
Modules/_xxsubinterpretersmodule.c:    PyThreadState *save_tstate = PyThreadState_Swap(tstate);
Modules/_xxsubinterpretersmodule.c:    PyThreadState_Swap(save_tstate);
Modules/_testcapimodule.c:    PyThreadState_Swap(NULL);
Modules/_testcapimodule.c:    PyThreadState_Swap(orig_tstate);
Modules/_testcapimodule.c:    PyThreadState_Swap(NULL);
Modules/_testcapimodule.c:        PyThreadState_Swap(mainstate);
Modules/_testcapimodule.c:    PyThreadState_Swap(mainstate);
Modules/_testcapimodule.c:    PyThreadState_Swap(NULL);
Modules/_testcapimodule.c:        PyThreadState_Swap(mainstate);
Modules/_testcapimodule.c:    PyThreadState_Swap(mainstate);
Modules/_testcapimodule.c:    PyThreadState *oldts = PyThreadState_Swap(NULL);
Modules/_testcapimodule.c:    PyThreadState_Swap(oldts);
PC/python3dll.c:EXPORT_FUNC(PyThreadState_Swap)
Programs/_testembed.c:        PyThreadState_Swap(NULL);
Programs/_testembed.c:        PyThreadState_Swap(mainstate);
Python/getargs.c:            save_tstate = PyThreadState_Swap(temp_tstate);
Python/getargs.c:            (void)PyThreadState_Swap(save_tstate);
Python/pylifecycle.c:    (void) _PyThreadState_SwapNoGIL(tstate);
Python/pylifecycle.c:    PyThreadState *save_tstate = _PyThreadState_SwapNoGIL(tstate);
Python/pylifecycle.c:        PyThreadState_Swap(save_tstate);
Python/pylifecycle.c:        _PyThreadState_SwapNoGIL(save_tstate);
Python/pystate.c:   The stored thread state is set by PyThreadState_Swap().
Python/pystate.c:    PyThreadState *tstate = _PyThreadState_Swap(runtime, NULL);
Python/pystate.c:    _PyThreadState_Swap(runtime, tstate);
Python/pystate.c:        PyThreadState *save_tstate = _PyThreadState_Swap(runtime, tstate);
Python/pystate.c:        _PyThreadState_Swap(runtime, save_tstate);
Python/pystate.c:_PyThreadState_SwapNoGIL(PyThreadState *newts)
Python/pystate.c:_PyThreadState_Swap(_PyRuntimeState *runtime, PyThreadState *newts)
Python/pystate.c:PyThreadState_Swap(PyThreadState *newts)
Python/pystate.c:    return _PyThreadState_Swap(&_PyRuntime, newts);
Python/ceval_gil.c:           under our feet using PyThreadState_Swap(). Fix the GIL last
Python/ceval_gil.c:    if (_PyThreadState_SwapNoGIL(tstate) != NULL) {
Python/ceval_gil.c:    PyThreadState *new_tstate = _PyThreadState_SwapNoGIL(NULL);
Python/ceval_gil.c:    PyThreadState *tstate = _PyThreadState_SwapNoGIL(NULL);
Python/ceval_gil.c:    _PyThreadState_SwapNoGIL(tstate);
Python/ceval_gil.c:        if (_PyThreadState_SwapNoGIL(NULL) != tstate) {
Python/ceval_gil.c:        if (_PyThreadState_SwapNoGIL(tstate) != NULL) {

@ZeroIntensity
Copy link
Member

Ok, but it's not obvious what the clear "fix" would be here. PyThreadState_Swap invalidates the old thread state (which is why your example fails), we can't change that.

@aotto1968
Copy link
Author

I just testing around the ts is the thread-state before the thread-exit is called →it is NOt the thread stata I want delete bot could be.

@ZeroIntensity
Copy link
Member

ZeroIntensity commented Sep 1, 2024

In the above example, you're using PyThreadState_Swap incorrectly.

  // I'm not totally sure why you're swapping here, you'll have to clarify.
  // If MK_RT_REF.threadData *is* the current thread state, as I suspect, then
  // the call to Py_EndInterpreter will also fail, since it will invalidate it.
  PyThreadState* ts = PyThreadState_Swap(MK_RT_REF.threadData);
  // At best, adding a check to turn PyThreadState_Swap into a no-op
  // if the passed thread state matches the current thread state
  // could be viable, but I'm not sure how useful it would be. 
 

printV("ts start: %p → %p", ts, MK_RT_REF.threadData);
  // MK_RT_REF.threadData is invalidated, as well as ts (since they're the same pointer)
  // You can't dereference either of them past this point.
  Py_EndInterpreter(MK_RT_REF.threadData);
printV("ts end  : %p → %p", MK_RT_REF.threadData, ts);

  // This call does nothing -- ts would already be the thread state.
  // The segfault comes from ts being freed by Py_EndInterpreter.
  PyThreadState_Swap(ts);

Again, I don't think there's anything here that's a CPython issue :(

I'll submit a PR for some extra documentation on subinterpreters, but I suggest opening a help thread and closing this issue.

@aotto1968
Copy link
Author

I already delete this test code because it was just a test → the original problem of crash somewhere is still an issue.

@ZeroIntensity
Copy link
Member

I can't help with that unless I see the code, but let's move this to discourse. If we find that this is indeed a CPython issue, we can reopen (or better yet, make a new one with a more solid reproducer).

@aotto1968
Copy link
Author

aotto1968 commented Sep 1, 2024

Again, I don't think there's anything here that's a CPython issue :(

well I already use my library as an extension for many languages and obvious able I'm not smart enough to use the python thread api. → I'll think this is an python problem :-)

After trying for the second time in my limited lifetime to do something with thread on python and being miserably wiser twice, I must now admit that python is apparently incapable of providing any usable thread-api.

→ the smartest thing to do with thread on python… activate the ignoreThread switch

int
NS(pMqContextC_Init)(MQ_RT_ARGS OT_LNG_T m) {

  // define type
  MqContextCTT = MkTypeDup2(MqContextC_TT,"pymqmsgque_MqContextC");
  MqContextCT->fIncrSelf = NS(sIncrSelf);
  MqContextCT->fDecrSelf = NS(sDecrSelf);
//  >> replaced by os_fork
//  MqContextCT->fParentBeforeFork = NS(sParentBeforeFork);
//  MqContextCT->fParentAfterFork = NS(sParentAfterFork);
//  MqContextCT->fChildAfterFork = NS(sChildAfterFork);
  MqContextCT->fProcessExit = NS(sProcessExit);
  //MqContextCT->fThreadExit = NS(sThreadExit);
  //MqContextCT->ignoreDisasterSetup = true;
  MqContextCT->ignoreThread = true;

  ClassInit

  return 0;
error:
  return -1;
}

@ZeroIntensity
Copy link
Member

Not disagreeing that multithreading from the C API is difficult! We need to document it better. Though, if you want to get this working, I can help you on discourse, instead of polluting the issues page with relentless back-and-forth debugging.

@ZeroIntensity
Copy link
Member

@encukou, this should be closed as not planned. I'll create a new issue sometime tomorrow to add some better documentation on using PEP 684.

@encukou encukou closed this as not planned Won't fix, can't repro, duplicate, stale Sep 4, 2024
@ZeroIntensity
Copy link
Member

I've created #123672 to address the documentation problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-crash A hard crash of the interpreter, possibly with a core dump
Projects
None yet
Development

No branches or pull requests

4 participants