Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-116738: Make _csv module thread-safe #118344

Merged
merged 17 commits into from
Oct 11, 2024
Merged

Conversation

aisk
Copy link
Contributor

@aisk aisk commented Apr 27, 2024

This extension module has some module state values which may be read or written in different threads:

  • field_limit: A simple integer value to store the maximum field size for the parser;
  • dialects: A dict value to store all the dialects by string names.

field_limit

There is a public function field_size_limit that can read and write it, using a critical section to protect it. For other internal usages, _Py_atomic_load_long and _Py_atomic_store_long are used to make it thread safe.

dialects

There are three public functions get_dialect, unregister_dialect and list_dialects that can access this field. For the first two functions, a critical section is used to protect it.

And the list_dialects function just returns the dict's keys with PyDict_Keys. I think its result is readonly and PyDict_Keys itself is thread safe, so we don't need to do anything.
Other internal usages of dialects involve accessing its value with get_dialect_from_registry, which simply calls PyDict_GetItemRef and returns the result. I think we don't need to change it too.

Update: I think these APIs is just call dict APIs which is already thread-safe, and there are no other state changes inside these APIs, and no intermediate variables need to protect in these functions, so we don't need add locks on them.

Modules/_csv.c Outdated Show resolved Hide resolved
Modules/_csv.c Outdated Show resolved Hide resolved
Modules/_csv.c Outdated Show resolved Hide resolved
Modules/_csv.c Outdated Show resolved Hide resolved
@aisk aisk requested a review from colesbury May 17, 2024 09:46
Include/cpython/pyatomic_msc.h Outdated Show resolved Hide resolved
Modules/_csv.c Outdated Show resolved Hide resolved
Copy link
Member

@corona10 corona10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Sorry for the late.

@colesbury Do you have any comments about this?

Copy link
Contributor

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the long delay in reviewing this. Two requests:

  • Let's change field_limit to a Py_ssize_t. That seems more appropriate here and avoids the need for adding new atomic bindings. I'd prefer to avoid adding additional atomic bindings because its more code to maintain and its easy to add subtle bugs in the atomic bindings. Additionally, field_limit is compared to field_len, which is already a Py_ssize_t.
  • Let's use relaxed atomics for reading and writing field_limit, e.g., FT_ATOMIC_STORE_SSIZE_RELAXED. Relaxed ordering is fine for these sorts of global settings.

For the data type change:

  • %ld -> %zd
  • PyLong_AsLong -> PyLong_AsSsize_t

@aisk
Copy link
Contributor Author

aisk commented Jul 10, 2024

Sorry for the long delay in reviewing this. Two requests:

* Let's change `field_limit` to a `Py_ssize_t`. That seems more appropriate here and avoids the need for adding new atomic bindings. I'd prefer to avoid adding additional atomic bindings because its more code to maintain and its easy to add subtle bugs in the atomic bindings. Additionally, `field_limit` is compared to `field_len`, which is already a `Py_ssize_t`.

* Let's use relaxed atomics for reading and writing `field_limit`, e.g., `FT_ATOMIC_STORE_SSIZE_RELAXED`. Relaxed ordering is fine for these sorts of global settings.

For the data type change:

* `%ld` -> `%zd`

* `PyLong_AsLong` -> `PyLong_AsSsize_t`

Thanks for the review, updated!

Modules/_csv.c Outdated Show resolved Hide resolved
@kumaraditya303 kumaraditya303 self-assigned this Oct 11, 2024
@kumaraditya303 kumaraditya303 added the needs backport to 3.13 bugs and security fixes label Oct 11, 2024
@kumaraditya303 kumaraditya303 merged commit a00221e into python:main Oct 11, 2024
35 checks passed
@miss-islington-app
Copy link

Thanks @aisk for the PR, and @kumaraditya303 for merging it 🌮🎉.. I'm working now to backport this PR to: 3.13.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Oct 11, 2024
(cherry picked from commit a00221e)

Co-authored-by: AN Long <aisk@users.noreply.github.com>
@bedevere-app
Copy link

bedevere-app bot commented Oct 11, 2024

GH-125328 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Oct 11, 2024
kumaraditya303 pushed a commit that referenced this pull request Oct 11, 2024
gh-116738: Make `_csv` module thread-safe (GH-118344)
(cherry picked from commit a00221e)

Co-authored-by: AN Long <aisk@users.noreply.github.com>
@aisk aisk deleted the csv-thread-safe branch October 12, 2024 05:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants