Vault CSI support v2 #828

pnovotnak · 2024-02-27T05:38:40Z

💸 TL;DR

This PR fixes Vault CSI support by reimplementing the CSI driver. The new driver is a bit simpler and has been tested against mocked CSI behavior (provided in test cases), as well as integration tested against a real Vault CSI driver.

📜 Details

This driver leverages this convenient tidbit from the Vault CSI code:

https://github.com/kubernetes-sigs/secrets-store-csi-driver/blob/c697863c35d5431ec048b440d36550eb3ceb338f/pkg/util/fileutil/atomic_writer.go#L60-L62

By taking advantage of this behavior (modification time of the ..data symlink) we simplify caching logic greatly by observing modification time of the symlink for cache entry invalidation. Furthermore, since the CSI driver initializes the volume before start, we have forgone the filewatcher on this symlink.

Secret files are rewritten atomically using this algorithm. The upshot is that if we don't resolve the ..data symlink we can use it each time we read files to get the most recent version from disk.

In my testing environment I was observing 2 minutes between refreshes. I haven't hunted down the configuration for this yet but that appears to be the default.

🧪 Testing Steps / Validation

This was tested with some print statements in a test environment where a Baseplate.py thrift service was serving testing traffic with Vault CSI running providing secrets. I monitored cache hits vs failures to ensure that the files weren't being reloaded more than expected.

Cache hits vs misses log

Feb 27 14:35:45 test-environment: INFO     Listening on ('0.0.0.0', 9090)
Feb 27 14:42:58 test-environment: cache miss: secret/example/secret-value-1@mtime=1709073701.164487, cache_entry=None, secret_data={'current': 'xxxxxxxxxxx==', 'encoding': 'base64', 'type': 'versioned'}
Feb 27 14:42:58 test-environment: cache miss: secret/example/secret-value-2@mtime=1709073701.164487, cache_entry=None, secret_data={'current': 'xxxxxxxx', 'encoding': 'base64', 'type': 'versioned'}
Feb 27 14:42:58 test-environment: cache hit: secret/example/secret-value-2@mtime=1709073701.164487, cache_entry=VaultCSIEntry(mtime=1709073701.164487, data={'current': 'xxxxxxxx', 'encoding': 'base64', 'type': 'versioned'}, updating=False)
Feb 27 14:42:58 test-environment: cache hit: secret/example/secret-value-2@mtime=1709073701.164487, cache_entry=VaultCSIEntry(mtime=1709073701.164487, data={'current': 'xxxxxxxx', 'encoding': 'base64', 'type': 'versioned'}, updating=False)
Feb 27 14:42:58 test-environment: cache hit: secret/example/secret-value-2@mtime=1709073701.164487, cache_entry=VaultCSIEntry(mtime=1709073701.164487, data={'current': 'xxxxxxxx', 'encoding': 'base64', 'type': 'versioned'}, updating=False)
Feb 27 14:42:58 test-environment: cache hit: secret/example/secret-value-2@mtime=1709073701.164487, cache_entry=VaultCSIEntry(mtime=1709073701.164487, data={'current': 'xxxxxxxx', 'encoding': 'base64', 'type': 'versioned'}, updating=False)
Feb 27 14:42:58 test-environment: cache hit: secret/example/secret-value-2@mtime=1709073701.164487, cache_entry=VaultCSIEntry(mtime=1709073701.164487, data={'current': 'xxxxxxxx', 'encoding': 'base64', 'type': 'versioned'}, updating=False)
Feb 27 14:43:19 test-environment: cache hit: secret/example/secret-value-1@mtime=1709073701.164487, cache_entry=VaultCSIEntry(mtime=1709073701.164487, data={'current': 'xxxxxxxxxxx==', 'encoding': 'base64', 'type': 'versioned'}, updating=False)
Feb 27 14:43:19 test-environment: cache hit: secret/example/secret-value-2@mtime=1709073701.164487, cache_entry=VaultCSIEntry(mtime=1709073701.164487, data={'current': 'xxxxxxxx', 'encoding': 'base64', 'type': 'versioned'}, updating=False)
Feb 27 14:43:19 test-environment: cache hit: secret/example/secret-value-2@mtime=1709073701.164487, cache_entry=VaultCSIEntry(mtime=1709073701.164487, data={'current': 'xxxxxxxx', 'encoding': 'base64', 'type': 'versioned'}, updating=False)
Feb 27 14:43:19 test-environment: cache hit: secret/example/secret-value-2@mtime=1709073701.164487, cache_entry=VaultCSIEntry(mtime=1709073701.164487, data={'current': 'xxxxxxxx', 'encoding': 'base64', 'type': 'versioned'}, updating=False)
Feb 27 14:43:19 test-environment: cache hit: secret/example/secret-value-2@mtime=1709073701.164487, cache_entry=VaultCSIEntry(mtime=1709073701.164487, data={'current': 'xxxxxxxx', 'encoding': 'base64', 'type': 'versioned'}, updating=False)
Feb 27 14:43:19 test-environment: cache hit: secret/example/secret-value-2@mtime=1709073701.164487, cache_entry=VaultCSIEntry(mtime=1709073701.164487, data={'current': 'xxxxxxxx', 'encoding': 'base64', 'type': 'versioned'}, updating=False)
Feb 27 14:43:19 test-environment: cache hit: secret/example/secret-value-2@mtime=1709073701.164487, cache_entry=VaultCSIEntry(mtime=1709073701.164487, data={'current': 'xxxxxxxx', 'encoding': 'base64', 'type': 'versioned'}, updating=False)
Feb 27 14:43:19 test-environment: cache hit: secret/example/secret-value-1@mtime=1709073701.164487, cache_entry=VaultCSIEntry(mtime=1709073701.164487, data={'current': 'xxxxxxxxxxx==', 'encoding': 'base64', 'type': 'versioned'}, updating=False)
Feb 27 14:43:19 test-environment: cache hit: secret/example/secret-value-2@mtime=1709073701.164487, cache_entry=VaultCSIEntry(mtime=1709073701.164487, data={'current': 'xxxxxxxx', 'encoding': 'base64', 'type': 'versioned'}, updating=False)
Feb 27 14:43:19 test-environment: cache hit: secret/example/secret-value-2@mtime=1709073701.164487, cache_entry=VaultCSIEntry(mtime=1709073701.164487, data={'current': 'xxxxxxxx', 'encoding': 'base64', 'type': 'versioned'}, updating=False)
Feb 27 14:43:19 test-environment: cache hit: secret/example/secret-value-2@mtime=1709073701.164487, cache_entry=VaultCSIEntry(mtime=1709073701.164487, data={'current': 'xxxxxxxx', 'encoding': 'base64', 'type': 'versioned'}, updating=False)
Feb 27 14:43:19 test-environment: cache hit: secret/example/secret-value-2@mtime=1709073701.164487, cache_entry=VaultCSIEntry(mtime=1709073701.164487, data={'current': 'xxxxxxxx', 'encoding': 'base64', 'type': 'versioned'}, updating=False)
Feb 27 14:43:19 test-environment: cache hit: secret/example/secret-value-2@mtime=1709073701.164487, cache_entry=VaultCSIEntry(mtime=1709073701.164487, data={'current': 'xxxxxxxx', 'encoding': 'base64', 'type': 'versioned'}, updating=False)
Feb 27 14:43:19 test-environment: cache hit: secret/example/secret-value-2@mtime=1709073701.164487, cache_entry=VaultCSIEntry(mtime=1709073701.164487, data={'current': 'xxxxxxxx', 'encoding': 'base64', 'type': 'versioned'}, updating=False)
Feb 27 14:43:19 test-environment: cache hit: secret/example/secret-value-1@mtime=1709073701.164487, cache_entry=VaultCSIEntry(mtime=1709073701.164487, data={'current': 'xxxxxxxxxxx==', 'encoding': 'base64', 'type': 'versioned'}, updating=False)
Feb 27 14:43:19 test-environment: cache hit: secret/example/secret-value-2@mtime=1709073701.164487, cache_entry=VaultCSIEntry(mtime=1709073701.164487, data={'current': 'xxxxxxxx', 'encoding': 'base64', 'type': 'versioned'}, updating=False)
Feb 27 14:43:19 test-environment: cache hit: secret/example/secret-value-2@mtime=1709073701.164487, cache_entry=VaultCSIEntry(mtime=1709073701.164487, data={'current': 'xxxxxxxx', 'encoding': 'base64', 'type': 'versioned'}, updating=False)
Feb 27 14:43:19 test-environment: cache hit: secret/example/secret-value-2@mtime=1709073701.164487, cache_entry=VaultCSIEntry(mtime=1709073701.164487, data={'current': 'xxxxxxxx', 'encoding': 'base64', 'type': 'versioned'}, updating=False)
Feb 27 14:43:19 test-environment: cache hit: secret/example/secret-value-2@mtime=1709073701.164487, cache_entry=VaultCSIEntry(mtime=1709073701.164487, data={'current': 'xxxxxxxx', 'encoding': 'base64', 'type': 'versioned'}, updating=False)
Feb 27 14:43:19 test-environment: cache hit: secret/example/secret-value-2@mtime=1709073701.164487, cache_entry=VaultCSIEntry(mtime=1709073701.164487, data={'current': 'xxxxxxxx', 'encoding': 'base64', 'type': 'versioned'}, updating=False)
Feb 27 14:43:19 test-environment: cache hit: secret/example/secret-value-2@mtime=1709073701.164487, cache_entry=VaultCSIEntry(mtime=1709073701.164487, data={'current': 'xxxxxxxx', 'encoding': 'base64', 'type': 'versioned'}, updating=False)
Feb 27 14:44:10 test-environment: cache hit: secret/example/secret-value-1@mtime=1709073820.1701584, cache_entry=VaultCSIEntry(mtime=1709073701.164487, data={'current': 'xxxxxxxxxxx==', 'encoding': 'base64', 'type': 'versioned'}, updating=False)
Feb 27 14:44:10 test-environment: cache miss: secret/example/secret-value-1@mtime=1709073820.1701584, cache_entry=VaultCSIEntry(mtime=1709073701.164487, data={'current': 'xxxxxxxxxxx==', 'encoding': 'base64', 'type': 'versioned'}, updating=False), secret_data={'current': 'xxxxxxxxxxx==', 'encoding': 'base64', 'type': 'versioned'}
Feb 27 14:44:10 test-environment: cache hit: secret/example/secret-value-2@mtime=1709073820.1701584, cache_entry=VaultCSIEntry(mtime=1709073701.164487, data={'current': 'xxxxxxxx', 'encoding': 'base64', 'type': 'versioned'}, updating=False)
Feb 27 14:44:10 test-environment: cache miss: secret/example/secret-value-2@mtime=1709073820.1701584, cache_entry=VaultCSIEntry(mtime=1709073701.164487, data={'current': 'xxxxxxxx', 'encoding': 'base64', 'type': 'versioned'}, updating=False), secret_data={'current': 'xxxxxxxx', 'encoding': 'base64', 'type': 'versioned'}
Feb 27 14:44:10 test-environment: cache hit: secret/example/secret-value-2@mtime=1709073820.1701584, cache_entry=VaultCSIEntry(mtime=1709073820.1701584, data={'current': 'xxxxxxxx', 'encoding': 'base64', 'type': 'versioned'}, updating=False)
Feb 27 14:44:10 test-environment: cache hit: secret/example/secret-value-2@mtime=1709073820.1701584, cache_entry=VaultCSIEntry(mtime=1709073820.1701584, data={'current': 'xxxxxxxx', 'encoding': 'base64', 'type': 'versioned'}, updating=False)
Feb 27 14:44:10 test-environment: cache hit: secret/example/secret-value-2@mtime=1709073820.1701584, cache_entry=VaultCSIEntry(mtime=1709073820.1701584, data={'current': 'xxxxxxxx', 'encoding': 'base64', 'type': 'versioned'}, updating=False)
Feb 27 14:44:10 test-environment: cache hit: secret/example/secret-value-2@mtime=1709073820.1701584, cache_entry=VaultCSIEntry(mtime=1709073820.1701584, data={'current': 'xxxxxxxx', 'encoding': 'base64', 'type': 'versioned'}, updating=False)

✅ Checks

CI tests (if present) are passing
Adheres to code style for repo
Contributor License Agreement (CLA) completed if not a Reddit employee

* Ensure VaultCSI secrets are sourcing from a directory * Utilize the parser mechanisms * Update error messages

pnovotnak · 2024-02-27T23:38:49Z

baseplate/lib/secrets.py

    if options.provider == "vault_csi":
-        parser = parse_vault_csi
-        return DirectorySecretsStore(options.path, parser, timeout=timeout, backoff=backoff)
+        return VaultCSISecretsStore(
+            options.path, parser=parse_vault_csi, timeout=timeout, backoff=backoff
+        )


@chriskuehl We are a little torn here WRT backward incompatibility. WDYT about removing the old implementation straight away?

PS If we decide to replace the current implementation, we can probably remove DirectorySecretsStore & associated tests.

This would be transparent to users, right? Looking in Sourcegraph, I don't see any references to DirectorySecretsStore outside of this repo, so that seems reasonable to me.

Internally we seem fine.

The issue is that theoretically there are open source users of baseplate.py that might see breakage. Is it ok to risk that?

I think it's OK to remove the old unused implementation. I did some cursory checks (searching GitHub for public users of baseplate) and also checking in with the rest of the team, and our general impression is that nobody is really using Baseplate.py outside of Reddit.

This isn't necessarily great open source hygiene, but I don't think it's worth increasing our maintenance burden for a benefit we're pretty sure isn't actually there. We should call out in the release notes that it's technically a breaking change though.

Ok, I've ripped out the old implementation based on this discussion.

pnovotnak · 2024-02-27T23:39:19Z

baseplate/lib/secrets.py

@@ -119,7 +120,9 @@ def _decode_secret(path: str, encoding: str, value: str) -> bytes:
    raise CorruptSecretError(path, f"unknown encoding: {encoding!r}")


-SecretParser = Callable[[Dict[str, Any], str], Dict[str, str]]


Second parameter has a default which can't be expressed with Callable

baseplate/lib/secrets.py

KTAtkinson

Thanks for getting this together!

baseplate/lib/secrets.py

Co-authored-by: Tyler Lubeck <tyler@tylerlubeck.com>

baseplate/lib/secrets.py

docs/pyproject.toml

tests/unit/lib/secrets/vault_csi_tests.py

kylelemons · 2024-02-29T23:23:48Z

tests/unit/lib/secrets/vault_csi_tests.py

+        simulate_secret_update(self.csi_dir)
+        assert original_data_path != self.csi_dir.joinpath("..data").resolve()
+        data = secrets_store.get_credentials("secret/example-service/example-secret")
+        assert data.username == "reddit"
+        assert data.password == "password"


It's important to validate more than one secret update -- historically, the main failure mode of our naive implementations that watch the file would work correctly for one update but will fail to notice the second.

Done, also with secret value updates.

kylelemons · 2024-02-29T23:25:15Z

Also, meta question, did we validate this in snoodev with the real Vault CSI yet?

pnovotnak · 2024-03-01T17:56:32Z

Yes! Tested pretty extensively in snoodev

kylelemons

Left a comment about the mutex

kylelemons

I think this looks workable to me, but I am definitely neither a python nor baseplate expert so please also wait for the other reviewers.

chriskuehl

Looks reasonable to me, just one question about whether a race condition can actually happen.

baseplate/lib/secrets.py

tests/unit/lib/secrets/vault_csi_tests.py

chriskuehl · 2024-03-01T22:09:50Z

tests/unit/lib/secrets/vault_csi_tests.py

+def new_fake_csi(data: typing.Dict[str, SecretType]) -> Path:
+    """Creates a simulated CSI directory with data and symlinks.
+    Note that this would already be configured before the pod starts."""
+    csi_dir = Path(tempfile.mkdtemp())


nit (optional): since we're using pytest as our test runner already, it might be nice to use pytest's tmp_path fixture rather than creating temporary directories manually: https://docs.pytest.org/en/latest/how-to/tmp_path.html

You wouldn't need to manually clean it up this way either, since pytest handles cleanup (by default it leaves the past couple test run outputs around which is helpful in case you want to inspect failures).

While I would like to apply this suggestion, I think this would require overhauling the test suite pretty significantly. So I'll omit this change, but TIL about this feature

baseplate/lib/secrets.py

chriskuehl · 2024-03-01T22:37:13Z

tests/unit/lib/secrets/vault_csi_tests.py

+    def test_secret_updated(self):
+        secrets_store = get_secrets_store(str(self.csi_dir))
+        data = secrets_store.get_credentials("secret/example-service/example-secret")
+        gevent.sleep(0.1)  # prevent gevent shenanigans


What's the purpose of these sleeps?

Tests were flaking if they weren't present. My theory is that gevent is passing control back to the tests. By sleeping, I ensure the IO is complete before the remainder of the test.

Were the flakes locally or in CI? I've been running the tests a bunch locally and can't seem to reproduce any flakes. I'd like to try to dig into this if possible before we merge.

I only saw it in CI

Example: https://github.com/reddit/baseplate.py/actions/runs/8116056546/job/22185234411

I looked into this a little bit and was able to replicate the issues frequently in GitHub Actions CI but never locally.

I added some debugging prints and noticed cases where the file mtime did not change but the file contents did, which caused the test failures:

Not sure if this is something weird with the CI environment (maybe like low-resolution timestamps in the filesystem or with the system clock?) or something with gevent I'm not understanding, but I think it's safe to go ahead and merge.

Co-authored-by: Chris Kuehl <chris.kuehl@reddit.com>

Vault CSI support v2

fc877c9

pnovotnak mentioned this pull request Feb 27, 2024

Vault CSI support v2 #827

Closed

3 tasks

TylerLubeck and others added 3 commits February 27, 2024 14:05

CSI Additions (#829)

b979cdc

* Ensure VaultCSI secrets are sourcing from a directory * Utilize the parser mechanisms * Update error messages

Fix data race causing null return (#830)

8e9d91a

lint and make fmt

a9015bf

pnovotnak force-pushed the vault-csi-support-v2 branch from 6742582 to a9015bf Compare February 27, 2024 23:19

pnovotnak marked this pull request as ready for review February 27, 2024 23:36

pnovotnak requested a review from a team as a code owner February 27, 2024 23:36

pnovotnak requested a review from chriskuehl February 27, 2024 23:36

pnovotnak commented Feb 27, 2024

View reviewed changes

pnovotnak requested a review from kylelemons February 28, 2024 17:08

remove useless comment

d7ed31c

pnovotnak requested review from TylerLubeck and KTAtkinson February 28, 2024 17:18

KTAtkinson reviewed Feb 28, 2024

View reviewed changes

baseplate/lib/secrets.py Outdated Show resolved Hide resolved

baseplate/lib/secrets.py Outdated Show resolved Hide resolved

baseplate/lib/secrets.py Outdated Show resolved Hide resolved

TylerLubeck reviewed Feb 28, 2024

View reviewed changes

baseplate/lib/secrets.py Outdated Show resolved Hide resolved

KA and TL review points

a696704

pnovotnak requested review from TylerLubeck and KTAtkinson February 28, 2024 23:17

KTAtkinson approved these changes Feb 29, 2024

View reviewed changes

TylerLubeck reviewed Feb 29, 2024

View reviewed changes

baseplate/lib/secrets.py Outdated Show resolved Hide resolved

TylerLubeck approved these changes Feb 29, 2024

View reviewed changes

TL review point: enrich stack trace

fba0109

Co-authored-by: Tyler Lubeck <tyler@tylerlubeck.com>

kylelemons reviewed Feb 29, 2024

View reviewed changes

pnovotnak added 2 commits March 1, 2024 18:36

KL review points

9e39e3b

prevent gevent returning values out-of-order

cdc7d38

pnovotnak requested a review from kylelemons March 1, 2024 18:49

kylelemons reviewed Mar 1, 2024

View reviewed changes

pnovotnak added 4 commits March 1, 2024 19:05

remove cache mutex

88e0e07

reintroduce sleep

ddbc377

more sleep

d6967d6

add sleep for cache test

3bdb034

pnovotnak requested a review from kylelemons March 1, 2024 19:27

kylelemons approved these changes Mar 1, 2024

View reviewed changes

chriskuehl approved these changes Mar 1, 2024

View reviewed changes

pnovotnak and others added 3 commits March 5, 2024 09:10

simpler test secret writer

1f6bda5

Co-authored-by: Chris Kuehl <chris.kuehl@reddit.com>

simplify link management in test

23c31ae

Co-authored-by: Chris Kuehl <chris.kuehl@reddit.com>

CK review point: remove race handling

2c7dcdb

pnovotnak requested a review from chriskuehl March 5, 2024 17:27

remove old implementation based on discussion

b073d14

chriskuehl approved these changes Mar 14, 2024

View reviewed changes

chriskuehl merged commit 82dd952 into develop Mar 14, 2024
5 checks passed

chriskuehl deleted the vault-csi-support-v2 branch March 14, 2024 16:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vault CSI support v2 #828

Vault CSI support v2 #828

pnovotnak commented Feb 27, 2024 •

edited

Loading

pnovotnak Feb 27, 2024

pnovotnak Mar 1, 2024

chriskuehl Mar 1, 2024

pnovotnak Mar 2, 2024

chriskuehl Mar 5, 2024

pnovotnak Mar 7, 2024

pnovotnak Feb 27, 2024 •

edited

Loading

KTAtkinson left a comment

kylelemons Feb 29, 2024

pnovotnak Mar 1, 2024

kylelemons commented Feb 29, 2024 •

edited

Loading

pnovotnak commented Mar 1, 2024

kylelemons left a comment

kylelemons left a comment

chriskuehl left a comment

chriskuehl Mar 1, 2024

pnovotnak Mar 5, 2024

chriskuehl Mar 1, 2024

pnovotnak Mar 5, 2024

chriskuehl Mar 5, 2024

pnovotnak Mar 7, 2024

pnovotnak Mar 7, 2024

chriskuehl Mar 14, 2024

		@@ -119,7 +120,9 @@ def _decode_secret(path: str, encoding: str, value: str) -> bytes:
		raise CorruptSecretError(path, f"unknown encoding: {encoding!r}")


		SecretParser = Callable[[Dict[str, Any], str], Dict[str, str]]

Vault CSI support v2 #828

Vault CSI support v2 #828

Conversation

pnovotnak commented Feb 27, 2024 • edited Loading

💸 TL;DR

📜 Details

🧪 Testing Steps / Validation

✅ Checks

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pnovotnak Feb 27, 2024 • edited Loading

Choose a reason for hiding this comment

KTAtkinson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kylelemons commented Feb 29, 2024 • edited Loading

pnovotnak commented Mar 1, 2024

kylelemons left a comment

Choose a reason for hiding this comment

kylelemons left a comment

Choose a reason for hiding this comment

chriskuehl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pnovotnak commented Feb 27, 2024 •

edited

Loading

pnovotnak Feb 27, 2024 •

edited

Loading

kylelemons commented Feb 29, 2024 •

edited

Loading