Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tap 14 draft pr #2049

Closed
wants to merge 14 commits into from
Closed

Tap 14 draft pr #2049

wants to merge 14 commits into from

Conversation

abs007
Copy link
Contributor

@abs007 abs007 commented Jul 6, 2022

• Commits fc5bd5c to 88c140a are based around setting up a sample metadata structure inside the repository folder (which is inside repository_data) for testing

774a039 is for reverting repository back to its original state and 8845ebd stores the metadata inside a new TAP 14 folder

df46e38 and 0f50945 add test functions that check inside the TAP 14 folder

• Currently working on implementing changes to the client update process inside the updater.py file

abs007 added 8 commits July 6, 2022 13:27
Removed the folders "metadata.staged" and "metadata" and
added all the metadata inside the "1.0.0" and "2.0.0"

Signed-off-by: Abhisman Sarkar <abhisman.sarkar@gmail.com>
Updated the spec version inside the metadata files inside
the "2.0.0" folder

Signed-off-by: Abhisman Sarkar <abhisman.sarkar@gmail.com>
Added the all the metadata files alongside the "1.0.0",
"2.0.0" and "targets" folders

Signed-off-by: Abhisman Sarkar <abhisman.sarkar@gmail.com>
Reverted the directory structure to keep all metadata
inside "metadata.staged" and "metadata" folders

Signed-off-by: Abhisman Sarkar <abhisman.sarkar@gmail.com>
Created a new "TAP 14" folder wherein I added all the
metadata inside the "1.0.0" and  "2.0.0" folders and
added the "targets" folder

Signed-off-by: Abhisman Sarkar <abhisman.sarkar@gmail.com>
Added the test case which checks for the TAP 14 folder

Signed-off-by: Abhisman Sarkar <abhisman.sarkar@gmail.com>
Added a test function that check the contents inside the
TAP 14 folder.

Signed-off-by: Abhisman Sarkar <abhisman.sarkar@gmail.com>
Added a function to select specification version folder
to download metadata from

Signed-off-by: Abhisman Sarkar <abhisman.sarkar@gmail.com>
@coveralls
Copy link

coveralls commented Jul 6, 2022

Pull Request Test Coverage Report for Build 2621327940

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 20 of 20 (100.0%) changed or added relevant lines in 2 files are covered.
  • 9 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-0.5%) to 97.584%

Files with Coverage Reduction New Missed Lines %
tuf/ngclient/updater.py 9 94.21%
Totals Coverage Status
Change from base Build 2609032759: -0.5%
Covered Lines: 1244
Relevant Lines: 1265

💛 - Coveralls

@abs007
Copy link
Contributor Author

abs007 commented Jul 6, 2022

Should I squash commits fc5bd5c to 774a039 ?

@MVrachev
Copy link
Collaborator

MVrachev commented Jul 6, 2022

@abs007 as the changes are a lot could you chime in and explain your thoughts on the TAP 14 implementation?
Now as you have some experience with that you have a valuable perspective on that and as it seems there is no full consensus on that feature #2040 it will be good to hear your opinion.

@abs007
Copy link
Contributor Author

abs007 commented Jul 6, 2022

So currently I along with @mnm678 and @znewman01 are working on implementing the changes to how a client would be downloading the metadata according to the format in the TAP 14 specification. This means tinkering with the updater.py file to add new functionality.

@@ -34,7 +34,7 @@
]
},
"expires": "2030-01-01T00:00:00Z",
"spec_version": "1.0.0",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep in mind that you can't just replace the value of any of the fields in the signed dictionary without re-signing the files again or the files will fail verification.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This I believe was made as a temporary change rather and the original state of repository_data/repository was brought back again in 774a039.

That being said, all of the metadata was copied over inside the repository_data/TAP 14/1.0.0 & repository_data/TAP 14/2.0.0 folders and metadata inside the 2.0.0 had the "spec_version" changed to 2.0.0 in a similar manner (just to show that this was metadata belonging to 2.0.0).
That seems like something I'll have to take a look at, thanks.

Added the logic for the new client update process inside
tuf/ngclient/updater.py. Added test functions for the new
process inside tests/test_updater_ng.py. Also made changes
to some test files.

Signed-off-by: Abhisman Sarkar <abhisman.sarkar@gmail.com>
@abs007 abs007 force-pushed the repo-test branch 2 times, most recently from 8c5d8e0 to e77d93e Compare August 7, 2022 07:13
@jku
Copy link
Member

jku commented Aug 7, 2022

I know this is still a draft but I'll leave a quick comment: I don't understand why the client needs this functionality and before this PR is made ready I would like to see a comment in #2040 that explains the real world use case: how is this useful for python-tuf users?

Background for my confusion: If a repository starts offering multiple versions of the repository, why can't clients decide to "upgrade" using a plain old software update on the client, so that the client just switches from old to new repository URL? (I'm obviously assuming that the python-tuf version used supports both versions of the spec -- but so does this proposal I think)

In practice, if a future TUF-enabled pypi.org made available a second repository using a newer TUF spec features I think it would be totally appropriate for pip, the client, to simply change the used repo URL to the new repository in a pip software update. What advantage is there in making this repository selection dynamic?

@znewman01
Copy link

znewman01 commented Aug 7, 2022 via email

@mnm678 mnm678 mentioned this pull request Aug 8, 2022
abs007 added 2 commits August 25, 2022 00:38
Worked upon adding better tests in test_updater_ng.py
and worked on the code structure for updater.py

Signed-off-by: Abhisman Sarkar <abhisman.sarkar@gmail.com>
Made changes to _get_repository_versions() to read the
supported-versions file and also removed _look() from
fetcher.py and requests_fetcher.py

Signed-off-by: Abhisman Sarkar <abhisman.sarkar@gmail.com>
Copy link

@znewman01 znewman01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! I have a lot of comments, but you're making progress

@@ -85,6 +88,7 @@ def __init__(
fetcher: Optional[FetcherInterface] = None,
config: Optional[UpdaterConfig] = None,
):
self.spec_version = None # spec_version is the last used version by the client to get metadata

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this private: self._spec_version

url = f"{self._metadata_base_url}supported-versions.json"

with self._fetcher.download_file(
url, "length placeholder"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably want self.config.supported_versions_max_length in place of "length placeholder". Need to add a line to tuf/ngclient/config.py as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering my comment above, should supported_versions_max_length have the length in bytes equivalent?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should use 'bytes' as the unit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What could be a possible value for supported_versions_max_length in that case?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think about an example supported-versions.json file:

{"supported_versions": ["1", "2", "3", "4"]}

That's got 45 bytes in it. So let's maybe round up to 1000 bytes? That's a pretty huge margin of error but well short of anything that could cause issues

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What could be the approx len() of the list in case of 1000 bytes?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, a list with 1000 elements would definitely serialize to have > 1000 bytes. The real threshold would probably be a couple hundred.

Why do you ask?

) as target_file:
repository_versions = json.loads(target_file)

return repository_versions["supported_versions"]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should just return repository_versions. We don't need to put it inside a JSON dictionary

Copy link
Contributor Author

@abs007 abs007 Sep 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can understand, supported-versions.json is a JSON dictionary itself :

{ "supported_versions" : [VERSION, ...],  //From the TAP14 page

So, If I return repository_versions instead then won't that return the dictionary itself? What I was trying to do was index the [VERSION, ...] list using the supported_versions key.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you're right! I misremembered the proposed update to the specification.

repository_versions: List[str],
spec_version: str,
supported_versions: List[str],
) -> str:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, we need to return the actual spec version too!

Maybe make the return type Tuple[Optional[str], Optional[str]].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about Tuple[[str], Optional[str]] ? spec_version is supposed to be either "" or [versions, ...] whereas warning can be None.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see return (str(spec_version), warning) below, which suggests:

Tuple[str, Optional[str]].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I meant that, oops. Will add that then.

spec_version: str,
supported_versions: List[str],
) -> str:
"""Returns the specification version to be used."""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure to mention the return type, which is a little funky. When does it return the message?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about?

Returns the specification version to be used and displays a warning if chosen spec_version
is lower than the highest repository version.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I might add "specification version to be used, following the rules of TAP-14"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, shouldn't I mention the return type in the function definition above and keep the docstring to just an explanation?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point—you don't typically want to repeat the type signature in documentation. However, it's not obvious to me from reading what Tuple[str, Optional[str]] means.

There are two ways to fix that:

  1. Make two NewTypes :

    SpecificationVersion = NewType('SpecificationVersion', str)
    WarningMessage = NewType('WarningMessage', str)

    so the return type becomes Tuple[SpecificationVersion, Optional[WarningMessage]]. That's self-documenting, so there's no need to repeat it in the docstring.

  2. Explain what the str and Optional[str] values are in the docstring

I might actually prefer (1), because I think in the medium term we want to make a full SpecificationVersion class to encapsulate ordering (so you don't have to convert to/from int in _get_spec_version), parsing, and conversion to URL parts. However, (2) is much easier so it's fine to do that for now.

Copy link
Contributor Author

@abs007 abs007 Sep 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused. If SpecificationVersion would essentially be derived from str then won't there be TypeError be thrown when I'm still comparing it with latest_repo_version and performing int type functions on it?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right—if you just make SpecificationVersion as a NewType, you can't compare them directly. It's the same as what you have now.

Eventually, you'd make a full class SpecificationVersion which doesn't require conversion to/from int.

os.path.isdir(os.path.join(self.tap14_directory, folder))
)

def test_get_spec_version1(self) -> None:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call this test_get_spec_version_supported or something. And put the comment below in a docstring.


def test_get_spec_version1(self) -> None:
# This uses the default SUPPORTED_VERSIONS variable from updater.py
with self.assertRaises(exceptions.DownloadError):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assertRaises takes an optional second argument: a message. This is really nice way to tell readers of your test code what you're checking:

self.assertRaises(exceptions.DownloadError, "4 is not a supported version").

Go through and do that for every assertRaises or assertEqual call

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fyi, test_get_spec_version1 now test_get_spec_version_supported doesn't pass because SUPPORTED_VERSIONS isn't defined correctly.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine—in fact, really test_get_spec_version should only handle major version 1. So let's update the test cases

@@ -54,6 +56,7 @@
from tuf.ngclient.fetcher import FetcherInterface

logger = logging.getLogger(__name__)
SUPPORTED_VERSIONS = ["1", "2", "3"]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should actually be SUPPORTED_VERSIONS = [SPECIFICATION_VERSION] for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I'm guessing that it'll have to change and be read from the disk?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No—SUPPORTED_VERSIONS is a property of the client library (that is, python-tuf).

The "last used specification version" is what we write to/read from disk.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you push changes that address my comments before marking them as resolved? Makes it easier for me to track what we've been talking about.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Will do that

("3", None),
)

def test_get_spec_version2(self) -> None:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call this just test_get_spec_version


def test_get_spec_version2(self) -> None:
warningchecker = "Not using the latest specification version available on the repository"
# Checks with different values

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you have this many test cases, I recommend putting them in a list:

test_cases = [
    (["1", "2", "3"], "3", ["3", "5", "6"],  "3", False),
    (["3", "5", "6"], "3", ["1", "2", "3", "4"], "3", True),
]
for repo_versions, spec_version, supported_versions, expected_version, should_have_warning in test_cases:
    actual_version, warning = _get_spec_version(...)
    self.assertEqual(actual_version, expected_version)
    self.assertEqual(bool(warning), should_have_warning)
# Do something similar for error_test_cases

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then you can have a comment for each test case explaining why you have it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, should I add a msg or leave a comment? I guess doing both would be repetitive.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, if you're going to put a msg in the asserts you don't need a comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad actually. I'll have to put comments if I'll be using the for loop.


url = f"{self._metadata_base_url}supported-versions.json"

with self._fetcher.download_file(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this fails, we probably want to return ["1"]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or [""]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, should I use download_bytes() instead because afai can see, download_bytes() just uses download_file() inside of fetcher.py

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, good call—do that!

Made changes to updater.py and config.py

Signed-off-by: Abhisman Sarkar <abhisman.sarkar@gmail.com>
Signed-off-by: Abhisman Sarkar <abhisman.sarkar@gmail.com>
Added some more changes adhering to the reviews. Also
worked on cleaning the code in the test file.
This is the last pull request as part of GSoC'22

Signed-off-by: Abhisman Sarkar <abhisman.sarkar@gmail.com>
try:
self._spec_version = self._load_local_metadata("spec_version")
except OSError:
self._spec_version = "1"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be: None

@@ -122,12 +126,50 @@ def refresh(self) -> None:
RepositoryError: Metadata failed to verify in some way
DownloadError: Download of a metadata file failed in some way
"""
try:
self._spec_version = self._load_local_metadata("spec_version")
except OSError:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe check the errno:

except OSError as err:
    import errno # this should be at top, of course
    if err.errno != errno.ENOENT:
        raise
    self._spec_version = ...

logger.warning(message)
self._spec_version = (
f"{spec_version}/"
if spec_version is not None or spec_version == "1"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO if your ternary operator spans 3 lines, better to just do an if/else

logger.warning(message)
self._spec_version = (
f"{spec_version}/"
if spec_version is not None or spec_version == "1"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spec_version is not None or spec_version == "1" <- doesn't the latter condition imply the former? if spec_version == "1" then it's definitely not None


# If supported-versions.json is not found, then look through the root directory to find supported versions
except exceptions.DownloadHTTPError as e:
return ["1"]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we need to decide how to represent:

  1. spec version 1, everything lives in the root /*
  2. spec version 1, everything lives under /1/*

These should be different. I think (1) should be None and (2) should be "1".

That means we should return [None].

return repository_versions["supported_versions"]

# If supported-versions.json is not found, then look through the root directory to find supported versions
except exceptions.DownloadHTTPError as e:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth checking e.status_code == 404; otherwise it could be a real error.

)

def test_get_spec_version_supported(self) -> None:
"""This uses the default SUPPORTED_VERSIONS variable from updater.py"""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a good idea to check that SUPPORTED_VERSIONS == ["1"]; it won't change for a long time.

"3 is selected as the spec version and no warning ensues",
)

def test_get_spec_version(self) -> None:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think now you need to account for the None case for repository_versions

This is actually starting to get a little confusing: there's "version/path pairs" (which come from the repository versions, and then there are just "versions" (from supported versions). _get_spec_version should pick a matching (version, path) pair.

It might be worth encoding that in the type system.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe even just have a function from "path" to version and use that in _get_spec_version.

SpecificationVersion = NewType("SpecificationVersion", str)
WarningMessage = NewType("WarningMessage", str)

repository_versions = [int(i) for i in repository_versions]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is gonna be subtle when we have to worry about None too (just a warning)

@mnm678 mnm678 mentioned this pull request Sep 23, 2022
3 tasks
@lukpueh
Copy link
Member

lukpueh commented Nov 17, 2022

#2114 claims that it supersedes this PR. Can we close here?

@abs007
Copy link
Contributor Author

abs007 commented Nov 17, 2022

Sure

@abs007 abs007 closed this Nov 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants