Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

config: Skip copyright substitution for copyright notices declared as constant values #12519

Conversation

jayaddison
Copy link
Contributor

Feature or Bugfix

  • Bugfix

Purpose

  • Even when a SOURCE_DATE_EPOCH reproducible build timestamp is configured, I do not think that Sphinx should attempt automated substitution of constantly-declared copyright notice lines.

Detail

  • When loading the conf.py file, also parse it using Python's built-in AST parser, and keep a note of config key names that were declared at the module-level and values that are parsed as ast.Constant. Skip the substitution logic for any copyright keys configured using these identified constants.

Relates

@jayaddison jayaddison changed the title Draft: Configuration: skip copyright substitution for copyright notices declared as constant values. Draft: [config] skip copyright substitution for copyright notices declared as constant values. Jul 8, 2024
@jayaddison jayaddison marked this pull request as ready for review July 9, 2024 14:38
@jayaddison jayaddison changed the title Draft: [config] skip copyright substitution for copyright notices declared as constant values. [config] skip copyright substitution for copyright notices declared as constant values. Jul 9, 2024
…ng entirely-constant elements to be constants.
@jayaddison
Copy link
Contributor Author

I don't have any further changes planned here, but I'm not 100% confident about the changeset either. This code handles detection of simple constant assignments, but the functionality as-is can falsely give the impression that certain variables are literal constants, when in fact they're later appended-to, re-assigned-to, etc.

It wouldn't be much fun to have to continually evolve and adjust this code to handle edge cases.

@jayaddison jayaddison marked this pull request as draft July 12, 2024 15:33
@jayaddison
Copy link
Contributor Author

Tangential idea, available from Py3.11 onwards: perhaps we could add a type-hint in the templated conf.py.jinja file for copyright of typing.LiteralString.

Sites could still remove or adjust that, but it might encourage usage of configuration of copyright notices using constants.

@jayaddison jayaddison changed the title [config] skip copyright substitution for copyright notices declared as constant values. [config] skip copyright substitution for copyright notices declared as constant values Jul 23, 2024
@jayaddison jayaddison changed the title [config] skip copyright substitution for copyright notices declared as constant values config: skip copyright substitution for copyright notices declared as constant values Aug 5, 2024
@jayaddison jayaddison changed the title config: skip copyright substitution for copyright notices declared as constant values config: Skip copyright substitution for copyright notices declared as constant values Aug 5, 2024
@jayaddison
Copy link
Contributor Author

Since there is some careful logic required to handle epub_copyright, it would make sense to add test coverage for that too before merge. Even so: I think this is ready to open for review.

@jayaddison jayaddison marked this pull request as ready for review September 3, 2024 11:43
@jayaddison jayaddison requested a review from picnixz September 3, 2024 22:32
Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are too many cases where someone can (for debugging purposes) redeclare a configuration variable and thus the detection algorithm must be improved. One could just ignore when a variable is being redeclared but that's probably too harsh since this could be tricky to debug.

I know that @AA-Turner wasn't in favor of having an additional configuration variable but to tackle the reproducbility issue, I think it's needed. However, what I can suggest is a configuration variable which is simply a format string and we will just use datetime.datetime.today().strftime(copyright_format) as is.

On the other hand, a way to make it much simpler is... to create an event so that users can just return whatever strings they want. It's yet another event but maybe it could done.

@@ -278,9 +280,17 @@ class Config:
def __init__(self, config: dict[str, Any] | None = None,
overrides: dict[str, Any] | None = None) -> None:
raw_config: dict[str, Any] = config or {}
constants = raw_config.pop('__constants__', None) or frozenset()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone is using this name for something else, it's a bad idea to actually change it IMO.

# when empty, config.epub_copyright inherits from config.copyright, so
# treat it as constant if the latter is
if 'epub_copyright' not in raw_config and 'copyright' in constants:
constants |= frozenset(['epub_copyright'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I would say that you should explicitly hardcode the allowed constants in a global variable (outside the Config class).

)
for name in constant_assignments
))
except Exception:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tree = ast.parse(code_py) can never fail since compiled the code above.
In addition, if someone writes

x = 1
x = get_x()

and

x = get_x()
x = 1

Then the first one is dynamic while the second can be considered constant. Your filter unfortunately doesn't consider those cases. As such, I think this approach is too complex.

@jayaddison
Copy link
Contributor Author

I know that @AA-Turner wasn't in favor of having an additional configuration variable but to tackle the reproducbility issue, I think it's needed. However, what I can suggest is a configuration variable which is simply a format string and we will just use datetime.datetime.today().strftime(copyright_format) as is.

On the other hand, a way to make it much simpler is... to create an event so that users can just return whatever strings they want. It's yet another event but maybe it could done.

I'm not too keen on continuing along paths where we support dynamic replacement of values, because generally I think that a static/constant declaration is near-optimal for various reasons.

I do admit that some groups will want to ensure that their build output contains the current year in copyright notices; I think that's better achieved by adding a post-build validation step, amenable to automation.

The changes suggested here don't require or enforce any of that, but they are intended to encourage use of literal/constant assignments for the copyright config setting(s).

@AA-Turner
Copy link
Member

Perhaps:

  1. Substitute the year iff it is the current year.
  2. Add some format specifier ({yy} or {yyyy} or %y or %Y or YY or etc) that we support and always replace with either the current year or the year in SOURCE_DATE_EPOCH (this should support four- and two-digit years)
  3. Document (2) as the preferred option for those who want to control substitution under SDE
  4. Do nothing else?

I am really wary of the complexity here, and oftentimes the import of time or datetime is the only dynamic thing in conf.py, whereas we should be promoting static configuration where possible.

A

@jayaddison
Copy link
Contributor Author

  1. Add some format specifier ({yy} or {yyyy} or %y or %Y or YY or etc) that we support and always replace with either the current year or the year in SOURCE_DATE_EPOCH (this should support four- and two-digit years)

I'd vote for strftime option; it's well-established and that seems better than inventing some new custom format.

What do we think about applying that logic to the existing copyright / epub_copyright config fields? I doubt it would break many, if any, existing projects -- and I think it might be preferable than attempting a long-duration migration via a copyright_format setting (partly because if we do that, other unusual configuration patterns might emerge during the transition and foil our plan).

@AA-Turner
Copy link
Member

I'd vote for strftime option; it's well-established and that seems better than inventing some new custom format.

Sure. Though we need to be clear that this is strftime-inspired, and only supports %Y and %y.

What do we think about applying that logic to the existing copyright / epub_copyright config fields?

This was my intent, yes.

A

@jayaddison
Copy link
Contributor Author

Perhaps:

1. Substitute the year iff it is the current year.

2. Add some format specifier (`{yy}` or `{yyyy}` or `%y` or `%Y` or `YY` or etc) that we support and always replace with either the current year or the year in SOURCE_DATE_EPOCH (this should support four- and two-digit years)

3. Document (2) as the preferred option for those who want to control substitution under SDE

4. Do nothing else?

I am really wary of the complexity here, and oftentimes the import of time or datetime is the only dynamic thing in conf.py, whereas we should be promoting static configuration where possible.

A

@picnixz do you also find this plan acceptable? I'd ideally like at least two supporters with some experience of the copyright functionality before closing this and #12516 and commencing work on it.

@picnixz
Copy link
Member

picnixz commented Sep 26, 2024

I'm back to business here Jay (really sorry for just contributing to CPython and forgetting about Sphinx). I like this plan because:

  • if you're lazy, you don't need to do anything and just use yy or yyyy and your projects will always be up-to-date
  • if you're worried about reproducibility, you'll want to make it easy to use. And using the %Y placeholders will be exactly using what you want.
  • "Substitute the year iff it is the current year." I don't understand this one. Could you show me an example please? (for instance, do you mean to say that a copyright notice hardcoded as "2002 - 2010" would not be substituted anymore as "2002 - 2024"?)

@jayaddison
Copy link
Contributor Author

I'm back to business here Jay (really sorry for just contributing to CPython and forgetting about Sphinx). I like this plan because:

No problem! Thank you for taking your time and considering the suggested approach.

Answering the third point first because I think it is the most important:

  • "Substitute the year iff it is the current year." I don't understand this one. Could you show me an example please? (for instance, do you mean to say that a copyright notice hardcoded as "2002 - 2010" would not be substituted anymore as "2002 - 2024"?)

tl;dr - that condition approximates: "only apply substitutions for notice lines where the project itself specifies a dynamically-evaluated end-year"

My thinking about this has changed since writing the bugreport (#12451), however I'll use the same example to explain:

2000-2001 - first copyright holder
2002-2003 - second copyright holder

I had a sense that there was something odd/wrong about updating both of these lines, because they have different end years, so I wrote the bugreport suggesting that the expected output was that only the most recent author's line should be updated.

That expectation was from some kind of probabilisic (and definitely not always true) sense that contributors listed towards the end of the copyright notice tend to be the more recently-active contributors -- and also that sometimes copyright transfer means that the latest-listed holder is effectively the only remaining contributor (again, not always true).

While thinking about disambiguation of active/inactive contributors, I gradually decided that contributor self-interest is likely to motivate them to keep their own copyright notice lines up-to-date and accurate. So in fact we could assume that first copyright holder has not contributed significantly since 2001, nor second copyright holder since 2003.

Based on that, the remaining requirement is to continue to substitute any dynamically computed years, because we don't want to break those, and that's where the iff current year logic arises from: the current year should only appear in a Sphinx copyright value if it was either written statically as the current system-clock year (in which case substitution is a no-op) or if it involved dynamic evaluation of code on the computer to figure out the apparent current year.

In the latter case, the lines where iff current year is true should be the ones that the project authors intentionally added dynamic logic for -- and so we can respect those by performing the substitution.

(in this case perhaps a third author added 2003-{utils.get_current_year()} - third copyright holder -- and that should continue to work and build accurately when SOURCE_DATE_EPOCH is configured)

Your other observations:

  • if you're lazy, you don't need to do anything and just use yy or yyyy and your projects will always be up-to-date

  • if you're worried about reproducibility, you'll want to make it easy to use. And using the %Y placeholders will be exactly using what you want.

To confirm points one and two: yep, this provides a minimal method for dynamic copyright notices, where the output is always year-based and deterministic (at least for a fixed build time and timezone, strictly speaking). Projects that use static/constant copyright notices would be unaffected, by definition.

@jayaddison
Copy link
Contributor Author

Closing this as the detection-of-constants approach seems complex, and also since we are gaining agreement that we can use a simplified template string format instead.

@jayaddison jayaddison closed this Sep 29, 2024
@jayaddison jayaddison deleted the issue-12451/copyright-substitution-skip-constants branch September 29, 2024 17:39
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 30, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants