Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: make sure to catch jinja2 vars used in single-line for and set statements #5447

Merged
merged 12 commits into from
Aug 7, 2024

Conversation

beckermr
Copy link
Contributor

@beckermr beckermr commented Aug 6, 2024

Description

This PR adds two regexes to catch variables used in single-line jinja2 for and set statements. These have been missed for a long time and appear to be the cause of at least one bug. I have not gone through the bug tracker, but I have a hunch this PR might fix other bugs related to similar situations I've seen in the past.

closes #5445

Checklist - did you ...

  • Add a file to the news directory (using the template) for the next release's release notes?
  • Add / update necessary tests?
  • Add / update outdated documentation?

@conda-bot conda-bot added the cla-signed [bot] added once the contributor has signed the CLA label Aug 6, 2024
Copy link

codspeed-hq bot commented Aug 6, 2024

CodSpeed Performance Report

Merging #5447 will not alter performance

Comparing beckermr:debug-skips-jinja2 (70d396c) with main (977e609)

Summary

✅ 3 untouched benchmarks

@beckermr beckermr changed the title fix: make sure to catch jinja2 vars used in for and set statements fix: make sure to catch jinja2 vars used in for and set statements Aug 6, 2024
@beckermr beckermr marked this pull request as ready for review August 6, 2024 19:43
@beckermr beckermr requested a review from a team as a code owner August 6, 2024 19:43
@beckermr
Copy link
Contributor Author

beckermr commented Aug 6, 2024

@kenodegard @isuruf or anyone else, can you review this PR?

It is an important bug fix that I'd like to see in 24.7.2 if we can.

@beckermr beckermr changed the title fix: make sure to catch jinja2 vars used in for and set statements fix: make sure to catch jinja2 vars used in single-line for and set statements Aug 6, 2024
@jaimergp
Copy link
Contributor

jaimergp commented Aug 6, 2024

Looking at this code for the first time I wonder why are regexes needed when apparently Jinja will happily give you a list of variables in use with three lines. Are we doing this before the selectors are parsed? Would it make sense to preprocess the selectors and then defer to the Jinja AST instead of regexes? Mentioning this because of the single-line limitations and exotic syntax not covered right now. Documenting these limitations in the function docstring might be enough.

@beckermr
Copy link
Contributor Author

beckermr commented Aug 6, 2024

I looked at that stackoverflow for a while, but didn't do any investigating.

Here is how it behaves:

import jinja2
import jinja2.meta

env = jinja2.Environment()
parsed_content = env.parse(
"""
{% set CL_VERSION = "19.29" %}
{% if VCVER is not defined %}
{% set BLAH = "19.29" %}
{% set FOO = "19.29" %}
{% set VCVER = "" %}
{% else %}
{% set BLAH = "19.29" %}
{% set BAZ = "19.29" %}
{% endif %}
{% set clang_major = (CLANG_VERSION|default("19.29")).split(".")[0] %}
{% set cl_minor = CL_VERSION.split(".")[1] %}
{% set vc_major = VCVER.split(".")[0] %}

package:
  name: clang-win-activation
  version: {{ CLANG_VERSION }}

build:
  number: 0
  {% if clang_major|int == 16 and cl_minor|int >= 40 %}
  skip: true
  {% endif %}

outputs:
  - name: clang_win-64
    build:
      run_exports:
        strong:
          - vc >={{ VCVER }}
    requirements:
      run:
        - clang {{ CLANG_VERSION }}.*
"""
)
tokens = jinja2.meta.find_undeclared_variables(parsed_content)

print(tokens)

This gives

% python jinja.py
{'BAZ', 'VCVER', 'FOO', 'BLAH', 'CLANG_VERSION'}

So the behavior is basically that anything that touches a conditional with a default value or uses a default filter is marked as undefined. I'd argue that BLAH should not be returned since it is in both branches.

As for when this happens, searches for jinja2 variables happens after selectors:

def _get_used_vars_meta_yaml(self, force_top_level=False, force_global=False):
# make variant dict hashable so that memoization works
variant_keys = tuple(sorted(self.config.variant.keys()))
reqs_text, recipe_text = self._get_used_vars_meta_yaml_helper(
force_top_level=force_top_level,
force_global=force_global,
apply_selectors=False,
)
all_used_selectors = find_used_variables_in_text(
variant_keys, recipe_text, selectors_only=True
)
reqs_text, recipe_text = self._get_used_vars_meta_yaml_helper(
force_top_level=force_top_level,
force_global=force_global,
apply_selectors=True,
)
all_used_reqs = find_used_variables_in_text(
variant_keys, recipe_text, selectors_only=False
)
all_used = all_used_reqs.union(all_used_selectors)
# things that are only used in requirements need further consideration,
# for omitting things that are only used in run
if force_global:
used = all_used
else:
requirements_used = find_used_variables_in_text(variant_keys, reqs_text)
outside_reqs_used = all_used - requirements_used
requirements_used = trim_build_only_deps(self, requirements_used)
used = outside_reqs_used | requirements_used
return used

and so using the jinja2 functions won't work unless we change the order (edit see below for more details, jinja2 functions won't ever work in all cases).

@beckermr
Copy link
Contributor Author

beckermr commented Aug 6, 2024

Other details that are fun.

  1. conda-build, depending on the options, will move text blocks around and concatenate them when searching for variables which can break jinja2 as well.
  2. The conda-build docs specify that selectors are processed after jinja2 which is the opposite of the search in the sense that jinja2 vars are searched for after selectors have been applied. :/

conda_build/variants.py Outdated Show resolved Hide resolved
Comment on lines 748 to 751
# NOTE: why use a regex instead of the jinja2 parser/AST?
# One can ask the jinj2 parser for undefined variables, but conda-build moves whole
# blocks of text around when searching for variables and applies selectors to the text.
# So the text that reaches this function is not necessarily valid jinja2 syntax. :/
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaimergp @wolfv @h-vetinari I have added a comment explaining why we have to use a regex here. Let's merge this one instead of letting the perfect be the enemy of the good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I think at this point it's impossible to fix this properly, sadly. LGTM!

@beckermr beckermr requested a review from jaimergp August 6, 2024 21:13
conda_build/variants.py Outdated Show resolved Hide resolved
@beckermr beckermr enabled auto-merge (squash) August 7, 2024 11:42
@beckermr beckermr disabled auto-merge August 7, 2024 12:00
@beckermr
Copy link
Contributor Author

beckermr commented Aug 7, 2024

@kenodegard @jaimergp @jezdez It appears a dev release of conda on the canary channel is buggy and that is breaking the tests here. What do we do in these cases?

@beckermr beckermr enabled auto-merge (squash) August 7, 2024 12:35
@beckermr beckermr merged commit 242bbce into conda:main Aug 7, 2024
28 checks passed
@beckermr beckermr deleted the debug-skips-jinja2 branch August 7, 2024 14:52
# TODO: this `for` regex won't catch some common cases like lists of vars, multiline
# jinja2 blocks, if filters on the for loop, etc.
for_regex = r"(?:^|[^\{])\{%\s*for\s*.*\s*in\s*" + v_regex + r"(?:[^%]*?)?%\}"
set_regex = r"(?:^|[^\{])\{%\s*set\s*.*\s*=\s*.*" + v_regex + r"(?:[^%]*?)?%\}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regex is not very good.
The following yaml matches glib and glibc both.

{% set a = glibc %}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any suggestions to fix?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set_regex = r"(?:^|[^\{])\{%\s*set\s*.*\s*=\s*.*" + v_regex + r"(?![a-zA-Z_0-9])(?:[^%]*?)?%\}"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #5514

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed [bot] added once the contributor has signed the CLA
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

variants not rendered properly w/ jinja2 if statements and skips
5 participants