Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unit/integration testing: Testing graphical and UI code. #1760

Open
bruvzg opened this issue Nov 2, 2020 · 15 comments
Open

Unit/integration testing: Testing graphical and UI code. #1760

bruvzg opened this issue Nov 2, 2020 · 15 comments

Comments

@bruvzg
Copy link
Member

bruvzg commented Nov 2, 2020

Describe the project you are working on:
Godot engine.

Describe the problem or limitation you are having in your project:
Unit testing was introduced in godotengine/godot#40148, but currently there's no possibility to automatically test any GUI and rendering related code.

Related proposal: #1307 (testing contexts), #1533 (old tests had at least some rendering and UI tests)

Describe the feature / enhancement and how it helps to overcome the problem or limitation:
Implement off-screen DisplayServer for use on headless CI, and make it compatible with software Vulkan (SwiftShader) / OpenGL (OSMesa) implementations to run on CI without GPU, and add testing framework context with active rendering pipeline (initialized display and rendering servers, and normal project main loop).

Describe how your proposal will work, with code, pseudocode, mockups, and/or diagrams:

  1. Testing framework render small, simple scenes for the isolated graphical features (materials, shaders, lighting/shadows e.t.c.) or reaction of the GUI elements to the simulated input event, with a fixed time steps for deterministic behavior.
  2. It takes screenshots at the predefined moments of time (for testing multiple rendering steps successively and testing particles/animations), and store them (probably downscaled, to avoid too big files and smooth out).
  3. Screenshots are compared (by the engine or external script) to the reference images, and marked for manual inspection if they substantial differences (by adding a thick, red border to the image for example).
  4. Screenshots are uploaded as the build artifact (as archive with the one image per test suite).

gr_test

If this enhancement will not be used often, can it be worked around with a few lines of script?:
It can be used as part CI to detect rendering, physics and GUI regressions, and can be used to quickly test specific hardware or driver versions for rendering issues (the same context should be usable with normal DisplayServers as well).

Is there a reason why this should be core and not an add-on in the asset library?:
It should be possible to achieve this with module or GDScript project, but probably better to have testing related stuff in the core for cleaner CI configs and to avoid duplicate code in multiple test projects.

@Xrayez
Copy link
Contributor

Xrayez commented Nov 2, 2020

Test contexts

The minimal testing context was introduced in godotengine/godot#40980 without rendering capabilities, but has been working alright for unit testing specifically so far.

The way I see it, it may be feasible to just introduce another integration test context manually. I've previously attempted to create test contexts using doctest's dynamic filtering in https://github.com/Xrayez/godot/tree/test-contexts, but it may be too complex to maintain, and error-prone.

The main challenge is being able to register setup/teardown methods with doctest, which is not a feature of doctest (without code duplication). The suggested setup/teardown mechanism in doctest is to use SUBCASEs, but I think that works better for avoiding duplication in a test case itself, and not really the test environment.

The entry point for unit and integration testing could be rewritten to accept things like:

  • --test unit
  • --test integration
  • --test project
  • --test rendering

This way, I think it would be still possible to use doctest for those (like godotengine/godot#42938). It means that the entry point would go through additional interface layer, so to speak.

This kind of setup would also help #1533 because it means no compatibility breakage would have to be done in the first place. But godotengine/godot#40148 didn't preserve compatibility with the old tests.

Graphical and UI code testing

I think testing graphical and UI code requires a MainLoop to be running. It's totally possible to feed input events via code, as seen in Xrayez/godot-testbed#5:

extends "res://addons/gut/test.gd"

# https://github.com/godotengine/godot/issues/32597

class TabContainerGuiInputCrash extends TabContainer:

    var ev = InputEventMouseButton.new()

    func _ready():
        var pm := PopupMenu.new()
        set_popup(pm)
        pm.queue_free()

        yield(get_tree(), "idle_frame")
        yield(get_tree(), "idle_frame")
        yield(get_tree(), "idle_frame")

        ev.pressed = true
        ev.button_index = BUTTON_LEFT
        ev.button_mask = BUTTON_LEFT
        ev.position = Vector2(0, 14)

        Input.parse_input_event(ev)

        yield(get_tree(), "idle_frame")
        yield(get_tree(), "idle_frame")

        Input.parse_input_event(ev)
        Input.parse_input_event(ev)

var container

func setup():
    var gut_window = get_parent().get_node('Gut')
    gut_window.hide() # need to hide to properly detect input event

    container = TabContainerGuiInputCrash.new()
    add_child(container)


func test_tab_container_gui_input():
    yield(yield_for(1.0, 'Hopefully no crash happens.'), YIELD)
    assert_true(true, "No crash, great!")


func teardown():
    container.queue_free()

The --fixed-fps and --disable-render-loop command-line options could potentially be used to speed up simulation and controlling the rendering loop via code with RenderingServer.force_draw(). See also godotengine/godot#43260, I'm not sure whether those methods would be actually useful for this.

3. Screenshots are compared (by the engine or external script) to the reference images, and marked for manual inspection if they substantial differences (by adding a thick, red border to the image for example).

doctest could be used for this as for GDScript integration tests #1429, but may be overkill, so perhaps an extra step would be indeed required to do this.

But in theory, all this could be done from within a Godot project running on CI. This is where testing frameworks like GUT shine, in my opinion. For instance, I've been successfully running unit tests in Goost, but we still need a way to render stuff on CI.

@c0d1f1ed
Copy link

c0d1f1ed commented Jan 7, 2021

I noticed at https://bruvzg.github.io/using-godot-with-swiftshader-software-vulkan-emulation.html that you had to increase SwiftShader's bound descriptor set limit to 16 to get it to work with Godot. I'm curious why that's required. Currently only just over half of the Vulkan drivers support 16 or more: https://vulkan.gpuinfo.org/displaydevicelimit.php?name=maxBoundDescriptorSets. While this metric does not take deployments into account, it still seems to me that important classes of GPUs only support 8, or 4, bound descriptor sets.

I don't mind upstreaming this change to permanently increase it, but I'd love to understand how an engine like Godot uses more than 4 descriptor sets, and what might be a good balance. It seems like no GPU has 16, so I guess 8 would already suffice? Any significant advantage from increasing it to 32? Thanks!

@bruvzg
Copy link
Member Author

bruvzg commented Jan 7, 2021

I noticed at https://bruvzg.github.io/using-godot-with-swiftshader-software-vulkan-emulation.html that you had to increase SwiftShader's bound descriptor set limit to 16 to get it to work with Godot.

Godot's RenderingDeviceVulkan supports up to 16 descriptor sets, but 6 should be fine for the current version.

Edit: Actually it might work with 4 since godotengine/godot#44175 was merged.

@bruvzg
Copy link
Member Author

bruvzg commented Jan 7, 2021

Actually it might work with 4 since godotengine/godot#44175 was merged.

I have checked the current master of Godot, and it's working with a limit of 4 descriptor sets, so this change is not necessary anymore.

@fire
Copy link
Member

fire commented Jan 18, 2021

Did anyone try robotframework to provide visual tests.

Need to evaluate:

https://robotframework.org/#documentation

We can use robotframework and pick one of the available frameworks that support vulkan.

@fire
Copy link
Member

fire commented Jan 19, 2021

I have made a prototype using robotframework.

This sample does two things:

  • Execute godot on windows
    • Close project manager
  • Execute notepad
    • Type text
    • Close

https://github.com/fire/robotframework-godot

@fire
Copy link
Member

fire commented Jan 19, 2021

Added a video recording task.

Using vmaf we were able to get a score of 97.430362 for the same video and 65.083790 for different videos.

.\data\ffmpeg-N-100672-gf3f5ba0bf8-win64-lgpl-shared-vulkan\bin\ffmpeg.exe -i default_1.webm -pix_fmt yuv420p default_1.y4m
.\data\ffmpeg-N-100672-gf3f5ba0bf8-win64-lgpl-shared-vulkan\bin\ffmpeg.exe -i godot_1.webm -pix_fmt yuv420p godot_1.y4m
copy godot_1.y4m to godot_2.y4m
.\data\vmaf.exe --reference .\godot_1.y4m --distorted .\default_1.y4m
.\data\vmaf.exe --reference .\godot_1.y4m --distorted .\godot_2.y4m 

Not written a script for it yet, but also able to take a screenshot and run comparison stats and have a visual diff. Used a single executable built reg-cli.
image

@Calinou
Copy link
Member

Calinou commented Feb 23, 2022

I have a proof of concept that uses Nut.js here: https://github.com/Calinou/godot/tree/add-editor-ui-tests/misc/ui_tests

For the editor, I don't know what kind of "workflows" would be best to apply within the automated tests though. Creating a basic project automatically, running it then stopping it would be useful, but it wouldn't be testing a whole lot of functionality.

Also, I haven't figured out how to run it on a headless server (with Xvfb + Lavapipe/SwiftShader) yet.

@fire
Copy link
Member

fire commented Feb 23, 2022

I was using robot framework because it can run the editor using image recognition to find buttons and then execute the process under swiftshader.

@nikitalita worked on swiftshader cicd integration.

Edited:

I evaluated nut.js it doesn't seem to have support for everything. https://robotframework.org/#resources

@nikitalita
Copy link

My initial attempts at visual regression testing has revealed that output can vary wildly between video cards and even different driver versions. It's not really noticeable to the human eye, but a 1-to-1 comparison or even a fuzzy comparison >95% of frame captures will fail if the test environment isn't set up the exact same for the baseline and the subsequent tests (preferably the exact same machine). @myaaaaaaaaa have you encountered this?

@Calinou
Copy link
Member

Calinou commented Feb 14, 2023

My initial attempts at visual regression testing has revealed that output can vary wildly between video cards and even different driver versions. It's not really noticeable to the human eye, but a 1-to-1 comparison or even a fuzzy comparison >95% of frame captures will fail if the test environment isn't set up the exact same for the baseline and the subsequent tests (preferably the exact same machine). @myaaaaaaaaa have you encountered this?

See How (not) to test graphics algorithms. A dssim check should be able to work out decently if it has a large enough threshold, but in general, it's recommended to have a few "complete" test images over a lot of "partial" tests covering isolated features. This may be counter-intuitive, but it makes checking for regressions a lot less time-consuming. We should be careful about "alarm fatigue" in general when it comes to this kind of regression testing, as it's an easy trap to fall into.

@mariomadproductions
Copy link

I wonder if this would be useful for "whole game" tests. The developer would record the inputs, RNG seed and movie for a playthrough. The movie or perceptual hash of the movie would be stored, and then the inputs and RNG seed would be used to replay the movie and compare with the developer's playthrough. This could be useful to automatically test if a game still functions correctly when ported to another platform/godot version. A self-test option could also be included in published builds, for players to use. For the self-test, as the full thing might take too long for large and performance-heavy games, there could just be an option for a cut-down playthrough, or playthrough of a test level/test suite.

@Calinou
Copy link
Member

Calinou commented Apr 24, 2024

I wonder if this would be useful for "whole game" tests.

Godot's physics engines are not determinstic, so this wouldn't be useful unless your game doesn't rely on the physics engine at all (and uses its own deterministic physics implementation).

Subtle differences in rendering (due to different GPU hardware or driver version) can also be introduced, which would cause the hash to be ivnalid.

@mariomadproductions
Copy link

mariomadproductions commented Apr 24, 2024

Makes sense, regarding the physics.

For the differences in rendering, I think perceptual hashes are designed to allow leeway for small changes. And I'd think you'd want to detect large differences in a game when using different GPU/driver configurations?

But maybe this should be a separate discussion thread, actually.

@Calinou
Copy link
Member

Calinou commented Apr 24, 2024

For the differences in rendering, I think perceptual hashes are designed to allow leeway for small changes. And I'd think you'd want to detect large differences in a game when using different GPU/driver configurations?

Yes, tools like dssim can be used to calculate a similarity score between two images. Tweaking the value threshold is an art in itself though, and you need to record your videos using lossless compression which results in huge files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants