Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[3.x] Implement CameraServer on Windows #49763

Open
wants to merge 1 commit into
base: 3.x
Choose a base branch
from

Conversation

benjarmstrong
Copy link
Contributor

@benjarmstrong benjarmstrong commented Jun 20, 2021

This PR implements webcam support on Windows via the Media Foundation API and partially solves #46531.

Tested on both virtual and real cameras, compiled with both MSVC and MinGW/Linux. Supports hot-plugging and performs YUV to RGB conversion on the GPU (massive thanks to whoever wrote those shaders).

The actual code for acquiring/decoding/releasing streams has been tested thoroughly as I’ve been using it for a work project for a few weeks. The main issue is getting MinGW to compile and link with the Media Foundation libraries. I’ve got it to a point where it seems to build on MinGW W64 major version 5 and newer. The MinGW workarounds are as elegant as I could get them but they're still pretty ugly. I would appreciate any testing to find a MinGW version under which it breaks.

I’m aware that there are currently other pull requests and forks for this, however they all convert YUV to RGB on the CPU (can't be used for higher resolutions), have no hot-plugging support and require MSVC to compile.

I understand PRs should generally be for the master branch and cherry-picked, but I checked quite recently and a lot of the CameraServer implementation in master was missing due to 4.0 refactoring. If the CameraFeed methods stay the same in 4.0, this should be trivial to bring port to it.

Some things I’m not entirely sure about:

  • CameraFeed instances “prefer” to select a resolution around 4096 by 4096, but will ultimately use the nearest resolution supported by the device. I chose this value because there doesn’t appear to be a webcam that goes beyond that resolution. My implementation assumes we want the highest resolution up to (and around) 4K. Thoughts?
  • CameraServer gets a virtual update_feeds method that is called from OS_Windows for hot-plugging. This new virtual method means the camera module can still be disabled if the developer chooses. I’m not sure if this was the best way to implement this from a design perspective.
  • CameraFeed instances always have their position set to unspecified (not front or back). I assumed camera position is irrelevant on a desktop OS.
  • Supports Cameras with drivers that output in either RGB, RGBA, NV12 or YUY2. I'd like to know if anyone can find a camera that doesn't output to any of those formats.

@Calinou
Copy link
Member

Calinou commented Jun 20, 2021

I chose this value because there doesn’t appear to be a webcam that goes beyond that resolution. My implementation assumes we want the highest resolution up to (and around) 4K. Thoughts?

Webcams that go above 1080p are still not that common right now, so I think it's fine to limit the maximum supported resolution to 4K 🙂

@benjarmstrong benjarmstrong force-pushed the 3x_win_cam_implementation branch from 97f3f11 to 71d2ee8 Compare June 24, 2021 01:32
@benjarmstrong
Copy link
Contributor Author

Cleaned up a bit, fixed formatting

Copy link
Contributor

@Ansraer Ansraer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still impressed that you got this running on other compilers. I tried to write this myself a while ago and threw in the towel after a few days.

@benjarmstrong
Copy link
Contributor Author

I am still impressed that you got this running on other compilers.

I do have some experience with patchy Windows APIs in MinGW from #38210, but this was so much harder.

I tried to write this myself a while ago and threw in the towel after a few days.

To be honest if I knew how difficult this would be I probably wouldn't have done it. The Media Foundation functions and GUIDs implemented varies wildly across different MinGW versions, and in some rare cases symbols are incorrectly defined. In addition there are sometimes function prototypes without implementations (depending on the mingw version) which is why I resort to run-time linking for some functions.

I haven't fought a compiler that hard since the time I tried to learn rust.

@havi05
Copy link
Contributor

havi05 commented Jun 25, 2021

When I activate the camera in the editor it works, but in the game it is not there or a green rectangle. If I deactivate it in the editor and run

var cam = CameraServer.get_feed(0)
cam.feed_is_active = true

it works in the game.
Is there a way to have both enabled? Should I make a bug report?
Thank you for adding this feature for Windows.

@benjarmstrong
Copy link
Contributor Author

When I activate the camera in the editor it works, but in the game it is not there or a green rectangle. If I deactivate it in the editor and run

var cam = CameraServer.get_feed(0)
cam.feed_is_active = true

it works in the game.
Is there a way to have both enabled? Should I make a bug report?
Thank you for adding this feature for Windows.

@Gamemap
This seems to be a limitation of Media Foundation, where only one program seems to be able to acquire a single webcam device's feed at any given time. This device lock is system-wide across all enumerable media sources. For example when I try and start the 'Logitech Capture' program while Godot is using my webcam I get this:
logitech_capture_fail2

I'm guessing you have an active camera feed in the editor while you launch the game. If so then the game fails to access the webcam because the editor is already using it. This is why deactivating the feed in the editor allows your game to use the webcam.

This does potentially present a usability issue, as it seems reasonable that many people may have this issue while running scenes with active camera feeds in them.

The easiest solution would be to update the docs to note this behavior.

A possible slightly more elegant solution: Have the editor deactivate all active camera feeds when the game is launched from it, then resume them once the game stops running. This might be worth doing if other platforms are experiencing this issue.

Do we know if the macOS and iOS implementation exhibit this behavior?

@benjarmstrong
Copy link
Contributor Author

benjarmstrong commented Jun 25, 2021

@Gamemap
Regarding behaviour on macOS I haven't found anything official but forum posts like this seem to indicate macOS might have the same issue.
Edit: macOS does not have the same issue

Should I make a bug report?

I think so, especially if this behavior is on macOS and Windows (possibly Linux once it gets a CameraServer implementation).

A possible slightly more elegant solution: Have the editor deactivate all active camera feeds when the game is launched from it, then resume them once the game stops running. This might be worth doing if other platforms are experiencing this issue.

^ I think this would be the solution to the problem. It would fix it on all platforms exhibiting this issue.

@Calinou
Copy link
Member

Calinou commented Jun 25, 2021

IIRC, the green screen fallback is drawn by Godot. Maybe we could change this fallback texture to have some text like "No cameras available".

@MarioLiebisch
Copy link
Contributor

MarioLiebisch commented Aug 15, 2021

Stumbled over this before trying to implement something similar. So I toyed around with it a bit and wanted to note a few points:

  • I'm not too familiar with COM and Media Foundation, so I might be completely off here, but it feels odd to me, that you're calling CoInitializeEx() and MFStartup() everywhere rather than in one central constructor for MediaFoundationCapture. It should be enough to do it once for that thread, shouldn't it?

  • In a similar fashion I think you're missing the corresponding shutdown functions, UnInitialize() and MFShutdown() for each successful call of the startup function. To quote Microsoft Docs (1, 2:

    To close the COM library gracefully on a thread, each successful call to CoInitialize or CoInitializeEx, including any call that returns S_FALSE, must be balanced by a corresponding call to CoUninitialize.

    An application must call this function before using Media Foundation. Before your application quits, call MFShutdown once for every previous call to MFStartup.

  • Would be great, if this wouldn't just list/access "video" sources, but depth sensor sources, too (interpreting MFVideoFormat_D16 as grayscale or encoded in three colors?).

@motionmonster
Copy link

Thanks for doing this. It is a big help. I tested this last night, and it works, but the feed is pretty slow. Many cameras perform much faster at MJPG. If there were a way to work that in, it would help quite a bit.

@petermcgarvey1111
Copy link

petermcgarvey1111 commented Oct 19, 2021

Was this added to Godot 3.4?

I tried using printing CameraServer.get_feed[0] in 3.4 and it is empty, despite my laptop having a webcam.

@Calinou
Copy link
Member

Calinou commented Oct 20, 2021

Was this added to Godot 3.4?

This pull request wasn't merged yet, so no. Since 3.4 is in feature freeze with a RC expected soon, this pull request is more likely to be merged in 3.5 instead.

@akien-mga akien-mga modified the milestones: 3.4, 3.5 Nov 8, 2021
@akien-mga akien-mga force-pushed the 3.x branch 2 times, most recently from 71cb8d3 to c58391c Compare January 6, 2022 22:40
@benjarmstrong
Copy link
Contributor Author

benjarmstrong commented Jan 11, 2022

Stumbled over this before trying to implement something similar. So I toyed around with it a bit and wanted to note a few points:

  • I'm not too familiar with COM and Media Foundation, so I might be completely off here, but it feels odd to me, that you're calling CoInitializeEx() and MFStartup() everywhere rather than in one central constructor for MediaFoundationCapture. It should be enough to do it once for that thread, shouldn't it?

  • In a similar fashion I think you're missing the corresponding shutdown functions, UnInitialize() and MFShutdown() for each successful call of the startup function. To quote Microsoft Docs (1, 2:

    To close the COM library gracefully on a thread, each successful call to CoInitialize or CoInitializeEx, including any call that returns S_FALSE, must be balanced by a corresponding call to CoUninitialize.

    An application must call this function before using Media Foundation. Before your application quits, call MFShutdown once for every previous call to MFStartup.

  • Would be great, if this wouldn't just list/access "video" sources, but depth sensor sources, too (interpreting MFVideoFormat_D16 as grayscale or encoded in three colors?).

Thanks, I'll be sure to make these changes and update the PR

edit: I probably won't get around to doing depth sensors. Maybe in a future commit

@bnolan
Copy link

bnolan commented Feb 13, 2022

Heya, can we contribute or sponsor development to get this PR merged into 3.5? We'd love to be able to use it.

@Calinou
Copy link
Member

Calinou commented Feb 13, 2022

Heya, can we contribute or sponsor development to get this PR merged into 3.5? We'd love to be able to use it.

It's a similar situation as #47967 (comment), except this PR targets 3.x instead of master. A master version of this pull request needs to be opened and merged first before this can be merged in 3.x.
If you want to help get this merged, you can test it locally on your own device and make sure it works as expected.

Still, nothing prevents you from creating custom editor and export template binaries with this PR included 🙂

As for sponsoring development, see "Commercial support" on the Contact page. This is intended to be used for larger features though, and not finishing up a PR that is already mostly complete.

@benjarmstrong
Copy link
Contributor Author

@bnolan
I'd love to see this in 3.5, but this really should go into master first if/when possible.

I understand PRs should generally be for the master branch and cherry-picked, but I checked quite recently and a lot of the CameraServer implementation in master was missing due to 4.0 refactoring. If the CameraFeed methods stay the same in 4.0, this should be trivial to bring port to it.

The master branch has improved a lot since I wrote this. If/when it's ready I'll bring it forward to master branch at some point (along with changes suggested by people here) then update this PR to 3.X

Still, nothing prevents you from creating custom editor and export template binaries with this PR included

If you need it right now then this is the way to go (it's what our team is doing)

If you really need it in a more up-to-date 3.X version and are willing to experiment you could copy-paste the modules/camera folder from this PR into a modern 3.x version and see if it compiles (I know for sure that hot-plugging won't work).

public:
CameraWindows();
~CameraWindows();

void update_feeds();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be marked as override, I think.

@TechnoPorg
Copy link
Contributor

Are you still interested in working on this PR? If not, I'm willing to port it to master, as I need it for a project that I'm working on.

@RedMser
Copy link
Contributor

RedMser commented May 2, 2022

@TechnoPorg I've got a working version for master on my fork. It probably is a better basis than this PR.

@TechnoPorg
Copy link
Contributor

Great, thanks!

@akien-mga akien-mga modified the milestones: 3.5, 3.x Jul 2, 2022
@ghost
Copy link

ghost commented Jul 4, 2022

Currently testing this with the latest 3.x changes.

Tested on both virtual and real cameras

I'm not sure that virtual cameras are working anymore. I have tested with OBS and NVIDIA Broadcast; in both cases only the real camera is being detected.

Is there anything I need to do to get virtual cameras to show up?

@Calinou
Copy link
Member

Calinou commented Jan 22, 2024

@TechnoPorg I've got a working version for master on my fork. It probably is a better basis than this PR.

@RedMser I've rebased your branch against master 0bcc0e9 and got it to build, but it seems it doesn't work with https://github.com/pkowal1982/cameratest (which I've confirmed works on Linux with #53666).

When I select the webcam in the list of webcams, I get this script error:

image

The binary was built using MSVC 2022 on Windows 11.

Edit: This is because this PR doesn't extend the interface like #53666 did. I don't know how I'm supposed to use the webcam stream then... I've tried using a TextureRect + CameraTexture setup, but changing the camera feed ID to 1 (to match my webcam1) and trying various Which Feed values only led me to a pink or white texture with this error spammed in the console:

servers\rendering\renderer_rd\storage_rd\texture_storage.cpp:1112 - Condition "p_image.is_null() || p_image->is_empty()" is true.

Footnotes

  1. ID 1 is the only one where I can tick Active and not have the checkbox instantly revert itself to being unchecked.

@RedMser
Copy link
Contributor

RedMser commented Jan 22, 2024

I've tried using a TextureRect + CameraTexture setup, but changing the feed ID does nothing and ticking the Active checkbox in CameraTexture will immediately revert it to be inactive.

@Calinou I don't think the changes in my fork are complete (it's been a while, so I forgot how much of it is just a port of this PR).

My test project did something like this (note I don't have it set up right now, so I can't guarantee this to work as-is):

A scene with multiple texture rects. Following code files:

# root.gd
## run e.g. when changing a value on a SpinBox. use CameraServer.get_feed_count() to check if out of bounds
func load_feed(value: int):
	var feed := CameraServer.get_feed(value)
	if feed == null:
		$HBoxContainer/TextureRect.texture = null
		$HBoxContainer/TextureRect2.texture = null
		return
	for f in CameraServer.feeds():
		f.feed_is_active = false
	feed.feed_is_active = true
	var ctY := CameraTexture.new()
	ctY.camera_feed_id = feed.get_id()
	ctY.which_feed = CameraServer.FEED_Y_IMAGE
	$HBoxContainer/TextureRect.texture = ctY
	var ctB := CameraTexture.new()
	ctB.camera_feed_id = feed.get_id()
	ctB.which_feed = CameraServer.FEED_CBCR_IMAGE
	$HBoxContainer/TextureRect2.texture = ctB
	
	$HBoxContainer/TextureRect3.material.set_shader_param("tex_Y", ctY)
	$HBoxContainer/TextureRect3.material.set_shader_param("tex_CbCr", ctB)
// root.gdshader on TextureRect3
shader_type canvas_item;
uniform sampler2D tex_Y;
uniform sampler2D tex_CbCr;

void fragment() {
	COLOR.r = texture(tex_Y, UV).r;
	COLOR.gb = texture(tex_CbCr, UV).rg - vec2(0.5, 0.5);
	COLOR.a = 1.0;
	
	// Using BT.601, which is the standard for SDTV is provided as a reference
	COLOR.rgb = mat3(
			vec3(1.00000, 1.00000, 1.00000),
			vec3(0.00000, -0.34413, 1.77200),
			vec3(1.40200, -0.71414, 0.00000)) *
	COLOR.rgb;
}

So likely the interface needs to be extended, and some of this math should be done by the C++ code and not by the user in shaders.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.