Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3DFX Voodoo emulation improvements #3115

Closed
2 tasks done
Kappa971 opened this issue Nov 13, 2023 · 25 comments · Fixed by #3935
Closed
2 tasks done

3DFX Voodoo emulation improvements #3115

Kappa971 opened this issue Nov 13, 2023 · 25 comments · Fixed by #3935
Labels
enhancement New feature or enhancement of existing features help wanted Community help wanted video Graphics and video related issues voodoo Issues related to the 3dfx Voodoo 1

Comments

@Kappa971
Copy link
Contributor

Are you using the latest Dosbox-Staging Version?

  • I have checked releases and am using the latest release.

Different version than latest?

No response

What Operating System are you using?

Windows 11

If Other OS, please describe

No response

Is your feature request related to a problem? Please describe.

Hi, I would like to list here some possible improvements for 3DFX Voodoo emulation in DOSBox Staging (for future releases):

  1. Optimized emulation for better speed (perhaps using GPU).
  2. Support Truecolor 32-bit to remove dithering (like NGlide for Windows games).
  3. Image scaling, for example making a 3D game with 3DFX support render in 1080p (like NGlide for Windows games).
  4. Widescreen Hack. This idea came to me while playing with a PS1 emulator. It's still pretty unstable there (for example Silent Hill crashes in some specific places) and I have no idea if something like this could be implemented in DOSBox as well. I consider it a niche feature and certainly less important than the others.

If I have any other ideas, I'll add it.
As already said these are just ideas, if a developer comes out and wants to implement them in the next few years, it will be a great thing. Thanks

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Add any other context or screenshots about the feature request here.

No response

Code of Conduct & Contributing Guidelines

  • Yes, I agree.
@Kappa971 Kappa971 added the enhancement New feature or enhancement of existing features label Nov 13, 2023
@kcgen kcgen added help wanted Community help wanted voodoo Issues related to the 3dfx Voodoo 1 labels Nov 14, 2023
@Torinde
Copy link
Contributor

Torinde commented Nov 14, 2023

For "1. Optimized emulation for better speed (perhaps using GPU)." - do you mean something like https://github.com/kjliew/qemu-3dfx or http://dosbox-x.com/wiki/Guide%3ASetting-up-3dfx-Voodoo-in-DOSBox%E2%80%90X#_high_level_emulation or something else?

@Kappa971
Copy link
Contributor Author

* Voodoo 2 SLI, so that 1024x768 gets supported

* Voodoo 3/4/5 (mostly relevant for Win9x)

Based on information given to me by @kcgen and the other developers, DOSBox Staging currently emulates a Voodoo 1 (4 mb) and a Voodoo 1 on steroids (12 mb), and it doesn't accurately emulate the performance of a Voodoo 1, but it depends on speed of the Host CPU (and that's a good thing).
Probably all 3dfx DOS games support Voodoo 1 (or only support Voodoo 1 when installing from the CDROM). Some like Carmageddon have a second patch for Voodoo 2, but I don't know if there are new graphical effects.
If Voodoo 2 support was added in these games only because of the better performance of Voodoo 2 compared to Voodoo 1 and not because they use specific features of Voodoo 2, perhaps it is useless to add Voodoo 2 (and higher) emulation in DOSBox Staging (I think).

For "1. Optimized emulation for better speed (perhaps using GPU)." - do you mean something like https://github.com/kjliew/qemu-3dfx or http://dosbox-x.com/wiki/Guide%3ASetting-up-3dfx-Voodoo-in-DOSBox%E2%80%90X#_high_level_emulation or something else?

Yes, something like that or whatever would improve performance.
I tried DOSBox-X SDL1 32 bit with Nglide but the performance wasn't good, someone else should try it too to see if I did something wrong (I don't notice any performance differences between the I7 860 I had years ago and the AMD Ryzen I have now, which is weird).

@Torinde
Copy link
Contributor

Torinde commented Dec 17, 2023

Related comment from @PoloniumRain:

Have you seen the Digital Foundry video on DOSBox-Pure? It was a nice surprise seeing this! They have DBP running on an Xbox Series X. There's some interesting benchmarks too, which replicate my experience on PC where using 3dfx acceleration is actually slower than software rendering, which is the opposite to real hardware of course. But it makes sense due to having to emulate the Voodoo 1 which is stressing the CPU more. With a Threadripper 3960X i can run Quake 2 at 1024x760 at 60fps in software rendering mode, but only 640x480 at 45 - 60fps with acceleration.

If you're interested i can post some benchmarks with AMD Zen 4/Ryzen 7000 or the Intel 13th gen Raptor Lake CPU's when these come out in a few months. Performance wise i'm expecting them to run 3D games from 2000 at very playable frame rates. The biggest problem will likely be the small 12MB VRAM of the Voodoo 1 at that point.

@johnnovak johnnovak added the video Graphics and video related issues label Mar 29, 2024
@Torinde
Copy link
Contributor

Torinde commented Apr 11, 2024

From #1671:

Carmageddon (GOG and Steam releases) - Glide patch and additional changes 00e70c1

Project card

3dfx (voodoo1)
With the current patches, voodoo=software allows scaling but voodoo=opengl doesnt - resulting in a tiny, but accelerated window.
Added by kcgen

@Grounded0
Copy link
Collaborator

Grounded0 commented Jun 6, 2024

@Torinde

Carmageddon now works perfectly with Voodoo. Can be disregarded.

@Torinde
Copy link
Contributor

Torinde commented Jun 6, 2024

Carmageddon now works perfectly with Voodoo. Can be disregarded.

OK, and what's the status/decision on voodoo=opengl? I assume that's described here, but still I'm not sure which is it:
a) Glide-to-Host OpenGL translation using substituted OVL/DLL and external host-side wrapper (I think it was decided not to implement that)
b) another rendering mode for the software emulation currently implemented (confusingly the other rendering mode reusing the label 'software'?)
c) something else

@johnnovak
Copy link
Member

johnnovak commented Jun 6, 2024

There's nothing confusing about it @Torinde. There is no OpenGL passthrough, no Glide, nothing like that. The Voodoo is emulated "in software" in Staging, just like the GUS is, the MT-32, the OPL, and so on. So the entire Voodoo hardware is emulated accurately at low-level, which then produces the frames entirely by using the host CPU, without any host GPU involvement.

Hope that makes it crystal clear. Also, forget about DOSBox-X, they do different things (e.g. support Glide). We do not support Glide wrappers, just authentic low-level emulation of the Voodoo hardware done 100% on the host CPU.

That's it.

@Torinde
Copy link
Contributor

Torinde commented Jun 7, 2024

OK, then this card should be moved to 'Rejected'?

@johnnovak
Copy link
Member

OK, then this card should be moved to 'Rejected'?

I've deleted it, we don't need it.

@Grounded0
Copy link
Collaborator

Grounded0 commented Aug 14, 2024

Relatively easy way to improve performance is to enable Voodoo emulation to run with multiple worker threads like DOSBox Pure does. Mostly useful for people running Win32 games in unofficial capacity but that basically increases speed in linear fashion with each additional worker thread.

Voodoo 5 6000 Prototype already did this in 2000:

How_NVIDIA_won_the_3D_race_20_years_ago_(极客湾Geekerwan)_48

@github-project-automation github-project-automation bot moved this to Suggested new features in Backlog Aug 29, 2024
@Torinde
Copy link
Contributor

Torinde commented Sep 15, 2024

way to improve performance is to enable Voodoo emulation to run with multiple worker threads

Here is the performance data for 4 to 23 threads

Effect depends both on host hardware and game used (~ 5-50%) and Pure decided to go with 7 threads.

#3115 (comment)

There's nothing confusing about it

The following is what confused me: #3040 (comment)

Please don't confuse the removed OpenGL backend in the Voodoo emulation patch with a pass-through to Glide wrappers, which is the request here.

@Grounded0
Copy link
Collaborator

Grounded0 commented Sep 15, 2024

We're not going to do any of that Glide wrapper stuff since its Windows-only and buggy. DOSBox Pure multiple worker threads model is the best way to squeeze more performance out in a cross-platform friendly way.

@interloper98
Copy link
Collaborator

interloper98 commented Sep 15, 2024

That's good data with actual FPS boosts across a bunch of games. Thanks for passing that on, @Torinde.

We do have the "voodoo_multithreading" setting, currently it's boolean.

Are we beholden to keep using 3 threads? (Maybe for the pi?)

We could make = yes be a bit smarter and use something like clamp(num-host-real-cores - 1, 1, 7)

I think SDL gives us some cross-platform CPU APIs to get this value.

Edit: https://wiki.libsdl.org/SDL2/SDL_GetCPUCount gives us logical cores, and we also have std::thread::hardware_concurrency, https://en.cppreference.com/w/cpp/thread/thread/hardware_concurrency

@johnnovak
Copy link
Member

johnnovak commented Sep 16, 2024

Are we beholden to keep using 3 threads? (Maybe for the pi?)

Yeah we hardcoded it to 3 for the Pi, and we thought there is no benefit to using more threads (but did not measure it AFAIK).

We could make = yes be a bit smarter and use something like clamp(num-host-real-cores - 1, 1, 7)

That's the perfect solution combined with SDL_GetCPUCount, go for it @interloper98 🚀
I'd say we should get rid of the voodoo_multithreading setting altogether as I see no reason for not enabling mulithreading when available. The less levers and more intelligent auto-config behaviour, the better! 👍🏻

Great find @Torinde, btw 🎖️

We should re-test the Voodoo emulation after the threading change as we've cleaned up the threading code significantly compared to DOSBox Pure, but we should in theory get similar performance improvements. So yeah, we need to conduct our own measurements to be sure.

@FeralChild64
Copy link
Collaborator

FeralChild64 commented Sep 16, 2024

How about using this algorithm as a default (auto setting), but allowing the user to specify the thread count manually (preferably changeable at runtime)? Rationale:

  • different games behave differently
  • we only have test results from the AMD Threadripper; these CPUs are strange little beasts, with multiple NUMA domains (depending on the model, some of the domains can consist of CPU cores only, without the RAM), and they are the most effective if all the threads processing the given set of data are executed on cores belonging to the domain which keeps the data; once some threads are executed inside other domain, the performance stop scaling that well
  • the current most common Intel Core CPUs (and ARM CPUs used in certain Raspberry Pi-like computers) are strange little beasts with Performance and Efficient CPU cores, with the Performance cores being roughly twice as fast; the Core i5 has 6 Performance cores - I can fully imagine a performance loss caused by the Performance cores needing to wait for the last two threads which landed on one of the Efficient cores
  • the AMD X3Ds CPUs, popular among gamers, are kind-of-hybrid too, as some cores have much more cache at the expense of their max frequency
  • thanks to all of above, writing the efficient scheduler is, well, like a rocket science; different OSes behave differently, different Linux distros behave differently (depending what they are optimized for), Linux can run a GameMode server which does various tweaks for the best gaming performance, Windows can run AMD drivers altering the scheduler behavior for maximizing Threadripper performance (I don’t know if they are still in use; long tome ago they were a must for the 4-chiplet Threadrippers), etc.

Let’s face it: we have no resources to determine the best possible thread count selection algorithm, IMHO we should provide a sane default (like the one proposed above), but allow the power users to determine the value which works the best for them.

@johnnovak
Copy link
Member

@FeralChild64 IMO, you're overthinking it; it seems the overall best performance is at thread count of 7. Surely performance may vary per level, or even per room in a game, or based on the number of enemies on the screen, etc. As long as long we're in the best performance bracket within +/- 10-15%, that's good enough (I doubt many people care about playing a game at 50 vs 55 FPS...)

So dunno, I don't think CPU or OS variations matter much; overall parallelism of 7 seems best. Or 6. Or 8. Something like that, it's not an exact science 😏

I'm happy to nuke the setting, but I also won't spend time arguing if you want to keep it, setting it to auto by default and allowing explicit numbers from 1 to 16 or something. I'm just not convinced it's worth exposing this for marginal tweakability, and we have enough settings already.

@FeralChild64
Copy link
Collaborator

It was just my opinion; I won’t be implementing the change, and I definitely won’t quarrel about it :)

@interloper98
Copy link
Collaborator

interloper98 commented Sep 16, 2024

I agree @FeralChild64; the concept of a cpu core is starting to lose all meaning ...

Intel Atom 15305 (Formerly known as RicketyTrail)

  • Power intense (P) cores: 0 (180W TDP/core, we're saving you power, bro!)
  • Energy efficient (E) cores: 1 (65W TDP/core)
  • Phony (Y) cores: 15 (0W TDP/core, for maximum power savings!)

I'm going to try to get the number of physical cores first (and ignore the PhonY cores and threads), and then only fall back to C++'s concurrency numbers.

@Torinde
Copy link
Contributor

Torinde commented Sep 16, 2024

Can somebody please share what "the removed OpenGL backend in the Voodoo emulation patch" is (since apparently it's not a pass-through to Glide wrappers) - was that a redundant render output path (Staging already uses OpenGL by default anyway)?

I think @interloper98 idea for auto/default is good.

I also agree with @FeralChild64 to modify the current user setting to be setting # of threads is also good - there are too many combinations (with more coming in the future) and while from the tests so far it seems 7 is good in the majority of cases (and doesn't hurt even weak CPUs with fewer cores, etc. - see further comments at the link, it's not only about Threadripper) - there are game/hardware combos where more than 7 helps.

(and leave the custom maximum # outrageously big - desktop chips have up to 32 threads, while exotic workstation/server models will go above 256... yes half of those are "phony-hyper-tiny"... so I suggest 256 limit - nice and round, well above what majority of users would be able to use mid term)

And what about even more involved algorithm? TLDR - adjust threads dynamically based on performance measurement

  1. measure FPS (over 1 second? or other set period duration)
  2. increase thread by 1 and measure FPS again
  3. if FPS improved, then increase thread by 1 and measure FPS again
  4. repeat last step until FPS isn't improved anymore and go back to 1 thread less
  5. continue measuring FPS and if it drops, then increase with 1 thread
  6. if FPS isn't improved, then decrease with 1 thread, else increase with 1 thread
  7. if FPS isn't improved go to step 5, else decrease with 1 thread
  8. ... you get the point

Feel free to shot this down - too complex, measuring FPS is cumbersome, too much work needed, etc.

@Grounded0
Copy link
Collaborator

Grounded0 commented Sep 16, 2024

Yeah I like the idea of auto OR number of threads on the same value field.

Auto can use SDL to figure out a good number and still leave us an option to dial it in manually so I don't have battery anxiety when I'm returning from the city on a train. Maybe I should upgrade to ARM64 finally... 😁

@johnnovak
Copy link
Member

johnnovak commented Sep 16, 2024

Ok... lots of effort to for those handful of Voodoo games. Do the auto thing, maybe allow 1 to 16 cores, job done, move on.

On the fly benchmarking is a bit ridiculous... Keep it simple, and the FPS improvements between 3 and more threads are rarely earth shattering. I don't even think you can vary the number of Voodoo threads after startup anyway.

When in doubt, always underengineer. I'd take the simple and good enough code any day (the 80-90% solution) vs the 100% solution that comes with an order of magnitude more complexity 😏 But I've been coding for 30+ years, I'm not after impressing anyone anymore, just get the job done with minimum fuss 😏

Keep the big picture in mind -- there are far more important things to do than improve 5 out of the 20 Voodoo games by 5 FPS... maybe.

@johnnovak
Copy link
Member

johnnovak commented Sep 16, 2024

I agree @FeralChild64; the concept of a cpu core is starting to lose all meaning ...

Dunno man. Multithreading helps, and we just want generic good enough solutions, not maximalist solutions with 5x the effort.

I'm a sweet spot, bang for the buck guy. Minimum change for maximum profit, and move on 😆

@johnnovak
Copy link
Member

Auto can use SDL to figure out a good number and still leave us an option to dial it in manually so I don't have battery anxiety when I'm returning from the city on a train. Maybe I should upgrade to ARM64 finally... 😁

Note that if you don't play a Voodoo game, the CPU cost is zero.

@Grounded0
Copy link
Collaborator

Note that if you don't play a Voodoo game, the CPU cost is zero.

EF 2000 happens to be one of my top 5 games on DOS so that's not a luxury I have.

@Grounded0 Grounded0 linked a pull request Sep 17, 2024 that will close this issue
11 tasks
@github-project-automation github-project-automation bot moved this from Suggested new features to Done in Backlog Sep 19, 2024
@Torinde
Copy link
Contributor

Torinde commented Sep 21, 2024

I don't even think you can vary the number of Voodoo threads after startup

Correct, I get this message:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or enhancement of existing features help wanted Community help wanted video Graphics and video related issues voodoo Issues related to the 3dfx Voodoo 1
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

7 participants