Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Video Super Resolution for Windows (AMD, Intel and NVIDIA) and MacOS #1180

Open
wants to merge 73 commits into
base: master
Choose a base branch
from

Conversation

linckosz
Copy link

@linckosz linckosz commented Feb 8, 2024

Hi,

Context:
Video Super Resolution (VSR) is to Video as DLSS is to 3D Rendering.
Why not let Moonlight being one of the first game streaming solution leveraging such technology?
AI upscaling means significantly less bandwidth usage without compromising the video quality!
NVIDIA, Intel (link in French), and more recently AMD start to advertise their respective technologies to enhance video quality using AI.

Solution:
VSR was not a straight forward implementation, I needed to add the component Video Processor to D3D11VA in order to offload the frame processing from the CPU to the GPU, and leveraging their additional GPU capabilities.
I added a UI checkbox in SettingsView.qml, but the main process logic has been done in d3d11va.cpp.
NVIDIA is providing VSR and HDR enhancement, I could implement VSR perfectly on SDR content, but could not yet HDR (more detail below).
Intel is providing VSR, it has been implemented, but yet to be tested on Arc GPU (I don't have it).
AMD just released AMF Video Upscaling, I prepared the code but need a RX 7000 series (I don't have it) and apparently it might be a quite different approach of implementation.

Testings:
The solution works stable on my rig, I did try different configuration (size, bandwidth, V-Sync, HDR, AV1, etc.) during few days.
AMD Ryzen 5600x
32GB DDR4 3200
RTX 4070 Ti
Moonlight v5.0.1
Sunshine v0.21.0

(Update May 6th, 2024)
A complete report is available at the comment below.
I could test it with a wider range of GPUs:

  • Nvidia RTX 4070 Ti (Windows)
  • AMD RX 7600 (Windows)
  • Intel Arc A380 (Windows)
  • Intel UHD Graphics (16EU), an iGPU from N95 CPU (Windows)
  • M1 Pro (MacOS)

(Update November 16th, 2024)
Other development (iOS)
For those who are also interested of VSR for iOS, I developped moonlight-ios-MFX too, but this is a WIP.
On an iPhone 13 Pro, the upscaler works quite well but too power hungry due to the use of Metal renderer (not the upscaler itself), so doesn't make yet worth it. Maybe with newest iPhone version, I didn't try.
I don't have an AppleTV, but it may also probably work modulo few tests and code fixes. I won't continue to maintain iOS version, so feel free to improve it.

Improvements:

  • (Minor bug) Using a RTX GPU, when HDR is activated, I needed to force the format of the SwapChain to DXGI_FORMAT_R8G8B8A8_UNORM instead of DXGI_FORMAT_R10G10B10A2_UNORM. Otherwise, if I use DXGI_FORMAT_R10G10B10A2_UNORM with VSR activated, the screen becomes a lot darker. I tried many ColorSpaces combinaisons.
  • (Minor bug) Using a RTX GPU, when HDR is activated, VSR adds a kind of extra white border to many components, like an over-sharpened picture. In comparison, SDR is fine.
  • (Medium bug) Using a RTX GPU, when I use the Window Mode, I can manually scale down randomly with no crash, but scaling up while keeping the ratio make the screen becoming black and oftenly crash Moonlight. To avoid the crashes, I just allow the picture to be stretch and bigger than the initial window size.
  • (Minor bug) Using a RTX GPU, I have also coded Nvidia HDR enhancement, I could activate it with some Color space setting (can see the detail in the comment of the method enableNvidiaHDR) but in such configuration the screen is always darker. Probably need to work on the Color space, format, and maybe from Sunshine side, to understand the behavior. So the feature is their, but the User cannot use it with current configuration.
  • (Test) I don't have an Intel GPU, I could only tried with a Intel N95 which has a iGPU, and Intel is support iGPU since CPU Gen 10th (Comet Lake). The code works, but I could barely see an improvement, apparently the best result is on Arc GPU series. I need someone to test it (comparison pictures like below).
  • (Improvement) AMD VSR still yet to be implemented, the documentation (Upscaling and Denoising) is very clear, it looks achievable at first sight. But it requires to have a RX 7000 series, which I don't have...

Results (comparison):
Resolution Test
Banding Test


Commits description

USER INTERFACE
Add a new UI feature called "Video AI-Enhancement" (VAE).

Changes made:

  1. Creation of a new class VideoEnhancement which check the liability to the feature.
  2. Add the checkbox "Video AI-Enhancement" in the "Basic Settings" groupbox.
  3. Disable VAE when fullscreen is selected
  4. Add a registery record
  5. On the Overlay and the mention "AI-Enhanced" when activated
  6. Add a command line for the class VideoEnhancement

BACKEND PROCESSING
Adding VideoProcessor to D3D11VA to offload video processing from CPU to GPU, and leveraging additional GPU capabilities such as AI enhancement for the upscaling and some filtering.

Changes made:

  1. VideoProcessor is used to render the frame only when "Video AI-Enhancement" is enabled; when disabled, the whole process is unchanged.
  2. Add methods to enable the Video Super Resolution for NVIDIA, and Intel. AMD method is currently empty, need to POC the solution with the AMF documentation.
  3. Add methods to enable SDR to HDR. Currently only NVIDIA has such feature, but the code place is prepared if Intel and AMD will too.
  4. Some existing variables local to a method (like BackBufferResource) changed to global scope to be consumed be also VideoProcessor methods.
  5. In ::initialize(), the application checks if the system is capable of leveraging GPU AI enhancement, if yes, it inform the UI to display the feature.
  6. ColorSpace setups (Source/Stream) for HDR are not optimal, further improvment might be possible. Issues observed are commented in the code at relevant places.

Changes made:
1. Creation of a new class VideoEnhancement which check the liability to
   the feature.
2. Add the checkbox "Video AI-Enhancement" in the "Basic Settings" groupbox.
3. Disable VAE when fullscreen is selected
4. Add a registery record
5. On the Overlay and the mention "AI-Enhanced" when activated
6. Add a command line for the class VideoEnhancement
Adding VideoProcessor to D3D11VA to offline video processing from CPU to GPU, and leveraging additional GPU capabilities such as AI enhancement for the upscaling and some filtering.

Changes made:
1. VideoProcessor is used to render the frame only when "Video
   AI-Enhancement" is enabled; when disabled, the whole process is unchanged.
2. Add methods to enable the Video Super Resolution for NVIDIA, and
   Intel. AMD method is currently empty, need to POC the solution with
   the AMF documentation.
3. Add methods to enable SDR to HDR. Currently only NVIDIA has such
   feature, but the code place is prepared if Intel and AMD will too.
4. Some existing variables local to a method (like BackBufferResource)
   changed to global scope to be consumed be also VideoProcessor methods.
5. In ::initialize(), the application checks if the system is capable of
   leveraging GPU AI enhancement, if yes, it inform the UI to display
   the feature.
6. ColorSpace setups (Source/Stream) for HDR are not optimal, further
   improvment might be possible. Issues observed are commented in the
   code at relevant places.
@linckosz linckosz changed the title Vsr Add Video Super Resulotion using NVIDIA and Intel GPUs Feb 8, 2024
@linckosz linckosz changed the title Add Video Super Resulotion using NVIDIA and Intel GPUs Add Video Super Resolution using NVIDIA and Intel GPUs Feb 8, 2024
app/gui/SettingsView.qml Outdated Show resolved Hide resolved
app/streaming/video/ffmpeg-renderers/d3d11va.cpp Outdated Show resolved Hide resolved
app/streaming/video/videoenhancement.cpp Outdated Show resolved Hide resolved
app/streaming/video/videoenhancement.cpp Outdated Show resolved Hide resolved
app/streaming/video/ffmpeg-renderers/d3d11va.cpp Outdated Show resolved Hide resolved
app/streaming/video/ffmpeg-renderers/d3d11va.cpp Outdated Show resolved Hide resolved
app/streaming/video/ffmpeg-renderers/d3d11va.cpp Outdated Show resolved Hide resolved
app/streaming/video/videoenhancement.h Outdated Show resolved Hide resolved
app/streaming/video/videoenhancement.cpp Outdated Show resolved Hide resolved
Changes made:
1. Creation of a new class VideoEnhancement which check the liability to
   the feature.
2. Add the checkbox "Video AI-Enhancement" in the "Basic Settings" groupbox.
3. Disable VAE when fullscreen is selected
4. Add a registery record
5. On the Overlay and the mention "AI-Enhanced" when activated
6. Add a command line for the class VideoEnhancement
Adding VideoProcessor to D3D11VA to offline video processing from CPU to GPU, and leveraging additional GPU capabilities such as AI enhancement for the upscaling and some filtering.

Changes made:
1. VideoProcessor is used to render the frame only when "Video
   AI-Enhancement" is enabled; when disabled, the whole process is unchanged.
2. Add methods to enable the Video Super Resolution for NVIDIA, and
   Intel. AMD method is currently empty, need to POC the solution with
   the AMF documentation.
3. Add methods to enable SDR to HDR. Currently only NVIDIA has such
   feature, but the code place is prepared if Intel and AMD will too.
4. Some existing variables local to a method (like BackBufferResource)
   changed to global scope to be consumed be also VideoProcessor methods.
5. In ::initialize(), the application checks if the system is capable of
   leveraging GPU AI enhancement, if yes, it inform the UI to display
   the feature.
6. ColorSpace setups (Source/Stream) for HDR are not optimal, further
   improvment might be possible. Issues observed are commented in the
   code at relevant places.
@cgutman
Copy link
Member

cgutman commented Feb 25, 2024

I'm going to introduce usage of ID3D11VideoProcessor for color conversion, so you can wait to make further changes until that new code is in to minimize conflicts or duplicated work.

m_IsHDRenabled was a duplication of the existing condition "m_DecoderParams.videoFormat & VIDEO_FORMAT_MASK_10BIT".

Replace the variable m_IsHDRenabled (2)

m_IsHDRenabled was a duplication of the existing condition "m_DecoderParams.videoFormat & VIDEO_FORMAT_MASK_10BIT".
Remove VideoEnhancement::getVideoDriverInfo() method (which was based of Window Registry)
and use the existing method CheckInterfaceSupport() from the Adapter.
@linckosz
Copy link
Author

I'm going to introduce usage of ID3D11VideoProcessor for color conversion, so you can wait to make further changes until that new code is in to minimize conflicts or duplicated work.

No worry, I keep going with current ID3D11VideoProcessor implementation and will do the change once your code is available.

- MaxCLL and MaxFALL at 0 as the source content is unknown in advance.
- Output ColorSpace matched SwapChain
…oProcessor"

"reset" was not used in latest code
 - NVIDIA: After updating NVIDIA driver to 551.61, VSR works in Exclusive Fullscreen (Tested on a RTX 4070 Ti)
 - Intel: VSR works in Exclusive Fullscreen (Test on a Arc a380)
 - AMD: VSR is WIP
@linckosz
Copy link
Author

linckosz commented Feb 27, 2024

I did many tests about Color Space with HDR on, as part of the final result I found something interesting that I wanted to share.

I have 2 Graphic cards on the same PC, a RTX 4070 Ti and a Arc a380.
I streamed another PC running on a CPU Intel N95.
In attachment are 2 HDR pictures (GPU Output.zip) coming from the same Moonlight session (H.265 decoding via the Arc) on the same display (Gigabyte m27q), same DP cable, but they still quite different. The reason is that one is displayed via the RTX (DP output), and the other is displayed via the Arc (DP output).
Arc a380: The picture is clear, the color close to be accuratly rendered, it is perfectly usable. Just minor artifacts around very contrasted areas like texts.
RTX 4070 Ti: The picture color is whashed out, and there is a significant banding effect. Unpleasant to use.

Conclusion:
HDR output is handled quite differently from one GPU to another one, and can lead to a poor quality picture.

…nt is On

- Simplification of the class VideoEnhancement as all properties will be set at D3D11va initialization
- Since it never change during a whole session, only scan all GPU once at the application launch and keep track of the most suitable adapter index with VideoEnhancement->m_AdapterIndex.
- Adapt setHDRoutput as the adapter might be different (not the one linked to the display).
- In case Video Enhancement is Off, we keep using the previous behavior (=using the adapter linked to the display).
- Update setHDRoutput in case of multiple displays, to make sure we get the HDR information of the display where Moonlight is displayed
During the scan, it is useless to enable enhancement capabilities for all GPU as it will be done later right after for only the selected GPU.
- [Minor] No need to set HDR Stream and Output if HDR is disabled in Moonlight UI
- [Minor] During the selection of most Video enhancement GPU, the best result was not saved resulting of selecting the last GPU scanned.
@linckosz
Copy link
Author

Hi @rygwdn ,
I fixed the memory leak, you can give it a try.

@rygwdn
Copy link

rygwdn commented Oct 3, 2024

It works perfectly! 👍 awesome work

@ody
Copy link

ody commented Nov 12, 2024

Got this built successfully and it looks to add ~1ms decode latency on my Mac Studio M2 Max when using it to just clean up a 1440p stream displayed a to 1440p monitor from a 1440p source. I've tried a couple other source and stream resolution combinations and the decode latency remains reasonable, all within 1ms of each other. I tried a 4k source with a 1440p stream and a 4k source and 4k stream, all three configurations displayed on the same 1440p monitor.

Can I please request one additional feature? Please add corresponding --video-ai-enhance and --no-video-ai-enhance command line options? I create shortcuts in Heroic Game Launcher for the games I run via Moonlight and have different options set for difference games. For example my default configuration is vsync enabled and 60 fps but when I stream Fortnite I have Heroic launch it with --no-vsync --fps 90.

@peerobo
Copy link

peerobo commented Nov 12, 2024

Do you think you can reduce rendering time on MacbookAir M1 ?

test_upscale

If I understand correctly:
{total_latency} = {host_latency_avg} + {network_latency} + {decoding_time} + {rendering_time_avg}

According to that:

  • Native latency: ~ 22ms
  • Upscale latency: ~26ms

If the rendering time can be reduced, it will be a totally game changer!

@ody
Copy link

ody commented Nov 12, 2024

@peerobo I neglected to look at the render time statistic last night when I posted my previous comment, I do experience an increase in render time but for me it was another ~1ms increase compared to my non-super resolution build, not the ~10ms increase you experienced. What are you doing while you test? The fps in your examples do not match but are floating around 30fps, is that what you normally run games at? I also see that you have YUV 4:4:4 enabled. I've never used that setting so enabled wondering if that was the issues. After enabling it I did see a momentary spike close to the ~13ms mark while upscaling 1080p to 1440p, it cleared after a couple seconds and I didn't see it again during the few minutes I was testing.

@ody
Copy link

ody commented Nov 12, 2024

This morning I am no longer confident in my previously report of ~1ms increase in decode latency, it looks to be equal in either case. It is honestly hard to eyeball, making me wish I could dump some time series stats. I suppose this might make sense, there should not be any additional decode latency because the stream must be entirely decoded before it can be enhanced.

The ~1ms increase on render time is obvious and stable though.

@linckosz
Copy link
Author

This morning I am no longer confident in my previously report of ~1ms increase in decode latency, it looks to be equal in either case. It is honestly hard to eyeball, making me wish I could dump some time series stats. I suppose this might make sense, there should not be any additional decode latency because the stream must be entirely decoded before it can be enhanced.

The ~1ms increase on render time is obvious and stable though.

@ody
Did you run your test using Software or Hardware encoding?
I just figured out that enhancement is not applied to Software mode, which makes sense as it uses GPU feature. I will update the settings menu to disable the feature when Software is selected.

@linckosz
Copy link
Author

Do you think you can reduce rendering time on MacbookAir M1 ?

test_upscale

If I understand correctly: {total_latency} = {host_latency_avg} + {network_latency} + {decoding_time} + {rendering_time_avg}

According to that:

  • Native latency: ~ 22ms
  • Upscale latency: ~26ms

If the rendering time can be reduced, it will be a totally game changer!

@peerobo
I do agree.
I've seen that you are using 4:4:4, it may require more compute as it doubles the image size, you can test without it.
On MacOS, metal is adding a bit of latency, I did test on a M1 Pro and M3 Pro, both are adding about 6ms for a 1080p picture.
I comparison, dedicated GPU like Arc, RTX and RX, are less than 1ms which are quite interesting results, but for Windows.
For Mac, there is probably some code optimization to do, can check the loops, condition, memory allocation which can impact significantly based on variables life-cycle. I did review it multiple time, I don't see what I can do more, feel free to give it a try.
Everything happen in the method "renderFrame", and enhancement code is in the condition where "m_VideoEnhancement->isVideoEnhancementEnabled()".
https://github.com/linckosz/moonlight-qt/blob/vsr/app/streaming/video/ffmpeg-renderers/vt_metal.mm#L516

@linckosz
Copy link
Author

Got this built successfully and it looks to add ~1ms decode latency on my Mac Studio M2 Max when using it to just clean up a 1440p stream displayed a to 1440p monitor from a 1440p source. I've tried a couple other source and stream resolution combinations and the decode latency remains reasonable, all within 1ms of each other. I tried a 4k source with a 1440p stream and a 4k source and 4k stream, all three configurations displayed on the same 1440p monitor.

Can I please request one additional feature? Please add corresponding --video-ai-enhance and --no-video-ai-enhance command line options? I create shortcuts in Heroic Game Launcher for the games I run via Moonlight and have different options set for difference games. For example my default configuration is vsync enabled and 60 fps but when I stream Fortnite I have Heroic launch it with --no-vsync --fps 90.

@ody
I already implemented "--video-enhancement", but didn't know how to test it, can you help me to try?
image

As only GPU (Hardware) acceleration is leveraging video enhancement feature, we disable the enhancement when Software decoding is selected
@ody
Copy link

ody commented Nov 13, 2024

Got this built successfully and it looks to add ~1ms decode latency on my Mac Studio M2 Max when using it to just clean up a 1440p stream displayed a to 1440p monitor from a 1440p source. I've tried a couple other source and stream resolution combinations and the decode latency remains reasonable, all within 1ms of each other. I tried a 4k source with a 1440p stream and a 4k source and 4k stream, all three configurations displayed on the same 1440p monitor.
Can I please request one additional feature? Please add corresponding --video-ai-enhance and --no-video-ai-enhance command line options? I create shortcuts in Heroic Game Launcher for the games I run via Moonlight and have different options set for difference games. For example my default configuration is vsync enabled and 60 fps but when I stream Fortnite I have Heroic launch it with --no-vsync --fps 90.

@ody I already implemented "--video-enhancement", but didn't know how to test it, can you help me to try? image

It think you just missed an additional spot where the option needs to be added. Add under line 369...

 parser.addToggleOption("video-enhancement", "video enhancement");

@ody
Copy link

ody commented Nov 13, 2024

This morning I am no longer confident in my previously report of ~1ms increase in decode latency, it looks to be equal in either case. It is honestly hard to eyeball, making me wish I could dump some time series stats. I suppose this might make sense, there should not be any additional decode latency because the stream must be entirely decoded before it can be enhanced.
The ~1ms increase on render time is obvious and stable though.

@ody Did you run your test using Software or Hardware encoding? I just figured out that enhancement is not applied to Software mode, which makes sense as it uses GPU feature. I will update the settings menu to disable the feature when Software is selected.

Hardware HEVC encoding and decoding. Source is a NVIDIA RTX 40470 SUPER, client is Mac Studio M2 Max

@linckosz
Copy link
Author

It think you just missed an additional spot where the option needs to be added. Add under line 369...

 parser.addToggleOption("video-enhancement", "video enhancement");

@ody , thanks for notifying the line, I did commit the update and rebuilt it. Can you help me to test again the command line?

@ody
Copy link

ody commented Nov 15, 2024

@linckosz Rebuilt and toggling the feature works now. Thank you.

@ody
Copy link

ody commented Nov 16, 2024

I have both a Mac Studio M2 Max and a MacBook Pro Intel I5-1038NG7 so I gave the PR a try on the Intel MacBook. It doesn't throw any errors and does do something to the image. It is definitely sharper, "enhanced" is subjective. The most obvious modification to the image was when upscaling 1280x800 to the MacBook's native 2560x1600. When upscaling that much it'll put a heavy load on the computer, fans really start spinning fast and it gets really warm. The file size of the screenshots are interesting, enhanced is 8MB while the standard one is 5.4MB.

I didn't play long though, no reason too. I primarily stream to the M2 Max or to my Steam Deck.

I did notice a lot of warnings during compilation about our target version being macOS 11 but MetalFX API only be available as of macOS 13. This line here is how I assume that version mismatch issue is being avoided.

Standard

screenshot-standard

Enhanced

screenshot-enhanced

@linckosz
Copy link
Author

@ody ,
I tried with a MacOS11 (via Virtualbox) but I get this message and no enhancement on my streaming.
image

Can you chek if it reaches these 2 lines ?

  1. It enables the feature: vt_metal.mm:846
  2. It upscales the picture with MetalFX: vt_metal.mm:765

@ody
Copy link

ody commented Nov 16, 2024

@ody , I tried with a MacOS11 (via Virtualbox) but I get this message and no enhancement on my streaming. image

Can you chek if it reaches these 2 lines ?

  1. It enables the feature: vt_metal.mm:846
  2. It upscales the picture with MetalFX: vt_metal.mm:765

The Intel test I did yesterday was on macOS Sequoia 15.1. Our Intel MacBook Pro was the very last Intel hardware refresh before they switched to their own M chips. I think it is working as Apple intended. It is cool it "works" on late generation Intel Macs. I do not have virtualization software installed on my Intel MacBook or hardware restricted to that version and do not see a good reason to test further since you've validated what I assumed was true by reading through the code.

@buldezir
Copy link

Hi, any chance you can merge or create merged version with M3, M4 AV1 hardware decoding #1125 ?

@ody
Copy link

ody commented Nov 17, 2024

Hi, any chance you can merge or create merged version with M3, M4 AV1 hardware decoding #1125 ?

@buldezir

Shouldn’t need to, according to #1125. They merged in the master branch today, the master branch includes AV1. If it isn’t working with AV1 then the that is something different.

This commit does nothing, it is just to restart a failed build operation.
@andre-ss6
Copy link

andre-ss6 commented Dec 21, 2024

Just tested this on a M3 Pro MBP client streaming from a Windows host, 1080p -> 4K 60fps HDR 10 AV1 with HW Decoding working great!

The only downside I found is that text becomes aliased, as if Clear Type is disabled, when using Super Resolution.

Avg performance stats:

image
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.