Fast Render Target Rendering in Unreal Engine 4

June 29, 2020

Benchmark and Profiling
How Canvas Works
Unfortunately Canvas Doesnt Work
Rendering to Texture with Slate
Conclusion
Bonus

I spent quite a bit of time working with Render Targets in Unreal Engine 4. Mostly to do some effects for my UI. One of the main example is the use of a Jump Flood algorithm to render a Distance Field.

For convenience and because this is often what is recommended by Epic Games in their tutorials, I used Canvas based render targets to render materials (shaders) into textures and I do a lot of renders within one frame. The jump flood for example end-up rendering 10 times (via two textures). Add after that Bloom passes and other type of effects/filtering and you can quickly see that good performances are critical.

The shaders I render in those textures are relatively simple (even the Jump Flood). What ended-up being costly however was the switch between the textures. Instead of rendering once multiple things, you render multiple times one thing. When you do that using Canvas, problems start to appear.

This article was written based on Unreal Engine version 4.23/4.25.

Benchmark and Profiling

Before going further into details, let first take a look at the performances themselves and how each method compare. The test scenario is relatively simple: I have a material that displays a texture. The goal is to render this material in a render target. To make the problem more obvious I created 50 render targets and render them all within one frame.

Rendering configuration:

Nvidia GTX 980 Ti (drivers 446.14)
Empty scene with only a floor and fog plus one directional light
No UI (Canvas or Slate)
Resolution of 1920x1080
9999 FPS limit

Render target configuration:

Resolution of 256x256
RGBA with 16f as bit depth

Here are the results (averages):

	Base Rendering	Base + Canvas	Base + Slate
Frame	4.2ms	16.2ms	5.5ms
Game	1.6ms	1.95ms	1.8ms
Draw	3.3ms	15.1ms	5.1ms
GPU	4.2ms	16.2ms	5.5ms

In essence:

Rendering with Slate takes around 1.3 ms.
Rendering with Canvas takes around 12 ms.

The performance difference between the two is gigantic ! Canvas is 10 times longer than what Slate takes to do the same work. 10 ms in your rendering budget is colossal, especially on an empty scene. As a reminder: 60 frames per second is a budget of 16 ms.

To go further I fired up Nvidia Insight and RenderDoc (two software that can be used to do GPU profiling). It allowed me to see when and how the render targets were updated by the engine.

In this case, the render target update happens at the beginning of the frame during the world tick. This is what we can see in RenderDoc for example with Canvas based render targets:

DrawIndexed() here renders the quad with the material into the render target. If we look into the detailed callstack we can see the following:

What does all of that mean ? This is where I'm not perfectly sure of myself, but it seams to resolves around a few keys points:

Pop/Push Debug Region: Those are events that help group things when using a GPU profiler like RenderDoc.
Set Viewport/Scissors: These function calls setup the rendering area of the Render Target (in this case 256x256).
Update Subresource: Copy by the CPU of the render target data to be accessible in memory by the GPU.
Map/Unmap: These functions call create sync points between the CPU and the GPU.
Set Buffers: These functions setup the buffers (vertices position, UVs, variable values, etc) for the geometry used to draw our image into the texture.
Set Shader Resources: Connect the various resources (like the source image) to make it accessible to the Shader for drawing.
Draw Indexed: Draw the given geometry, non-instanced.

From what I was able to discuss around, in itself those function calls are not specifically heavy or slow. Maybe the multiple Map/Unmap are, because they make the CPU and GPU wait each other, leading to some stall and therefore long rendering times (even if the actual drawing is fast). However, if we compare with what happen when using Slate to render into a texture, we get a different story:

Once again, we look at the API calls:

There is a clear difference here. Slate performs a lot less API calls, and most importantly we don't see the shader setup and all the rest happening every frame. The only similar part is the viewport and scissor setup which once again are just saying which part of the render texture is gonna be drawn.

So why does Canvas do all of this while Slate doesn't ?

How Canvas Works

So the issue above quickly raised a suspicion: it seems that Canvas doesn't cache anything. Meaning that the setup used to drawn and update the render targets is recreated each frame and this is a waste of time and resources. I'm not even talking about sharing resources across Render Targets, but that one texture should reuse the same geometry/shader setup to render in the next frame.

I decided to look into how UCanvas rendering worked to investigate the issue. Here is a little schematic to summarize the process:

UCanvas is a UObject which can be easily created and managed by any game object. The class contains various properties, mostly related to the viewport/area we want to draw something into. It also contains a few helper functions to draw things, like DrawMaterial(). FCanvas is the actual drawing implementation and UCanvas contains a reference to it.

So what does UCanvas actually do ? When a function such as DrawMaterial() is called, it actually creates something called a CanvasItem. This item contains the references and properties of the element we want to draw (size, position, material, texture, etc). The function then call the draw function of FCanvas.

When FCanvas receive the call to draw the item, it actually calls the item internal Draw() function with itself as reference. The reason is because CanvasItems implement themselves the setup used to render the item depending on the resource type: textures, shaders, but also more advanced patterns like tiled boxes and so on. This is done by calling specific geometry building functions.

The building functions are relatively normal: they create vertices and triangles/polygons to render the resources. What is curious however, is that once the building is done, they ask FCanvas to give a reference to something called a FBatchedElements and then store the geometry into it.

Then when the flush function is called, either via the canvas item when drawing in immediate mode or by the parent object, the geometry is drawn into the texture by the GPU.

That's globally how Canvas render into textures.

Unfortunately Canvas Doesn't Work

Wait a minute: FCanvas is doing batching ? The geometry is cached ? So why does it looks like it is not working ? This is where is starts to look strange: it seems that by design nothing is preserved.

The first detail that confirms that, is how the Flush_GameThread() function works. When it iterates over the batched elements and draw them, it actually delete them when it is done:

// iterate over the FCanvasSortElements in sorted order and render all the batched items for each entry
for( int32 Idx=0; Idx < SortedElements.Num(); Idx++ )
{
    FCanvasSortElement& SortElement = SortedElements[Idx];
    for( int32 BatchIdx=0; BatchIdx < SortElement.RenderBatchArray.Num(); BatchIdx++ )
    {
        FCanvasBaseRenderItem* RenderItem = SortElement.RenderBatchArray[BatchIdx];
        if( RenderItem )
        {
            // mark current render target as dirty since we are drawing to it
            bRenderTargetDirty |= RenderItem->Render_GameThread(this, RenderThreadScope);
            if( AllowedModes & Allow_DeleteOnRender )
            {
                delete RenderItem;
            }
        }
    }
    if( AllowedModes & Allow_DeleteOnRender )
    {
        SortElement.RenderBatchArray.Empty();
    }
}

As you can see, the check to know if the RenderItem can be deleted is mainly based on Allow_DeleteOnRender, which is an enum value:

/**
 * Enum for canvas features that are allowed
 **/
enum ECanvasAllowModes
{
    // flushing and rendering
    Allow_Flush = 1 << 0,
    // delete the render batches when rendering
    Allow_DeleteOnRender = 1 << 1
};

This enum value is set and stored into a variable named AllowedModes which is initialized during the FCanvas constructor to 0xFFFFFFFF. This means that when the FCanvas is created the "delete on render" mode is enabled by default.

So FCanvas is doing caching but then cleanup everything after drawing.

Fortunately, there is a function to change the "allowed mode", which means it should be possible to preserve the item cache. Unfortunately, nobody in the engine is calling it.

But things are even worse than that: FCanvas is recreated all the time. For example if we take a look at how the DrawMaterialToRenderTarget() blueprint function works, we can see this (simplified):

void UKismetRenderingLibrary::DrawMaterialToRenderTarget(
    UObject* WorldContextObject, 
    UTextureRenderTarget2D* TextureRenderTarget, 
    UMaterialInterface* Material )
{
    UWorld* World = GEngine->GetWorldFromContextObject(WorldContextObject,

    if (!World)
    {
        // Returns an warning
    }
    else // Draw Render Target
    {
        // Reference to the Render Target resource
        FTextureRenderTargetResource* RenderTargetResource = TextureRenderTarget->GameThread_GetRenderTargetResource();

        // Retrieve a UCanvas form the world to avoid creating a new one each time
        UCanvas* Canvas = World->GetCanvasForDrawMaterialToRenderTarget();

        // Creates a new FCanvas for rendering
        FCanvas RenderCanvas(
            RenderTargetResource,
            nullptr, 
            World,
            World->FeatureLevel);

        // Setup the canvas with the FCanvas reference
        Canvas->Init(TextureRenderTarget->SizeX, TextureRenderTarget->SizeY, nullptr, &RenderCanvas);
        Canvas->Update();

        // Create the CanvasItem with the material to render
        Canvas->K2_DrawMaterial(Material, FVector2D(0, 0), FVector2D(TextureRenderTarget->SizeX, TextureRenderTarget->SizeY), FVector2D(0, 0));

        // Performe the drawing
        RenderCanvas.Flush_GameThread();

        // Cleanup the FCanvas reference, to delete it
        Canvas->Canvas = NULL;
    }
}

So each time you call that function, it creates a new FCanvas. You can therefor imagine how well performances go. To be honest however, Epic Games mentions in their documentation to use Begin/EndDrawCanvasToRenderTarget() instead when doing multiple operations that draw into the texture. Likely because the FCanvas is created in Begin() and only destroyed when End() is called. This wouldn't help in our situation however, since we draw once in the texture per frame.

What if instead of using the Blueprint functions, we use a UCanvasRenderTarget2D directly ?

UCanvasRenderTarget2D contains itself a Canvas, so it's promising. It has an update function named ReceiveUpdate() that can be overridden in blueprint/c++ which makes it easy to draw custom stuff. This function is called by an other internal function named RepaintCanvas(), which goes like this:

void UCanvasRenderTarget2D::RepaintCanvas()
{
    // Create or find the canvas object to use to render onto the texture.  Multiple canvas render target textures can share the same canvas.
    static const FName CanvasName(TEXT("CanvasRenderTarget2DCanvas"));
    UCanvas* Canvas = (UCanvas*)StaticFindObjectFast(UCanvas::StaticClass(), GetTransientPackage(), CanvasName);
    if (Canvas == nullptr)
    {
        Canvas = NewObject<UCanvas>(GetTransientPackage(), CanvasName);
        Canvas->AddToRoot();
    }

    // Create the FCanvas which does the actual rendering.
    const UWorld* WorldPtr = World.Get();
    const ERHIFeatureLevel::Type FeatureLevel = WorldPtr != nullptr ? World->FeatureLevel.GetValue() : GMaxRHIFeatureLevel;

    // NOTE: This texture may be null when this is invoked through blueprint from a cmdlet or server.
    FTextureRenderTarget2DResource* TextureRenderTarget = (FTextureRenderTarget2DResource*) GameThread_GetRenderTargetResource();

    FCanvas RenderCanvas(TextureRenderTarget, nullptr, FApp::GetCurrentTime() - GStartTime, FApp::GetDeltaTime(), FApp::GetCurrentTime() - GStartTime, FeatureLevel);
    Canvas->Init(GetSurfaceWidth(), GetSurfaceHeight(), nullptr, &RenderCanvas);

    [...]
}

The RepaintCanvas() function retrieves a UCanvas as well, to avoid creating a new one and then... creates a FCanvas from scratch.

Sadly, UCanvasRenderTarget2D doesn't do any caching as well. An alternative solution could be to make a child class and reimplementing the repaint function to do that caching.

What about the HUD class ? Used via the game framework, the HUD class use a Canvas to draw anything on screen. Does it trash the FCanvas as well when it is done ? The answer is yes. The setup is a bit more convoluted, but basically the HUD is updated via the GameviewportClient which contains a FViewport. It creates an FCanvas each time the draw function is called (both for normal and debug drawing). This means debug printing on screen a lot of information leads to bad performances as well.

Rendering to Texture with Slate

If like me you prefer to avoid hacking the engine to migrate more easily to newer engine versions, you are likely looking for an alternative solution. It turns out that Slate can be used to render into textures natively.

Unreal Engine 4 supports what is called a WidgetComponent. This is a type of component used to draw Widgets/UI inside the game world and not on screen. They work by rendering the Slate widget into a texture and then displaying it on a mesh.

I took inspiration from it and then I wrote my own Render Target class. It's basically a component that can be spawned and attached to an actor (to make it easy to manage) which takes care of creating the Slate context (a virtual window) and a render target texture. It just needs to be fed a UMG/Slate Wigdet. The Slate virtual window allows to keep the widget alive in case it needs to be updated (like regular UI) and helps caching it.

Here is a simplified version of my class, which covers the most important points:

ScriptedTexture.h

UCLASS()
class EXEDRE_API UExedreScriptedTexture : public USceneComponent
{
    GENERATED_UCLASS_BODY()

    public:
        virtual void Init();

        void Render( float DeltaTime = 0.0f );

        void Resize( FIntPoint& NewSize );

        virtual void BeginPlay() override;

    protected:
        virtual void OnUnregister() override;

    private:
        // The cached window containing the rendering widget
        TSharedPtr<SVirtualWindow>  SlateWindow;
        TSharedPtr<FHittestGrid>    SlateGrid;
        FGeometry SlateGeometry;

        void UpdateSlateWindow();

        UPROPERTY(transient)
        UTextureRenderTarget2D* ScriptedTexture;

        UPROPERTY(transient)
        UUserWidget* RenderingWidget;

        FWidgetRenderer* Renderer;
};

ScriptedTexture.cpp

// Constructor
UExedreScriptedTexture::UExedreScriptedTexture(const FObjectInitializer& ObjectInitializer)
    : Super(ObjectInitializer)
{
    ScriptedTexture = nullptr;
}

// Begin play, setup of the Slate virtual window
void UExedreScriptedTexture::BeginPlay()
{
    Super::BeginPlay();

    if( FSlateApplication::IsInitialized() )
    {
        SlateWindow = SNew(SVirtualWindow).Size( FVector2D(256.0,256.0) );
        SlateGrid   = MakeShareable( new FHittestGrid() );
    }

    check( SlateWindow.IsValid() );
}


// Cleanup any Slate references when the component is being destroyed
void UExedreScriptedTexture::OnUnregister()
{
    Super::OnUnregister();

    if( SlateGrid.IsValid() )
    {
        SlateGrid.Reset();
    }

    if ( SlateWindow.IsValid() )
    {
        if( FSlateApplication::IsInitialized() )
        {
            FSlateApplication::Get().UnregisterVirtualWindow( SlateWindow.ToSharedRef() );
        }

        SlateWindow.Reset();
    }

    ScriptedTexture = nullptr;
    RenderingWidget = nullptr;
}


// Create the Render Target resource and the User Widget for rendering
void UExedreScriptedTexture::Init()
{
    // Create widget to render into RTT
    // Load a class from a blueprint object,
    // Don't forget to add "_C" at the end to get the class
    FString Path = "WidgetBlueprint'/Game/UI/UMG_RenderMaterial.UMG_RenderMaterial_C'";
    TSubclassOf<UUserWidget> ClassWidget = LoadClass<UUserWidget>(nullptr, *Path);

    RenderingWidget = CreateWidget<UUserWidget>( GetWorld(), ClassWidget );

    // Create render target resource
    FString Name = GetName() + "_ScriptTxt";
    ScriptedTexture = NewObject<UTextureRenderTarget2D>(this, UTextureRenderTarget2D::StaticClass(), *Name);
    check( ScriptedTexture );

    ScriptedTexture->RenderTargetFormat = ETextureRenderTargetFormat::RTF_RGBA8;
    ScriptedTexture->SizeX      = 256;
    ScriptedTexture->SizeY      = 256;
    ScriptedTexture->ClearColor = FLinearColor::Transparent;

    ScriptedTexture->UpdateResource();

    // Slate setup
    Renderer = new FWidgetRenderer(false, true); //bool bUseGammaCorrection, bool bInClearTarget

    if( FSlateApplication::IsInitialized() )
    {
        FSlateApplication::Get().RegisterVirtualWindow( SlateWindow.ToSharedRef() );
    }

    UpdateSlateWindow();
}


// Setup the Slate window with the widget
void UExedreScriptedTexture::UpdateSlateWindow()
{
    SlateWindow->SetContent( RenderingWidget->TakeWidget() );
    SlateWindow->Resize( 256, 256 );
    SlateGeometry = FGeometry::MakeRoot( FVector2D( 256, 256 ), FSlateLayoutTransform(1.0f));
}


// Render/Draw the texture
void UExedreScriptedTexture::Render( float DeltaTime )
{
    // Use the FWidgetRenderer to Draw the Slate 
    // window and its widget into the texture.
    // Replace:
    //    SlateGrid.ToSharedRef() 
    // by:
    //    *SlateGrid.Get()
    // if you compile with UE4 4.25
    Renderer->DrawWindow(
        ScriptedTexture->GameThread_GetRenderTargetResource(),  // FRenderTarget* RenderTarget
        SlateGrid.ToSharedRef(),                                // TSharedRef<FHittestGrid> HitTestGrid
        SlateWindow.ToSharedRef(),                              // TSharedRef<SWindow> Window
        SlateGeometry,                                          // FGeometry WindowGeometry
        SlateGeometry.GetLayoutBoundingRect(),                  // FSlateRect WindowClipRect
        DeltaTime,                                              // float DeltaTime
        false                                                   // bool bDeferRenderTargetUpdate
    );

    // Generate the MipMaps if needed
    // ScriptedTexture->UpdateResourceImmediate( false );
}


// Resize the render target and update the Slate window
// Note: the UpdateSlateWindow() use an hardcoded size
// so be sure to adjust the code to pass the right size
// to the window as well.
void UExedreScriptedTexture::Resize( FIntPoint& NewSize )
{
    if( ScriptedTexture != nullptr )
    {
        // Resizes the render target without recreating 
        // the FTextureResource. It might crash if you are 
        // using MipMaps because of an engine bug, in that 
        // case use UpdateResource() instead.
        // This issue should be fixed with UE4 4.26.
        ScriptedTexture->ResizeTarget( NewSize.X, NewSize.Y );

        // Recreate the Slate window used for rendering (since the size changed)
        UpdateSlateWindow();
    }
}

Now to use the class, the code below should be relatively simple to understand:

// Create the texture (in an actor) begin play, tick, etc
UExedreScriptedTexture* Texture =
    NewObject<UExedreScriptedTexture>(this, UExedreScriptedTexture::StaticClass());

Texture->AttachToComponent( 
    GetRootComponent(), 
    FAttachmentTransformRules::SnapToTargetIncludingScale 
);
Texture->RegisterComponent();
Texture->Init();

// Render the Texture
// You can also provide a Deltatime in case your widget needs to Tick
// Should be called each time you want to draw the render target
Texture->Render();

One important thing to note: FWidgetRenderer can be used only once per frame to render a Widget. If you need to update a render target multiple times per frame (like a Jump Flood which do a ping-pong), you will need multiple widget renderers. In my case I used a pool managed by my game instance and each render target request one at render time. This way they are created on the fly when needed and re-used next frames.

Nick Darnell on Twitter

Conclusion

I hope all of that provide a clearer vision of how UCanvas works and why is should be avoided (or fixed ?). In my case Slate is a very good alternative because it allows to render both my UI and other effects with the same render target system.

If you want to take a look at the engine code, here are the files with all the information:

UCanvas: Engine\Source\Runtime\Engine\Classes\Engine\Canvas.h
FCanvas: Engine\Source\Runtime\Engine\Public\CanvasTypes.h
CanvasItem: Engine\Source\Runtime\Engine\Public\CanvasItem.h

Many thanks to Newin, Nick Darnell, Chris Murphy and some of my colleagues for the help on this subject.

Bonus

If like me you prefer to avoid Blueprints when possible, you might be wondering if it is possible to build a widget in C++ without using UMG. For example, to achieve the same as what DrawMaterialToRenderTarget() do.

Well, it is possible ! I built a new class inherited from UserWidget that just does that.

Note: in order to render materials via a custom Slate widget, make sure the material domain is set to User Interface and not something else. Otherwise the Widget might not draw anything.

ExedreWidgetRenderTarget.h

#pragma once

#include "CoreMinimal.h"
#include "Blueprint/UserWidget.h"
#include "ExedreWidgetRenderTarget.generated.h"

UCLASS()
class EXEDRE_API UExedreWidgetRenderTarget : public UUserWidget
{
    GENERATED_UCLASS_BODY()

    public:
        void SetRenderMaterial( UMaterialInterface* Material );

        virtual void ReleaseSlateResources(bool bReleaseChildren) override;

    protected:
        virtual TSharedRef<SWidget> RebuildWidget() override;

    private:
        TSharedPtr<SWidget> WidgetParent;

        UPROPERTY(transient)
        UMaterialInterface* RenderingMaterial;

        UPROPERTY(transient)
        FSlateBrush ImageBrush;

        UPROPERTY(transient)
        UTexture2D* DefaultTexture;
};

ExedreWidgetRenderTarget.cpp

#include "ExedreWidgetRenderTarget.h"


UExedreWidgetRenderTarget::UExedreWidgetRenderTarget(
    const FObjectInitializer& ObjectInitializer)
    : Super(ObjectInitializer)
{
    FString Path = "Texture2D'/Game/UI/txt_LogoUE4.txt_LogoUE4'";
    static ConstructorHelpers::FObjectFinder<UTexture2D> Texture(*Path);
    DefaultTexture = Texture.Object;

    ImageBrush = FSlateBrush();
    ImageBrush.SetResourceObject(DefaultTexture);

    RenderingMaterial = nullptr;
}


void UExedreWidgetRenderTarget::SetRenderMaterial( UMaterialInterface* Material )
{
    if( Material != nullptr && RenderingMaterial != Material )
    {
        // Store new reference
        RenderingMaterial = Material;

        // Updating internal rendering brush
        ImageBrush.SetResourceObject(Material);
    }
}


TSharedRef<SWidget> UExedreWidgetRenderTarget::RebuildWidget()
{
    if( !WidgetParent.IsValid() )
    {
        // Use an SInvalidationPanel if you want to cache
        // the image and its brush, but it won't allow
        // to update the material later (unless explicitly invalidated)
        /*
        WidgetParent =
            SNew(SInvalidationPanel).CacheRelativeTransforms(false)
            [
                SNew(SImage).Image( &ImageBrush )
            ];
        */

        WidgetParent = SNew(SImage).Image( &ImageBrush );
    }

    return WidgetParent.ToSharedRef();
}


void UExedreWidgetRenderTarget::ReleaseSlateResources(bool bReleaseChildren)
{
    Super::ReleaseSlateResources(bReleaseChildren);

    WidgetParent.Reset();
}