Shaders - Performance & Libraries

The simulator requires OpenGL version 4.1 or higher. Currently it’s compiled to run on Microsoft Windows. OpenGL features which are not “core” in version 4.1, might eventually be incorporated via extensions. For example, the “Anisotropic filtering” extension is already in use.

Vertex shader & Fragment Shader

Shaders are computer programs that are written by developers. They reside on the graphics card (also referred to as: GPU) and execute in conjunction with programming code written on the CPU side. They support various common programming elements, such as: variables, if statements, loops, functions etc, but they don’t support OOP.

Shader stages execute one after the other. The vertex shader stage executes before the fragment shader stage. Other shader stages are: Tessellation Control, Evaluation, Geometry, and Compute, all of which are optional.

The processing of each shader stage (a vertex in the case of a vertex shader) is said to be an invocation, and because “shader stage invocations cannot interact with one another”, the results of calculations made during each stage cannot be used by the following invocations of the same stage.

In contrast to the above, each shader stage can pass values to the following stages via “shader stage inputs and outputs”. The values that are passed can indeed be the results of calculations made during a shader stage invocation. For example, Simanic calculates the rotational positional displacement of vertices for blade feathering, in the vertex shader, and passes those values to the fragment shader for calculating the transparency.

Transferring information from the CPU to the GPU each game (application) loop, can be done via input attributes and/or uniforms. Typically, vertices will be fed to the vertex shader via an input attribute. Uniforms can be textures, variables, and more.

Uniforms are so named because they do not change from one shader invocation to the next within a particular rendering call”, which means you can have this uniform type variable state available for every invocation for all stages, but it does remain constant per draw call.

Test subject “The Beast”: 270,000 Vertices - reduced to approximately 157,000 for testing (before triangulation)

Modelled in Blender and consists of numerous meshes and moving parts. Between 1 and 5 copies were used. Various settings were changed during testing using a 60Hz Quad HD and 240Hz Full HD monitor.

Note: partly due to the initial model importing process which optimises (i.e. triangulation, which can also/alternatively be done in the modelling software) mesh data for rendering, and a little due to drawing inefficiencies within some of the model’s meshes, the actual number of vertices processed (drawn) in the shaders was a little over 527,000 per helicopter via 827,000 indices.

A Realistic model: A model that consists of 75,000 to 100,000 (or even fewer) vertices within 3D modelling software (prior to triangulation), can indeed be highly detailed. “The Beast” is a high polygon count test model. Its wiring which has the highest vertex count of all its parts consists of about 70,000 vertices, which could of course be reduced drastically, or be excluded completely. Other parts could also be optimised (vertices that result in minimal additional detail be removed).

Benchmarking PC: A relatively high-end desktop PC (uses a GTX 1070 graphics card) and an Intel i7 processor. A skybox photo world was used as the scene for all the following tests. Windowed screen mode was used, the window size maximised.


Performance Benchmarking

Quad HD Monitor:

Test 1: helicopters 1 -- shadows off -- Max -- Performance: vertices 527,000 -- CPU 13% -- GPU 94% -- FPS 450 to 600

Test 2: helicopters 1 -- shadows on -- 60Hz -- Performance: vertices 527,000 -- CPU 1.2% -- GPU 24%

Full HD Monitor:

Test 3: helicopters 1 -- shadows off -- Max -- Performance: vertices 527,000 -- CPU 12% -- GPU 93% -- FPS 575 to 700

Test 4: helicopters 1 -- shadows off -- 60Hz -- Performance: vertices 527,000 -- CPU 0.7% -- GPU 12%

Test 5: helicopters 1 -- shadows on -- 60Hz -- Performance: vertices 527,000 -- CPU 0.9% -- GPU 22%

Test 6: helicopters 1 -- shadows on -- 120Hz -- Performance: vertices 527,000 -- CPU 1.7% -- GPU 36%

Test 7: helicopters 1 -- shadows on -- 240Hz -- Performance: vertices 527,000 -- CPU 3.3% -- GPU 52%

Test 8: helicopters 5 -- shadows on -- 60Hz -- Performance: vertices 2.64 Million -- CPU 1.3% -- GPU 47%

Test 9: helicopters 5 -- shadows off -- 120Hz -- Performance: vertices 2.64 Million -- CPU 1.9% -- GPU 46%

Test 10: helicopters 5 -- shadows off -- 240Hz -- Performance: vertices 2.64 Million -- CPU 9.5% -- GPU 92%

Performance readings variations:

Performance can vary depending on how much of the monitor’s screen area is taken up by a given model. As the model becomes bigger the closer it gets, the more the GPU must work. Therefore, the FPS, CPU and GPU readings are approximate averages taken during some general flying around.

Even when gameplay is effectively paused (i.e. model position and camera view are unchanging), CPU and GPU usage readings do still fluctuate every second or so. Averaging out this aspect (and other variations noticed) has also been taken into consideration when taking the readings.

Other settings included:

PCF shadow setting (smooths shadow edges) -- Set to factor 2.

Multisampling (an anti-aliasing technique) -- Set to 2 samples.

Anisotropic filtering (makes images displayed on mesh faces look well defined (sharp) when viewed at steep angles) -- Turned off.

Depth map enhanced detail feature -- Turned off.

Technical note:

The above testing involved 1 OpenGL draw call (plus shadow draw calls) per helicopter model. The numerous meshes exist combined as 1 buffer object and are handled independently in the vertex shader and fragment shader via an index number (fed into the vertex shader as an attribute) which corresponds to each mesh (heli part).

There are 3 calls to: “glBindVertexArray”, 1 for the helicopters, 1 for the scene fake floor, and 1 for the skybox scene itself.

There are 2 shader programs, 1 for drawing all the 3D models and their shadows, and 1 for drawing the text (simulator parameters at the top of the screen).

Instancing could be implemented in cases like this where the models rendered are identical, which in these examples would reduce the draw calls from 5 down to 1.

Objective:

Controls to adjust the graphics settings will be added. The goal is to make it possible for relatively low spec PCs to run the simulator.

However, there’s no getting away from the fact that modern high-performance graphics cards will be able to handle much more than relatively low spec cards. VR for example needs a decent card to run at all.

Models will be added which will consist of significantly fewer vertices, to target PCs that have more limited graphics processing capabilities.

You can also create your own models and import them.

Additional note:

It’s impressive to see rotary animations (such as the main blades) at higher refresh rates. The 240Hz Full HD monitor reveals this very well, even at 120 FPS, and is further complimented by its 1ms response time.

Further test results and some video demonstrations of speed performance will be added.


GLFW Open Source Multi-Platform Library

The OpenGL context, graphics window, screen modes, keyboard and joystick input events, and more, are all functioning via the GLFW library.

Windowed and Full Screen modes are available which can be set within the simulator control panel.

GLFW also provides support for Vulkan. Migrating from OpenGL to Vulkan is something already being considered for the future.

Currently the simulator is only compiled to run on Microsoft Windows. GLFW is multi-platform, which means compiling to run on different operating systems is an option.


OpenAL Based Sounds

Sounds are processed via a sound module which utilises OpenAL.

The volume changes according to how far away models are from the camera.

The pitch of a model’s motor increases or decreases according to the RPM.

The volume and pitch of servos both adjust according to how fast the servo arms move.


Transitioning to Vulkan

Development of adapting the existing OpenGL programming aspects to the relatively new lower level Vulkan API, depends on where the bottlenecks are in the overall execution of code i.e. the CPU side, the graphics card, and the communication between them.

There’s certainly no urgency at the moment, but in the coming years as the simulator’s codebase progresses, migrating could become increasingly relevant.

Both articles linked to below give example scenarios identifying how transitioning to Vulkan might or might not be worthwhile.


“Transitioning from OpenGL to Vulkan” (February 2016)

“Migrating from OpenGL to Vulkan” (January 2016)