DivX, FFDShow, and decoder speed testing
This is the first part of a two-part series of articles about performance of DivX decoder and MPEG-4 decoding in general.
In this article, I will discuss measuring decoder performance and give some general results. Part two (coming soon) will look at variations of decoder performance on different systems (CPU types and clocks, memory, video cards).
First let me describe our approach to decoder speed testing. Those of you less technically minded can skip it and go straight to the next section.
How do you measure speed of a decoder?
The easiest way is simply to play some video clip in a media player and look at CPU usage. This method is quite imprecise, so, at best, we can use it to double-check our results. We need a more scientific approach.
A much better way is to play your video as fast as possible and then record how fast the decoder can play it on this PC (in frames per second). This can be done manually, using a standard DirectShow tool called “graphedit”
There are a few problems with this method. First of all, this method is manual, and it’s time-consuming to do this for multiple clips or combinations of settings. If we want good precision, we have to repeat the process many times and average results by hand.
The biggest disadvantage of graphedit approach lies in the Video Renderer filter. If you have ever benchmarked a 3D video game, you might be familiar with it. Video Renderer may choose to synchronize the decoding process to the refresh rate of the monitor – so-called “vsync”. If it does that, even the best decoder in the world will be unable to surpass monitor refresh rate (typically 60 fps). What’s worse, it seems to be impossible to tell the Video Renderer not to wait for vsync.
The best solution is to implement a tool that does everything for us automatically. The tool and its source can be downloaded here:
It is a command-line tool that accepts the name of a DirectShow filter and any number of AVI files. Using this tool, we can write a batch file that tests several clips and filters at once.
My test system represents a “mid-line” PC, something that’s not too archaic but also does not cost an arm and a leg.
- Pentium D 805 Smithfield, 2.667 GHz
- 1 GB of DDR2 PC3200 memory
- Windows Vista 64-bit
Although this system is based on a dual-core processor, none of the decoders being tested employ multiprocessor optimizations for MPEG-4 decoding, so only one thread is used in all tests.
Four DirectShow decoders are tested:“Elephant’s Dream”. I have encoded two fragments of this sequence, one resized to DVD resolution (720x480), the other in full HD (1920x1080). Test clips were encoded using DivX 6.6 encoder in constant-quantizer mode, with adaptive B-frames, without Q-pel or GMC. (Unfortunately, bandwidth constraints don't allow us to publish encoded fragments.)
Decoding Performance - Introduction
My first test is to evaluate “typical” decoding speed at commonly used bit rates.
Test sequence 1: 720x480 @ 30 fps, Q=4 (bitrate 1.1 Mbps)
This system is, of course, far more than sufficient to decode MPEG-4 at home theater resolution. Plenty of CPU time is available for other tasks (e.g. postprocessing).
DivX 5.2 is slowest. There’s almost no difference between two versions of FFDShow even though they are separated by 3 years.
Test sequence 2: 1920x1080 @ 30 fps, Q=4 (bitrate approx. 4 Mbps)
At 1080p resolution we’re much closer to loading the system. Note that much more than 30 fps is needed for smooth playback. Your system must be able to read from the disk, decode audio, send data through to the video card, and maintain minimum decoding speed of 30 fps, or it will drop frames.
For comparison purposes, we’ve included the same clip encoded in H.264 at a similar bit rate (CABAC, B-frames, in-loop filter). Advanced features of H.264 clearly come at a price. Assuming that ffdshow’s H.264 is as well optimized as MPEG-4, full featured H.264 is roughly 3 times slower to decode than MPEG-4 at this bit rate.
Does bit rate affect decoding performance?
Bit rate and decoding speed are definitely correlated, but correlation is not very strong. On this PC, 1080p@60 MPEG-4 should be decodable at any reasonable bit rate.
Finally, let’s take a look at the relative impact of different SIMD instruction sets (MMX, SSE …) on decoder performance.
In ffdshow, it is theoretically possible to turn off instruction sets from the configuration dialog. This feature appears to be broken at the present time (turning off all optimizations has negligible effect on decoding framerate). I had to make a custom version of DivX decoder to perform this test.
Most of the speed increase comes from MMX. Overall, MMX-SSE2 optimizations speed up decoding by more than 3x.
MPEG-4 decoding performance, part 2:
- different CPU types
- different CPU clock frequencies
- impact of system memory speed
- video cards and video memory