From fd4b5b24cedac1f7bae6792cbe4216f3e30a2cae Mon Sep 17 00:00:00 2001 From: Niklas Haas Date: Wed, 3 Sep 2025 14:38:38 +0200 Subject: [PATCH] fftools/ffmpeg_sched: lower default frame queue size I tested this extensively under different conditions and could not come up with any scenario where using a larger queue size was actually beneficial. Moreover, having such a large default queue is very wasteful especially for larger frame sizes; and can in the worst case lead to an extra ~50% memory footprint per input (with the default 16 threads), regardless of whether that input is currently in use or not. My methodology was to add logging in the event of a queue underrun/overrun, and then observe and then observe the frequency of such events in practice, as well as the impact on performance. I came up with an example filter graph involving decoding, filtering and encoding with several input files and various changes to move the bottleneck around. I found that, in all configurations I tested, with all thread counts and bottlenecks, using a queue size of 2 frames yielded practically identical performance to a queue size of 8 frames. I was only able to consistently measure a slowdown when restricting the queue to a single frame, where the underruns ended up making up almost 1.1% of frame events in the worst case. A summary of my test log follows: = Bottleneck in decoder = ffmpeg -i A -i B -i C -filter_complex "concat=n=3" -f null - == 16 threads == === Queue statistics (dec -> filtergraph) === - 8 frames = 91355 underruns, 1 overrun - 4 frames = 91381 underruns, 2 overruns - 2 frames = 91326 underruns, 21 overruns - 1 frame = 91284 underruns, 102 overruns === Time elapsed === - 8 frames = 14.37s - 4 frames = 14.28s - 2 frames = 14.27s - 1 frame = 14.35s == 1 thread == === Queue statistics (dec -> filtergraph) === - 8 frames = 91801 underruns, 0 overruns - 4 frames = 91929 underruns, 1 overrun - 2 frames = 91854 underruns, 7 overruns - 1 frame = 91745 underrons, 83 overruns === Time elapsed === - 8 frames = 39.51s - 4 frames = 39.94s - 2 frames = 39.91s - 1 frame = 41.69s = Bottleneck in filter graph: = ffmpeg -i A -i B -i C -filter_complex "concat=n=3,scale=3840x2160" -f null - == 16 threads == === Queue statistics (dec -> filtergraph) === - 8 frames = 277 underruns, 84673 overruns - 4 frames = 640 underruns, 86523 overruns - 2 frames = 850 underruns, 88751 overruns - 1 frame = 1028 underruns, 89957 overruns === Time elapsed === - 8 frames = 26.35s - 4 frames = 26.31s - 2 frames = 26.38s - 1 frame = 26.55s == 1 thread == === Queue statistics (dec -> filtergraph) === - 8 frames = 29746 underruns, 57033 overruns - 4 frames = 29940 underruns, 58948 overruns - 2 frames = 30160 underruns, 60185 overruns - 1 frame = 30259 underruns, 61126 overruns === Time elapsed === - 8 frames = 52.08s - 4 frames = 52.49s - 2 frames = 52.25s - 1 frame = 52.69s = Bottleneck in encoder: = ffmpeg -i A -i B -i C -filter_complex "concat=n=3" -c:v libx264 -preset veryfast -f null - == 1 thread == == Queue statistics (filtergraph -> enc) == - 8 frames = 26763 underruns, 63535 overruns - 4 frames = 26863 underruns, 63810 overruns - 2 frames = 27243 underruns, 63839 overruns - 1 frame = 27670 underruns, 63953 overruns == Time elapsed == - 8 frames = 89.45s - 4 frames = 89.04s - 2 frames = 89.24s - 1 frame = 90.26s --- fftools/ffmpeg_sched.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fftools/ffmpeg_sched.h b/fftools/ffmpeg_sched.h index fb7a77ddfc..24ad37b778 100644 --- a/fftools/ffmpeg_sched.h +++ b/fftools/ffmpeg_sched.h @@ -257,7 +257,7 @@ int sch_add_mux(Scheduler *sch, SchThreadFunc func, int (*init)(void *), /** * Default size of a frame thread queue. */ -#define DEFAULT_FRAME_THREAD_QUEUE_SIZE 8 +#define DEFAULT_FRAME_THREAD_QUEUE_SIZE 2 /** * Add a muxed stream for a previously added muxer.