The HEVC decoder has both HEVCContext and HEVCLocalContext
structures. The latter is supposed to be the structure
containing the per-slicethread state.
Yet that is not how it is handled in practice: Each HEVCLocalContext
has a unique HEVCContext allocated for it and each of these
coincides with the main HEVCContext except in exactly one field:
The corresponding HEVCLocalContext.
This makes it possible to pass the HEVCContext everywhere where
logically a HEVCLocalContext should be used.
This commit stops doing this for lavc/hevc_filter.c; it also constifies
everything that is possible in order to ensure that no slice thread
accidentally modifies the main HEVCContext state.
There are places where this was not possible, namely with the SAOParams
in sao_filter_CTB() or with sao_pixels_buffer_h in copy_CTB_to_hv().
Both of these instances lead to data races, see
https://fate.ffmpeg.org/report.cgi?time=20220629145651&slot=x86_64-archlinux-gcc-tsan-slices
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Fixes: out of array access
Fixes: 49271/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_HEVC_fuzzer-5424984922652672
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This is in preparation for further commits that will stop
using ThreadFrame for frame-threaded codecs that don't use
ff_thread_(await|report)_progress(); the API for those codecs
having inter-frame depdendencies will live in threadframe.h.
Reviewed-by: Anton Khirnov <anton@khirnov.net>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Fixes: runtime error: left shift of negative value -1
Fixes: 2299/clusterfuzz-testcase-minimized-4843509351710720
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* commit 'ba479f3daafc7e4359ec1212164569ebe59f0bb7':
hevc: Change type of array stride parameters to ptrdiff_t
Merged-by: James Almer <jamrial@gmail.com>
hevc seems to be the only place where the C implementation
of the av_clip function is explicitly selected, precluding
platform-specific optimizations
Signed-off-by: Peter Meerwald <pmeerw@pmeerw.net>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Original x86 intrinsics code and initial yasm port by Pierre-Edouard Lepere.
Refactoring and optimizations by James Almer.
Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U
Width 32
158583 decicycles in edge, sao_edge_filter_8 runs, 0 skips
5205 decicycles in ff_hevc_sao_edge_filter_32_8_ssse3, 32767 runs, 1 skips
2942 decicycles in ff_hevc_sao_edge_filter_32_8_avx2, 32767 runs, 1 skips
Width 64
705639 decicycles in sao_edge_filter_8, 262144 runs, 0 skips
19224 decicycles in ff_hevc_sao_edge_filter_64_8_ssse3, 262111 runs, 33 skips
10433 decicycles in ff_hevc_sao_edge_filter_64_8_avx2, 262115 runs, 29 skips
Signed-off-by: James Almer <jamrial@gmail.com>
As with sao_band_filter, pass instead the two variables from the struct needed in the function.
This simplifies writing asm optimized versions.
Reviewed-by: Mickaël Raulet <mraulet@insa-rennes.fr>
Signed-off-by: James Almer <jamrial@gmail.com>
Use edge emu buffers
And enable the code unconditionally
Speed difference without USE_SAO_SMALL_BUFFER and with the new code:
Decicycles: 26772->26220 (BO32), 83803->80942 (BO64)
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
cherry picked from commit 5d9f79edef2c11b915bdac3a025b59a32082f409
SAO edge filter uses pre-SAO pixel data on the left and top of the ctb, so
this data must be kept available. This was done previously by having 2
copies of the frame, one before and one after SAO.
This commit reduces the storage to just that, instead of the previous whole
frame.
Commit message taken from patch by Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
For band filter, source and destination are aligned (except for 16x16 ctbs),
and otherwise, they are most often aligned. Overall, the total width is also
too small for amortizing memcpy.
Timings (using an intrinsic version of edge filters):
B/32 B/64 E/32 E/64
Before: 32045 93952 38925 126896
After: 26772 83803 33942 117182
Pass instead the two variables from the struct needed in the function.
This simplifies writing asm optimized versions of the function
Signed-off-by: James Almer <jamrial@gmail.com>
* commit 'a7a17e3f1915ce69b787dc58c5d8dba0910fc0a4':
hevc_filter: move some conditions out of loops
Conflicts:
libavcodec/hevc_filter.c
This is possibly less readable than the variant used before.
Thus please take a look and if people agree its worse, dont
hesitate to revert.
See: 83976e40e8
Merged-by: Michael Niedermayer <michaelni@gmx.at>
1) each of the loops run within a single CTB, so the relevant reference
list is constant
2) when that CTB is, or lies on the same slice as, the current one, we
can use a simple access instead of a relatively expensive call to
ff_hevc_get_ref_list()
The x86 asm expects int32_t so use that type.
Reviewed-by: Mickaël Raulet <mraulet@insa-rennes.fr>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This should help cache locality. On win64:
Before: 1397x cycles, 16216 bytes
After: 1369x cycles, 16040 bytes
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit '73bb8f61d48dbf7237df2e9cacd037f12b84b00a':
hevcdsp: remove an unneeded variable in the loop filter
Conflicts:
libavcodec/hevc_filter.c
See: d7e162d46b
Merged-by: Michael Niedermayer <michaelni@gmx.at>
beta0 and beta1 will always be the same within a CU
Signed-off-by: Mickaël Raulet <mraulet@insa-rennes.fr>
cherry picked from commit 4a23d824741a289c7d2d2f2871d1e2621b63fa1b
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>