1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-01-13 21:28:01 +02:00
Commit Graph

116879 Commits

Author SHA1 Message Date
Ramiro Polla
7e4784e40c avcodec/mpegvideoencdsp: speed up draw_edges_8_c by inlining it for all used edge widths
This commit also restricts w to 4, 8, or 16.

Intel(R) Core(TM) i5-5300U CPU @ 2.30GHz:
                                    before    after
draw_edges_8_1724_4_c:             46796.5   7141.7  ( 6.55x)
draw_edges_8_1724_8_c:             43584.5   7216.5  ( 6.04x)
draw_edges_8_1724_16_c:            47007.2  10080.5  ( 4.66x)
draw_edges_128_407_4_c:            11199.0   4185.0  ( 2.68x)
draw_edges_128_407_8_c:            10660.2   4418.0  ( 2.41x)
draw_edges_128_407_16_c:           11800.2   4634.5  ( 2.55x)
draw_edges_1080_31_4_c:             1356.5    634.7  ( 2.14x)
draw_edges_1080_31_8_c:             1972.0   1430.2  ( 1.38x)
draw_edges_1080_31_16_c:            4621.0   4009.7  ( 1.15x)
draw_edges_1920_4_4_c:               834.5    795.2  ( 1.05x)
draw_edges_1920_4_4_negstride_c:     821.7    802.0  ( 1.02x)
draw_edges_1920_4_8_c:              2782.2   2650.7  ( 1.05x)
draw_edges_1920_4_8_negstride_c:    2724.7   2670.0  ( 1.02x)
draw_edges_1920_4_16_c:             6437.5   6327.7  ( 1.02x)
draw_edges_1920_4_16_negstride_c:   6395.2   6349.5  ( 1.01x)

A55:
                                    before    after
draw_edges_8_1724_4_c:             52540.4  19739.2  ( 2.66x)
draw_edges_8_1724_8_c:             45386.9  19847.4  ( 2.29x)
draw_edges_8_1724_16_c:            51995.4  23284.7  ( 2.23x)
draw_edges_128_407_4_c:            13401.1   6988.2  ( 1.92x)
draw_edges_128_407_8_c:            12218.4   7527.9  ( 1.62x)
draw_edges_128_407_16_c:           13695.9   8207.2  ( 1.67x)
draw_edges_1080_31_4_c:             3702.9   3110.4  ( 1.19x)
draw_edges_1080_31_8_c:             6015.6   5643.2  ( 1.07x)
draw_edges_1080_31_16_c:           12281.9  11901.4  ( 1.03x)
draw_edges_1920_4_4_c:              3957.9   3970.2  ( 1.00x)
draw_edges_1920_4_4_negstride_c:    3964.1   3825.2  ( 1.04x)
draw_edges_1920_4_8_c:              7757.9   7676.4  ( 1.01x)
draw_edges_1920_4_8_negstride_c:    7923.6   7812.4  ( 1.01x)
draw_edges_1920_4_16_c:            14791.6  15143.9  ( 0.98x)
draw_edges_1920_4_16_negstride_c:  14788.6  15163.4  ( 0.98x)

A76:
                                    before   after
draw_edges_8_1724_4_c:             39786.0  4968.5  ( 8.01x)
draw_edges_8_1724_8_c:             32971.5  5069.5  ( 6.50x)
draw_edges_8_1724_16_c:            40056.0  6017.2  ( 6.66x)
draw_edges_128_407_4_c:             9517.2  1210.5  ( 7.86x)
draw_edges_128_407_8_c:             8035.7  1346.2  ( 5.97x)
draw_edges_128_407_16_c:            9946.5  1648.2  ( 6.03x)
draw_edges_1080_31_4_c:             1308.0   660.7  ( 1.98x)
draw_edges_1080_31_8_c:             1785.5  1270.7  ( 1.41x)
draw_edges_1080_31_16_c:            3266.7  2591.5  ( 1.26x)
draw_edges_1920_4_4_c:              1151.0  1090.7  ( 1.06x)
draw_edges_1920_4_4_negstride_c:    1153.7  1096.5  ( 1.05x)
draw_edges_1920_4_8_c:              2220.7  2186.5  ( 1.02x)
draw_edges_1920_4_8_negstride_c:    2218.5  2193.5  ( 1.01x)
draw_edges_1920_4_16_c:             4324.2  4230.0  ( 1.02x)
draw_edges_1920_4_16_negstride_c:   4310.7  4233.0  ( 1.02x)
2024-08-26 12:50:26 +02:00
Ramiro Polla
3bfce2a104 avcodec/x86/mpegvideoencdsp: speed up draw_edges_mmx by using memcpy()
The mmx memory copy code is not nearly as efficient as memcpy(), which
would make draw_edges_mmx much slower than draw_edges_8_c.

Intel(R) Core(TM) i5-5300U CPU @ 2.30GHz:
                                      before    after
draw_edges_8_1724_4_mmx:              8700.5   8751.8  ( 0.99x)
draw_edges_8_1724_8_mmx:             10441.7  10558.0  ( 0.99x)
draw_edges_8_1724_16_mmx:            10660.7  10799.5  ( 0.99x)
draw_edges_128_407_4_mmx:             4202.2   4099.3  ( 1.03x)
draw_edges_128_407_8_mmx:             4579.0   4511.3  ( 1.02x)
draw_edges_128_407_16_mmx:            5479.7   4729.5  ( 1.16x)
draw_edges_1080_31_4_mmx:             1546.7    658.0  ( 2.35x)
draw_edges_1080_31_8_mmx:             2745.5   1442.5  ( 1.90x)
draw_edges_1080_31_16_mmx:           12511.5   4901.0  ( 2.55x)
draw_edges_1920_4_4_mmx:              2659.0    705.0  ( 3.77x)
draw_edges_1920_4_4_negstride_mmx:    2643.0    729.0  ( 3.63x)
draw_edges_1920_4_8_mmx:              7845.0   2819.0  ( 2.78x)
draw_edges_1920_4_8_negstride_mmx:    7777.0   2747.3  ( 2.83x)
draw_edges_1920_4_16_mmx:            24583.7   6358.3  ( 3.87x)
draw_edges_1920_4_16_negstride_mmx:  24589.0   6367.0  ( 3.86x)
2024-08-26 12:50:21 +02:00
Ramiro Polla
9cdcbb639a avcodec/x86/mpegvideoencdsp: fix comment for draw_edges_mmx
Not only w == 8 and w == 16 are supported, but also w == 4.
2024-08-26 12:49:24 +02:00
Ramiro Polla
8c203ea7c7 avcodec/aarch64/mpegvideoencdsp: add dotprod implementation for pix_norm1
A55             A76
pix_norm1_c:        484.3           235.2
pix_norm1_neon:     193.8 ( 2.50x)   44.7 ( 5.26x)
pix_norm1_dotprod:   91.8 ( 5.28x)   21.2 (11.09x)
2024-08-26 12:49:04 +02:00
Ramiro Polla
9f68a3712e avcodec/aarch64/mpegvideoencdsp: add neon implementations for pix_sum and pix_norm1
A55             A76
pix_norm1_c:     478.2           234.2
pix_norm1_neon:  188.2 ( 2.54x)   41.2 ( 5.68x)
pix_sum_c:       304.2           244.0
pix_sum_neon:     77.2 ( 3.94x)   21.5 (11.35x)
2024-08-26 12:48:31 +02:00
Ramiro Polla
834964ce1a checkasm/mpegvideoencdsp: add pix_sum, pix_norm1, and draw_edges 2024-08-26 12:48:09 +02:00
Ramiro Polla
f9074427db avcodec/x86/mpegvideoencdsp: support negative strides in draw_edges_mmx() 2024-08-26 12:44:02 +02:00
Ramiro Polla
98610fe95f fate/checkasm: run the sw_yuv2yuv test 2024-08-26 12:16:40 +02:00
Zhao Zhili
12cdb30e37 avcodec/videotoolboxenc: Fix leaking of supported_props
There are two VTCompressionSessionRef been created, one for generating
extradata, and another for normal encoding. supported_props was been
overwritten without release.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-08-26 17:09:46 +08:00
Ramiro Polla
420d443600 swscale/aarch64: cosmetics fix (spaces inside curly braces) 2024-08-26 11:07:49 +02:00
Ramiro Polla
52887683e9 swscale/aarch64: add nv24/nv42 to yuv420p unscaled converter
A55               A76
nv24_yuv420p_128_c:       4956.1            1267.0
nv24_yuv420p_128_neon:    3109.1 ( 1.59x)    640.0 ( 1.98x)
nv24_yuv420p_1920_c:     35728.4           11736.2
nv24_yuv420p_1920_neon:   8011.1 ( 4.46x)   2436.0 ( 4.82x)
nv42_yuv420p_128_c:       4956.4            1270.5
nv42_yuv420p_128_neon:    3074.6 ( 1.61x)    639.5 ( 1.99x)
nv42_yuv420p_1920_c:     35685.9           11732.5
nv42_yuv420p_1920_neon:   7995.1 ( 4.46x)   2437.2 ( 4.81x)
2024-08-26 11:04:46 +02:00
Ramiro Polla
88a563ad18 swscale: export ff_copyPlane so it may be used by simd code 2024-08-26 11:04:46 +02:00
Ramiro Polla
a2e01cade8 checkasm/yuv2yuv: add tests for semiplanar unscaled converters 2024-08-26 11:04:46 +02:00
Ramiro Polla
4eb5594295 swscale: add nv24/nv42 to yuv420p unscaled converter 2024-08-26 11:04:46 +02:00
Zhao Zhili
aa14f9fe63 avcodec/mediacodecdec: Skip dequeue buffer in draining state
There is no more packet to queue in draining state.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-08-26 16:59:07 +08:00
Zhao Zhili
2e370805da avfilter/unsharp: Merge header into .c
It was shared with opencl implementation.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-08-26 16:58:25 +08:00
Stefan Oltmanns
d42cd5b75b avformat/vapoursynth: load library at runtime
Signed-off-by: Stefan Oltmanns <stefan-oltmanns@gmx.net>
2024-08-26 10:30:52 +02:00
Stefan Oltmanns
eac611f1a4 avformat/vapoursynth: Update to API version 4
Signed-off-by: Stefan Oltmanns <stefan-oltmanns@gmx.net>
2024-08-26 10:30:50 +02:00
Ramiro Polla
abb4e13a0a avutil/aarch64: add AV_COPY128 and AV_ZERO128 macros 2024-08-26 10:26:59 +02:00
Zhao Zhili
40dda881d6 avcodec/filter_units: Fix extradata and packets can have different bitstream format
Filter init can change extradata from avcc/hvcc to annexb format.
With different passthrough logic, packets can still in avcc/hvcc
format. Use same passthrough logic for init and filter.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-08-24 00:27:15 +08:00
Zhao Zhili
523189c744 fftools/ffplay: handle flip in display matrix
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-08-24 00:26:59 +08:00
Gnattu OC
30f090b4f8 avfilter: inherit input color range for videotoolbox filters
The color range should be set to match the input when creating
the VideoToolbox context. Otherwise, the new context will default
to limited range, creates inconsistencies with full range inputs.

Signed-off-by: Gnattu OC <gnattuoc@me.com>
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-08-24 00:24:06 +08:00
Martin Storsjö
cfe0a36352 libswscale: aarch64: Fix the indentation of some macro invocations
Signed-off-by: Martin Storsjö <martin@martin.st>
2024-08-22 14:40:30 +03:00
James Almer
9d15fe77e3 avcodec/container_fifo: add missing stddef.h include
Fixes make checkheaders

Signed-off-by: James Almer <jamrial@gmail.com>
2024-08-21 15:12:46 -03:00
James Almer
a754ee0844 avcodec/h2645_parse: replace three bool arguments in ff_h2645_packet_split with a single flags one
Signed-off-by: James Almer <jamrial@gmail.com>
2024-08-19 20:23:20 -03:00
James Almer
8060644237 avcodec/shorten: Fix discard of ‘const’ qualifier
Signed-off-by: James Almer <jamrial@gmail.com>
2024-08-19 17:40:00 -03:00
Martin Storsjö
507c2a5774 libswscale: arm: Don't assume aligned output in yuv2rgb functions
This fixes failures in recently added checkasm tests.

While the buffers in most cases are aligned, libswscale in general
can't assume the output to be aligned.

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-08-19 23:04:52 +03:00
Anton Khirnov
52471b56ba lavfi: make FFFilterContext private to generic code
Nothing in it needs to be visible to filters.
2024-08-19 21:48:11 +02:00
Anton Khirnov
f19c988911 lavfi/filters: move functions only used by generic code to avfilter_internal.h 2024-08-19 21:48:11 +02:00
Anton Khirnov
6d75d44d90 lavfi: drop internal.h
All that remains in it are things that belong in avfilter_internal.h.

Move them there and remove internal.h
2024-08-19 21:48:04 +02:00
Anton Khirnov
90e4af65e1 lavfi/f_streamselect: remove a no-op ff_filter_config_links() call
It does not do anything when the links are already configured.
2024-08-19 21:45:25 +02:00
Anton Khirnov
a2314308f2 lavfi/inernal: move ff_fmt_is_regular_yuv() declaration to video.h 2024-08-19 21:45:25 +02:00
Anton Khirnov
a83a30e899 lavfi: move ff_parse_{sample_rate,channel_layout}() to audio.[ch]
That is a more appropriate place for those functions.
2024-08-19 21:45:25 +02:00
Anton Khirnov
f4bfdf7893 lavfi: move ff_parse_pixel_format() to vf_format, its only caller
The only thing this function does beyond calling av_get_pix_fmt() is
falling back onto parsing the argument as a number. No other filters
should need to do this.
2024-08-19 21:45:25 +02:00
Anton Khirnov
1afe42852b lavfi/internal: move functions used by filters to filters.h
internal.h currently mixes interfaces intended to be used by filters
with those that should be limited to generic filter- or graph-level
code.
2024-08-19 21:45:25 +02:00
Rémi Denis-Courmont
d8fb44c0aa lavc/mpegvideoencdsp: R-V V add_8x8basis
T-Head C908:
add_8x8basis_c:      440.6
add_8x8basis_rvv_i32: 70.3

SpacemiT X60:
add_8x8basis_c:      436.3
add_8x8basis_rvv_i32: 40.5
2024-08-19 22:41:13 +03:00
Rémi Denis-Courmont
1907dd7f23 lavc/mpegvideoencdsp: R-V V try_8x8basis
T-Head C908:
try_8x8basis_c:       922.5
try_8x8basis_rvv_i32: 135.3

SpacemiT X60:
try_8x8basis_c:       926.1
try_8x8basis_rvv_i32: 103.1
2024-08-19 22:41:13 +03:00
Rémi Denis-Courmont
0fd37c00d7 lavc/mpegvideoencdsp: R-V V pix_norm1
T-Head C908:
pix_norm1_c:       480.2
pix_norm1_rvv_i64: 146.9

SpacemiT X60:
pix_norm1_c:       478.2
pix_norm1_rvv_i64:  92.7
2024-08-19 22:41:13 +03:00
Rémi Denis-Courmont
63d016aea5 lavc/mpegvideoencdsp: R-V V pix_sum
T-Head C908:
pix_sum_c:      332.2
pix_sum_rvv_i64: 91.2

SpacemiT X60:
pix_sum_c:      321.2
pix_sum_rvv_i64: 60.9
2024-08-19 22:41:13 +03:00
Anton Khirnov
631a725670 lavc/hevcdec: call ff_thread_finish_setup() even if hwaccel is in use
Serializing frame threading for non-threadsafe hwaccels is handled at the
generic level, the decoder does not need to care about it.
2024-08-19 21:37:22 +02:00
Anton Khirnov
4b9adb35b6 lavc/hevcdec: simplify output logic
Current code is written around the "simple" decode API's limitation that
a single input packet (AU/coded frame) triggers the output of at most
one output frame. However the spec contains two cases where a coded
frame may cause multiple frames to be output (cf. C.5.2.2.2):
* start of a new sequence
* overflowing sps_max_dec_pic_buffering

The decoder currently contains rather convoluted logic to handle these
cases:
* decode/output/per-frame sequence counters,
* HEVC_FRAME_FLAG_BUMPING
* ff_hevc_bump_frame()
* special clauses in ff_hevc_output_frame()

However, with the receive_frame() API none of that is necessary, as we
can just output multiple frames at once. Previously added ContainerFifo
allows that to be done in a straightforward and efficient manner.
2024-08-19 21:37:22 +02:00
Anton Khirnov
79afc45c03 lavc/hevcdec: use a ContainerFifo to hold frames scheduled for output
Instead of a single AVFrame.

Will be useful in future commits, where we will want to produce multiple
output frames for a single coded frame.
2024-08-19 21:37:22 +02:00
Anton Khirnov
4bda7f288c lavc/videotoolbox: drop HEVC cropping from start_frame rather than end_frame
HEVCContext.output_frame will be removed in following commits.

Reported-By: Max Bykov
2024-08-19 21:37:22 +02:00
Anton Khirnov
6174818252 lavc: add private container FIFO API
It provides a FIFO for "container" objects like AVFrame/AVPacket and
features an integrated FFRefStructPool-based pool to avoid allocating an
freeing them repeatedly.
2024-08-19 21:37:22 +02:00
Anton Khirnov
2fdecbb239 lavc/hevcdec: switch to receive_frame()
Required by following commits, where we will want to output multiple
frames per packet.
2024-08-19 21:37:22 +02:00
sunyuechi
4e7b5ac48f lavc/vp9dsp: R-V V mc bilin hv
C908   X60
vp9_avg_bilin_4hv_8bpp_c                           :   10.7    9.5
vp9_avg_bilin_4hv_8bpp_rvv_i32                     :    4.0    3.5
vp9_avg_bilin_8hv_8bpp_c                           :   38.5   34.2
vp9_avg_bilin_8hv_8bpp_rvv_i32                     :    7.2    6.5
vp9_avg_bilin_16hv_8bpp_c                          :  147.2  130.5
vp9_avg_bilin_16hv_8bpp_rvv_i32                    :   14.5   12.7
vp9_avg_bilin_32hv_8bpp_c                          :  574.2  509.7
vp9_avg_bilin_32hv_8bpp_rvv_i32                    :   42.5   38.0
vp9_avg_bilin_64hv_8bpp_c                          : 2321.2 2017.7
vp9_avg_bilin_64hv_8bpp_rvv_i32                    :  163.5  131.0
vp9_put_bilin_4hv_8bpp_c                           :   10.0    8.7
vp9_put_bilin_4hv_8bpp_rvv_i32                     :    3.5    3.0
vp9_put_bilin_8hv_8bpp_c                           :   35.2   31.2
vp9_put_bilin_8hv_8bpp_rvv_i32                     :    6.5    5.7
vp9_put_bilin_16hv_8bpp_c                          :  134.0  119.0
vp9_put_bilin_16hv_8bpp_rvv_i32                    :   12.7   11.5
vp9_put_bilin_32hv_8bpp_c                          :  538.5  464.2
vp9_put_bilin_32hv_8bpp_rvv_i32                    :   39.7   35.2
vp9_put_bilin_64hv_8bpp_c                          : 2111.7 1833.2
vp9_put_bilin_64hv_8bpp_rvv_i32                    :  138.5  122.5

Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2024-08-19 22:29:20 +03:00
sunyuechi
9edd2e723b lavc/vp9dsp: R-V V mc bilin h v
C908   X60
vp9_avg_bilin_4h_8bpp_c                            :    5.5    4.7
vp9_avg_bilin_4h_8bpp_rvv_i32                      :    1.7    1.5
vp9_avg_bilin_4v_8bpp_c                            :    5.5    4.7
vp9_avg_bilin_4v_8bpp_rvv_i32                      :    1.5    1.2
vp9_avg_bilin_8h_8bpp_c                            :   20.0   17.7
vp9_avg_bilin_8h_8bpp_rvv_i32                      :    3.0    2.7
vp9_avg_bilin_8v_8bpp_c                            :   20.7   18.7
vp9_avg_bilin_8v_8bpp_rvv_i32                      :    3.0    2.7
vp9_avg_bilin_16h_8bpp_c                           :   78.2   69.7
vp9_avg_bilin_16h_8bpp_rvv_i32                     :    7.0    6.2
vp9_avg_bilin_16v_8bpp_c                           :   98.5   73.2
vp9_avg_bilin_16v_8bpp_rvv_i32                     :    7.0    6.0
vp9_avg_bilin_32h_8bpp_c                           :  325.5  275.5
vp9_avg_bilin_32h_8bpp_rvv_i32                     :   23.0   20.5
vp9_avg_bilin_32v_8bpp_c                           :  342.2  290.0
vp9_avg_bilin_32v_8bpp_rvv_i32                     :   21.7   19.5
vp9_avg_bilin_64h_8bpp_c                           : 1263.7 1095.7
vp9_avg_bilin_64h_8bpp_rvv_i32                     :   91.2   81.2
vp9_avg_bilin_64v_8bpp_c                           : 1331.7 1155.2
vp9_avg_bilin_64v_8bpp_rvv_i32                     :   91.2   81.0
vp9_put_bilin_4h_8bpp_c                            :    4.5    4.0
vp9_put_bilin_4h_8bpp_rvv_i32                      :    1.0    1.0
vp9_put_bilin_4v_8bpp_c                            :    4.7    4.2
vp9_put_bilin_4v_8bpp_rvv_i32                      :    1.0    1.0
vp9_put_bilin_8h_8bpp_c                            :   16.7   15.0
vp9_put_bilin_8h_8bpp_rvv_i32                      :    2.2    2.0
vp9_put_bilin_8v_8bpp_c                            :   17.5   15.7
vp9_put_bilin_8v_8bpp_rvv_i32                      :    2.2    2.0
vp9_put_bilin_16h_8bpp_c                           :   65.2   58.0
vp9_put_bilin_16h_8bpp_rvv_i32                     :    6.0    5.5
vp9_put_bilin_16v_8bpp_c                           :   69.2   61.7
vp9_put_bilin_16v_8bpp_rvv_i32                     :    5.7    5.2
vp9_put_bilin_32h_8bpp_c                           :  273.2  229.0
vp9_put_bilin_32h_8bpp_rvv_i32                     :   19.7   17.7
vp9_put_bilin_32v_8bpp_c                           :  290.5  243.7
vp9_put_bilin_32v_8bpp_rvv_i32                     :   18.7   16.7
vp9_put_bilin_64h_8bpp_c                           : 1040.5  910.5
vp9_put_bilin_64h_8bpp_rvv_i32                     :   82.5   73.0
vp9_put_bilin_64v_8bpp_c                           : 1108.5  971.0
vp9_put_bilin_64v_8bpp_rvv_i32                     :   82.2   73.2

Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2024-08-19 22:29:20 +03:00
Marvin Scholz
8f36c6f2e7
MAINTAINERS: add CC preference for myself
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-08-19 17:46:44 +02:00
Michael Niedermayer
7e5410eadb
avformat/iamf_parse: clear padding
Fixes: use of uninitialized value
Fixes: 70929/clusterfuzz-testcase-minimized-ffmpeg_dem_IAMF_fuzzer-5931276639469568

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-08-18 23:05:40 +02:00
Michael Niedermayer
67947f2a1c
avcodec/hevc/ps: use unsigned shift
Fixes: left shift of 1 by 31 places cannot be represented in type 'int'
Fixes: 70726/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_HEVC_fuzzer-6149928703819776

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-08-18 23:05:39 +02:00