1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-01-13 21:28:01 +02:00
Commit Graph

2274 Commits

Author SHA1 Message Date
Lauri Kasanen
9456adc223 swscale/ppc: VSX-optimize hScale8To19
./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \
    -s 2400x720 -f rawvideo -y -vframes 5 -pix_fmt yuv420p16le -nostats test.raw

2.26 speedup (x86 SSE2 is 2.32):
  23772 UNITS in hscale,    4096 runs,      0 skips
  53862 UNITS in hscale,    4096 runs,      0 skips
2019-05-07 10:08:16 +03:00
Lauri Kasanen
d0e4d0429e swscale/ppc: VSX-optimize hscale_fast
./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags fast_bilinear \
        -s 2400x720 -f rawvideo -vframes 5 -pix_fmt abgr -nostats test.raw

4.27 speedup for hyscale_fast:
  24796 UNITS in hyscale_fast,    4096 runs,      0 skips
   5797 UNITS in hyscale_fast,    4096 runs,      0 skips

4.48 speedup for hcscale_fast:
  19911 UNITS in hcscale_fast,    4095 runs,      1 skips
   4437 UNITS in hcscale_fast,    4096 runs,      0 skips
2019-04-30 14:41:28 +03:00
Lauri Kasanen
ce92ee4b4f swscale/ppc: VSX-optimize non-full-chroma yuv2rgb_2
./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags fast_bilinear \
        -s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \
        -cpuflags 0 -v error -

32-bit mul, power8 only.

~2x speedup:

rgb24
  24431 UNITS in yuv2packed2,   16384 runs,      0 skips
  13783 UNITS in yuv2packed2,   16383 runs,      1 skips
bgr24
  24396 UNITS in yuv2packed2,   16384 runs,      0 skips
  14059 UNITS in yuv2packed2,   16384 runs,      0 skips
rgba
  26815 UNITS in yuv2packed2,   16383 runs,      1 skips
  12797 UNITS in yuv2packed2,   16383 runs,      1 skips
bgra
  27060 UNITS in yuv2packed2,   16384 runs,      0 skips
  13138 UNITS in yuv2packed2,   16384 runs,      0 skips
argb
  26998 UNITS in yuv2packed2,   16384 runs,      0 skips
  12728 UNITS in yuv2packed2,   16381 runs,      3 skips
bgra
  26651 UNITS in yuv2packed2,   16384 runs,      0 skips
  13124 UNITS in yuv2packed2,   16384 runs,      0 skips

This is a low speedup, but the x86 mmx version also gets only ~2x. The mmx version
is also heavily inaccurate, while the vsx version has high accuracy.
2019-04-11 09:08:51 +03:00
Lauri Kasanen
8607e29fa3 swscale/ppc: VSX-optimize yuv2rgb_full_X
./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \
                -s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \
                -cpuflags 0 -v error -

32-bit mul, power8 only.

~6.4x speedup:

rgb24
 214278 UNITS in yuv2packedX,   16384 runs,      0 skips
  33249 UNITS in yuv2packedX,   16384 runs,      0 skips
bgr24
 214616 UNITS in yuv2packedX,   16384 runs,      0 skips
  33233 UNITS in yuv2packedX,   16384 runs,      0 skips
rgba
 214517 UNITS in yuv2packedX,   16384 runs,      0 skips
  33271 UNITS in yuv2packedX,   16384 runs,      0 skips
bgra
 214973 UNITS in yuv2packedX,   16384 runs,      0 skips
  33397 UNITS in yuv2packedX,   16384 runs,      0 skips
argb
 214613 UNITS in yuv2packedX,   16384 runs,      0 skips
  33310 UNITS in yuv2packedX,   16384 runs,      0 skips
bgra
 214637 UNITS in yuv2packedX,   16384 runs,      0 skips
  33330 UNITS in yuv2packedX,   16384 runs,      0 skips
2019-04-07 09:20:34 +03:00
Lauri Kasanen
3256e949be swscale/ppc: VSX-optimize yuv2rgb_full_2
./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags area \
            -s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \
            -cpuflags 0 -v error -

32-bit mul, power8 only.

~4x speedup:

rgb24
  52763 UNITS in yuv2packed2,   16384 runs,      0 skips
  13453 UNITS in yuv2packed2,   16384 runs,      0 skips
bgr24
  53144 UNITS in yuv2packed2,   16384 runs,      0 skips
  13616 UNITS in yuv2packed2,   16384 runs,      0 skips
rgba
  52796 UNITS in yuv2packed2,   16384 runs,      0 skips
  12904 UNITS in yuv2packed2,   16384 runs,      0 skips
bgra
  52732 UNITS in yuv2packed2,   16384 runs,      0 skips
  13262 UNITS in yuv2packed2,   16384 runs,      0 skips
argb
  52661 UNITS in yuv2packed2,   16384 runs,      0 skips
  12879 UNITS in yuv2packed2,   16384 runs,      0 skips
bgra
  52662 UNITS in yuv2packed2,   16384 runs,      0 skips
  12932 UNITS in yuv2packed2,   16384 runs,      0 skips
2019-04-07 09:20:33 +03:00
Lauri Kasanen
50e672bc54 swscale/ppc: VSX-optimize non-full-chroma yuv2rgb_1
./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags fast_bilinear \
        -s 1200x1440 -f null -vframes 100 -pix_fmt $i -nostats \
        -cpuflags 0 -v error -

32-bit mul, power8 only.

1.8-2.3x speedup:

rgb24
  18192 UNITS in yuv2packed1,   32767 runs,      1 skips
   9983 UNITS in yuv2packed1,   32760 runs,      8 skips
bgr24
  18665 UNITS in yuv2packed1,   32766 runs,      2 skips
   9925 UNITS in yuv2packed1,   32763 runs,      5 skips
rgba
  20239 UNITS in yuv2packed1,   32767 runs,      1 skips
   8794 UNITS in yuv2packed1,   32759 runs,      9 skips
bgra
  20354 UNITS in yuv2packed1,   32768 runs,      0 skips
   8770 UNITS in yuv2packed1,   32761 runs,      7 skips
argb
  20185 UNITS in yuv2packed1,   32768 runs,      0 skips
   8761 UNITS in yuv2packed1,   32761 runs,      7 skips
bgra
  20360 UNITS in yuv2packed1,   32766 runs,      2 skips
   8759 UNITS in yuv2packed1,   32764 runs,      4 skips

This is a low speedup, but the x86 mmx version also gets only ~2x. The mmx version
is also heavily inaccurate, while the vsx version has high accuracy.
2019-04-07 09:20:31 +03:00
Lauri Kasanen
7adce3e64c swscale/ppc: VSX-optimize yuv2422_X
./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \
          -s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \
          -cpuflags 0 -v error -

7.2x speedup:

yuyv422
 126354 UNITS in yuv2packedX,   16384 runs,      0 skips
  16383 UNITS in yuv2packedX,   16382 runs,      2 skips
yvyu422
 117669 UNITS in yuv2packedX,   16384 runs,      0 skips
  16271 UNITS in yuv2packedX,   16379 runs,      5 skips
uyvy422
 117310 UNITS in yuv2packedX,   16384 runs,      0 skips
  16226 UNITS in yuv2packedX,   16382 runs,      2 skips
2019-03-31 12:41:34 +03:00
Lauri Kasanen
9a2db4dc61 swscale/ppc: VSX-optimize yuv2422_2
./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags area \
                -s 1200x720 -f null -vframes 100 -pix_fmt $i -nostats \
                -cpuflags 0 -v error -

5.1x speedup:

yuyv422
  19339 UNITS in yuv2packed2,   16384 runs,      0 skips
   3718 UNITS in yuv2packed2,   16383 runs,      1 skips
yvyu422
  19438 UNITS in yuv2packed2,   16384 runs,      0 skips
   3800 UNITS in yuv2packed2,   16380 runs,      4 skips
uyvy422
  19128 UNITS in yuv2packed2,   16384 runs,      0 skips
   3721 UNITS in yuv2packed2,   16380 runs,      4 skips
2019-03-31 12:41:33 +03:00
Lauri Kasanen
a6a31ca3d9 swscale/ppc: VSX-optimize yuv2422_1
./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \
            -s 1200x1440 -f null -vframes 100 -pix_fmt $i -nostats \
            -cpuflags 0 -v error -

15.3x speedup:

yuyv422
  14513 UNITS in yuv2packed1,   32768 runs,      0 skips
    949 UNITS in yuv2packed1,   32767 runs,      1 skips
yvyu422
  14516 UNITS in yuv2packed1,   32767 runs,      1 skips
    943 UNITS in yuv2packed1,   32767 runs,      1 skips
uyvy422
  14530 UNITS in yuv2packed1,   32767 runs,      1 skips
    941 UNITS in yuv2packed1,   32766 runs,      2 skips
2019-03-31 12:41:32 +03:00
Michael Niedermayer
8865ae959b swscale/swscale_unscaled: Fix chroma slice height
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-03-28 22:47:32 +01:00
Dong, Jerry
c47fada298 swscale/swscale_unscaled: fixed the issue that when width/height is not 2-multiple, transition of nv12 to u/v planes is not completed.
Signed-off-by: Dong, Jerry <jerry.dong@intel.com>
Signed-off-by: Decai Lin <decai.lin@intel.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-03-28 20:28:43 +01:00
Lauri Kasanen
681957b88d swscale/ppc: VSX-optimize yuv2rgb_full
./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 \
        -s 1200x1440 -f null -vframes 100 -pix_fmt $i -nostats \
        -cpuflags 0 -v error -

This uses 32-bit mul, so POWER8 only.

The following output formats get about 4.5x speedup:

rgb24
  39980 UNITS in yuv2packed1,   32768 runs,      0 skips
   8774 UNITS in yuv2packed1,   32768 runs,      0 skips
bgr24
  40069 UNITS in yuv2packed1,   32768 runs,      0 skips
   8772 UNITS in yuv2packed1,   32766 runs,      2 skips
rgba
  39759 UNITS in yuv2packed1,   32768 runs,      0 skips
   8681 UNITS in yuv2packed1,   32767 runs,      1 skips
bgra
  39729 UNITS in yuv2packed1,   32768 runs,      0 skips
   8696 UNITS in yuv2packed1,   32766 runs,      2 skips
argb
  39766 UNITS in yuv2packed1,   32768 runs,      0 skips
   8672 UNITS in yuv2packed1,   32766 runs,      2 skips
bgra
  39784 UNITS in yuv2packed1,   32768 runs,      0 skips
   8659 UNITS in yuv2packed1,   32767 runs,      1 skips
2019-03-27 09:05:08 +02:00
Lauri Kasanen
81a4719d8e swscale: Remove duplicated code
In this function, the exact same clamping happens both in the if and unconditionally.
2019-03-27 09:00:06 +02:00
Lauri Kasanen
6b5ea90eac swscale/ppc: Add av_unused to template vars only used in one includer 2019-03-20 10:21:55 +02:00
Lauri Kasanen
ac3062f1a4 swscale/ppc: Clean up some mixed decl warnings 2019-03-20 10:21:53 +02:00
Lauri Kasanen
8522d219ce libswscale/ppc: VSX-optimize 9-16 bit yuv2planeX
./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt yuv420p16be \
-s 1920x1728 -f null -vframes 100 -v error -nostats -

9-14 bit funcs get about 6x speedup, 16-bit gets about 15x.
Fate passes, each format tested with an image to video conversion.

Only POWER8 includes 32-bit vector multiplies, so POWER7 is locked out
of the 16-bit function. This includes the vec_mulo/mule functions too,
not just vmuluwm.

With TIMER_REPORT skips disabled:
yuv420p9le
  12412 UNITS in planarX,  131072 runs,      0 skips
  73136 UNITS in planarX,  131072 runs,      0 skips
yuv420p9be
  12481 UNITS in planarX,  131072 runs,      0 skips
  73410 UNITS in planarX,  131072 runs,      0 skips
yuv420p10le
  12322 UNITS in planarX,  131072 runs,      0 skips
  72546 UNITS in planarX,  131072 runs,      0 skips
yuv420p10be
  12291 UNITS in planarX,  131072 runs,      0 skips
  72935 UNITS in planarX,  131072 runs,      0 skips
yuv420p12le
  12316 UNITS in planarX,  131072 runs,      0 skips
  72708 UNITS in planarX,  131072 runs,      0 skips
yuv420p12be
  12319 UNITS in planarX,  131072 runs,      0 skips
  72577 UNITS in planarX,  131072 runs,      0 skips
yuv420p14le
  12259 UNITS in planarX,  131072 runs,      0 skips
  72516 UNITS in planarX,  131072 runs,      0 skips
yuv420p14be
  12440 UNITS in planarX,  131072 runs,      0 skips
  72962 UNITS in planarX,  131072 runs,      0 skips
yuv420p16le
  10548 UNITS in planarX,  131072 runs,      0 skips
  73429 UNITS in planarX,  131072 runs,      0 skips
yuv420p16be
  10634 UNITS in planarX,  131072 runs,      0 skips
 150959 UNITS in planarX,  131072 runs,      0 skips

Signed-off-by: Lauri Kasanen <cand@gmx.com>
2019-02-05 09:34:53 +02:00
Michael Niedermayer
fe17f9b956 swscale/yuv2rgb: Return a more specific error code from ff_yuv2rgb_c_init_tables()
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-01-01 21:11:47 +01:00
Lauri Kasanen
8dd9df9ecd swscale/output: Altivec-optimize float yuv2plane1
This function wouldn't benefit from VSX instructions, so I put it
under altivec.

./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt grayf32le \
-f null -vframes 100 -v error -nostats -

3743 UNITS in planar1,   65495 runs,     41 skips

-cpuflags 0

23511 UNITS in planar1,   65530 runs,      6 skips

grayf32be

4647 UNITS in planar1,   65449 runs,     87 skips

-cpuflags 0

28608 UNITS in planar1,   65530 runs,      6 skips

The native speedup is 6.28133, and the bswapping one 6.15623.
Fate passes, each format tested with an image to video conversion.

Signed-off-by: Lauri Kasanen <cand@gmx.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2018-12-26 20:28:58 +01:00
Lauri Kasanen
b4c8c03b00 swscale/output: VSX-optimize 16-bit yuv2plane1
./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt yuv420p16le \
-f null -vframes 100 -v error -nostats -

2120 UNITS in planar1,   65393 runs,    143 skips

-cpuflags 0

19157 UNITS in planar1,   65512 runs,     24 skips

9.03632 speedup, 16be similarly.

Fate passes, each format tested with an image to video conversion.

Signed-off-by: Lauri Kasanen <cand@gmx.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2018-12-14 19:09:11 +01:00
Lauri Kasanen
1046cba24b swscale/output: VSX-optimize nbps yuv2plane1
./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt yuv420p9le \
-f null -vframes 100 -v error -nostats -

Speedups:
yuv2plane1_9BE_vsx	11.2042
yuv2plane1_9LE_vsx	11.156
yuv2plane1_10BE_vsx	9.89428
yuv2plane1_10LE_vsx	10.3637
yuv2plane1_12BE_vsx	9.71923
yuv2plane1_12LE_vsx	11.0404
yuv2plane1_14BE_vsx	10.1763
yuv2plane1_14LE_vsx	11.2728

Fate passes, each format tested with an image to video conversion.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2018-12-12 01:56:57 +01:00
Lauri Kasanen
78c7ff7d25 swscale/ppc: Move VSX-using code to its own file
Passes fate on LE (with "lavc/jrevdct: Avoid an aliasing violation" applied).

Signed-off-by: Lauri Kasanen <cand@gmx.com>
Tested-by: Michael Kostylev on BE
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2018-12-04 02:59:07 +01:00
Lauri Kasanen
46c5693ea3 swscale/output: Altivec-optimize yuv2plane1_8
./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt yuv420p \
-f null -vframes 100 -v error -nostats -

1158 UNITS in planar1,   65528 runs,      8 skips

-cpuflags 0

19082 UNITS in planar1,   65533 runs,      3 skips

16.48 speedup ratio. On x86, SSE2 is ~7. Curiously, the Power C version
takes as many cycles as the x86 SSE2 version, yikes it's fast.

Note that this function uses VSX instructions, but is not marked so.
This is because several existing functions also make that mistake.
I'll submit a patch moving them once this is reviewed.

Signed-off-by: Lauri Kasanen <cand@gmx.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2018-11-26 02:56:25 +01:00
Martin Vignali
86e6f0dbc7 swscale : add support for YUVA444P12 and YUVA422P12 2018-11-24 16:24:47 +01:00
Michael Niedermayer
517573a670 Bump minor version for master after 4.1 branchpoint
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2018-11-02 00:53:07 +01:00
Michael Niedermayer
780d5e30a0 Bump minor versions for branching 4.1
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2018-11-02 00:15:32 +01:00
Martin Vignali
156120fcf8 swscale/swscale_unscaled : rename packed_16bpc_bswap
is used for packed and planar format
2018-10-24 21:21:20 +02:00
Martin Vignali
26bf4a4050 swscale/unscaled : add grayf32 le to be 2018-10-24 21:21:14 +02:00
Martin Vignali
3db33b446f swscale/utils : simplify unscaled initial test for float pixfmt 2018-10-24 21:21:10 +02:00
Martin Vignali
db4771af81 swscale : add YA16 LE/BE output 2018-10-18 21:43:24 +02:00
Martin Vignali
658bbc0060 swscale/x86/rgb2rgb.asm : add Ivo Van Poorten name to the top of the file
suggested by Carl Eugen Hoyos
2018-10-18 21:43:19 +02:00
Martin Vignali
296609f859 swscale/x86/rgb2rgb : port shuffle 2103 mmxext to external asm and remove inline asm version 2018-10-13 14:12:41 +02:00
Martin Vignali
04afdbb560 swscale/x86/rgb2rgb : remove mmx version for shuffle2103 2018-10-13 14:12:36 +02:00
Paul B Mahol
931e7c050e swscale/swscale_unscaled: add gbrap -> packed rgb path 2018-09-09 22:58:26 +02:00
Martin Vignali
bdd6754648 swscale/swscale : small cosmetic 2018-08-22 11:36:15 +02:00
Martin Vignali
3af1c4ea7d swscale : treat float input data as uint 16bpc
Currently float are converted to 16b uint in input part
using src depth (32 bits) in hScale16To19 and hScale16to15,
make an invalid shift for the data

So shift the value when using float input
like 16 bpc uint.
2018-08-22 11:36:09 +02:00
Sergey Lavrushkin
582bc5a348 libswscale: Adds conversions from/to float gray format.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2018-08-14 18:22:39 +02:00
Carl Eugen Hoyos
3a56ade1f3 lsws/rgb2rgb_template: Do not compile unneeded shuffle functions on big-endian.
Fixes the following warnings:
In file included from libswscale/rgb2rgb.c:128:0:
libswscale/rgb2rgb_template.c:346:13: warning: 'shuffle_bytes_3210_c' defined but not used
libswscale/rgb2rgb_template.c:346:13: warning: 'shuffle_bytes_3012_c' defined but not used
libswscale/rgb2rgb_template.c:346:13: warning: 'shuffle_bytes_1230_c' defined but not used
2018-06-10 03:22:59 +02:00
Paul B Mahol
b9dd058f7a swscale: add gray14 support
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2018-05-05 21:35:31 +02:00
Martin Vignali
07a566e7d6 swscale/swscale_unscaled : add X86_64 (SSE2 and AVX) for uyvyto422
and checkasm test
2018-04-22 19:15:32 +02:00
Michael Niedermayer
3c1ecb057d Bump minor versions after release/4.0 branching
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2018-04-16 12:35:12 +02:00
Michael Niedermayer
7e3a070d9a Bump minor versions for branching release/4.0
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2018-04-16 12:35:12 +02:00
wm4
d6fc031caf avutil/pixdesc: deprecate AV_PIX_FMT_FLAG_PSEUDOPAL
PSEUDOPAL pixel formats are not paletted, but carried a palette with the
intention of allowing code to treat unpaletted formats as paletted. The
palette simply mapped the byte values to the resulting RGB values,
making it some sort of LUT for RGB conversion.

It was used for 1 byte formats only: RGB4_BYTE, BGR4_BYTE, RGB8, BGR8,
GRAY8. The first 4 are awfully obscure, used only by some ancient bitmap
formats. The last one, GRAY8, is more common, but its treatment is
grossly incorrect. It considers full range GRAY8 only, so GRAY8 coming
from typical Y video planes was not mapped to the correct RGB values.
This cannot be fixed, because AVFrame.color_range can be freely changed
at runtime, and there is nothing to ensure the pseudo palette is
updated.

Also, nothing actually used the PSEUDOPAL palette data, except xwdenc
(trivially changed in the previous commit). All other code had to treat
it as a special case, just to ignore or to propagate palette data.

In conclusion, this was just a very strange old mechnaism that has no
real justification to exist anymore (although it may have been nice and
useful in the past). Now it's an artifact that makes the API harder to
use: API users who allocate their own pixel data have to be aware that
they need to allocate the palette, or FFmpeg will crash on them in
_some_ situations. On top of this, there was no API to allocate the
pseuo palette outside of av_frame_get_buffer().

This patch not only deprecates AV_PIX_FMT_FLAG_PSEUDOPAL, but also makes
the pseudo palette optional. Nothing accesses it anymore, though if it's
set, it's propagated. It's still allocated and initialized for
compatibility with API users that rely on this feature. But new API
users do not need to allocate it. This was an explicit goal of this
patch.

Most changes replace AV_PIX_FMT_FLAG_PSEUDOPAL with FF_PSEUDOPAL. I
first tried #ifdefing all code, but it was a mess. The FF_PSEUDOPAL
macro reduces the mess, and still allows defining FF_API_PSEUDOPAL to 0.

Passes FATE with FF_API_PSEUDOPAL enabled and disabled. In addition,
FATE passes with FF_API_PSEUDOPAL set to 1, but with allocation
functions manually changed to not allocating a palette.
2018-04-03 17:53:00 +02:00
Martin Storsjö
f33f728470 arm: swscale: Only compile the rgb2yuv asm if .dn aliases are supported
Vanilla clang supports altmacro since clang 5.0, and thus doesn't
require gas-preprocessor for building the arm assembly any longer.

However, the built-in assembler doesn't support .dn directives.

This readds checks that were removed in d7320ca3ed, when
the last usage of .dn directives within libav were removed.

Alternatively, the assembly could be rewritten to not use the
.dn directive, making it available to clang users.

Signed-off-by: Martin Storsjö <martin@martin.st>
2018-03-31 21:54:56 +03:00
Martin Vignali
5f6126ea7f swscale/rgb2rgb : cosmetic, move shuffle_bytes func declaration
move shuffle_bytes_1230, 3012, 3210 with the other shuffle_byte
declaration
2018-03-24 20:22:17 +01:00
Martin Vignali
1ba5ca2d72 swscale/rgb : add X86 SIMD (SSSE3), for shuffle_bytes_1230, shuffle_bytes_3012, shuffle_bytes_3210 2018-03-24 20:22:08 +01:00
Martin Vignali
d4f6640855 swscale/rgb : move shuffle func shuffle_bytes_1230, shuffle_bytes_3012, shuffle_bytes_3210 in order to add SIMD 2018-03-24 20:22:02 +01:00
Martin Vignali
923a324174 swscale/rgb : add X86 SIMD (SSSE3) for shuffle_bytes_2103 and shuffle_bytes_0321 2018-03-24 20:21:58 +01:00
Philip Langdale
dd3f1e3a11 swscale: Introduce a helper to identify semi-planar formats
This cleans up the ever-more-unreadable list of semi-planar
exclusions for selecting the planar copy wrapper.
2018-03-03 15:20:19 -08:00
Philip Langdale
9d5aff09a7 swscale: Add p016 output support and generalise yuv420p1x to p010
To make the best use of existing code, I generalised the wrapper
that currently does yuv420p10 to p010 to support any mixture of
input and output sizes between 10 and 16 bits. This had the side
effect of yielding a working code path for all yuv420p1x formats
to p01x.
2018-03-02 14:52:48 -08:00
Thomas Köppe
43171a2a73 Fix missing used attribute for inline assembly variables
Variables used in inline assembly need to be marked with attribute((used)).
Static constants already were, via the define of DECLARE_ASM_CONST.
But DECLARE_ALIGNED does not add this attribute, and some of the variables
defined with it are const only used in inline assembly, and therefore
appeared dead. This change adds a macro DECLARE_ASM_ALIGNED that marks
variables as used.

This change makes FFMPEG work with Clang's ThinLTO.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-11-13 03:58:34 +01:00
James Almer
869401cefc Merge commit '29ccc641b17afad058a5c24071ea827865a8b3a9'
* commit '29ccc641b17afad058a5c24071ea827865a8b3a9':
  build: Drop check for sys/mman.h in favor of mmap() check

Merged-by: James Almer <jamrial@gmail.com>
2017-11-11 16:09:09 -03:00
James Almer
087e9ab1b3 Merge commit '0fd0d4fd0a518e30ff23972828ad7cf7f35cfb9d'
* commit '0fd0d4fd0a518e30ff23972828ad7cf7f35cfb9d':
  swscale-test: const correctness

Merged-by: James Almer <jamrial@gmail.com>
2017-10-30 12:34:40 -03:00
Carl Eugen Hoyos
9b0510a8e3 lsws/yuv2rgb: Fix yuva2rgb32 on big endian hardware. 2017-10-29 14:53:57 +01:00
Mateusz
50ce296026 swscale: use dithering in DITHER_COPY only if not set -sws_dither none
This patch uses dithering in DITHER_COPY macro only if
it was not used option '-sws_dither none'.
With option '-sws_dither none' it uses downshift.

For human eye dithering is OK, for video codecs not necessarily.
If user don't want to use dithering, we should respect that.

Signed-off-by: Mateusz Brzostek <mateuszb@poczta.onet.pl>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-10-25 21:50:37 +02:00
Mateusz
f192f2f061 swscale: more accurate DITHER_COPY macro for full and limited range
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-10-23 23:00:05 +02:00
James Almer
69b5ce64d2 Merge commit '07a2b155949eb267cdfc7805f42c7b3375f9c7c5'
* commit '07a2b155949eb267cdfc7805f42c7b3375f9c7c5':
  Bump major versions of all libraries

A few API deprecated ~2 years ago or more are also postponed here for
varying reasons.

FF_API_LOWRES:
Since this functionality depends on AVStream->codec, i figure the two can
be removed at the same time in the next bump or so.

FF_API_AVCTX_TIMEBASE:
Couldn't get this one to work. Not just libavcodec but apparently also
libavformat and ffmpeg.c expect AVCodecContext->time_base to be set for
decoding. Upon removal some tests report a different generic stream time
base (like 1/25), and others lose packet duration values. I guess it's
somehow tied to the AVStream->codec clusterfuck.
It can be dealt with alongside FF_API_LAVF_AVCTX in the next bump.

FF_API_OLD_FILTER_OPTS_ERROR:
This one is meant to remain after FF_API_OLD_FILTER_OPTS is removed.
Its purpose is displaying the corrected command line using the new syntax
as a suggestion as part of the error message.

Merged-by: James Almer <jamrial@gmail.com>
2017-10-21 14:57:53 -03:00
James Almer
2904db9045 Merge commit '994c4bc10751e39c7ed9f67ffd0c0dea5223daf2'
* commit '994c4bc10751e39c7ed9f67ffd0c0dea5223daf2':
  x86util: Port all macros to cpuflags

See d5f8a642f6

Merged-by: James Almer <jamrial@gmail.com>
2017-10-21 12:15:57 -03:00
Michael Niedermayer
80154b1b3a Bump version for master after 3.4 branchpoint
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-10-11 02:45:37 +02:00
Michael Niedermayer
e1de9eab3a Bump minor versions for branching 3.4
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-10-11 01:23:47 +02:00
Diego Biurrun
29ccc641b1 build: Drop check for sys/mman.h in favor of mmap() check
We already rely on just mmap() in other places.
2017-10-10 23:20:16 +02:00
Lou Logan
183fd30e0f Fix several typos
"apix_fmts" found by Marc Péchaud.
"speedloss" found by Mikhail V.

Signed-off-by: Lou Logan <lou@lrcd.com>
2017-09-21 16:17:02 -08:00
Derek Buitenhuis
5e3f6dc701 swscale: Do not expand a macro with 'defined' in it
Fixes:

    libswscale/utils.c:1632:5: warning: macro expansion producing 'defined' has undefined behavior [-Wexpansion-to-defined]
    #if USE_MMAP
        ^
    libswscale/utils.c:1577:49: note: expanded from macro 'USE_MMAP'
    #define USE_MMAP (HAVE_MMAP && HAVE_MPROTECT && defined MAP_ANONYMOUS)
                                                    ^
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2017-08-28 10:26:14 +02:00
Derek Buitenhuis
add7b3bc3f utils: Do not expand a macro with 'defined' in it
Fixes:

    libswscale/utils.c:1632:5: warning: macro expansion producing 'defined' has undefined behavior [-Wexpansion-to-defined]
    #if USE_MMAP
        ^
    libswscale/utils.c:1577:49: note: expanded from macro 'USE_MMAP'
    #define USE_MMAP (HAVE_MMAP && HAVE_MPROTECT && defined MAP_ANONYMOUS)
                                                    ^
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2017-08-25 13:44:57 +01:00
Carl Eugen Hoyos
cb1a3eecac lsws/rgb2rgb: Add unscaled 48bit to 64bit rgb conversion.
Based on b4befca2 and 6b7849e6 by Paul B Mahol.

Fixes ticket #6608.
2017-08-24 12:50:06 +02:00
Paul B Mahol
de48710c11 libswscale: add gray9 support 2017-08-07 13:09:41 +02:00
James Cowgill
013ec23cbe swscale: fix gbrap16 alpha channel issues
Fixes filter-pixfmts-scale test failing on big-endian systems due to
alpSrc not being cast to (const int32_t**).

Also fixes distortions in the output alpha channel values by copying the
alpha channel code from the rgba64 case found elsewhere in output.c.

Fixes ticket 6555.

Signed-off-by: James Cowgill <James.Cowgill@imgtec.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-08-05 23:24:02 +02:00
Clément Bœsch
797c232ef8 sws/tests/pixdesc_query: fix use of free() instead of av_free()
Fix CID 1415949
2017-07-30 20:48:57 +02:00
Clément Bœsch
4158fba3cd sws/tests/pixdesc_query: replace rgb based pix fmts with endianess agnostic names
Fixes ticket #6554
2017-07-30 16:05:32 +02:00
Clément Bœsch
d2c70fc887 sws/tests/pixdesc_query: sort pixel formats 2017-07-30 16:04:36 +02:00
Clément Bœsch
ca23d3491d sws/tests/pixdesc_query: save every pix fmts in a list
This will be required for the next commit.
2017-07-30 16:04:36 +02:00
Diego Biurrun
825e463a17 build: Add feature test macros for glibc 2.19+
glibc introduced _DEFAULT_SOURCE in version 2.19 to replace _BSD_SOURCE and
_SVID_SOURCE, which were deprecated in version 2.20. Add _DEFAULT_SOURCE
where the latter two are used to be forwards-compatible and avoid warnings
about the use of deprecated definitions.
2017-07-10 10:22:56 +02:00
Diego Biurrun
fd502f4f5f build: Generalize yasm/nasm-related variable names
None of them are specific to the YASM assembler.

(Cherry-picked from libav commit 39e208f4d4)

Signed-off-by: James Almer <jamrial@gmail.com>
2017-06-21 17:00:29 -03:00
James Almer
6fdd35a312 Merge commit '92db5083077a8b0f8e1050507671b456fd155125'
* commit '92db5083077a8b0f8e1050507671b456fd155125':
  build: Generate pkg-config files from Make and not from configure
  build: Store library version numbers in .version files

Includes cherry-picked commits 8a34f36593 and
ee164727dd to fix issues.

Changes were also made to retain support for raise_major and build_suffix.

Reviewed-by: ubitux
Merged-by: James Almer <jamrial@gmail.com>
2017-05-04 19:59:30 -03:00
Clément Bœsch
3f17751eeb Merge commit '11a9320de54759340531177c9f2b1e31e6112cc2'
* commit '11a9320de54759340531177c9f2b1e31e6112cc2':
  build: Move build-system-related helper files to a separate subdirectory

"ffbuild" directory name is used instead of "avbuild".

Merged-by: Clément Bœsch <u@pkh.me>
2017-05-03 16:49:12 +02:00
Michael Niedermayer
7796f29065 libswscale/tests/swscale: Fix uninitialized variables
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-04-30 14:23:27 +02:00
Clément Bœsch
172b0e2e88 Merge commit 'ea7ee4b4e381e0fa731458de0cbf740430eeb013'
* commit 'ea7ee4b4e381e0fa731458de0cbf740430eeb013':
  ppc: Centralize compiler-specific altivec.h #include handling in one place

Merged-by: Clément Bœsch <u@pkh.me>
2017-04-26 16:23:28 +02:00
Diego Biurrun
0fd0d4fd0a swscale-test: const correctness 2017-04-24 16:10:05 +02:00
Luca Barbato
37f573543c swscale: Convert the check check_image_pointers helper to a macro
Avoid warnings about types mismatch and make the code a little simpler.
2017-04-15 15:37:18 +02:00
Luca Barbato
f56fa95cd1 swscale: Do not shift negative values directly
It is undefined in C as reported:
    warning: shifting a negative signed value is undefined
2017-04-15 15:37:18 +02:00
Michael Niedermayer
ac29b82ec5 swscale: Add gbrap10 output
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2017-04-15 15:37:10 +02:00
Paul B Mahol
f6a9c20a52 swscale: Add input support for gbrap10 pixel format
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2017-04-13 16:11:24 +02:00
Carl Eugen Hoyos
c1616b454d lsws/utils: Make gray10 and gray12 full-scale like gray8 and gray16. 2017-04-12 23:00:04 +02:00
Michael Niedermayer
22b0daa1b3 Bump versions for master after 3.3
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-04-02 19:54:12 +02:00
Michael Niedermayer
e1cc7f83df Bump minor for 3.3
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-04-02 19:49:45 +02:00
Michael Niedermayer
58b867a7cf Bump minor versions for master after release/3.3 branchpoint
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-03-31 13:21:06 +02:00
Michael Niedermayer
fc332f3e29 Bump minor versions for staring release/3.3 branch
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-03-31 13:21:06 +02:00
Clément Bœsch
46f4f8ad86 Merge commit '1263b2039eb5aaf1522e9de9f07c787ab30a5f50'
* commit '1263b2039eb5aaf1522e9de9f07c787ab30a5f50':
  Adjust printf conversion specifiers to match variable signedness

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-24 13:34:39 +01:00
Clément Bœsch
99dd6fe62c sws/tests/pixdesc_query: remove func wrappers 2017-03-24 00:06:35 +01:00
Clément Bœsch
bc7308aae8 sws: make is{RGB,BGR}inInt functions 2017-03-24 00:06:35 +01:00
Vittorio Giovara
07a2b15594 Bump major versions of all libraries
This disables everything that was deprecated at least 18 months ago.

Readjust the minimum API version as needed, postponing any
API-incompatible changes until the next bump.

Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2017-03-23 09:56:20 +01:00
Clément Bœsch
fa8db3f597 Merge commit 'de8e096c7eda2bce76efd0a1c1c89d37348c2414'
* commit 'de8e096c7eda2bce76efd0a1c1c89d37348c2414':
  swscale: Consistently order input YUV pixel formats

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-21 08:24:39 +01:00
Clément Bœsch
e811f84a2e swscale: cosmetics in is{RGB,BGR}inInt
Reduce diff with Libav.
2017-03-20 08:02:30 +01:00
Clément Bœsch
d6635daded swscale: remove unused is{RGB,BGR}inBytes 2017-03-20 08:02:30 +01:00
Clément Bœsch
ff6bc16c5a swscale: use a (more correct) function for isPacked 2017-03-20 08:02:30 +01:00
Clément Bœsch
2b9a52bcca swscale: use a function for isAnyRGB 2017-03-20 08:02:30 +01:00
Clément Bœsch
c30875e8b2 swscale: use a function for isBayer 2017-03-20 08:02:30 +01:00
Clément Bœsch
f052b1b40f swscale: use a function for isGray 2017-03-20 08:02:30 +01:00
Clément Bœsch
08e1376d81 fate: add fate-sws-pixdesc-query
Test the pixel format querying within libswscale.
2017-03-20 08:02:30 +01:00
Clément Bœsch
8e950c9b42 Merge commit 'aa37d2bf4505afc106e2a23c44afc722bb204a8e'
* commit 'aa37d2bf4505afc106e2a23c44afc722bb204a8e':
  swscale: Kill non-compiling disabled cruft

The isGray() chunk is not merged as an alternative patch actually fixing
the dead code is currently under review on the mailing-list.

The SWS_X chunk is merged, with an additional cosmetic.

Merged-by: Clément Bœsch <u@pkh.me>
2017-03-19 15:40:03 +01:00
Diego Biurrun
994c4bc107 x86util: Port all macros to cpuflags
Also do some small cosmetic changes: Drop pointless _MMX suffix from ABSD2
macro name, drop pointless check for MMX support, we always assume MMX is
available in our SIMD code, fix spelling.
2017-03-14 17:23:32 +01:00