Previously, the code allowed overwriting on 16-aligned blocks, which was suitable when there were
no picture's virtual boundaries because both CTU sizes and strides were 16-aligned. However, with
picture's virtual boundaries, each CTU is divided into four ALF blocks, leading to potential issues
with overwriting later CTUs.
In cases involving picture virtual boundaries, each ALF block is 8-pixel aligned.
For luma, we consistently ensure an 8-aligned width. For chroma in 4:2:0 format,
we need to account for a 4-aligned width.