Make clear that this isn't the official version
[dirac-spec-errata.git] / mc.tex
blob801253bf5e898332036a3870a3aa7f13dc462f9f
1 \label{motioncompensate}
3 This section defines the operation of the process
4 $motion\_compensate(ref1, ref2, pic, c)$ for motion-compensating a
5 picture component array $pic$ of type $c=Y, U$ or $V$ from reference
6 component arrays $ref1$ and $ref2$ of the same type.
8 This process shall be invoked for each component in a picture, subsequent to the
9 decoding of coefficient data, specified in Section \ref{transformdec}, and the Inverse Wavelet Transform (IWT), specified in Section \ref{idwt}.
11 Motion compensation shall use the motion block data $\BlockData$ and optionally may use the
12 global motion parameters $\GlobalParams$.
14 \begin{informative*}
15 \subsubsection{Overlapped Block Motion Compensation (OBMC) (Informative)}
17 Motion compensated prediction methods provide methods for determining
18 predictions for pixels in the current picture by using motion vectors to
19 define offsets from those pixels to pixels in previously decoded
20 pictures. Motion compensation techniques vary in how those pixels are grouped
21 together, and how a prediction is formed for pixels in a given group. In
22 conventional block motion compensation, as used in MPEG2, H.264 and many other
23 codecs, the picture is divided into {\em disjoint} rectangular blocks and the
24 motion vector or vectors associated with that block defines the offset(s) into
25 the reference pictures.
27 In OBMC, by contrast, the predicted picture is divided into a regular overlapping
28 blocks of dimensions $xblen$ by $yblen$ that cover at least the entire picture
29 area as shown in figure \ref{fig:blockcoverage}. Overlapping is ensured by starting
30 each block at a horizontal separation $xbsep$ and a vertical separation $ybsep$
31 from its neighbours, where these values are less than the corresponding block dimensions.
32 \end{informative*}
34 \begin{figure}[!ht]
35 \centering
36 \includegraphics[width=0.7\textwidth]{figs/block-coverage}
37 \caption{Block coverage of the predicted picture}
38 \label{fig:blockcoverage}
39 \end{figure}
41 \begin{informative*}
42 The overlap between blocks horizontally is $xoffset=(xblen - xbsep)/2$ both on the left
43 and on the right, and vertically is $yoffset=(yblen - ybsep)/2$ both on the top and the
44 bottom. As a result pixels in the overlapping areas lie in more than
45 one block, and so more than one motion vector set (and set of associated predictions)
46 applies to them. Indeed, a pixel may have up to eight predictions, as it may belong to
47 up to four blocks, each of which may have up to two motion vectors. These are combined
48 into a single prediction by using weights, which are so constructed so as to sum to 1. In
49 the Dirac integer implementation, fractional weights are achieved by insisting that weights
50 sum to a power of 2, which is then shifted out once all contributions have been summed.
52 In Dirac blocks are positioned so that blocks will overspill the left and top edges by
53 ($xoffset$) and ($yoffset$) pixels. The number of blocks has been
54 determined (Section \ref{motiondatadimensions}) so that the picture area is wholly
55 covered, and the overspill
56 on the right hand and bottom edges will be at least the amount on the left and top edges.
57 Indeed, the number of blocks has been set so that the blocks divide into whole superblocks
58 (sets of 4x4 blocks), which mean that some blocks may fall entirely out of the picture area.
59 Any predictions for pixels outside the actual picture area are discarded.
60 \end{informative*}
62 \subsubsection{Overall motion compensation process}
63 \label{mcprocess}
65 The motion compensation process shall form an integer prediction for each pixel in
66 the predicted picture component $pic$, which shall be added to the pixel value, and
67 then clipped to keep it in range.
69 The $motion\_compensate()$ process is defined by means of a temporary data
70 array $mc\_tmp$ for storing the motion-compensated prediction for the
71 current picture.
73 The $motion\_compensate()$ process shall be defined as follows:
75 \begin{pseudo}{motion\_compensate}{ref1, ref2, pic, c}
76 \bsIF{c==Y}
77 \bsCODE{bit\_depth=\LumaDepth}
78 \bsELSE
79 \bsCODE{bit\_depth=\ChromaDepth}
80 \bsEND
81 \bsCODE{init\_dimensions(c)}{\ref{mcdimensions}}
82 \bsCODE{mc\_tmp=init\_temp\_array()}{\ref{mctemparray}}
83 \bsFOR{j=0}{\BlocksY-1}
84 \bsFOR{i=0}{\BlocksX-1}
85 \bsCODE{block\_mc(mc\_tmp,i,j,ref1,ref2,c)}{\ref{blockmc}}
86 \bsEND
87 \bsEND
88 \bsFOR{y=0}{\LenY-1}
89 \bsFOR{x=0}{\LenX-1}
90 \bsCODE{pic[y][x] += (mc\_tmp[y][x]+32)\gg 6}
91 \bsCODE{pic[y][x] = \clip(pic[y][x], -2^{bit\_depth-1}, 2^{bit\_depth-1}-1)}
92 \bsEND
93 \bsEND
94 \end{pseudo}
96 \begin{informative}
97 Six bits are used for the overlapped-block weighting matrix. This ensures that 10-bit
98 data may normally be motion compensated using 16-bit words as per Section \ref{blockmc}.
99 \end{informative}
101 \subsubsection{Dimensions}
102 \label{mcdimensions}
103 Since motion compensation shall apply to both luma and (potentially subsampled)
104 chroma data, for simplicity a number of variables are defined by the
105 $init\_dimensions()$ function, which is as follows:
107 \begin{pseudo}{init\_dimensions}{c}
108 \bsIF{c==Y}
109 \bsCODE{\LenX=\LumaWidth}
110 \bsCODE{\LenY=\LumaHeight}
111 \bsCODE{\XBlen=\LumaXBlen}
112 \bsCODE{\YBlen=\LumaYBlen}
113 \bsCODE{\XBsep=\LumaXBsep}
114 \bsCODE{\YBsep=\LumaYBsep}
115 \bsELSE
116 \bsCODE{\LenX=\ChromaWidth}
117 \bsCODE{\LenY=\ChromaHeight}
118 \bsCODE{\XBlen=\ChromaXBlen}
119 \bsCODE{\YBlen=\ChromaYBlen}
120 \bsCODE{\XBsep=\ChromaXBsep}
121 \bsCODE{\YBsep=\ChromaYBsep}
122 \bsEND
123 \bsCODE{\XOffset = (\XBlen-\XBsep)//2}
124 \bsCODE{\YOffset = (\YBlen-\YBsep)//2}
125 \end{pseudo}
127 \begin{informative}
128 The subband data that makes up the IWT coefficients is padded in order that the IWT
129 may function correctly. For simplicity, in this specification, padding data is removed
130 after the IWT has been performed so that the picture data and reference data arrays have
131 the same dimensions for motion compensation. However, it may be more efficient to
132 perform all operations prior to the output of pictures using padded data, i.e. to discard
133 padding values subsequent to motion compensation. Such a course of action is equivalent,
134 so long as it is realised that blocks must be regarded as edge blocks if they overlap the
135 actual picture area, not the larger area produced by padding.
136 \end{informative}
138 \subsubsection{Initialising the motion compensated data array}
139 \label{mctemparray}
141 The $init\_temp\_array()$ function shall return a two-dimensional data array with
142 horizontal size $\LenX$ and vertical size $\LenY$, such that each element of the two dimensional array shall be set to zero.
145 \subsubsection{Motion compensation of a block}
146 \label{blockmc}
148 This section defines the $block\_mc()$ process for motion-compensating a single
149 block.
151 Each block shall be motion-compensated by applying a weighting matrix to a block prediction and adding the weighted prediction into the motion-compensated
152 prediction array.
154 The $block\_mc()$ process shall be defined as follows:
156 \begin{pseudo}{block\_mc}{mc\_pred,i,j,ref1,ref2,c}
157 \bsCODE{xstart = i*\XBsep-\XOffset}
158 \bsCODE{ystart = j*\XBsep-\XOffset}
159 \bsCODE{xstop = (i+1)*\XBsep+\XOffset}
160 \bsCODE{ystop = (j+1)*\YBsep+\YOffset}
161 \bsCODE{mode=\BlockData[j][i][\RMode]}
162 \bsCODE{W=spatial\_wt(i,j)}{\ref{pixelprediction}}
163 \bsFOR{y=\max(ystart,0)}{\min(ystop,\LenY)-1}
164 \bsFOR{x=\max(xstart,0)}{\min(xstop,\LenX)-1}
165 \bsCODE{p=x-xstart}
166 \bsCODE{q=y-ystart}
167 \bsIF{mode==\Intra}
168 \bsCODE{val=\BlockData[j][i][dc][c]}
169 \bsELSEIF{mode==\RefOneOnly}
170 \bsCODE{val=pixel\_pred(ref1, 1, i, j, x, y, c)}{\ref{pixelprediction}}
171 \bsCODE{val*=\RefOneWeight+\RefTwoWeight}
172 \bsCODE{val=(val+2^{\RefsWeightPrecision-1})\gg\RefsWeightPrecision}
173 \bsELSEIF{mode==\RefTwoOnly}
174 \bsCODE{val=pixel\_pred(ref2, 2, i, j, x, y, c)}{\ref{pixelprediction}}
175 \bsCODE{val*=\RefOneWeight+\RefTwoWeight}
176 \bsCODE{val=(val+2^{\RefsWeightPrecision-1})\gg\RefsWeightPrecision}
177 \bsELSEIF{mode==\RefOneAndTwo}
178 \bsCODE{val1=pixel\_pred(ref1, 1, i, j, x, y, c)}{\ref{pixelprediction}}
179 \bsCODE{val1*=\RefOneWeight}
180 \bsCODE{val2=pixel\_pred(ref2, 2, i, j, x, y, c)}{\ref{pixelprediction}}
181 \bsCODE{val2*=\RefTwoWeight}
182 \bsCODE{val=(val1+val2+2^{\RefsWeightPrecision-1})\gg\RefsWeightPrecision}
183 \bsEND
184 \bsCODE{val *= W[q][p]}
185 \bsCODE{mc\_tmp[y][x]+=val}
186 \bsEND
187 \bsEND
188 \end{pseudo}
190 \begin{informative}
191 Note that if the two reference weights are 1 and $\RefsWeightPrecision$ is 1, then
192 reference weighting is transparent and
194 \begin{pseudo*}
195 \bsCODE{val=pixel\_pred(ref1, 1, i, j, x, y, c)}
196 \bsCODE{val*=\RefOneWeight+\RefTwoWeight}
197 \bsCODE{val=(val+2^{\RefsWeightPrecision-1})\gg\RefsWeightPrecision}
198 \bsCODE{\ldots}
199 \end{pseudo*}
201 reduces to
203 \begin{pseudo*}
204 \bsCODE{val=pixel\_pred(ref1, 1, i, j, x, y, c)}
205 \bsCODE{\ldots}
206 \end{pseudo*}
208 In this case, therefore, the normal reference weighting produces no additional dynamic
209 range for internal processing and 10 bit video can be motion compensated with 16 bit
210 unsigned internal values.
212 In general, however, the worst case internal bit widths consist of the video bit depth plus the maximum of: 6 (the spatial matrix bit width) and the value of $\RefsWeightPrecision$. 6 bits
213 should be sufficient for most fading compensation applications, and so 16 bit internals will
214 suffice for all practical motion compensation scenarios for 8 and 10 bit video.
215 \end{informative}
218 \subsubsection{Spatial weighting matrix}
220 \label{mcspatialweights}
222 This section specifies the function $spatial\_wt(i,j)$ for deriving the 6-bit spatial weighting
223 matrix that shall be applied to the block with coordinates $(i,j)$.
225 Note that other weights shall be applied to the prediction as a result of the
226 weights applied to each reference.
228 The same weighting matrix shall be returned for all blocks within the interior
229 of the picture component array. Suitably modified weighting matrices shall
230 be returned for blocks at the edges of the picture component data array.
232 The function shall return a two-dimensional spatial weighting matrix. This
233 shall apply a linear roll-off in both horizontal and vertical directions.
235 The spatial matrix returned shall be the product of a horizontal and a vertical
236 weighting matrix. It shall be defined as follows:
238 \begin{pseudo}{spatial\_wt}{i,j}
239 \bsFOR{y=0}{\YBlen-1}
240 \bsFOR{x=0}{\XBlen-1}
241 \bsCODE{W[y][x]=h\_wt(i)[x]*v\_wt(j)[y]}
242 \bsEND
243 \bsEND
244 \bsRET{W}
245 \end{pseudo}
247 The horizontal weighting function shall be defined as follows:
249 \begin{pseudo}{h\_wt}{i}
250 \bsIF{\XOffset!=1}
251 \bsFOR{x=0}{2*\XOffset-1}
252 \bsCODE{hwt[x]=1+(6*x+\XOffset-1)//(2*\XOffset-1)}
253 \bsCODE{hwt[x+\XBsep]=8-hwt[x]}
254 \bsEND
255 \bsELSE
256 \bsCODE{hwt[0]=3}
257 \bsCODE{hwt[1]=5}
258 \bsCODE{hwt[\XBsep]=5}
259 \bsCODE{hwt[\XBsep+1]=3}
260 \bsEND
261 \bsFOR{x=2*\XOffset}{\XBsep-1}
262 \bsCODE{hwt[x]=8}
263 \bsEND
264 \bsIF{i==0}
265 \bsFOR{x=0}{2*\XOffset-1}
266 \bsCODE{hwt[x]=8}
267 \bsEND
268 \bsELSEIF{i==\BlocksX-1}
269 \bsFOR{x=0}{2*\XOffset-1}
270 \bsCODE{hwt[x+\XBsep]=8}
271 \bsEND
272 \bsEND
273 \bsRET{hwt}
274 \end{pseudo}
276 The vertical weighting function shall be defined as follows:
278 \begin{pseudo}{v\_wt}{j}
279 \bsIF{\YOffset!=1}
280 \bsFOR{y=0}{2*\YOffset-1}
281 \bsCODE{vwt[y]=1+(6*y+\YOffset-1)//(2*\YOffset-1)}
282 \bsCODE{vwt[y+\YBsep]=8-vwt[y]}
283 \bsEND
284 \bsELSE
285 \bsCODE{vwt[0]=3}
286 \bsCODE{vwt[1]=5}
287 \bsCODE{vwt[\YBsep]=5}
288 \bsCODE{vwt[\YBsep+1]=3}
289 \bsEND
290 \bsFOR{y=2*\YOffset}{\YBsep-1}
291 \bsCODE{vwt[y]=8}
292 \bsEND
293 \bsIF{j==0}
294 \bsFOR{y=0}{2*\YOffset-1}
295 \bsCODE{vwt[y]=8}
296 \bsEND
297 \bsELSEIF{j==\BlocksY-1}
298 \bsFOR{y=0}{2*\YOffset-1}
299 \bsCODE{vwt[y+\YBsep]=8}
300 \bsEND
301 \bsEND
302 \bsRET{vwt}
303 \end{pseudo}
305 \begin{informative}
306 The horizontal and vertical weighting arrays satisfy the perfect reconstruction property across block overlaps by construction:
307 \begin{eqnarray*}
308 hwt[x+\XBsep] & = & 8 - hwt[x] \\
309 vwt[y+\YBsep] & = & 8 - vwt[y]
310 \end{eqnarray*}
312 In addition, it can be shown they are always symmetric (except at picture edges), or
313 equivalently the leading edges have skew-symmetry about the half-way point:
314 \begin{eqnarray*}
315 hwt[\XBlen-1-x] & = & hwt[x] \\
316 vwt[\YBlen-1-y] & = & vwt[y]
317 \end{eqnarray*}
319 The horizontal and vertical weighting matrix components for various block
320 overlaps are shown in Table \ref{table:leadingedges}.
321 These encompass all the default values listed
322 in Table \ref{blockparamsvalues} for both luma and chroma.
323 \end{informative}
324 \begin{table}[!ht]
325 \centering
326 \begin{tabular}{|c|c|c|}
327 \hline
328 \rowcolor[gray]{0.75}\bf{Overlap} & \bf{Offset} & \bf{Leading edge} \\
329 \rowcolor[gray]{0.75}\bf{(length-separation)} & & \\
330 \hline
331 2 & 1 & 3,5\\
332 \hline
333 4 & 2 & 1,3,5,7\\
334 \hline
335 8 & 4 & 1,2,3,4,4,5,6,7\\
336 \hline
337 16 & 8 & 1,1,2,2,3,3,3,4,4,5,5,5,6,6,7,7 \\
338 \hline
339 \end{tabular}
340 \caption{Leading and trailing edge values for different block overlaps}
341 \label{table:leadingedges}
342 \end{table}
344 \begin{comment}
345 The profile of the matrix
346 for interior blocks is illustrated in Figure \ref{fig:weightprofile}.
348 \begin{figure}[!ht]
349 \centering
350 \includegraphics[width=0.7\textwidth]{figs/obmc-profile}
351 \caption{Profile of overlapped-block motion compensation matrix}
352 \label{fig:weightprofile}
353 \end{figure}
355 \begin{informative*}
356 \subsubsection{Reference weights and fade prediction (Informative)}
358 The reference prediction weights used for each prediction mode for
359 block prediction (Section \ref{blockmc}) may appear
360 confusing. It is helpful
361 to think of two cases for using reference picture weighting. The first is interpolative
362 prediction, where the picture being predicted is, for example, a cross-fade and is
363 closely approximated by some mixture of the reference pictures:
364 $P\backsimeq\delta R_1+(1-\delta)R_2$. Here the weights we'd like to
365 use for each frame prediction add up to 1 (or $2^\RefsWeightPrecision$
366 for integer weights).
367 The second case is scaling prediction, where
368 the weights we'd like to use for the frame predictions don't add up to 1: for example,
369 a fade to or from black
370 $P\backsimeq\delta_1 R_1$ and $P\backsimeq\delta_2 R_2$. It is not possible to choose
371 weights for each prediction mode which will be optimal both cases. The weighting
372 factors chosen will give work with interpolative prediction (which is more common)
373 but are not perfect for scaling prediction. It would have been possible to create a variety of
374 prediction modes to cover all cases, however the potential savings do not justify the
375 additional complexity.
377 For interpolative prediction, all data in the current picture will be of commensurate scale to
378 that of the references. In forming the bi-directional prediction, a value
379 $W_1 p_1 + W_2 p2_2$ is
380 formed, so the prediction has "scale" $W_1+W_2$. $W_1+W_2$ is
381 therefore the weighting value used to scale unidirectional prediction, in order to provide
382 predictions of commensurate order. The unity weighting value
383 $2^\RefsWeightPrecision$ is used
384 for DC blocks as this gives the best prediction, and in the interpolative case
385 this equals $W_1+W_2$
386 so all predictions are of the same order.
388 The weighting factors we would like to use for unidirectionally
389 redicted blocks in the scaling case
390 are $2W_1$ and $2W_2$ - the factor 2 takes into account that
391 we're only adding in one prediction
392 value as against two for bidirectional prediction. These factors differ f
393 rom $W_1+W_2$, and hence
394 unidirectional prediction is incorrect when there are two references.
395 Note, however, that we can
396 still perform prediction with the correct scaling values when we
397 only have a single reference. Note
398 also that the value of $W_1+W_2$ was selected instead of
399 $2^\RefsWeightPrecision$, which
400 would be equivalent in the interpolative case, as it gives a
401 better approximation when the
402 weights do not sum to $2^\RefsWeightPrecision$.
403 \end{informative*}
404 \end{comment}
406 \subsubsection{Pixel prediction}
407 \label{pixelprediction}
409 This section defines the operation of the $pixel\_pred(ref, ref\_num, i, j, x, y, c)$
410 process which shall be used for forming the prediction for a pixel
411 with coordinates $(x,y)$ in component $c$, belonging to the block with coordinates $(i,j)$.
413 The pixel prediction process shall consist of two stages. In the first stage, a motion vector
414 to be applied to pixel $(x,y)$ shall be derived. For block motion, this shall be a block
415 motionvector that shall apply to all pixels in a block. For global motion the motion
416 vector shall be computed from the global motion parameters and may vary pixel-by-pixel.
418 In the second stage, the motion vector shall be used to derive coordinates in an reference picture.
420 \begin{pseudo}{pixel\_pred}{ref,ref\_num,i,j,x,y,c}
421 \bsIF{\BlockData[j][i][\GMode]==\false}
422 \bsCODE{mv= \BlockData[j][i][\Vect][ref\_num]}
423 \bsELSE
424 \bsCODE{mv=global\_mv(ref\_num, x, y)}{\ref{globalmv}}
425 \bsEND
426 \bsIF{c!=Y}
427 \bsCODE{mv = chroma\_mv\_scale(mv)}{\ref{chromamvscale}}
428 \bsEND
429 \bsCODE{px = (x\ll \MotionVectorPrecision)+mv[0]}
430 \bsCODE{py = (y\ll \MotionVectorPrecision)+mv[1]}
431 \bsIF{\MotionVectorPrecision>0}
432 \bsRET{subpel\_predict(ref, c, px, py))}{\ref{upconvert}}
433 \bsELSE
434 \bsRET{ref[\clip(py,0,\height(ref)-1)][\clip(px,0,\width(ref)-1)]}
435 \bsEND
436 \end{pseudo}
438 \subsubsection{Global motion vector field generation}
439 \label{globalmv}
441 This section specifies the operation of the $global\_mv(ref\_num, x,y)$ process
442 for deriving a global motion vector for a pixel at location $(x,y)$.
444 The function shall be defined as follows:
446 \begin{pseudo}{global\_mv}{ref\_num, x,y}
447 \bsCODE{ez = \GlobalParams[ref\_num][\ZRSexponent]}
448 \bsCODE{ep = \GlobalParams[ref\_num][\PerspectiveExponent]}
449 \bsCODE{b=\GlobalParams[ref\_num][\PanTilt]}
450 \bsCODE{A=\GlobalParams[ref\_num][\ZRS]}
451 \bsCODE{c=\GlobalParams[ref\_num][\Perspective]}
452 \bsCODE{m=2^{ep}-(c[0]*x+c[1]*y)}
453 \bsCODE{v[0]=m*((A[0][0]*x+A[0][1]*y)+2^{ez}*b[0])}
454 \bsCODE{v[1]=m*((A[1][0]*x+A[1][1]*y)+2^{ez}*b[1])}
455 \bsCODE{v[0] = (v[0]+(1\ll(ez+ep)) )\gg (ez+ep)}
456 \bsCODE{v[1] = (v[1]+(1\ll(ez+ep)) )\gg (ez+ep)}
457 \bsRET{v}
458 \end{pseudo}
460 \begin{informative}
461 Write ${\bf x}=\left( \begin{array}{c} x\\y \end{array}\right)$.
462 Mathematically, we wish the global motion vector ${\bf v}$ to be defined by:
463 \[{\bf v}=\dfrac{{\bf Ax}+{\bf b}}{1+{\bf c}^T{\bf x}}\]
464 where: ${\bf A}$ is a matrix describing the degree of zoom, rotation or shear; ${\bf b}$
465 is a translation vector; and ${\bf c}$ is a perspective vector which expresses the
466 degree to which the global motion is not orthogonal to the axis of view.
468 In Dirac, this formula is adjusted in two ways in order to get an implementable result.
469 Firstly, the perspective element is adjusted to remove a division, changing the
470 formula to:
471 \[{\bf v}=(1-{\bf c}^T{\bf x})({\bf Ax}+{\bf b})\]
472 which is valid for small ${\bf c}$. Secondly, the formula is re-cast in terms of integer
473 arithmetic by giving the matrix element an accuracy factor $\alpha$ and the perspective
474 element an accuracy factor $\beta$:
475 \[{\bf v}=(1-2^{-\beta}{\bf c}^T{\bf x})(2^{-\alpha}{\bf Ax}+{\bf b})\]
476 where the parameters ${\bf A}, {\bf b},{\bf c}$ are now integral. (No accuracy bits are required for the translation, since it must be an integral number of sub-pixels.)
478 This reduces to
479 \[2^{\alpha+\beta}{\bf v}=(2^\beta-{\bf c}^T{\bf x})({\bf Ax}+2^\alpha{\bf b})\]
480 and this formula is used for the computation of values.
481 \end{informative}
483 \subsubsection{Chroma subsampling}
484 \label{chromamvscale}
486 When motion compensating chroma components, motion vectors shall be scaled by the
487 $chroma\_mv\_scale()$ function. This produces chroma vectors in units of
488 $\MotionVectorPrecision$ with respect to the chroma samples, as follows:
490 \begin{pseudo}{chroma\_mv\_scale}{v}
491 \bsCODE{sv[0] = v[0]//chroma\_h\_ratio()}{\ref{picturedimensions}}
492 \bsCODE{sv[1] = v[1]//chroma\_v\_ratio()}{\ref{picturedimensions}}
493 \bsRET{sv}
494 \end{pseudo}.
496 \begin{informative}
497 Recall that division in this specification rounds towards -infinity. This division can be achieved by a bit-shift in C/C++ as chroma dimension ratios are 1 or 2.
498 \end{informative}
501 \subsubsection{Sub-pixel prediction}
502 \label{upconvert}
504 This section defines the operation of the $subpel\_predict(ref, c, u, v)$ function
505 for producing a sub-pixel accurate value at location $(u,v)$ from an upconverted picture reference component of type $c$ (Y, C1 or C2).
507 Upconversion shall be defined by means of a half-pixel interpolated reference array
508 $upref$. $upref$ shall have dimensions $(2W-1)$x$(2H-1)$ where the original reference
509 picture component array has dimensions $W$x$H$, as per Section \ref{halfpel}.
511 Motion vectors shall be permitted to extend beyond the edges of reference picture data,
512 where values lying outside shall be determined by edge extension.
514 If $\MotionVectorPrecision==1$, upconverted values shall be derived directly from the
515 the half-pixel interpolated array $upref$, which shall be calculated as per Section \ref{halfpel}.
517 If $\MotionVectorPrecision==2$ or $\MotionVectorPrecision==3$, upconverted values shall be
518 derived by linear interpolation from the half-pixel interpolated array.
520 The sub-pixel prediction process shall be defined as follows:
522 \begin{pseudo}{subpel\_predict}{ref,c,u,v}
523 \bsCODE{upref=interp2by2(ref,c)}{\ref{halfpel}}
524 \bsCODE{hu = u \gg (\MotionVectorPrecision-1)}
525 \bsCODE{hv = v \gg (\MotionVectorPrecision-1)}
526 \bsCODE{ru = u-(hu\ll (\MotionVectorPrecision-1))}
527 \bsCODE{rv = v-(hv\ll (\MotionVectorPrecision-1))}
528 \bsCODE{w00 = (2^{\MotionVectorPrecision-1}-rv)*(2^{\MotionVectorPrecision-1}-ru)}
529 \bsCODE{w01 = (2^{\MotionVectorPrecision-1}-rv)*ru}
530 \bsCODE{w10 = rv*(2^{\MotionVectorPrecision-1}-ru)}
531 \bsCODE{w11 = rv*ru}
532 \bsCODE{xpos = \clip(hu, 0, \width(upref)-1)}
533 \bsCODE{xpos1 = \clip(hu+1, 0,\width(upref)-1)}
534 \bsCODE{ypos = \clip(hv, 0, \height(upref)-1)}
535 \bsCODE{ypos1 = \clip(hv+1, 0, \height(upref)-1)}
536 \bsCODE{\begin{array}{ll} val = & w00*upref[ypos][xpos]+w01*upref[ypos][xpos1]+ \\
537 & w10*upref[ypos1][xpos]+w11*upref[ypos1][xpos1]
538 \end{array}}
539 \bsIF{\MotionVectorPrecision>1}
540 \bsRET{(val+2^{2*\MotionVectorPrecision-3})\gg(2*\MotionVectorPrecision-2)}
541 \bsELSE
542 \bsRET{val}
543 \bsEND
544 \end{pseudo}
546 \begin{informative}
547 $hu$ and $hv$ represent the half-pixel part of the sub-pixel position $(u,v)$.
549 $ru$ and $rv$ represent the remaining sub-pixel component of the position.
550 $ru$ and $rv$ satisfy \[0\leq ru,rv <2^{\MotionVectorPrecision-1}\]
552 The four weights $w00,w01,w10$ and $w11$ sum to $2^{2*\MotionVectorPrecision-2}$, and
553 hence the upconverted value is returned to the initial pixel ranges in the pseudocode
554 above.
556 Note that the remainder values $ru$ and $rv$, and hence the four weight values,
557 only depend on the motion vectors. This is because
558 $u$ and $v$ have been computed by scaling the picture coordinates by
559 $2^{\MotionVectorPrecision}$ and adding the motion vector.
561 In particular constant linear interpolation weights are applied throughout a
562 block when block motion is used. Likewise, the necessity of clipping the ranges of
563 $xpos$, $ypos$ etc can be determined in advance for each block by checking whether any
564 corner of the reference block will fall outside of the reference picture area. In most
565 cases it will not and clipping will not be required for motion compensating most blocks.
567 For half-pixel motion vectors ($\MotionVectorPrecision$ is 1), the majority of the
568 pseudocode is redundant, and the return value $val$ will merely be the value at
569 position $(u,v)$, clipped to the ranges of the upconverted reference.
571 \end{informative}
573 \subsubsection{Half-pixel interpolation}
574 \label{halfpel}
576 This section defines the $interp2by2(ref,c)$ process for generating
577 an upconverted reference array $upref$ representing a half-pixel interpolation of
578 the reference array $ref$ for component $c$ (Y, C1, or C2).
580 $upref$ shall be created in two stages. The first stage shall upconvert vertically. The second stage shall upconvert horizontally.
582 $upref$ shall have width $2*\width(ref)-1$ and height $2*\height(ref)-1$, so that all
583 edge values shall be copied from the original array and not interpolated.
585 The interpolation filter shall be the 8-tap symmetric filter with taps as defined in Figure \ref{upfilter}.
587 \begin{figure}[h!]
588 \begin{centering}
589 \begin{tabular}{l|ccccc}
590 Tap & $t[0]$ & $t[1]$ & $t[2]$ & $t[3]$\\
591 \hline
592 Value & 21 & -7 & 3 & -1
593 \end{tabular}
594 \caption{Interpolation filter coefficients \label{upfilter}}
595 \end{centering}
596 \end{figure}
598 Where coefficients used in the filtering process fall outside the bounds of the
599 reference array, values shall be supplied by edge extension.
601 The overall process shall be defined as follows:
603 \begin{pseudo}{interp2by2}{ref,c}
604 \bsIF{c==Y}
605 \bsCODE{bit\_depth=\LumaDepth}
606 \bsELSE
607 \bsCODE{bit\_depth=\ChromaDepth}
608 \bsEND
609 \bsFOR{q=0}{2*\height(ref)-2}
610 \bsIF{q\%2==0}
611 \bsFOR{p=0}{\width(ref)-1}
612 \bsCODE{ref2[q][p]=ref[q//2][p]}
613 \bsEND
614 \bsELSE
615 \bsFOR{p=0}{\width(ref)-1}
616 \bsCODE{ref2[q][p]=16}
617 \bsFOR{i=0}{3}
618 \bsCODE{ypos=(q-1)//2-i}
619 \bsCODE{ref2[q][p]+=t[i]*ref[\clip(ypos,0,\height(ref)-1)][p]}
620 \bsCODE{ypos=(q+1)//2+i}
621 \bsCODE{ref2[q][p]+=t[i]*ref[\clip(ypos,0,\height(ref)-1)][p]}
622 \bsEND
623 \bsCODE{ref2[q][p] \gg=5}
624 \bsCODE{ref2[q][p] = \clip(ref2[q][p], -2^{bit\_depth-1}, 2^{bit\_depth-1}-1)}
625 \bsEND
626 \bsEND
627 \bsEND
628 \bsFOR{q=0}{2*\height(ref)-2}
629 \bsFOR{p=0}{2*\width(ref)-2}
630 \bsIF{p\%2==0}
631 \bsCODE{upref[q][p]=ref2[q][p//2]}
632 \bsELSE
633 \bsCODE{upref[q][p]=16}
634 \bsFOR{i=0}{3}
635 \bsCODE{xpos=(p-1)//2-i}
636 \bsCODE{upref[q][p]+=t[i]*ref2[q][\clip(xpos,0,\width(ref)-1)]}
637 \bsCODE{xpos=(p+1)//2+i}
638 \bsCODE{upref[q][p]+=t[i]*ref2[q][\clip(xpos,0,\width(ref)-1)]}
639 \bsEND
640 \bsCODE{upref[q][p] \gg=5}
641 \bsCODE{upref[q][p] = \clip(upref[q][p], -2^{bit\_depth-1}, 2^{bit\_depth-1}-1)}
642 \bsEND
643 \bsEND
644 \bsEND
645 \end{pseudo}
647 \begin{informative}
648 While this filter may appear to be variable separable, the integer rounding and
649 clipping processes prevent this being so. Note also that the clipping process for
650 filtering terms implies that the upconversion uses edge-extension at the array
651 edges, consistent with the edge-extension used in motion-compensation itself.
652 \end{informative}