mc.tex

   1 \label{motioncompensate}
   2
   3 This section defines the operation of the process
   4 $motion\_compensate(ref1, ref2,  pic, c)$ for motion-compensating a
   5 picture component array  $pic$ of type $c=Y, U$ or $V$ from reference
   6 component arrays $ref1$ and $ref2$ of the same type.
   7
   8 This process shall be invoked for each component in a picture, subsequent to the
   9 decoding of coefficient data, specified in Section \ref{transformdec}, and the Inverse Wavelet Transform (IWT), specified in Section \ref{idwt}.
  10
  11 Motion compensation shall use the motion block data $\BlockData$ and optionally may use the
  12 global motion parameters $\GlobalParams$.
  13
  14 \begin{informative*}
  15 \subsubsection{Overlapped Block Motion Compensation (OBMC) (Informative)}
  16
  17 Motion compensated prediction methods provide methods for determining
  18 predictions for pixels in the current picture by using motion vectors to
  19 define offsets from those pixels to pixels in previously decoded
  20 pictures. Motion compensation techniques vary in how those pixels are grouped
  21 together, and how a prediction is formed for pixels in a given group. In
  22 conventional  block motion compensation, as used in MPEG2, H.264 and many other
  23 codecs, the picture is divided into {\em disjoint} rectangular blocks and the
  24 motion vector or vectors associated with that block defines the offset(s) into
  25 the reference pictures.
  26
  27 In OBMC, by contrast, the predicted picture is divided into a regular overlapping
  28 blocks of dimensions $xblen$ by $yblen$ that cover at least the entire picture
  29 area as shown in figure \ref{fig:blockcoverage}.  Overlapping is ensured by starting
  30 each block at a horizontal separation $xbsep$ and a vertical separation $ybsep$
  31 from its neighbours, where these values are less than the corresponding block dimensions.
  32 \end{informative*}
  33
  34 \begin{figure}[!ht]
  35 \centering
  36 \includegraphics[width=0.7\textwidth]{figs/block-coverage}
  37 \caption{Block coverage of the predicted picture}
  38 \label{fig:blockcoverage}
  39 \end{figure}
  40
  41 \begin{informative*}
  42 The overlap between blocks horizontally is $xoffset=(xblen - xbsep)/2$ both on the left
  43 and on the right, and vertically is $yoffset=(yblen - ybsep)/2$ both on the top and the
  44 bottom. As a result pixels in the overlapping areas lie in more than
  45 one block, and so more than one motion vector set (and set of associated predictions)
  46 applies to them. Indeed, a pixel may have up to eight predictions, as it may belong to
  47 up to four blocks, each of which may have up to two motion vectors. These are combined
  48 into a single prediction by using weights, which are so constructed so as to sum to 1. In
  49  the  Dirac integer implementation, fractional weights are achieved by insisting that weights
  50  sum to a power of 2, which is then shifted out once all contributions have been summed.
  51
  52 In Dirac blocks are positioned so that blocks will overspill the left and top edges by
  53 ($xoffset$) and ($yoffset$) pixels.  The number of blocks has been
  54 determined (Section \ref{motiondatadimensions}) so that the picture area is wholly
  55  covered, and the overspill
  56  on the right hand and bottom edges will be at least the amount on the left and top edges.
  57 Indeed, the number of blocks has been set so that the blocks divide into whole superblocks
  58 (sets of 4x4 blocks), which mean that some blocks may fall entirely out of the picture area.
  59  Any predictions for pixels outside the actual picture area are discarded.
  60 \end{informative*}
  61
  62 \subsubsection{Overall motion compensation process}
  63 \label{mcprocess}
  64
  65 The motion compensation process shall form an integer prediction for each pixel in
  66 the predicted picture component $pic$, which shall be added to the pixel value, and
  67  then clipped to keep it in range.
  68
  69 The $motion\_compensate()$ process is defined by means of a temporary data
  70 array $mc\_tmp$ for storing the motion-compensated prediction for the
  71 current picture.
  72
  73 The $motion\_compensate()$ process shall be defined as follows:
  74
  75 \begin{pseudo}{motion\_compensate}{ref1, ref2,  pic, c}
  76 \bsIF{c==Y}
  77     \bsCODE{bit\_depth=\LumaDepth}
  78 \bsELSE
  79     \bsCODE{bit\_depth=\ChromaDepth}
  80 \bsEND
  81 \bsCODE{init\_dimensions(c)}{\ref{mcdimensions}}
  82 \bsCODE{mc\_tmp=init\_temp\_array()}{\ref{mctemparray}}
  83 \bsFOR{j=0}{\BlocksY-1}
  84     \bsFOR{i=0}{\BlocksX-1}
  85         \bsCODE{block\_mc(mc\_tmp,i,j,ref1,ref2,c)}{\ref{blockmc}}
  86     \bsEND
  87 \bsEND
  88 \bsFOR{y=0}{\LenY-1}
  89     \bsFOR{x=0}{\LenX-1}
  90        \bsCODE{pic[y][x] += (mc\_tmp[y][x]+32)\gg 6}
  91         \bsCODE{pic[y][x] = \clip(pic[y][x], -2^{bit\_depth-1}, 2^{bit\_depth-1}-1)}
  92     \bsEND
  93 \bsEND
  94 \end{pseudo}
  95
  96 \begin{informative}
  97 Six bits are used for the overlapped-block weighting matrix. This ensures that 10-bit
  98 data may normally be motion compensated using 16-bit words as per Section \ref{blockmc}.
  99 \end{informative}
 100
 101 \subsubsection{Dimensions}
 102 \label{mcdimensions}
 103 Since motion compensation shall apply to both luma and (potentially subsampled)
 104 chroma data, for simplicity a number of variables are defined by the
 105 $init\_dimensions()$ function, which is as follows:
 106
 107 \begin{pseudo}{init\_dimensions}{c}
 108 \bsIF{c==Y}
 109    \bsCODE{\LenX=\LumaWidth}
 110    \bsCODE{\LenY=\LumaHeight}
 111    \bsCODE{\XBlen=\LumaXBlen}
 112    \bsCODE{\YBlen=\LumaYBlen}
 113    \bsCODE{\XBsep=\LumaXBsep}
 114    \bsCODE{\YBsep=\LumaYBsep}
 115 \bsELSE
 116    \bsCODE{\LenX=\ChromaWidth}
 117    \bsCODE{\LenY=\ChromaHeight}
 118    \bsCODE{\XBlen=\ChromaXBlen}
 119    \bsCODE{\YBlen=\ChromaYBlen}
 120    \bsCODE{\XBsep=\ChromaXBsep}
 121    \bsCODE{\YBsep=\ChromaYBsep}
 122 \bsEND
 123 \bsCODE{\XOffset = (\XBlen-\XBsep)//2}
 124 \bsCODE{\YOffset = (\YBlen-\YBsep)//2}
 125 \end{pseudo}
 126
 127 \begin{informative}
 128 The subband data that makes up the IWT coefficients is padded in order that the IWT
 129 may function correctly. For simplicity, in this specification, padding data is removed
 130 after the IWT has been performed so that the picture data and reference data arrays have
 131 the same dimensions for motion compensation. However, it may be more efficient to
 132 perform all operations prior to the output of pictures using padded data, i.e. to discard
 133  padding values subsequent to motion compensation. Such a course of action is equivalent,
 134  so long as it is realised that blocks must be regarded as edge blocks if they overlap the
 135  actual picture area, not the larger area produced by padding.
 136 \end{informative}
 137
 138 \subsubsection{Initialising the motion compensated data array}
 139 \label{mctemparray}
 140
 141 The $init\_temp\_array()$ function shall return a two-dimensional data array with
 142 horizontal size $\LenX$ and vertical size $\LenY$, such that each element of the two dimensional array shall be set to zero.
 143
 144
 145 \subsubsection{Motion compensation of a block}
 146 \label{blockmc}
 147
 148 This section defines the $block\_mc()$ process for motion-compensating a single
 149 block.
 150
 151 Each block shall be motion-compensated by applying a weighting matrix to a block prediction and adding the weighted prediction into the motion-compensated
 152 prediction array.
 153
 154 The $block\_mc()$ process shall be defined as follows:
 155
 156 \begin{pseudo}{block\_mc}{mc\_pred,i,j,ref1,ref2,c}
 157 \bsCODE{xstart = i*\XBsep-\XOffset}
 158 \bsCODE{ystart = j*\XBsep-\XOffset}
 159 \bsCODE{xstop = (i+1)*\XBsep+\XOffset}
 160 \bsCODE{ystop = (j+1)*\YBsep+\YOffset}
 161 \bsCODE{mode=\BlockData[j][i][\RMode]}
 162 \bsCODE{W=spatial\_wt(i,j)}{\ref{pixelprediction}}
 163 \bsFOR{y=\max(ystart,0)}{\min(ystop,\LenY)-1}
 164     \bsFOR{x=\max(xstart,0)}{\min(xstop,\LenX)-1}
 165         \bsCODE{p=x-xstart}
 166         \bsCODE{q=y-ystart}
 167         \bsIF{mode==\Intra}
 168             \bsCODE{val=\BlockData[j][i][dc][c]}
 169         \bsELSEIF{mode==\RefOneOnly}
 170             \bsCODE{val=pixel\_pred(ref1, 1, i, j, x, y, c)}{\ref{pixelprediction}}
 171             \bsCODE{val*=\RefOneWeight+\RefTwoWeight}
 172             \bsCODE{val=(val+2^{\RefsWeightPrecision-1})\gg\RefsWeightPrecision}
 173         \bsELSEIF{mode==\RefTwoOnly}
 174             \bsCODE{val=pixel\_pred(ref2, 2, i, j, x, y, c)}{\ref{pixelprediction}}
 175             \bsCODE{val*=\RefOneWeight+\RefTwoWeight}
 176             \bsCODE{val=(val+2^{\RefsWeightPrecision-1})\gg\RefsWeightPrecision}
 177         \bsELSEIF{mode==\RefOneAndTwo}
 178             \bsCODE{val1=pixel\_pred(ref1, 1, i, j, x, y, c)}{\ref{pixelprediction}}
 179             \bsCODE{val1*=\RefOneWeight}
 180             \bsCODE{val2=pixel\_pred(ref2, 2, i, j, x, y, c)}{\ref{pixelprediction}}
 181             \bsCODE{val2*=\RefTwoWeight}
 182             \bsCODE{val=(val1+val2+2^{\RefsWeightPrecision-1})\gg\RefsWeightPrecision}
 183         \bsEND
 184         \bsCODE{val *= W[q][p]}
 185         \bsCODE{mc\_tmp[y][x]+=val}
 186     \bsEND
 187 \bsEND
 188 \end{pseudo}
 189
 190 \begin{informative}
 191 Note that if the two reference weights are 1 and $\RefsWeightPrecision$ is 1, then
 192 reference weighting is transparent and
 193
 194 \begin{pseudo*}
 195 \bsCODE{val=pixel\_pred(ref1, 1, i, j, x, y, c)}
 196 \bsCODE{val*=\RefOneWeight+\RefTwoWeight}
 197 \bsCODE{val=(val+2^{\RefsWeightPrecision-1})\gg\RefsWeightPrecision}
 198 \bsCODE{\ldots}
 199 \end{pseudo*}
 200
 201 reduces to
 202
 203 \begin{pseudo*}
 204 \bsCODE{val=pixel\_pred(ref1, 1, i, j, x, y, c)}
 205 \bsCODE{\ldots}
 206 \end{pseudo*}
 207
 208 In this case, therefore, the normal reference weighting produces no additional dynamic
 209 range for internal processing and 10 bit video can be motion compensated with 16 bit
 210 unsigned internal values.
 211
 212 In general, however, the worst case internal bit widths consist of the video bit depth plus the maximum of: 6 (the spatial matrix bit width) and the value of $\RefsWeightPrecision$. 6 bits
 213 should be sufficient for most fading compensation applications, and so 16 bit internals will
 214 suffice for all practical motion compensation scenarios for 8 and 10 bit video.
 215 \end{informative}
 216
 217
 218 \subsubsection{Spatial weighting matrix}
 219
 220 \label{mcspatialweights}
 221
 222 This section specifies the function $spatial\_wt(i,j)$ for deriving the 6-bit spatial weighting
 223 matrix that shall be applied to the block with coordinates  $(i,j)$.
 224
 225 Note that other weights shall be applied to the prediction as a result of the
 226 weights applied to each reference.
 227
 228 The same weighting matrix shall be returned for all blocks within the interior
 229 of the picture component array. Suitably modified weighting matrices shall
 230 be returned for blocks at the edges of the picture component data array.
 231
 232 The function shall return a two-dimensional spatial weighting matrix. This
 233 shall apply a linear roll-off in both horizontal and vertical directions.
 234
 235 The spatial matrix returned shall be the product of a horizontal and a vertical
 236 weighting matrix. It shall be defined as follows:
 237
 238 \begin{pseudo}{spatial\_wt}{i,j}
 239 \bsFOR{y=0}{\YBlen-1}
 240     \bsFOR{x=0}{\XBlen-1}
 241         \bsCODE{W[y][x]=h\_wt(i)[x]*v\_wt(j)[y]}
 242     \bsEND
 243 \bsEND
 244 \bsRET{W}
 245 \end{pseudo}
 246
 247 The horizontal weighting function shall be defined as follows:
 248
 249 \begin{pseudo}{h\_wt}{i}
 250 \bsIF{\XOffset!=1}
 251     \bsFOR{x=0}{2*\XOffset-1}
 252         \bsCODE{hwt[x]=1+(6*x+\XOffset-1)//(2*\XOffset-1)}
 253         \bsCODE{hwt[x+\XBsep]=8-hwt[x]}
 254     \bsEND
 255 \bsELSE
 256     \bsCODE{hwt[0]=3}
 257     \bsCODE{hwt[1]=5}
 258     \bsCODE{hwt[\XBsep]=5}
 259     \bsCODE{hwt[\XBsep+1]=3}
 260 \bsEND
 261 \bsFOR{x=2*\XOffset}{\XBsep-1}
 262     \bsCODE{hwt[x]=8}
 263 \bsEND
 264 \bsIF{i==0}
 265     \bsFOR{x=0}{2*\XOffset-1}
 266         \bsCODE{hwt[x]=8}
 267     \bsEND
 268 \bsELSEIF{i==\BlocksX-1}
 269     \bsFOR{x=0}{2*\XOffset-1}
 270         \bsCODE{hwt[x+\XBsep]=8}
 271     \bsEND
 272 \bsEND
 273 \bsRET{hwt}
 274 \end{pseudo}
 275
 276 The vertical weighting function  shall be defined as follows:
 277
 278 \begin{pseudo}{v\_wt}{j}
 279 \bsIF{\YOffset!=1}
 280     \bsFOR{y=0}{2*\YOffset-1}
 281         \bsCODE{vwt[y]=1+(6*y+\YOffset-1)//(2*\YOffset-1)}
 282         \bsCODE{vwt[y+\YBsep]=8-vwt[y]}
 283     \bsEND
 284 \bsELSE
 285     \bsCODE{vwt[0]=3}
 286     \bsCODE{vwt[1]=5}
 287     \bsCODE{vwt[\YBsep]=5}
 288     \bsCODE{vwt[\YBsep+1]=3}
 289 \bsEND
 290 \bsFOR{y=2*\YOffset}{\YBsep-1}
 291     \bsCODE{vwt[y]=8}
 292 \bsEND
 293 \bsIF{j==0}
 294     \bsFOR{y=0}{2*\YOffset-1}
 295         \bsCODE{vwt[y]=8}
 296     \bsEND
 297 \bsELSEIF{j==\BlocksY-1}
 298     \bsFOR{y=0}{2*\YOffset-1}
 299         \bsCODE{vwt[y+\YBsep]=8}
 300     \bsEND
 301 \bsEND
 302 \bsRET{vwt}
 303 \end{pseudo}
 304
 305 \begin{informative}
 306 The horizontal and vertical weighting arrays satisfy the perfect reconstruction property across block overlaps by construction:
 307 \begin{eqnarray*}
 308 hwt[x+\XBsep] & = & 8 - hwt[x] \\
 309 vwt[y+\YBsep] & = & 8 - vwt[y]
 310 \end{eqnarray*}
 311
 312 In addition, it can be shown they are always symmetric (except at picture edges), or
 313 equivalently the leading edges have skew-symmetry about the half-way point:
 314 \begin{eqnarray*}
 315 hwt[\XBlen-1-x] & =  & hwt[x] \\
 316 vwt[\YBlen-1-y] & = & vwt[y]
 317 \end{eqnarray*}
 318
 319 The horizontal and vertical weighting matrix components for various block
 320  overlaps are shown in Table \ref{table:leadingedges}.
 321 These encompass all the default values listed
 322 in Table \ref{blockparamsvalues} for both luma and chroma.
 323 \end{informative}
 324 \begin{table}[!ht]
 325 \centering
 326 \begin{tabular}{|c|c|c|}
 327 \hline
 328 \rowcolor[gray]{0.75}\bf{Overlap}  & \bf{Offset} & \bf{Leading edge} \\
 329 \rowcolor[gray]{0.75}\bf{(length-separation)} & & \\
 330 \hline
 331 2 & 1 & 3,5\\
 332 \hline
 333 4 & 2 & 1,3,5,7\\
 334 \hline
 335 8 & 4 & 1,2,3,4,4,5,6,7\\
 336 \hline
 337 16 & 8 & 1,1,2,2,3,3,3,4,4,5,5,5,6,6,7,7 \\
 338 \hline
 339 \end{tabular}
 340 \caption{Leading and trailing edge values for different block overlaps}
 341 \label{table:leadingedges}
 342 \end{table}
 343
 344 \begin{comment}
 345 The profile of the matrix
 346 for interior blocks is illustrated in Figure \ref{fig:weightprofile}.
 347
 348 \begin{figure}[!ht]
 349 \centering
 350 \includegraphics[width=0.7\textwidth]{figs/obmc-profile}
 351 \caption{Profile of overlapped-block motion compensation matrix}
 352 \label{fig:weightprofile}
 353 \end{figure}
 354
 355 \begin{informative*}
 356 \subsubsection{Reference weights and fade prediction (Informative)}
 357
 358 The reference prediction weights used for each prediction mode for
 359 block prediction (Section \ref{blockmc}) may appear
 360 confusing. It is helpful
 361 to think of two cases for using reference picture weighting. The first is interpolative
 362 prediction, where the picture being predicted is, for example, a cross-fade and is
 363 closely approximated by some mixture of the reference pictures:
 364  $P\backsimeq\delta R_1+(1-\delta)R_2$. Here the weights we'd like to
 365 use for each frame prediction add up to 1 (or $2^\RefsWeightPrecision$
 366 for integer weights).
 367 The second case is scaling prediction, where
 368 the weights we'd like to use for the frame predictions don't add up to 1: for example,
 369 a fade to or from black
 370 $P\backsimeq\delta_1 R_1$ and $P\backsimeq\delta_2 R_2$. It is not possible to choose
 371 weights for each prediction mode which will be optimal both cases. The weighting
 372 factors chosen will give work with interpolative prediction (which is more common)
 373 but are not perfect for scaling prediction. It would have been possible to create a variety of
 374 prediction modes to cover all cases, however the potential savings do not justify the
 375 additional complexity.
 376
 377 For interpolative prediction, all data in the current picture will be of commensurate scale to
 378 that of the references. In forming the bi-directional prediction, a value
 379 $W_1 p_1 + W_2 p2_2$ is
 380 formed, so the prediction has "scale" $W_1+W_2$. $W_1+W_2$ is
 381 therefore the weighting value used to scale unidirectional prediction, in order to provide
 382 predictions of commensurate order. The unity weighting value
 383 $2^\RefsWeightPrecision$ is used
 384 for DC blocks as this gives the best prediction, and in the interpolative case
 385 this equals $W_1+W_2$
 386 so all predictions are of the same order.
 387
 388 The weighting factors we would like to use for unidirectionally
 389 redicted blocks in the scaling case
 390 are $2W_1$ and $2W_2$ - the factor 2 takes into account that
 391 we're only adding in one prediction
 392 value as against two for bidirectional prediction. These factors differ f
 393 rom $W_1+W_2$, and hence
 394 unidirectional prediction is incorrect when there are two references.
 395 Note, however, that we can
 396 still perform prediction with the correct scaling values when we
 397 only have a single reference. Note
 398 also that the value of $W_1+W_2$ was selected instead of
 399 $2^\RefsWeightPrecision$, which
 400 would be equivalent in the interpolative case, as it gives a
 401 better approximation when the
 402 weights do not sum to $2^\RefsWeightPrecision$.
 403 \end{informative*}
 404 \end{comment}
 405
 406 \subsubsection{Pixel prediction}
 407 \label{pixelprediction}
 408
 409 This section defines the operation of the $pixel\_pred(ref, ref\_num, i, j, x, y, c)$
 410 process which shall be used for forming the prediction for a pixel
 411 with coordinates $(x,y)$ in component $c$, belonging to the block with coordinates $(i,j)$.
 412
 413 The pixel prediction process shall consist of two stages. In the first stage, a motion vector
 414  to be applied to pixel $(x,y)$ shall be derived. For block motion, this shall be a block
 415  motionvector that shall apply to all pixels in a block. For global motion the motion
 416 vector shall be computed from the global motion parameters and may vary pixel-by-pixel.
 417
 418 In the second stage, the motion vector shall be used to derive coordinates in an reference picture.
 419
 420 \begin{pseudo}{pixel\_pred}{ref,ref\_num,i,j,x,y,c}
 421 \bsIF{\BlockData[j][i][\GMode]==\false}
 422   \bsCODE{mv= \BlockData[j][i][\Vect][ref\_num]}
 423 \bsELSE
 424   \bsCODE{mv=global\_mv(ref\_num, x, y)}{\ref{globalmv}}
 425 \bsEND
 426 \bsIF{c!=Y}
 427   \bsCODE{mv = chroma\_mv\_scale(mv)}{\ref{chromamvscale}}
 428 \bsEND
 429 \bsCODE{px = (x\ll \MotionVectorPrecision)+mv[0]}
 430 \bsCODE{py = (y\ll \MotionVectorPrecision)+mv[1]}
 431 \bsIF{\MotionVectorPrecision>0}
 432   \bsRET{subpel\_predict(ref, c, px, py))}{\ref{upconvert}}
 433 \bsELSE
 434   \bsRET{ref[\clip(py,0,\height(ref)-1)][\clip(px,0,\width(ref)-1)]}
 435 \bsEND
 436 \end{pseudo}
 437
 438 \subsubsection{Global motion vector field generation}
 439 \label{globalmv}
 440
 441 This section specifies the operation of the $global\_mv(ref\_num, x,y)$ process
 442 for deriving a global motion vector for a pixel at location $(x,y)$.
 443
 444 The function shall be defined as follows:
 445
 446 \begin{pseudo}{global\_mv}{ref\_num, x,y}
 447 \bsCODE{ez  =  \GlobalParams[ref\_num][\ZRSexponent]}
 448 \bsCODE{ep  =  \GlobalParams[ref\_num][\PerspectiveExponent]}
 449 \bsCODE{b=\GlobalParams[ref\_num][\PanTilt]}
 450 \bsCODE{A=\GlobalParams[ref\_num][\ZRS]}
 451 \bsCODE{c=\GlobalParams[ref\_num][\Perspective]}
 452 \bsCODE{m=2^{ep}-(c[0]*x+c[1]*y)}
 453 \bsCODE{v[0]=m*((A[0][0]*x+A[0][1]*y)+2^{ez}*b[0])}
 454 \bsCODE{v[1]=m*((A[1][0]*x+A[1][1]*y)+2^{ez}*b[1])}
 455 \bsCODE{v[0] = (v[0]+(1\ll(ez+ep)) )\gg (ez+ep)}
 456 \bsCODE{v[1] = (v[1]+(1\ll(ez+ep)) )\gg (ez+ep)}
 457 \bsRET{v}
 458 \end{pseudo}
 459
 460 \begin{informative}
 461 Write ${\bf x}=\left( \begin{array}{c} x\\y \end{array}\right)$.
 462 Mathematically, we wish the global motion vector ${\bf v}$ to be defined by:
 463 \[{\bf v}=\dfrac{{\bf Ax}+{\bf b}}{1+{\bf c}^T{\bf x}}\]
 464 where: ${\bf A}$ is a matrix describing the degree of zoom, rotation or shear; ${\bf b}$
 465 is a translation vector; and ${\bf c}$ is a perspective vector which expresses the
 466 degree to which the global motion is not orthogonal to the axis of view.
 467
 468 In Dirac, this formula is adjusted in two ways in order to get an implementable result.
 469 Firstly, the perspective element is adjusted to remove a division, changing the
 470 formula to:
 471 \[{\bf v}=(1-{\bf c}^T{\bf x})({\bf Ax}+{\bf b})\]
 472 which is valid for small ${\bf c}$. Secondly, the formula is re-cast in terms of integer
 473 arithmetic by giving the matrix element an accuracy factor $\alpha$ and the perspective
 474 element an accuracy factor $\beta$:
 475 \[{\bf v}=(1-2^{-\beta}{\bf c}^T{\bf x})(2^{-\alpha}{\bf Ax}+{\bf b})\]
 476 where the parameters ${\bf A}, {\bf b},{\bf c}$ are now integral. (No accuracy bits are required for the translation, since it must be an integral number of sub-pixels.)
 477
 478 This reduces to
 479 \[2^{\alpha+\beta}{\bf v}=(2^\beta-{\bf c}^T{\bf x})({\bf Ax}+2^\alpha{\bf b})\]
 480 and this formula is used for the computation of values.
 481 \end{informative}
 482
 483 \subsubsection{Chroma subsampling}
 484 \label{chromamvscale}
 485
 486 When motion compensating chroma components, motion vectors shall be scaled by the
 487 $chroma\_mv\_scale()$ function. This produces chroma vectors in units of
 488 $\MotionVectorPrecision$ with respect to the chroma samples, as follows:
 489
 490 \begin{pseudo}{chroma\_mv\_scale}{v}
 491 \bsCODE{sv[0] = v[0]//chroma\_h\_ratio()}{\ref{picturedimensions}}
 492 \bsCODE{sv[1] = v[1]//chroma\_v\_ratio()}{\ref{picturedimensions}}
 493 \bsRET{sv}
 494 \end{pseudo}.
 495
 496 \begin{informative}
 497 Recall that division in this specification rounds towards -infinity. This division can be achieved by a bit-shift in C/C++ as chroma dimension ratios are 1 or 2.
 498 \end{informative}
 499
 500
 501 \subsubsection{Sub-pixel prediction}
 502 \label{upconvert}
 503
 504 This section defines the operation of the $subpel\_predict(ref, c, u, v)$ function
 505 for producing a sub-pixel accurate value at location $(u,v)$ from an upconverted picture reference component of type $c$ (Y, C1 or C2).
 506
 507 Upconversion shall be defined by means of a half-pixel interpolated reference array
 508 $upref$.  $upref$ shall have dimensions $(2W-1)$x$(2H-1)$ where the original reference
 509 picture component array has dimensions $W$x$H$, as per Section \ref{halfpel}.
 510
 511 Motion vectors shall be permitted to extend beyond the edges of reference picture data,
 512  where values lying outside shall be determined by edge extension.
 513
 514 If $\MotionVectorPrecision==1$, upconverted values shall be derived directly from the
 515 the half-pixel interpolated array $upref$, which shall be calculated as per Section \ref{halfpel}.
 516
 517 If $\MotionVectorPrecision==2$ or $\MotionVectorPrecision==3$, upconverted values shall be
 518 derived by linear interpolation from the half-pixel interpolated array.
 519
 520 The sub-pixel prediction process shall be defined as follows:
 521
 522 \begin{pseudo}{subpel\_predict}{ref,c,u,v}
 523 \bsCODE{upref=interp2by2(ref,c)}{\ref{halfpel}}
 524 \bsCODE{hu = u \gg (\MotionVectorPrecision-1)}
 525 \bsCODE{hv = v \gg (\MotionVectorPrecision-1)}
 526 \bsCODE{ru = u-(hu\ll (\MotionVectorPrecision-1))}
 527 \bsCODE{rv = v-(hv\ll (\MotionVectorPrecision-1))}
 528 \bsCODE{w00 = (2^{\MotionVectorPrecision-1}-rv)*(2^{\MotionVectorPrecision-1}-ru)}
 529 \bsCODE{w01 = (2^{\MotionVectorPrecision-1}-rv)*ru}
 530 \bsCODE{w10 = rv*(2^{\MotionVectorPrecision-1}-ru)}
 531 \bsCODE{w11 = rv*ru}
 532 \bsCODE{xpos = \clip(hu, 0, \width(upref)-1)}
 533 \bsCODE{xpos1 = \clip(hu+1, 0,\width(upref)-1)}
 534 \bsCODE{ypos = \clip(hv, 0, \height(upref)-1)}
 535 \bsCODE{ypos1 = \clip(hv+1, 0, \height(upref)-1)}
 536 \bsCODE{\begin{array}{ll} val = & w00*upref[ypos][xpos]+w01*upref[ypos][xpos1]+ \\
 537                             & w10*upref[ypos1][xpos]+w11*upref[ypos1][xpos1]
 538         \end{array}}
 539 \bsIF{\MotionVectorPrecision>1}
 540     \bsRET{(val+2^{2*\MotionVectorPrecision-3})\gg(2*\MotionVectorPrecision-2)}
 541 \bsELSE
 542     \bsRET{val}
 543 \bsEND
 544 \end{pseudo}
 545
 546 \begin{informative}
 547 $hu$ and $hv$ represent the half-pixel part of the sub-pixel position $(u,v)$.
 548
 549 $ru$ and $rv$ represent the remaining sub-pixel component of the position.
 550 $ru$ and $rv$ satisfy \[0\leq ru,rv <2^{\MotionVectorPrecision-1}\]
 551
 552 The four weights $w00,w01,w10$ and $w11$ sum to $2^{2*\MotionVectorPrecision-2}$, and
 553 hence the upconverted value is returned to the initial pixel ranges in the pseudocode
 554 above.
 555
 556 Note that the remainder values $ru$ and $rv$, and hence the four weight values,
 557 only depend on the motion vectors. This is because
 558 $u$ and $v$ have been computed by scaling the picture coordinates by
 559 $2^{\MotionVectorPrecision}$ and adding the motion vector.
 560
 561 In particular constant linear interpolation weights are applied throughout a
 562 block when block motion is used. Likewise, the necessity of clipping the ranges of
 563 $xpos$, $ypos$ etc can be determined in advance for each block by checking whether any
 564 corner of the reference block will fall outside of the reference picture area. In most
 565 cases it will not and clipping will not be required for motion compensating most blocks.
 566
 567 For half-pixel motion vectors ($\MotionVectorPrecision$ is 1), the majority of the
 568 pseudocode is redundant, and the return value $val$ will merely be the value at
 569 position $(u,v)$, clipped to the ranges of the upconverted reference.
 570
 571 \end{informative}
 572
 573 \subsubsection{Half-pixel interpolation}
 574 \label{halfpel}
 575
 576 This section defines the $interp2by2(ref,c)$ process for generating
 577 an upconverted reference array $upref$ representing a half-pixel interpolation of
 578 the reference array $ref$ for component $c$ (Y, C1, or C2).
 579
 580 $upref$ shall be created in two stages. The first stage shall upconvert vertically. The second stage shall upconvert horizontally.
 581
 582 $upref$ shall have width $2*\width(ref)-1$ and height $2*\height(ref)-1$, so that all
 583 edge values shall be copied from the original array and not interpolated.
 584
 585 The interpolation filter shall be the 8-tap symmetric filter with taps as defined in Figure \ref{upfilter}.
 586
 587 \begin{figure}[h!]
 588 \begin{centering}
 589 \begin{tabular}{l|ccccc}
 590 Tap & $t[0]$ & $t[1]$ & $t[2]$ & $t[3]$\\
 591 \hline
 592 Value & 21 & -7 & 3 & -1
 593 \end{tabular}
 594 \caption{Interpolation filter coefficients \label{upfilter}}
 595 \end{centering}
 596 \end{figure}
 597
 598 Where coefficients used in the filtering process fall outside the bounds of the
 599 reference array, values shall be supplied by edge extension.
 600
 601 The overall process shall be defined as follows:
 602
 603 \begin{pseudo}{interp2by2}{ref,c}
 604 \bsIF{c==Y}
 605     \bsCODE{bit\_depth=\LumaDepth}
 606 \bsELSE
 607     \bsCODE{bit\_depth=\ChromaDepth}
 608 \bsEND
 609 \bsFOR{q=0}{2*\height(ref)-2}
 610     \bsIF{q\%2==0}
 611         \bsFOR{p=0}{\width(ref)-1}
 612             \bsCODE{ref2[q][p]=ref[q//2][p]}
 613         \bsEND
 614     \bsELSE
 615         \bsFOR{p=0}{\width(ref)-1}
 616             \bsCODE{ref2[q][p]=16}
 617             \bsFOR{i=0}{3}
 618                 \bsCODE{ypos=(q-1)//2-i}
 619                 \bsCODE{ref2[q][p]+=t[i]*ref[\clip(ypos,0,\height(ref)-1)][p]}
 620                 \bsCODE{ypos=(q+1)//2+i}
 621                 \bsCODE{ref2[q][p]+=t[i]*ref[\clip(ypos,0,\height(ref)-1)][p]}
 622             \bsEND
 623             \bsCODE{ref2[q][p] \gg=5}
 624             \bsCODE{ref2[q][p] = \clip(ref2[q][p], -2^{bit\_depth-1}, 2^{bit\_depth-1}-1)}
 625         \bsEND
 626     \bsEND
 627 \bsEND
 628 \bsFOR{q=0}{2*\height(ref)-2}
 629     \bsFOR{p=0}{2*\width(ref)-2}
 630         \bsIF{p\%2==0}
 631             \bsCODE{upref[q][p]=ref2[q][p//2]}
 632         \bsELSE
 633             \bsCODE{upref[q][p]=16}
 634             \bsFOR{i=0}{3}
 635                 \bsCODE{xpos=(p-1)//2-i}
 636                 \bsCODE{upref[q][p]+=t[i]*ref2[q][\clip(xpos,0,\width(ref)-1)]}
 637                 \bsCODE{xpos=(p+1)//2+i}
 638                 \bsCODE{upref[q][p]+=t[i]*ref2[q][\clip(xpos,0,\width(ref)-1)]}
 639             \bsEND
 640             \bsCODE{upref[q][p] \gg=5}
 641             \bsCODE{upref[q][p] = \clip(upref[q][p], -2^{bit\_depth-1}, 2^{bit\_depth-1}-1)}
 642         \bsEND
 643     \bsEND
 644 \bsEND
 645 \end{pseudo}
 646
 647 \begin{informative}
 648 While this filter may appear to be variable separable, the integer rounding and
 649 clipping processes prevent this being so. Note also that the clipping process for
 650 filtering terms implies that the upconversion uses edge-extension at the array
 651 edges, consistent with the edge-extension used in motion-compensation itself.
 652 \end{informative}