vidsys.tex

   1 \label{vidsys}
   2
   3 \begin{informative*}
   4 \subsection{Colour models}
   5 All current video systems use a $Y, C1, C2$ form of coding for RGB source
   6 values. Although $Y, C_B, C_R$ is widely used, Dirac can support other colour
   7 systems such as $Y, C_O, C_G$ as defined by ITU-T H.264 AVC annex E. For this
   8 reason the non-luma components are generalized to the terms C1 and C2.
   9
  10 The R, G and B are tristimulus values (e.g. candelas/$m^2$). Their
  11 relationship to CIE XYZ tristimulus values can be derived from the set
  12 of primaries and white point defined in the colour primaries part of the
  13 colour specification below using the method described in SMPTE RP
  14 177-1993. In this document the RGB values are normalised to the range
  15 [0,1], so that RGB=[1,1,1] represents the peak white of the display device
  16 and RGB=0,0,0 represents black.
  17
  18 The $E_R$, $E_G$ and $E_B$ values are related to the linear RGB
  19 values by non-linear transfer functions.
  20 Normally, $E_R$, $E_G$ and $E_B$ also fall in the range $[0,1]$, but in the
  21 case of extended gamut systems (such as ITU-R BT1361), negative values can also
  22 occur. The non-linear transfer function is typically performed in the camera and
  23 is specified in the transfer characteristic part of the appropriate colour
  24 specification. For aesthetic and psychovisual reasons
  25 the encoding transfer function is not always the inverse of
  26 the decoding transfer function. In fact the combined effect of the
  27 encoding and decoding transfer functions is such that the rendering intent or
  28 end-to-end gamma of the system can vary between about 1.1 and 1.6 depending on
  29  viewing conditions. The rationale for this is given in “Digital Video and
  30 HDTV” by Charles Poynton, (2003, Morgan Kaufmann Publishers, ISBN 1-55860-792-7).
  31
  32 The non-linear $E_R$, $E_G$ and $E_B$ values are subject to a matrix operation
  33 (known as `non-constant luminance coding'), which transforms
  34 them into luma ($E_Y$) and colour difference (normally $E_{Cb}$ and $E_{Cr}$) values.
  35 $E_Y$ is normally limited to the range $[0,1]$ and the colour difference
  36 values to the range $[-0.5, 0.5]$. In this specification, the color difference
  37 components are referred to as `chroma’ components and are not to be confused
  38 with the chroma signals used by composite television systems where the colour
  39 difference signals are significantly reduced in both resolution and signal
  40 amplitude. The chroma components used in this specification can be sub-sampled,
  41  either horizontally, vertically or both horizontally and vertically.
  42
  43 \subsubsection{$YC_BC_R$ coding}
  44
  45 The $E_Y$, $E_{Cb}$ and $E_{Cr}$ values are
  46 mapped to a range of integers denoted $Y$, $C_B$ and $C_R$, typically $[0,255]$.  In order to display video, the inverse to the above
  47 operations must be performed to convert this data to $E_Y$, $E_{Cb}$, $E_Cr$,
  48 then to $E_R$, $E_G$, $E_B$ and thence to R, G and B.
  49
  50 \subsubsection{$YC_OC_G$ coding}
  51
  52 In the case of YCoCg coding, the $E_R$, $E_G$ and $E_B$ values are directly
  53 linearly scaled to integer ranges before a lossless
  54 direct integer transform is applied to convert this data to $Y$, $C_O$ and
  55 $C_G$) data.
  56
  57 \subsubsection{Signal range}
  58 \label{signalranges}
  59
  60 The output of the Dirac decoder consists of unsigned integer values. For $YC_BC_R$ coding, the offset and excursion values are used to linearly scale these
  61 values into intermediate  vlues  $E_Y$, $E_{Cb}$, and $E_{Cr}$.
  62 $E_Y$ is normally clipped to the range $[0,1]$ and $E_{Cb}$, $E_{Cr}$
  63 to the range $[-0.5,0.5]$. The effect is to clip integer $Y$ values output by
  64 the decoder to the interval
  65 \[ \SLumaOffset, \SLumaOffset+\SLumaExcursion] \]
  66 and $C1$, $C2$ values to
  67 \[ [\SChromaOffset-\SChromaExcursion/2,\SChromaOffset+\SChromaExcursion/2] \]
  68
  69 However, maintaining an extended RGB gamut can mean that either such
  70 clipping is not done, or non-standard offset and excursion values are
  71 used to extract the extended gamut from the non-negative $Y$, $C1$,
  72 and $C2$ values.
  73
  74 In the case of $YCoCg$ coding, $E_Y$, $E_{CO}$, and $E_{CG}$ should not be
  75  calculated. Instead, direct integer conversion to RGB should be done
  76 (note: excursion values will be ignored in this integer conversion.)
  77
  78 \subsubsection{Primaries}
  79 \label{primaries}
  80 The colour primaries allow device dependent linear RGB colour
  81 co-ordinates to be mapped to device independent linear CIE XYZ space.
  82 The primaries specified are the CIE (1931) XYZ chromaticity
  83 co-ordinates of the primaries and the white point of the device.
  84
  85 The color primary specification therefore allows exact color reproduction of
  86  decoded RGB values on different displays
  87 with different display primaries.
  88
  89 \subsubsection{Colour matrix}
  90 \label{matrix}
  91 \paragraph{$YC_BC_R$ coding}
  92 $\ $\newline
  93 Unit-scale luma and chroma values $E_Y$, $E_{Cb}$ and $E_{Cr}$ should be
  94 derived from decoded $Y$, $C1$ and $C2$ values using the signal range parameters
  95 as per Section \ref{signalranges}. Given these values, $E_R$, $E_G$ and $E_B$ are
  96 determined as follows:
  97 \begin{eqnarray*}
  98 E_R & = & E_Y + 2*(1-K_R)*E_{Cr} \\
  99 E_G & = & E_Y - \dfrac{2*K_R*(1-K_R)*E_{Cr}}{K_G}-\dfrac{2*K_B*(1-K_B)*E_{Cb}}{K_G} \\
 100 E_B & = & E_Y + 2*(1-K_R)*E_{Cb}
 101 \end{eqnarray*}
 102 where $K_G=1-K_R-K_B$.
 103 This follows by inverting the equations
 104 \begin{eqnarray*}
 105 K_R+K_G+K_B & = & 1 \\
 106 E_Y & = & K_R*E_R+K_G*E_G+K_B*E_B \\
 107 E_{Cb} & = & \dfrac{E_B - E_Y}{2*(1-K_B)} \\
 108 E_{Cr} & = & \dfrac{E_R - E_Y}{2*(1-K_R)} \\
 109 \end{eqnarray*}
 110
 111 \paragraph{YCoCg coding}
 112 $\ $\newline
 113 In the case of YCoCg coding, integer $I_R$, $I_G$, $I_B$ should be directly computed from
 114 the decoded $Y$, $C1$ ($C_O$) and $C2$ ($C_G$) values by
 115 \begin{eqnarray*}
 116 Y & -= & \SLumaOffset \\
 117 Co=C1 & -= & \SChromaOffset \\
 118 Cg=C2 & -= & \SChromaOffset \\
 119 t & = & Y-(Cg\gg1) \\
 120 I_G & = & t+Cg \\
 121 I_B & = & t-(Co\gg1) \\
 122 I_R & = & I_B+Co
 123 \end{eqnarray*}
 124 The integer values are converted to unit-scale $E_R$, $E_G$, $E_B$ by dividing by
 125 $2^\LumaDepth$ and clipping to $[0,1]$.
 126 If the inverse transform has been correctly
 127 applied prior to coding and lossless coding employed, then clipping will
 128 be unnecessary, and reversing the above operations will reproduce $Y$,
 129 $C_O$ and $C_G$ losslessly from $I_R$, $I_G$ and $I_R$ yielding a transparent
 130  RGB to RGB coding system:
 131 \begin{eqnarray*}
 132 Co & = & I_R-I_B \\
 133 t & = & I_B+(I_R-I_B)\gg1 \approx (I_R+I_B)/2\\
 134 Cg & = & I_G-t = \approx I_G-(I_R+I_B)/2\\
 135 Y & = & t+(Cg\gg1) \approx I_G/2-(I_R+I_B)/4+(I_R+I_B)/2=I_R/4+I_G/2+I_B/4
 136 \end{eqnarray*}
 137
 138 Note that these matrix operations imply that the chroma data requires an
 139  additional bit, due to the subtractions used to create chroma components.
 140 So for 8-bit RGB ($I_R$, $I_G$, $I_B$) values, $Y$ will be 8 bits and $C_O$ and
 141 $C_G$ will be 9 bits.
 142
 143
 144 \subsection{Transfer characteristics}
 145 \subsubsection{TV transfer characteristic}
 146
 147 ITU-R BT.601-6 defines the 625-line and 525-line standard definition systems
 148 with an assumed receiver display gamma value of 2.8. SMPTE 170M defines the NTSC
 149  SDTV system with an assumed receiver display gamma value of 2.2.
 150
 151 High Definition systems for both 50Hz and 60Hz based systems use an encoding
 152  gamma value of 0.45 with a linear portion at the low end of the scale to avoid
 153  the need for infinite gain at the receiver. This gamma value is defined by
 154 ITU-R BT.709.
 155
 156 \subsubsection{Extended Colour Gamut}
 157
 158 ITU-R BT 1361 (Worldwide Unified Colorimetry of Future TV Systems) defines a
 159  color system with an extended colour gamut. Refer to ITU-R BT 1361 (1998)
 160 for details.
 161
 162 ISO/IEC 61966-2 (Extended RGB Color Space) defines another colour system with
 163  an extended color gamut. Refer to IEC 61966-2-2:2003 for details.
 164
 165 In both cases, it should be noted that use of the full range of $Y, C1, C2$
 166  values can create negative R, G or B values. The original color gamut equations
 167  were designed around the CRT (cathode ray tube) device. Some flat panel
 168 displays are capable of displaying a wider color gamut resulting in the desire
 169 to extend the color gamut to maximize the impact of these displays.
 170
 171 \paragraph{Linear}
 172 $\ $\newline
 173 A linear transfer characteristic has $f(x)=x$ i.e. $E_X=X$.
 174
 175 \subsection{Frame rate}
 176 The ratio of the frame rate values $\SFrameRateNumer$ and $\SFrameRateDenom$
 177  encodes the intended rate at which frames should be
 178 displayed subsequent to decoding. If $\SSourceSampling$ is 1 (interlaced
 179  sampling),  then fields are displayed at double the frame rate, in the order specified by the $\STopFieldFirst$ flag.
 180
 181 \subsection{Aspect ratios and clean area}
 182
 183 \subsubsection{Pixel aspect ratio}
 184
 185 The pixel aspect ratio value of an image is the ratio of the intended spacing of
 186  horizontal samples (pixels) to the spacing of vertical samples (picture lines)
 187  on the display device. Pixel aspect ratios are fundamental properties of
 188 sampled images because they determine the displayed shape of objects in the
 189  whole image. Failure to use the correct value of pixel aspect ratio will result
 190  in distorted images where circles will be displayed as ellipses.
 191
 192 Most HDTV standards and computer image formats are defined to have pixel aspect
 193  ratios that are exactly 1:1.
 194
 195 For a number NH of pixels per unit length and NV pixels per unit height, this
 196  ratio is 1/NH : 1/NV or NV : NH. For a video standard of WxH pixels displayed
 197 at 4:3 picture aspect ratio, NH=W/4 and NV=H/3.
 198
 199 \paragraph{Using non-square pixel aspect ratios}
 200 $ $\newline
 201 The defined pixel aspect ratios are designed to give image aspect ratios for
 202  standard definition television operating with a standard 4:3 picture aspect
 203  ratio.
 204
 205 For 525-line video, defining a 704 x 480 picture with a 4:3 aspect ratio results
 206  in a H:V pixel aspect ratio of 10:11 (i.e. 480/3 : 704/4 ).
 207
 208 For 625-line video defining a 704 x 576 picture with a 4:3 aspect ratio results
 209  in a H:V pixel aspect ratio of 12:11 (i.e. 576/3 : 704/4 ).
 210
 211 If the intended image aspect ratio is 16:9, then the H:V pixel aspect ratios
 212  change accordingly to 40:33 for 525-line video and 16:11 for 625-line video.
 213
 214 The values specified above are widely, but not unanimously, agreed to be the
 215  correct values. Differences of viewpoint arise from how much of the available
 216  horizontal picture size of 720 Y pixels is intended for display.
 217
 218 You are strongly advised to use one of the default pixel aspect ratios. However,
 219  if you know what you are doing and don’t like the default values the codec
 220  allows you to define your own ratio. You should be aware that many display
 221 devices could ignore your decision and may default to using different and
 222  unsuitable values.
 223
 224 \subsubsection{Clean area}
 225
 226 The clean area is intended to define an area within which picture information is
 227  subjectively uncontaminated by all edge distortions and possible unintended
 228  picture content such as microphones appearing at the top of the picture. It
 229 could be appropriate to display the clean area rather than the whole picture,
 230  which can contain edge distortions or unintended content.
 231
 232 The top-left corner of the clean area has coordinates
 233 \[(\SLeftOffset,\STopOffset)\]
 234 counting from the top-left corner of the picture data, and
 235 dimensions $\SCleanWidth$ by $\SCleanHeight$.
 236
 237 Note that these dimensions refer to pixels within a picture, not a frame,
 238 so a change from interlaced to progressive picture coding will
 239 necessitate a change of clean area if a custom clean area is used.
 240
 241 The clean area and the pixel aspect ratio together determine the
 242 aspect ratio of the displayed image which is the ratio of the width of the
 243  intended
 244 display area to the height of the intended display area:
 245 \[\dfrac{\SCleanWidth*\SAspectRatioNumer}{\SCleanHeight*\SAspectRatioDenom}\]
 246
 247 Given two separate sequences, with identical image aspect ratio, if the
 248 top left corner and bottom right corners of their clean apertures are
 249 coincident when displayed, then the images as a whole should be exactly
 250 coincident. This is regardless of the actual pixel dimensions of the
 251 images or their clean areas. This allows sequences to be combined
 252 together appropriately if they are appropriately scaled.
 253
 254 \end{informative*}