Make clear that this isn't the official version
[dirac-spec-errata.git] / vidsys.tex
blob7ce24dadfa10be699c66891f8e6b9ce799efe42d
1 \label{vidsys}
3 \begin{informative*}
4 \subsection{Colour models}
5 All current video systems use a $Y, C1, C2$ form of coding for RGB source
6 values. Although $Y, C_B, C_R$ is widely used, Dirac can support other colour
7 systems such as $Y, C_O, C_G$ as defined by ITU-T H.264 AVC annex E. For this
8 reason the non-luma components are generalized to the terms C1 and C2.
10 The R, G and B are tristimulus values (e.g. candelas/$m^2$). Their
11 relationship to CIE XYZ tristimulus values can be derived from the set
12 of primaries and white point defined in the colour primaries part of the
13 colour specification below using the method described in SMPTE RP
14 177-1993. In this document the RGB values are normalised to the range
15 [0,1], so that RGB=[1,1,1] represents the peak white of the display device
16 and RGB=0,0,0 represents black.
18 The $E_R$, $E_G$ and $E_B$ values are related to the linear RGB
19 values by non-linear transfer functions.
20 Normally, $E_R$, $E_G$ and $E_B$ also fall in the range $[0,1]$, but in the
21 case of extended gamut systems (such as ITU-R BT1361), negative values can also
22 occur. The non-linear transfer function is typically performed in the camera and
23 is specified in the transfer characteristic part of the appropriate colour
24 specification. For aesthetic and psychovisual reasons
25 the encoding transfer function is not always the inverse of
26 the decoding transfer function. In fact the combined effect of the
27 encoding and decoding transfer functions is such that the rendering intent or
28 end-to-end gamma of the system can vary between about 1.1 and 1.6 depending on
29 viewing conditions. The rationale for this is given in “Digital Video and
30 HDTV” by Charles Poynton, (2003, Morgan Kaufmann Publishers, ISBN 1-55860-792-7).
32 The non-linear $E_R$, $E_G$ and $E_B$ values are subject to a matrix operation
33 (known as `non-constant luminance coding'), which transforms
34 them into luma ($E_Y$) and colour difference (normally $E_{Cb}$ and $E_{Cr}$) values.
35 $E_Y$ is normally limited to the range $[0,1]$ and the colour difference
36 values to the range $[-0.5, 0.5]$. In this specification, the color difference
37 components are referred to as `chroma’ components and are not to be confused
38 with the chroma signals used by composite television systems where the colour
39 difference signals are significantly reduced in both resolution and signal
40 amplitude. The chroma components used in this specification can be sub-sampled,
41 either horizontally, vertically or both horizontally and vertically.
43 \subsubsection{$YC_BC_R$ coding}
45 The $E_Y$, $E_{Cb}$ and $E_{Cr}$ values are
46 mapped to a range of integers denoted $Y$, $C_B$ and $C_R$, typically $[0,255]$. In order to display video, the inverse to the above
47 operations must be performed to convert this data to $E_Y$, $E_{Cb}$, $E_Cr$,
48 then to $E_R$, $E_G$, $E_B$ and thence to R, G and B.
50 \subsubsection{$YC_OC_G$ coding}
52 In the case of YCoCg coding, the $E_R$, $E_G$ and $E_B$ values are directly
53 linearly scaled to integer ranges before a lossless
54 direct integer transform is applied to convert this data to $Y$, $C_O$ and
55 $C_G$) data.
57 \subsubsection{Signal range}
58 \label{signalranges}
60 The output of the Dirac decoder consists of unsigned integer values. For $YC_BC_R$ coding, the offset and excursion values are used to linearly scale these
61 values into intermediate vlues $E_Y$, $E_{Cb}$, and $E_{Cr}$.
62 $E_Y$ is normally clipped to the range $[0,1]$ and $E_{Cb}$, $E_{Cr}$
63 to the range $[-0.5,0.5]$. The effect is to clip integer $Y$ values output by
64 the decoder to the interval
65 \[ \SLumaOffset, \SLumaOffset+\SLumaExcursion] \]
66 and $C1$, $C2$ values to
67 \[ [\SChromaOffset-\SChromaExcursion/2,\SChromaOffset+\SChromaExcursion/2] \]
69 However, maintaining an extended RGB gamut can mean that either such
70 clipping is not done, or non-standard offset and excursion values are
71 used to extract the extended gamut from the non-negative $Y$, $C1$,
72 and $C2$ values.
74 In the case of $YCoCg$ coding, $E_Y$, $E_{CO}$, and $E_{CG}$ should not be
75 calculated. Instead, direct integer conversion to RGB should be done
76 (note: excursion values will be ignored in this integer conversion.)
78 \subsubsection{Primaries}
79 \label{primaries}
80 The colour primaries allow device dependent linear RGB colour
81 co-ordinates to be mapped to device independent linear CIE XYZ space.
82 The primaries specified are the CIE (1931) XYZ chromaticity
83 co-ordinates of the primaries and the white point of the device.
85 The color primary specification therefore allows exact color reproduction of
86 decoded RGB values on different displays
87 with different display primaries.
89 \subsubsection{Colour matrix}
90 \label{matrix}
91 \paragraph{$YC_BC_R$ coding}
92 $\ $\newline
93 Unit-scale luma and chroma values $E_Y$, $E_{Cb}$ and $E_{Cr}$ should be
94 derived from decoded $Y$, $C1$ and $C2$ values using the signal range parameters
95 as per Section \ref{signalranges}. Given these values, $E_R$, $E_G$ and $E_B$ are
96 determined as follows:
97 \begin{eqnarray*}
98 E_R & = & E_Y + 2*(1-K_R)*E_{Cr} \\
99 E_G & = & E_Y - \dfrac{2*K_R*(1-K_R)*E_{Cr}}{K_G}-\dfrac{2*K_B*(1-K_B)*E_{Cb}}{K_G} \\
100 E_B & = & E_Y + 2*(1-K_R)*E_{Cb}
101 \end{eqnarray*}
102 where $K_G=1-K_R-K_B$.
103 This follows by inverting the equations
104 \begin{eqnarray*}
105 K_R+K_G+K_B & = & 1 \\
106 E_Y & = & K_R*E_R+K_G*E_G+K_B*E_B \\
107 E_{Cb} & = & \dfrac{E_B - E_Y}{2*(1-K_B)} \\
108 E_{Cr} & = & \dfrac{E_R - E_Y}{2*(1-K_R)} \\
109 \end{eqnarray*}
111 \paragraph{YCoCg coding}
112 $\ $\newline
113 In the case of YCoCg coding, integer $I_R$, $I_G$, $I_B$ should be directly computed from
114 the decoded $Y$, $C1$ ($C_O$) and $C2$ ($C_G$) values by
115 \begin{eqnarray*}
116 Y & -= & \SLumaOffset \\
117 Co=C1 & -= & \SChromaOffset \\
118 Cg=C2 & -= & \SChromaOffset \\
119 t & = & Y-(Cg\gg1) \\
120 I_G & = & t+Cg \\
121 I_B & = & t-(Co\gg1) \\
122 I_R & = & I_B+Co
123 \end{eqnarray*}
124 The integer values are converted to unit-scale $E_R$, $E_G$, $E_B$ by dividing by
125 $2^\LumaDepth$ and clipping to $[0,1]$.
126 If the inverse transform has been correctly
127 applied prior to coding and lossless coding employed, then clipping will
128 be unnecessary, and reversing the above operations will reproduce $Y$,
129 $C_O$ and $C_G$ losslessly from $I_R$, $I_G$ and $I_R$ yielding a transparent
130 RGB to RGB coding system:
131 \begin{eqnarray*}
132 Co & = & I_R-I_B \\
133 t & = & I_B+(I_R-I_B)\gg1 \approx (I_R+I_B)/2\\
134 Cg & = & I_G-t = \approx I_G-(I_R+I_B)/2\\
135 Y & = & t+(Cg\gg1) \approx I_G/2-(I_R+I_B)/4+(I_R+I_B)/2=I_R/4+I_G/2+I_B/4
136 \end{eqnarray*}
138 Note that these matrix operations imply that the chroma data requires an
139 additional bit, due to the subtractions used to create chroma components.
140 So for 8-bit RGB ($I_R$, $I_G$, $I_B$) values, $Y$ will be 8 bits and $C_O$ and
141 $C_G$ will be 9 bits.
144 \subsection{Transfer characteristics}
145 \subsubsection{TV transfer characteristic}
147 ITU-R BT.601-6 defines the 625-line and 525-line standard definition systems
148 with an assumed receiver display gamma value of 2.8. SMPTE 170M defines the NTSC
149 SDTV system with an assumed receiver display gamma value of 2.2.
151 High Definition systems for both 50Hz and 60Hz based systems use an encoding
152 gamma value of 0.45 with a linear portion at the low end of the scale to avoid
153 the need for infinite gain at the receiver. This gamma value is defined by
154 ITU-R BT.709.
156 \subsubsection{Extended Colour Gamut}
158 ITU-R BT 1361 (Worldwide Unified Colorimetry of Future TV Systems) defines a
159 color system with an extended colour gamut. Refer to ITU-R BT 1361 (1998)
160 for details.
162 ISO/IEC 61966-2 (Extended RGB Color Space) defines another colour system with
163 an extended color gamut. Refer to IEC 61966-2-2:2003 for details.
165 In both cases, it should be noted that use of the full range of $Y, C1, C2$
166 values can create negative R, G or B values. The original color gamut equations
167 were designed around the CRT (cathode ray tube) device. Some flat panel
168 displays are capable of displaying a wider color gamut resulting in the desire
169 to extend the color gamut to maximize the impact of these displays.
171 \paragraph{Linear}
172 $\ $\newline
173 A linear transfer characteristic has $f(x)=x$ i.e. $E_X=X$.
175 \subsection{Frame rate}
176 The ratio of the frame rate values $\SFrameRateNumer$ and $\SFrameRateDenom$
177 encodes the intended rate at which frames should be
178 displayed subsequent to decoding. If $\SSourceSampling$ is 1 (interlaced
179 sampling), then fields are displayed at double the frame rate, in the order specified by the $\STopFieldFirst$ flag.
181 \subsection{Aspect ratios and clean area}
183 \subsubsection{Pixel aspect ratio}
185 The pixel aspect ratio value of an image is the ratio of the intended spacing of
186 horizontal samples (pixels) to the spacing of vertical samples (picture lines)
187 on the display device. Pixel aspect ratios are fundamental properties of
188 sampled images because they determine the displayed shape of objects in the
189 whole image. Failure to use the correct value of pixel aspect ratio will result
190 in distorted images where circles will be displayed as ellipses.
192 Most HDTV standards and computer image formats are defined to have pixel aspect
193 ratios that are exactly 1:1.
195 For a number NH of pixels per unit length and NV pixels per unit height, this
196 ratio is 1/NH : 1/NV or NV : NH. For a video standard of WxH pixels displayed
197 at 4:3 picture aspect ratio, NH=W/4 and NV=H/3.
199 \paragraph{Using non-square pixel aspect ratios}
200 $ $\newline
201 The defined pixel aspect ratios are designed to give image aspect ratios for
202 standard definition television operating with a standard 4:3 picture aspect
203 ratio.
205 For 525-line video, defining a 704 x 480 picture with a 4:3 aspect ratio results
206 in a H:V pixel aspect ratio of 10:11 (i.e. 480/3 : 704/4 ).
208 For 625-line video defining a 704 x 576 picture with a 4:3 aspect ratio results
209 in a H:V pixel aspect ratio of 12:11 (i.e. 576/3 : 704/4 ).
211 If the intended image aspect ratio is 16:9, then the H:V pixel aspect ratios
212 change accordingly to 40:33 for 525-line video and 16:11 for 625-line video.
214 The values specified above are widely, but not unanimously, agreed to be the
215 correct values. Differences of viewpoint arise from how much of the available
216 horizontal picture size of 720 Y pixels is intended for display.
218 You are strongly advised to use one of the default pixel aspect ratios. However,
219 if you know what you are doing and don’t like the default values the codec
220 allows you to define your own ratio. You should be aware that many display
221 devices could ignore your decision and may default to using different and
222 unsuitable values.
224 \subsubsection{Clean area}
226 The clean area is intended to define an area within which picture information is
227 subjectively uncontaminated by all edge distortions and possible unintended
228 picture content such as microphones appearing at the top of the picture. It
229 could be appropriate to display the clean area rather than the whole picture,
230 which can contain edge distortions or unintended content.
232 The top-left corner of the clean area has coordinates
233 \[(\SLeftOffset,\STopOffset)\]
234 counting from the top-left corner of the picture data, and
235 dimensions $\SCleanWidth$ by $\SCleanHeight$.
237 Note that these dimensions refer to pixels within a picture, not a frame,
238 so a change from interlaced to progressive picture coding will
239 necessitate a change of clean area if a custom clean area is used.
241 The clean area and the pixel aspect ratio together determine the
242 aspect ratio of the displayed image which is the ratio of the width of the
243 intended
244 display area to the height of the intended display area:
245 \[\dfrac{\SCleanWidth*\SAspectRatioNumer}{\SCleanHeight*\SAspectRatioDenom}\]
247 Given two separate sequences, with identical image aspect ratio, if the
248 top left corner and bottom right corners of their clean apertures are
249 coincident when displayed, then the images as a whole should be exactly
250 coincident. This is regardless of the actual pixel dimensions of the
251 images or their clean areas. This allows sequences to be combined
252 together appropriately if they are appropriately scaled.
254 \end{informative*}