[xiph-commits] r17434 - websites/xiph.org/video

xiphmont at svn.xiph.org xiphmont at svn.xiph.org
Thu Sep 23 06:49:13 PDT 2010


Author: xiphmont
Date: 2010-09-23 06:49:13 -0700 (Thu, 23 Sep 2010)
New Revision: 17434

Added:
   websites/xiph.org/video/vid1-en.srt
   websites/xiph.org/video/vid1-fr.srt
Removed:
   websites/xiph.org/video/vid1-en_US.kate
   websites/xiph.org/video/vid1-en_US.srt
Modified:
   websites/xiph.org/video/vid1.shtml
Log:
Add french subs, move en_US to just en on suggestion of oggkoggk


Added: websites/xiph.org/video/vid1-en.srt
===================================================================
--- websites/xiph.org/video/vid1-en.srt	                        (rev 0)
+++ websites/xiph.org/video/vid1-en.srt	2010-09-23 13:49:13 UTC (rev 17434)
@@ -0,0 +1,1588 @@
+1
+00:00:08,124 --> 00:00:10,742
+Workstations and high end personal computers have been able to
+
+2
+00:00:10,742 --> 00:00:14,749
+manipulate digital audio pretty easily for about fifteen years now.
+
+3
+00:00:14,749 --> 00:00:17,470
+It's only been about five years that a decent workstation's been able
+
+4
+00:00:17,470 --> 00:00:21,643
+to handle raw video without a lot of expensive special purpose hardware.
+
+5
+00:00:21,643 --> 00:00:25,400
+But today even most cheap home PCs have the processor power and
+
+6
+00:00:25,400 --> 00:00:28,092
+storage necessary to really toss raw video around,
+
+7
+00:00:28,092 --> 00:00:30,479
+at least without too much of a struggle. 
+
+8
+00:00:30,479 --> 00:00:33,579
+So now that everyone has all of this cheap capable hardware, 
+
+9
+00:00:33,579 --> 00:00:36,651
+more people, not surprisingly, want to do interesting
+
+10
+00:00:36,651 --> 00:00:39,908
+things with digital media, especially streaming. 
+
+11
+00:00:39,908 --> 00:00:44,017
+YouTube was the first huge success, and now everybody wants in.
+
+12
+00:00:44,017 --> 00:00:47,413
+Well good!  Because this stuff is a lot of fun!
+
+13
+00:00:48,250 --> 00:00:51,179
+It's no problem finding consumers for digital media.  
+
+14
+00:00:51,179 --> 00:00:54,649
+But here, I'd like to address the engineers, the mathematicians, 
+
+15
+00:00:54,649 --> 00:00:57,869
+the hackers, the people who are interested in discovering 
+
+16
+00:00:57,869 --> 00:01:01,302
+and making things and building the technology itself. 
+
+17
+00:01:01,302 --> 00:01:03,282
+The people after my own heart.
+
+18
+00:01:04,250 --> 00:01:08,723
+Digital media, compression especially, is perceived to be super-elite,
+
+19
+00:01:08,723 --> 00:01:12,822
+somehow incredibly more difficult than anything else in computer science. 
+
+20
+00:01:12,822 --> 00:01:15,700
+The big industry players in the field don't mind this perception at all; 
+
+21
+00:01:15,700 --> 00:01:19,734
+it helps justify the staggering number of very basic patents they hold.  
+
+22
+00:01:19,734 --> 00:01:23,870
+They like the image that their media researchers are the best of the best, 
+
+23
+00:01:23,870 --> 00:01:27,738
+so much smarter than anyone else that their brilliant ideas can't 
+
+24
+00:01:27,738 --> 00:01:29,903
+even be understood by mere mortals. 
+
+25
+00:01:30,625 --> 00:01:33,716
+This is bunk.  
+
+26
+00:01:35,205 --> 00:01:38,900
+Digital audio and video and streaming and compression 
+
+27
+00:01:38,900 --> 00:01:42,738
+offer endless deep and stimulating mental challenges, 
+
+28
+00:01:42,738 --> 00:01:44,662
+just like any other discipline. 
+
+29
+00:01:44,662 --> 00:01:47,929
+It seems elite because so few people have been been involved.  
+
+30
+00:01:47,929 --> 00:01:51,223
+So few people have been involved perhaps because so few people 
+
+31
+00:01:51,223 --> 00:01:54,665
+could afford the expensive, special-purpose equipment it required. 
+
+32
+00:01:54,665 --> 00:01:58,792
+But today, just about anyone watching this video has a cheap, 
+
+33
+00:01:58,792 --> 00:02:03,317
+general-purpose computer powerful enough to play with the big boys. 
+
+34
+00:02:05,926 --> 00:02:11,108
+There are battles going on today around HTML5 and browsers 
+
+35
+00:02:11,108 --> 00:02:13,671
+and video and open vs. closed. 
+
+36
+00:02:13,671 --> 00:02:17,048
+So now is a pretty good time to get involved.  
+
+37
+00:02:17,048 --> 00:02:20,000
+The easiest place to start is probably understanding 
+
+38
+00:02:20,000 --> 00:02:22,619
+the technology we have right now.
+
+39
+00:02:23,500 --> 00:02:25,071
+This is an introduction. 
+
+40
+00:02:25,071 --> 00:02:28,180
+Since it's an introduction, it glosses over a ton of details 
+
+41
+00:02:28,180 --> 00:02:30,882
+so that the big picture's a little easier to see.
+
+42
+00:02:30,882 --> 00:02:33,908
+Quite a few people watching are going to be way past anything 
+
+43
+00:02:33,908 --> 00:02:36,378
+that I'm talking about, at least for now.  
+
+44
+00:02:36,378 --> 00:02:39,293
+On the other hand, I'm probably going to go too fast for folks 
+
+45
+00:02:39,293 --> 00:02:44,558
+who really are are brand new to all of this, so if this is all new, relax. 
+
+46
+00:02:44,558 --> 00:02:48,629
+The important thing is to pick out any ideas  that really grab your imagination.
+
+47
+00:02:48,629 --> 00:02:52,497
+Especially pay attention to the terminology surrounding those ideas, 
+
+48
+00:02:52,479 --> 00:02:56,078
+because with those, and Google, and Wikipedia, you can dig 
+
+49
+00:02:56,078 --> 00:02:57,753
+as deep as interests you.
+
+50
+00:02:57,753 --> 00:03:00,094
+So, without any further ado, 
+
+51
+00:03:00,094 --> 00:03:03,351
+welcome to one hell of a new hobby.
+
+52
+00:03:10,291 --> 00:03:13,030
+Sound is the propagation of pressure waves through air, 
+
+53
+00:03:13,030 --> 00:03:16,981
+spreading out from a source like ripples spread from a stone tossed into a pond.
+
+54
+00:03:16,981 --> 00:03:19,489
+A microphone, or the human ear for that matter, 
+
+55
+00:03:19,489 --> 00:03:22,876
+transforms these passing ripples of pressure into an electric signal.  
+
+56
+00:03:22,876 --> 00:03:25,800
+Right, this is middle school science class, everyone remembers this.
+
+57
+00:03:25,800 --> 00:03:26,771
+Moving on.
+
+58
+00:03:27,465 --> 00:03:32,527
+That audio signal is a one-dimensional function, a single value varying over time.  
+
+59
+00:03:32,527 --> 00:03:34,248
+If we slow the 'scope down a bit... 
+
+60
+00:03:36,450 --> 00:03:38,190
+that should be a little easier to see. 
+
+61
+00:03:38,190 --> 00:03:40,688
+A few other aspects of the signal are important.  
+
+62
+00:03:40,688 --> 00:03:43,418
+It's continuous in both value and time;  
+
+63
+00:03:43,418 --> 00:03:46,813
+that is, at any given time it can have any real value, 
+
+64
+00:03:46,813 --> 00:03:50,228
+and there's a smoothly varying value at every point in in time.  
+
+65
+00:03:50,228 --> 00:03:52,439
+No matter how much we zoom in,
+
+66
+00:03:54,068 --> 00:03:58,510 
+there are no discontinuities, no singularities, no instantaneous steps 
+
+67
+00:03:58,510 --> 00:04:01,285
+or points where the signal ceases to exist. 
+
+68
+00:04:03,247 --> 00:04:08,475
+It's defined everywhere. Classic continuous math works very well on these signals.
+
+69
+00:04:11,001 --> 00:04:15,378
+A digital signal on the other hand is discrete in both value and time.
+
+70
+00:04:15,378 --> 00:04:19,107
+In the simplest and most common system, called Pulse Code Modulation,
+
+71
+00:04:19,107 --> 00:04:24,058
+one of a fixed number of possible values directly represents the instantaneous signal amplitude 
+
+72
+00:04:24,058 --> 00:04:30,165
+at points in time spaced a fixed distance apart. The end result is a stream of digits.
+
+73
+00:04:30,674 --> 00:04:35,309
+Now this looks an awful lot like this.  
+
+74
+00:04:35,309 --> 00:04:38,964
+It seems intuitive that we should somehow be able to rigorously transform 
+
+75
+00:04:38,964 --> 00:04:44,683
+one into the other, and good news, the Sampling Theorem says we can and tells us how. 
+
+76
+00:04:44,683 --> 00:04:48,477
+Published in its most recognizable form by Claude Shannon in 1949
+
+77
+00:04:48,477 --> 00:04:52,409
+and built on the work of Nyquist, and Hartley, and tons of others, 
+
+78
+00:04:52,409 --> 00:04:56,138
+the sampling theorem states that not only can we go back and forth between 
+
+79
+00:04:56,138 --> 00:05:00,913
+analog and digital, but also lays down a set of conditions for which conversion 
+
+80
+00:05:00,913 --> 00:05:06,779
+is lossless and the two representations become equivalent and interchangable.  
+
+81
+00:05:06,779 --> 00:05:10,601
+When the lossless conditions aren't met, the sampling theorem tells us 
+
+82
+00:05:10,601 --> 00:05:14,247
+how and how much information is lost or corrupted.
+
+83
+00:05:14,900 --> 00:05:21,270
+Up until very recently, analog technology was the basis for practically everything done with audio, 
+
+84
+00:05:21,270 --> 00:05:25,267
+and that's not because most audio comes from an originally analog source.
+
+85
+00:05:25,267 --> 00:05:28,450
+You may also think that since computers are fairly recent, 
+
+86
+00:05:28,450 --> 00:05:31,643
+analog signal technology must have come first.  
+
+87
+00:05:31,643 --> 00:05:34,428
+Nope. Digital is actually older.  
+
+88
+00:05:34,428 --> 00:05:37,611
+The telegraph predates the telephone by half a century 
+
+89
+00:05:37,611 --> 00:05:41,951
+and was already fully mechanically automated by the 1860s, sending coded, 
+
+90
+00:05:41,951 --> 00:05:46,476
+multiplexed digital signals long distances. You know... Tickertape. 
+
+91
+00:05:46,476 --> 00:05:50,427
+Harry Nyquist of Bell Labs was researching telegraph pulse transmission 
+
+92
+00:05:50,427 --> 00:05:53,027
+when he published his description of what later became known 
+
+93
+00:05:53,027 --> 00:05:57,219
+as the Nyquist frequency, the core concept of the sampling theorem.  
+
+94
+00:05:57,219 --> 00:06:01,642
+Now, it's true the telegraph was transmitting symbolic information, text, 
+
+95
+00:06:01,642 --> 00:06:06,883
+not a digitized analog signal, but with the advent of the telephone and radio,
+
+96
+00:06:06,883 --> 00:06:12,000
+analog and digital signal technology progressed rapidly and side-by-side.
+
+97
+00:06:12,699 --> 00:06:18,732
+Audio had always been manipulated as an analog signal because, well, gee it's so much easier.  
+
+98
+00:06:18,732 --> 00:06:23,257
+A second-order lowpass filter, for example, requires two passive components.  
+
+99
+00:06:23,257 --> 00:06:26,505
+An all-analog short-time Fourier transform, a few hundred.  
+
+100
+00:06:26,505 --> 00:06:30,752
+Well, maybe a thousand if you want to build something really fancy.  
+
+101
+00:06:31,844 --> 00:06:35,989
+Processing signals digitally requires millions to billions of transistors 
+
+102
+00:06:35,989 --> 00:06:40,366
+running at microwave frequencies, support hardware at very least to digitize 
+
+103
+00:06:40,366 --> 00:06:43,836
+and reconstruct the analog signals, a complete software ecosystem 
+
+104
+00:06:43,836 --> 00:06:47,362
+for programming and controlling that billion-transistor juggernaut,
+
+105
+00:06:47,362 --> 00:06:51,091
+digital storage just in case you want to keep any of those bits for later...
+
+106
+00:06:51,091 --> 00:06:56,171
+So we come to the conclusion that analog is the only practical way to do much with audio...
+
+107
+00:06:56,171 --> 00:07:07,019
+well, unless you happen to have a billion transistors and all the other things just lying around. 
+
+108
+00:07:07,850 --> 00:07:12,660
+And since we do, digital signal processing becomes very attractive.
+
+109
+00:07:13,363 --> 00:07:18,906
+For one thing, analog componentry just doesn't have the flexibility of a general purpose computer.
+
+110
+00:07:18,906 --> 00:07:21,182
+Adding a new function to this beast... 
+
+111
+00:07:22,191 --> 00:07:24,578
+yeah, it's probably not going to happen.  
+
+112
+00:07:24,578 --> 00:07:26,567
+On a digital processor though...
+
+113
+00:07:28,668 --> 00:07:34,127
+...just write a new program. Software isn't trivial, but it is a lot easier.
+
+114
+00:07:34,127 --> 00:07:39,550
+Perhaps more importantly though every analog component is an approximation. 
+
+115
+00:07:39,550 --> 00:07:44,352
+There's no such thing as a perfect transistor, or a perfect inductor, or a perfect capacitor.  
+
+116
+00:07:44,352 --> 00:07:51,569
+In analog, every component adds noise and distortion, usually not very much, but it adds up. 
+
+117
+00:07:51,569 --> 00:07:55,669
+Just transmitting an analog signal, especially over long distances,
+
+118
+00:07:55,669 --> 00:08:00,434
+progressively, measurably, irretrievably corrupts it.  
+
+119
+00:08:00,434 --> 00:08:06,513
+Besides, all of those single-purpose analog components take up a lot of space.  
+
+120
+00:08:06,513 --> 00:08:09,946
+Two lines of code on the billion transistors back here 
+
+121
+00:08:09,946 --> 00:08:14,702
+can implement a filter that would require an inductor the size of a refrigerator.
+
+122
+00:08:14,702 --> 00:08:17,941
+Digital systems don't have these drawbacks.  
+
+123
+00:08:17,941 --> 00:08:24,335
+Digital signals can be stored, copied, manipulated and transmitted without adding any noise or distortion. 
+
+124
+00:08:24,335 --> 00:08:26,889
+We do use lossy algorithms from time to time, 
+
+125
+00:08:26,889 --> 00:08:31,284
+but the only unavoidably non-ideal steps are digitization and reconstruction,
+
+126
+00:08:31,284 --> 00:08:35,929
+where digital has to interface with all of that messy analog.  
+
+127
+00:08:35,929 --> 00:08:40,750
+Messy or not, modern conversion stages are very, very good.  
+
+128
+00:08:40,750 --> 00:08:45,849
+By the standards of our ears, we can consider them practically lossless as well.
+
+129
+00:08:45,849 --> 00:08:50,429
+With a little extra hardware, then, most of which is now small and inexpensive 
+
+130
+00:08:50,429 --> 00:08:55,379
+due to our modern industrial infrastructure, digital audio is the clear winner over analog.
+
+131
+00:08:55,379 --> 00:09:00,857
+So let us then go about storing it, copying it, manipulating it, and transmitting it.
+
+132
+00:09:04,956 --> 00:09:08,639
+Pulse Code Modulation is the most common representation for raw audio.  
+
+133
+00:09:08,639 --> 00:09:13,867
+Other practical representations do exist, for example the Sigma-Delta coding used by the SACD, 
+
+134
+00:09:13,867 --> 00:09:16,625
+which is a form of Pulse Density Modulation.  
+
+135
+00:09:16,625 --> 00:09:19,687
+That said, Pulse Code Modulation is far and away dominant, 
+
+136
+00:09:19,687 --> 00:09:22,158
+mainly because it's so mathematically convenient.  
+
+137
+00:09:22,158 --> 00:09:26,350
+An audio engineer can spend an entire career without running into anything else.
+
+138
+00:09:26,350 --> 00:09:29,135
+PCM encoding can be characterized in three parameters,
+
+139
+00:09:29,135 --> 00:09:34,187
+making it easy to account for every possible PCM variant with mercifully little hassle.
+
+140
+00:09:34,187 --> 00:09:36,426
+The first parameter is the sampling rate.  
+
+141
+00:09:36,426 --> 00:09:40,886
+The highest frequency an encoding can represent is called the Nyquist Frequency.  
+
+142
+00:09:40,886 --> 00:09:45,124
+The Nyquist frequency of PCM happens to be exactly half the sampling rate.
+
+143
+00:09:45,124 --> 00:09:51,389
+Therefore the sampling rate directly determines the highest possible frequency in the digitized signal.
+
+144
+00:09:51,389 --> 00:09:56,515
+Analog telephone systems traditionally band-limited voice channels to just under 4kHz, 
+
+145
+00:09:56,515 --> 00:10:02,224
+so digital telephony and most classic voice applications use an 8kHz sampling rate, 
+
+146
+00:10:02,224 --> 00:10:07,277
+the minimum sampling rate necessary to capture the entire bandwidth of a 4kHz channel.  
+
+147
+00:10:07,227 --> 00:10:14,263
+This is what an 8kHz sampling rate sounds like--- a bit muffled but perfectly intelligible for voice.  
+
+148
+00:10:17,263 --> 00:10:18,149
+This is the lowest sampling rate that's ever been used widely in practice.
+
+149
+00:10:18,149 --> 00:10:23,322
+From there, as power, and memory, and storage increased, consumer computer hardware
+
+150
+00:10:23,322 --> 00:10:29,642
+went to offering 11, and then 16, and then 22, and then 32kHz sampling.  
+
+151
+00:10:29,642 --> 00:10:33,491
+With each increase in the sampling rate and the Nyquist frequency, 
+
+152
+00:10:33,491 --> 00:10:38,302
+it's obvious that the high end becomes a little clearer and the sound more natural.
+
+153
+00:10:38,301 --> 00:10:44,576
+The Compact Disc uses a 44.1kHz sampling rate, which is again slightly better than 32kHz, 
+
+154
+00:10:44,576 --> 00:10:46,788
+but the gains are becoming less distinct.  
+
+155
+00:10:46,788 --> 00:10:52,053
+44.1kHz is a bit of an oddball choice, especially given that it hadn't been used  for anything 
+
+156
+00:10:52,053 --> 00:10:56,559
+prior to the compact disc, but the huge success of the CD has made it a common rate.
+
+157
+00:10:56,559 --> 00:11:01,195
+The most common hi-fidelity sampling rate aside from the CD is 48kHz.
+
+158
+00:11:05,710 --> 00:11:08,597
+There's virtually no audible difference between the two.  
+
+159
+00:11:08,597 --> 00:11:13,640
+This video, or at least the original version of it, was shot and produced with 48kHz audio, 
+
+160
+00:11:13,640 --> 00:11:18,545
+which happens to be the original standard for high-fidelity audio with video.
+
+161
+00:11:18,545 --> 00:11:25,100
+Super-hi-fidelity sampling rates of 88, and 96, and 192kHz have also appeared. 
+
+162
+00:11:25,100 --> 00:11:30,888
+The reason for the sampling rates beyond 48kHz isn't to extend the audible high frequencies further. 
+
+163
+00:11:30,888 --> 00:11:32,489
+It's for a different reason.
+
+164
+00:11:32,896 --> 00:11:37,319
+Stepping back for just a second, the French mathematician Jean Baptiste Joseph Fourier 
+
+165
+00:11:37,319 --> 00:11:42,353
+showed that we can also think of signals like audio as a set of component frequencies.  
+
+166
+00:11:42,353 --> 00:11:45,841
+This frequency domain representation is equivalent to the time representation; 
+
+167
+00:11:45,841 --> 00:11:49,719
+the signal is exactly the same, we're just looking at it a different way.  
+
+168
+00:11:49,719 --> 00:11:56,131
+Here we see the frequency domain representation of a hypothetical analog signal we intend to digitally sample.
+
+169
+00:11:56,131 --> 00:11:59,888
+The sampling theorem tells us two essential things about the sampling process. 
+
+170
+00:11:59,888 --> 00:12:04,727
+First, that a digital signal can't represent any frequencies above the Nyquist frequency. 
+
+171
+00:12:04,727 --> 00:12:10,640
+Second, and this is the new part, if we don't remove those frequencies with a lowpass filter before sampling, 
+
+172
+00:12:10,640 --> 00:12:16,414
+the sampling process will fold them down into the representable frequency range as aliasing distortion.
+
+173
+00:12:16,414 --> 00:12:20,069
+Aliasing, in a nutshell, sounds freakin' awful, 
+
+174
+00:12:20,069 --> 00:12:25,242
+so it's essential to remove any beyond-Nyquist frequencies before sampling and after reconstruction.
+
+175
+00:12:25,871 --> 00:12:31,265
+Human frequency perception is considered to extend to about 20kHz. 
+
+176
+00:12:31,265 --> 00:12:37,548
+In 44.1 or 48kHz sampling, the lowpass before the sampling stage has to be extremely sharp 
+
+177
+00:12:37,548 --> 00:12:42,101
+to avoid cutting any audible frequencies below 20kHz 
+
+178
+00:12:42,101 --> 00:12:49,439
+but still not allow frequencies above the Nyquist to leak forward into the sampling process.  
+
+179
+00:12:49,439 --> 00:12:55,342
+This is a difficult filter to build and no practical filter succeeds completely. 
+
+180
+00:12:55,342 --> 00:13:00,024
+If the sampling rate is 96kHz or 192kHz on the other hand, 
+
+181
+00:13:00,024 --> 00:13:07,223
+the lowpass has an extra octave or two for its transition band. This is a much easier filter to build.  
+
+182
+00:13:07,223 --> 00:13:14,348
+Sampling rates beyond 48kHz are actually one of those messy analog stage compromises.
+
+183
+00:13:15,014 --> 00:13:20,844
+The second fundamental PCM parameter is the sample format, that is, the format of each digital number.  
+
+184
+00:13:20,844 --> 00:13:26,285
+A number is a number, but a number can be represented in bits a number of different ways.
+
+185
+00:13:26,942 --> 00:13:30,902
+Early PCM was eight bit linear, encoded as an unsigned byte.  
+
+186
+00:13:30,902 --> 00:13:37,028
+The dynamic range is limited to about 50dB and the quantization noise, as you can hear, is pretty severe. 
+
+187
+00:13:37,028 --> 00:13:39,970
+Eight bit audio is vanishingly rare today.
+
+188
+00:13:41,007 --> 00:13:47,484
+Digital telephony typically uses one of two related non-linear eight bit encodings 
+called A-law and mu-law. 
+
+189
+00:13:47,484 --> 00:13:51,287
+These formats encode a roughly 14 bit dynamic range into eight bits 
+
+190
+00:13:51,287 --> 00:13:54,674
+by spacing the higher amplitude values farther apart. 
+
+191
+00:13:54,674 --> 00:13:59,226
+A-law and mu-law obviously improve quantization noise compared to linear 8-bit, 
+
+192
+00:13:59,226 --> 00:14:03,557
+and voice harmonics especially hide the remaining quantization noise well. 
+
+193
+00:14:03,557 --> 00:14:08,248
+All three eight bit encodings, linear, A-law, and mu-law, are typically paired 
+
+194
+00:14:08,248 --> 00:14:13,328
+with an 8kHz sampling rate, though I'm demonstrating them here at 48kHz.
+
+195
+00:14:13,328 --> 00:14:18,491
+Most modern PCM uses 16 or 24 bit two's-complement signed integers to encode 
+
+196
+00:14:18,491 --> 00:14:23,858
+the range from negative infinity to zero decibels in 16 or 24 bits of precision. 
+
+197
+00:14:23,858 --> 00:14:27,800
+The maximum absolute value corresponds to zero decibels. 
+
+198
+00:14:27,800 --> 00:14:31,584
+As with all the sample formats so far, signals beyond zero decibels 
+
+199
+00:14:31,584 --> 00:14:35,619
+and thus beyond the maximum representable range are clipped.
+
+200
+00:14:35,619 --> 00:14:41,199
+In mixing and mastering, it's not unusual to use floating point numbers for PCM instead of integers.  
+
+201
+00:14:41,199 --> 00:14:47,222
+A 32 bit IEEE754 float, that's the normal kind of floating point you see on current computers, 
+
+202
+00:14:47,222 --> 00:14:52,793
+has 24 bits of resolution, but a seven bit floating point exponent increases the representable range.  
+
+203
+00:14:52,793 --> 00:14:57,040
+Floating point usually represents zero decibels as +/-1.0, 
+
+204
+00:14:57,040 --> 00:15:00,547
+and because floats can obviously represent considerably beyond that, 
+
+205
+00:15:00,547 --> 00:15:05,220
+temporarily exceeding zero decibels during the mixing process doesn't cause clipping.
+
+206
+00:15:05,220 --> 00:15:11,077 
+Floating point PCM takes up more space, so it tends to be used only as an intermediate production format.
+
+207
+00:15:11,077 --> 00:15:15,796
+Lastly, most general purpose computers still read and write data in octet bytes, 
+
+208
+00:15:15,796 --> 00:15:18,489
+so it's important to remember that samples bigger than eight bits 
+
+209
+00:15:18,489 --> 00:15:22,838
+can be in big or little endian order, and both endiannesses are common.  
+
+210
+00:15:22,838 --> 00:15:28,751
+For example, Microsoft WAV files are little endian, and Apple AIFC files tend to be big-endian.  
+
+211
+00:15:28,751 --> 00:15:30,139
+Be aware of it.
+
+212
+00:15:30,870 --> 00:15:34,071
+The third PCM parameter is the number of channels.  
+
+213
+00:15:34,071 --> 00:15:38,485
+The convention in raw PCM is to encode multiple channels by interleaving the samples 
+
+214
+00:15:38,485 --> 00:15:43,398
+of each channel together into a single stream.  Straightforward and extensible.
+
+215
+00:15:43,398 --> 00:15:47,701
+And that's it!  That describes every PCM representation ever. 
+
+216
+00:15:47,701 --> 00:15:51,578
+Done. Digital audio is _so easy_!  
+
+217
+00:15:51,578 --> 00:15:56,436
+There's more to do of course, but at this point we've got a nice useful chunk of audio data, 
+
+218
+00:15:56,436 --> 00:15:58,092
+so let's get some video too.
+
+219
+00:16:02,571 --> 00:16:08,798
+One could think of video as being like audio but with two additional spatial dimensions, X and Y, 
+
+220
+00:16:08,798 --> 00:16:12,787
+in addition to the dimension of time. This is mathematically sound.  
+
+221
+00:16:12,787 --> 00:16:19,097
+The Sampling Theorem applies to all three video dimensions just as it does the single time dimension of audio.
+
+222
+00:16:19,097 --> 00:16:25,815
+Audio and video are obviously quite different in practice. For one, compared to audio, video is huge. 
+
+223
+00:16:25,815 --> 00:16:29,294
+Raw CD audio is about 1.4 megabits per second. 
+
+224
+00:16:29,294 --> 00:16:33,958
+Raw 1080i HD video is over 700 megabits per second. 
+
+225
+00:16:33,958 --> 00:16:40,056
+That's more than 500 times more data to capture, process and store per second.  
+
+226
+00:16:40,056 --> 00:16:43,711
+By Moore's law... that's... let's see... roughly eight doublings times two years, 
+
+227
+00:16:43,711 --> 00:16:47,838
+so yeah, computers requiring about an extra fifteen years to handle raw video 
+
+228
+00:16:47,838 --> 00:16:51,252
+after getting raw audio down pat was about right.
+
+229
+00:16:51,252 --> 00:16:55,425
+Basic raw video is also just more complex than basic raw audio. 
+
+230
+00:16:55,425 --> 00:16:58,599
+The sheer volume of data currently necessitates a representation 
+
+231
+00:16:58,599 --> 00:17:02,106 
+more efficient than the linear PCM used for audio.  
+
+232
+00:17:02,106 --> 00:17:06,705
+In addition, electronic video comes almost entirely from broadcast television alone,
+
+233
+00:17:06,705 --> 00:17:13,423
+and the standards committees that govern broadcast video have always been very concerned with backward compatibility.
+
+234
+00:17:13,423 --> 00:17:17,559  
+Up until just last year in the US, a sixty year old black and white television 
+
+235
+00:17:17,559 --> 00:17:21,038
+could still show a normal analog television broadcast.  
+
+236
+00:17:21,038 --> 00:17:23,879
+That's actually a really neat trick.
+
+237
+00:17:23,879 --> 00:17:28,718
+The downside to backward compatibility is that once a detail makes it into a standard, 
+
+238
+00:17:28,718 -->  00:17:30,985
+you can't ever really throw it out again. 
+
+239
+00:17:30,985 --> 00:17:37,305
+Electronic video has never started over from scratch the way audio has multiple times.  
+
+240
+00:17:37,305 --> 00:17:43,958
+Sixty years worth of clever but obsolete hacks necessitated by the passing technology of a given era 
+
+241
+00:17:43,958 --> 00:17:50,102
+have built up into quite a pile, and because digital standards also come from broadcast television, 
+
+242
+00:17:50,102 --> 00:17:54,664
+all these eldritch hacks have been brought forward into the digital standards as well.
+
+243
+00:17:54,664 --> 00:18:00,022
+In short, there are a whole lot more details involved in digital video than there were with audio. 
+
+244
+00:18:00,022 --> 00:18:05,592
+There's no hope of covering them all completely here, so we'll cover the broad fundamentals.
+
+245
+00:18:06,036 --> 00:18:10,857
+The most obvious raw video parameters are the width and height of the picture in pixels. 
+
+246
+00:18:10,857 --> 00:18:15,882
+As simple as that may sound, the pixel dimensions alone don't actually specify the absolute 
+
+247
+00:18:15,882 --> 00:18:22,016
+width and height of the picture, as most broadcast-derived video doesn't use square pixels.
+
+248
+00:18:22,016 --> 00:18:25,005
+The number of scanlines in a broadcast image was fixed, 
+
+249
+00:18:25,005 --> 00:18:29,021
+but the effective number of horizontal pixels was a function of channel bandwidth. 
+
+250
+00:18:29,021 --> 00:18:31,945
+Effective horizontal resolution could result in pixels that were either 
+
+251
+00:18:31,945 --> 00:18:35,489
+narrower or wider than the spacing between scanlines.
+
+252
+00:18:35,489 --> 00:18:38,395
+Standards have generally specified that digitally sampled video 
+
+253
+00:18:38,395 --> 00:18:41,902
+should reflect the real resolution of the original analog source, 
+
+254
+00:18:41,902 --> 00:18:45,566
+so a large amount of digital video also uses non-square pixels. 
+
+255
+00:18:45,566 --> 00:18:49,924
+For example, a normal 4:3 aspect NTSC DVD is typically encoded 
+
+256
+00:18:49,924 --> 00:18:55,374
+with a display resolution of 704 by 480, a ratio wider than 4:3.  
+
+257
+00:18:55,374 --> 00:18:59,640
+In this case, the pixels themselves are assigned an aspect ratio of 10:11, 
+
+258
+00:18:59,640 --> 00:19:04,553
+making them taller than they are wide and narrowing the image horizontally to the
+correct aspect.  
+
+259
+00:19:04,553 --> 00:19:09,800
+Such an image has to be resampled to show properly on a digital display with square pixels.
+
+260
+00:19:10,253 -->  00:19:15,287
+The second obvious video parameter is the frame rate, the number of full frames per second.  
+
+261
+00:19:15,287 --> 00:19:19,655
+Several standard frame rates are in active use. Digital video, in one form or another, 
+
+262
+00:19:19,655 --> 00:19:23,689
+can use all of them.  Or, any other frame rate.  Or even variable rates 
+
+263
+00:19:23,689 --> 00:19:27,113
+where the frame rate changes adaptively over the course of the video. 
+
+264
+00:19:27,113 --> 00:19:32,998
+The higher the frame rate, the smoother the motion and that brings us, unfortunately, to interlacing.
+
+265
+00:19:32,998 --> 00:19:37,967
+In the very earliest days of broadcast video, engineers sought the fastest practical framerate 
+
+266
+00:19:37,967 --> 00:19:42,075
+to smooth motion and to minimize flicker on phosphor-based CRTs.  
+
+267
+00:19:42,075 --> 00:19:45,277
+They were also under pressure to use the least possible bandwidth 
+
+268
+00:19:45,277 --> 00:19:48,182
+for the highest resolution and fastest frame rate.  
+
+269
+00:19:48,182 --> 00:19:51,208
+Their solution was to interlace the video where the even lines 
+
+270
+00:19:51,208 --> 00:19:54,826
+are sent in one pass and the odd lines in the next.  
+
+271
+00:19:54,826 --> 00:19:59,961
+Each pass is called a field and two fields sort of produce one complete frame.
+
+272
+00:19:59,961 --> 00:20:05,319
+"Sort of", because the even and odd fields aren't actually from the same source frame.  
+
+273
+00:20:05,319 --> 00:20:10,797
+In a 60 field per second picture, the source frame rate is actually 60 full frames per second, 
+
+274
+00:20:10,797 --> 00:20:15,386
+and half of each frame, every other line, is simply discarded.  
+
+275
+00:20:15,386 --> 00:20:20,272
+This is why we can't deinterlace a video simply by combining two fields into one frame;
+
+276
+00:20:20,272 --> 00:20:23,039
+they're not actually from one frame to begin with.
+
+277
+00:20:24,047 --> 00:20:29,683
+The cathode ray tube was the only available display technology for most of the history of electronic video. 
+
+278
+00:20:29,683 --> 00:20:32,949
+A CRT's output brightness is nonlinear, approximately equal 
+
+279
+00:20:32,949 --> 00:20:36,585
+to the input controlling voltage raised to the 2.5th power. 
+
+280
+00:20:36,585 --> 00:20:43,821
+This exponent, 2.5, is designated gamma, and so it's often referred to as the gamma of a display.  
+
+281
+00:20:43,821 --> 00:20:50,493
+Cameras, though, are linear, and if you feed a CRT a linear input signal, it looks a
+bit like this.
+
+282
+00:20:51,270 --> 00:20:56,637
+As there were originally to be very few cameras, which were fantastically expensive anyway, 
+
+283
+00:20:56,637 --> 00:21:01,634
+and hopefully many, many television sets which best be as inexpensive as possible, 
+
+284
+00:21:01,634 --> 00:21:08,222
+engineers decided to add the necessary gamma correction circuitry to the cameras rather than the sets. 
+
+285
+00:21:08,222 --> 00:21:13,062
+Video transmitted over the airwaves would thus have a nonlinear intensity using the inverse 
+
+286
+00:21:13,062 --> 00:21:18,271
+of the set's gamma exponent, so that once a camera's signal was finally displayed on the CRT, 
+
+287
+00:21:18,271 --> 00:21:23,305
+the overall response of the system from camera to set was back to linear again.
+
+288
+00:21:23,777 --> 00:21:25,118
+Almost.
+
+289
+00:21:30,393 --> 00:21:33,113
+There were also two other tweaks. 
+
+290
+00:21:33,113 --> 00:21:40,442
+A television camera actually uses a gamma exponent that's the inverse of 2.2, not 2.5.  
+
+291
+00:21:40,442 --> 00:21:43,754
+That's just a correction for viewing in a dim environment. 
+
+292
+00:21:43,754 --> 00:21:48,279
+Also, the exponential curve transitions to a linear ramp near black.  
+
+293
+00:21:48,279 --> 00:21:52,360
+That's just an old hack for suppressing sensor noise in the camera.
+
+294
+00:21:54,941 --> 00:21:57,347
+Gamma correction also had a lucky benefit. 
+
+295
+00:21:57,347 --> 00:22:02,214
+It just so happens that the human eye has a perceptual gamma of about 3.  
+
+296
+00:22:02,214 --> 00:22:05,962
+This is relatively close to the CRT's gamma of 2.5. 
+
+297
+00:22:05,962 --> 00:22:10,607
+An image using gamma correction devotes more resolution to lower intensities, 
+
+298
+00:22:10,607 --> 00:22:14,336
+where the eye happens to have its finest intensity discrimination, 
+
+299
+00:22:14,336 --> 00:22:18,222
+and therefore uses the available scale resolution more efficiently.  
+
+300
+00:22:18,222 --> 00:22:22,784
+Although CRTs are currently vanishing, a standard sRGB computer display 
+
+301
+00:22:22,784 --> 00:22:28,419
+still uses a nonlinear intensity curve similar to television, with a linear ramp near black,
+
+302
+00:22:28,419 --> 00:22:32,491
+followed by an exponential curve with a gamma exponent of 2.4. 
+
+303
+00:22:32,491 --> 00:22:36,636
+This encodes a sixteen bit linear range down into eight bits.
+
+304 
+00:22:37,580 --> 00:22:41,790
+The human eye has three apparent color channels, red, green, and blue, 
+
+305
+00:22:41,790 --> 00:22:47,407
+and most displays use these three colors as additive primaries to produce a full range of color output.  
+
+306
+00:22:49,258 --> 00:22:54,190
+The primary pigments in printing are Cyan, Magenta, and Yellow for the same reason; 
+
+307
+00:22:54,190 --> 00:22:59,381
+pigments are subtractive, and each of these pigments subtracts one pure color from reflected light.  
+
+308
+00:22:59,381 --> 00:23:05,682
+Cyan subtracts red, magenta subtracts green, and yellow subtracts blue.
+
+309
+00:23:05,682 --> 00:23:10,919
+Video can be and sometimes is represented with red, green, and blue color channels, 
+
+310
+00:23:10,919 --> 00:23:17,211
+but RGB video is atypical. The human eye is far more sensitive to luminosity than it is the color, 
+
+311
+00:23:17,211 --> 00:23:21,329
+and RGB tends to spread the energy of an image across all three color channels.  
+
+312
+00:23:21,329 --> 00:23:25,326
+That is, the red plane looks like a red version of the original picture, 
+
+313
+00:23:25,326 --> 00:23:28,769
+the green plane looks like a green version of the original picture, 
+
+314
+00:23:28,769 --> 00:23:32,063
+and the blue plane looks like a blue version of the original picture.  
+
+315
+00:23:32,063 --> 00:23:35,705
+Black and white times three.  Not efficient.
+
+316
+00:23:35,706 --> 00:23:39,438
+For those reasons and because, oh hey, television just happened to start out 
+
+317
+00:23:39,438 --> 00:23:45,017
+as black and white anyway, video usually is represented as a high resolution luma channel, 
+
+318
+00:23:45,017 --> 00:23:51,041
+the black & white, along with additional, often lower resolution chroma channels, the color. 
+
+319
+00:23:51,041 --> 00:23:57,074
+The luma channel, Y, is produced by weighting and then adding the separate red, green and blue signals.  
+
+320
+00:23:57,074 --> 00:24:01,867
+The chroma channels U and V are then produced by subtracting the luma signal from blue 
+
+321
+00:24:01,867 --> 00:24:04,070
+and the luma signal from red.
+
+322
+00:24:04,070 --> 00:24:11,750
+When YUV is scaled, offset and quantized for digital video, it's usually more correctly called Y'CbCr, 
+
+323
+00:24:11,750 --> 00:24:15,238
+but the more generic term YUV is widely used to describe 
+
+324
+00:24:15,238 --> 00:24:18,301
+all the analog and digital variants of this color model.
+
+325
+00:24:18,912 --> 00:24:22,983
+The U and V chroma channels can have the same resolution as the Y channel, 
+
+326
+00:24:22,983 --> 00:24:28,674
+but because the human eye has far less spatial color resolution than spatial luminosity resolution, 
+
+327
+00:24:28,674 --> 00:24:34,346
+chroma resolution is usually halved or even quartered in the horizontal direction, the vertical direction, 
+
+328
+00:24:34,346 --> 00:24:39,528
+or both, usually without any significant impact on the apparent raw image quality. 
+
+329
+00:24:39,528 --> 00:24:43,942
+Practically every possible subsampling variant has been used at one time or another,
+
+330
+00:24:43,942 --> 00:24:46,875
+but the common choices today are 
+
+331
+00:24:46,875 --> 00:24:51,187
+4:4:4 video, which isn't actually subsampled at all, 
+
+332
+00:24:51,187 --> 00:24:56,711
+4:2:2 video in which the horizontal resolution of the U and V channels is halved, 
+
+333
+00:24:56,711 --> 00:25:02,587
+and most common of all, 4:2:0 video in which both the horizontal and vertical resolutions 
+
+334
+00:25:02,587 --> 00:25:08,897
+of the chroma channels are halved, resulting in U and V planes that are each one quarter the size of Y.
+
+335
+00:25:08,897 --> 00:25:17,096
+The terms 4:2:2, 4:2:0, 4:1:1 and so on and so forth, aren't complete descriptions of a chroma subsampling. 
+
+336
+00:25:17,096 --> 00:25:21,186
+There's multiple possible ways to position the chroma pixels relative to luma, 
+
+337
+00:25:21,096 --> 00:25:24,776 
+and again, several variants are in active use for each subsampling.  
+
+338
+00:25:24,776 --> 00:25:32,502
+For example, motion JPEG, MPEG-1 video, MPEG-2 video, DV, Theora and WebM all use 
+
+339
+00:25:32,502 --> 00:25:38,137
+or can use 4:2:0 subsampling, but they site the chroma pixels three different ways.
+
+340
+00:25:38,498 --> 00:25:43,023
+Motion JPEG, MPEG1 video, Theora and WebM all site chroma pixels 
+
+341
+00:25:43,023 --> 00:25:46,345
+between luma pixels both horizontally and vertically.
+
+342
+00:25:46,345 --> 00:25:51,989
+MPEG2 sites chroma pixels between lines, but horizontally aligned with every other luma pixel. 
+
+343
+00:25:51,989 --> 00:25:57,106
+Interlaced modes complicate things somewhat, resulting in a siting arrangement that's a tad bizarre.
+
+344
+00:25:57,106 --> 00:26:00,909
+And finally PAL-DV, which is always interlaced, places the chroma pixels 
+
+345
+00:26:00,909 --> 00:26:04,398
+in the same position as every other luma pixel in the horizontal direction, 
+
+346
+00:26:04,398 --> 00:26:07,303
+and vertically alternates chroma channel on each line.
+
+347
+00:26:07,683 --> 00:26:12,282
+That's just 4:2:0 video. I'll leave the other subsamplings as homework for the
+viewer.  
+
+348
+00:26:12,282 --> 00:26:14,882
+You've got the basic idea, moving on.
+
+349
+00:26:15,511 --> 00:26:21,128
+In audio, we always represent multiple channels in a PCM stream by interleaving the samples 
+
+350
+00:26:21,128 --> 00:26:26,383
+from each channel in order. Video uses both packed formats that interleave the color channels, 
+
+351
+00:26:26,383 --> 00:26:30,584
+as well as planar formats that keep the pixels from each channel together in separate planes 
+
+352
+00:26:30,584 --> 00:26:35,415
+stacked in order in the frame. There are at least 50 different formats in these two broad categories 
+
+353
+00:26:35,415 --> 00:26:41,549
+with possibly ten or fifteen in common use. Each chroma subsampling and different bit-depth requires 
+
+354
+00:26:41,549 --> 00:26:46,574
+a different packing arrangement,  and so a different pixel format.  For a given unique subsampling, 
+
+355
+00:26:46,574 --> 00:26:50,858 
+there are usually also several equivalent formats that consist of trivial channel order 
+
+356
+00:26:50,858 --> 00:26:55,966
+rearrangements or repackings due either to convenience once-upon-a-time on some particular 
+
+357
+00:26:55,966 --> 00:27:00,352
+piece of hardware or sometimes just good old-fashioned spite.
+
+358
+00:27:00,352 --> 00:27:04,692
+Pixels formats are described by a unique name or fourcc code.  
+
+359
+00:27:04,692 --> 00:27:08,115
+There are quite a few of these and there's no sense going over each one now.
+
+360
+00:27:08,115 --> 00:27:13,704
+Google is your friend.  Be aware that fourcc codes for raw video specify the pixel arrangement 
+
+361
+00:27:13,704 --> 00:27:20,339
+and chroma subsampling, but generally don't imply anything certain about chroma siting or color space.  
+
+362
+00:27:20,339 --> 00:27:25,807
+YV12 video to pick one, can use JPEG, MPEG-2 or DV chroma siting, 
+
+363
+00:27:25,807 --> 00:27:28,991
+and any one of several YUV colorspace definitions.
+
+364
+00:27:29,472 --> 00:27:33,913
+That wraps up our not so quick and yet very incomplete tour of raw video. 
+
+365
+00:27:33,913 --> 00:27:38,651
+The good news is we can already get quite a lot of real work done using that overview. 
+
+366
+00:27:38,651 --> 00:27:42,528
+In plenty of situations, a frame of video data is a frame of video data.  
+
+367
+00:27:42,528 --> 00:27:46,451
+The details matter, greatly, when it come time to write software, 
+
+368
+00:27:46,452 --> 00:27:52,086
+but for now I am satisfied that the esteemed viewer is broadly aware of the relevant issues.
+
+369
+00:27:55,640 --> 00:27:59,230
+So. We have audio data. We have video data. 
+
+370
+00:27:59,230 --> 00:28:03,246
+What remains is the more familiar non-signal data and straight up engineering 
+
+371
+00:28:03,246 --> 00:28:07,410
+that software developers are used to. And plenty of it!
+
+372
+00:28:07,928 --> 00:28:11,768 
+Chunks of raw audio and video data have no externally visible structure, 
+
+373
+00:28:11,768 -->  00:28:15,173
+but they're often uniformly sized.  We could just string them together 
+
+374
+00:28:15,173 --> 00:28:18,097
+in a rigid pre-determined ordering for streaming and storage, 
+
+375
+00:28:18,097 --> 00:28:21,040
+and some simple systems do approximately that. 
+
+376
+00:28:21,040 --> 00:28:24,195
+Compressed frames though aren't necessarily a predictable size, 
+
+377
+00:28:24,195 --> 00:28:29,405
+and we usually want some flexibility in using a range of different data types in streams.
+
+378
+00:28:29,405 --> 00:28:34,281
+If we string random formless data together, we lose the boundaries that separate frames 
+
+379
+00:28:34,281 --> 00:28:37,871
+and don't necessarily know what data belongs to which streams.  
+
+380
+00:28:37,871 --> 00:28:42,192
+A stream needs some generalized structure to be generally useful.
+
+381
+00:28:42,192 --> 00:28:46,606
+In addition to our signal data, we also have our PCM and video parameters.  
+
+382
+00:28:46,606 --> 00:28:49,752
+There's probably plenty of other metadata we also want to deal with, 
+
+383
+00:28:49,752 --> 00:28:55,415
+like audio tags and video chapters and subtitles, all essential components of rich media.  
+
+384
+00:28:55,415 --> 00:29:01,633
+It makes sense to place this metadata, that is,  data about the data, within the media itself.
+
+385
+00:29:01,633 --> 00:29:06,445
+Storing and structuring formless data and disparate metadata is the job of a container.  
+
+386
+00:29:06,445 --> 00:29:09,221
+Containers provide framing for the data blobs, 
+
+387
+00:29:09,221 --> 00:29:12,015
+interleave and identify multiple data streams, 
+
+388
+00:29:12,015 --> 00:29:15,337
+provide timing information, and store the metadata necessary 
+
+389
+00:29:15,337 --> 00:29:19,140
+to parse, navigate, manipulate and present the media.  
+
+390
+00:29:19,140 --> 00:29:22,222
+In general, any container can hold any kind of data.  
+
+391
+00:29:22,222 --> 00:29:24,970
+And data can be put into any container.
+
+392
+00:29:28,801 --> 00:29:32,391 
+In the past thirty minutes, we've covered digital audio, video, 
+
+393
+00:29:32,391 --> 00:29:35,435
+some history, some math and a little engineering. 
+
+394
+00:29:35,435 --> 00:29:39,377
+We've barely scratched the surface, but it's time for a well earned break.
+
+395
+00:29:41,107 --> 00:29:45,373
+There's so much more to talk about, so I hope you'll join me again in our next episode.  
+
+396
+00:29:45,373 --> 00:29:47,159
+Until then--- Cheers!
+

Deleted: websites/xiph.org/video/vid1-en_US.kate
===================================================================
(Binary files differ)

Deleted: websites/xiph.org/video/vid1-en_US.srt
===================================================================
--- websites/xiph.org/video/vid1-en_US.srt	2010-09-23 12:54:20 UTC (rev 17433)
+++ websites/xiph.org/video/vid1-en_US.srt	2010-09-23 13:49:13 UTC (rev 17434)
@@ -1,1588 +0,0 @@
-1
-00:00:08,124 --> 00:00:10,742
-Workstations and high end personal computers have been able to
-
-2
-00:00:10,742 --> 00:00:14,749
-manipulate digital audio pretty easily for about fifteen years now.
-
-3
-00:00:14,749 --> 00:00:17,470
-It's only been about five years that a decent workstation's been able
-
-4
-00:00:17,470 --> 00:00:21,643
-to handle raw video without a lot of expensive special purpose hardware.
-
-5
-00:00:21,643 --> 00:00:25,400
-But today even most cheap home PCs have the processor power and
-
-6
-00:00:25,400 --> 00:00:28,092
-storage necessary to really toss raw video around,
-
-7
-00:00:28,092 --> 00:00:30,479
-at least without too much of a struggle. 
-
-8
-00:00:30,479 --> 00:00:33,579
-So now that everyone has all of this cheap capable hardware, 
-
-9
-00:00:33,579 --> 00:00:36,651
-more people, not surprisingly, want to do interesting
-
-10
-00:00:36,651 --> 00:00:39,908
-things with digital media, especially streaming. 
-
-11
-00:00:39,908 --> 00:00:44,017
-YouTube was the first huge success, and now everybody wants in.
-
-12
-00:00:44,017 --> 00:00:47,413
-Well good!  Because this stuff is a lot of fun!
-
-13
-00:00:48,250 --> 00:00:51,179
-It's no problem finding consumers for digital media.  
-
-14
-00:00:51,179 --> 00:00:54,649
-But here, I'd like to address the engineers, the mathematicians, 
-
-15
-00:00:54,649 --> 00:00:57,869
-the hackers, the people who are interested in discovering 
-
-16
-00:00:57,869 --> 00:01:01,302
-and making things and building the technology itself. 
-
-17
-00:01:01,302 --> 00:01:03,282
-The people after my own heart.
-
-18
-00:01:04,250 --> 00:01:08,723
-Digital media, compression especially, is perceived to be super-elite,
-
-19
-00:01:08,723 --> 00:01:12,822
-somehow incredibly more difficult than anything else in computer science. 
-
-20
-00:01:12,822 --> 00:01:15,700
-The big industry players in the field don't mind this perception at all; 
-
-21
-00:01:15,700 --> 00:01:19,734
-it helps justify the staggering number of very basic patents they hold.  
-
-22
-00:01:19,734 --> 00:01:23,870
-They like the image that their media researchers are the best of the best, 
-
-23
-00:01:23,870 --> 00:01:27,738
-so much smarter than anyone else that their brilliant ideas can't 
-
-24
-00:01:27,738 --> 00:01:29,903
-even be understood by mere mortals. 
-
-25
-00:01:30,625 --> 00:01:33,716
-This is bunk.  
-
-26
-00:01:35,205 --> 00:01:38,900
-Digital audio and video and streaming and compression 
-
-27
-00:01:38,900 --> 00:01:42,738
-offer endless deep and stimulating mental challenges, 
-
-28
-00:01:42,738 --> 00:01:44,662
-just like any other discipline. 
-
-29
-00:01:44,662 --> 00:01:47,929
-It seems elite because so few people have been been involved.  
-
-30
-00:01:47,929 --> 00:01:51,223
-So few people have been involved perhaps because so few people 
-
-31
-00:01:51,223 --> 00:01:54,665
-could afford the expensive, special-purpose equipment it required. 
-
-32
-00:01:54,665 --> 00:01:58,792
-But today, just about anyone watching this video has a cheap, 
-
-33
-00:01:58,792 --> 00:02:03,317
-general-purpose computer powerful enough to play with the big boys. 
-
-34
-00:02:05,926 --> 00:02:11,108
-There are battles going on today around HTML5 and browsers 
-
-35
-00:02:11,108 --> 00:02:13,671
-and video and open vs. closed. 
-
-36
-00:02:13,671 --> 00:02:17,048
-So now is a pretty good time to get involved.  
-
-37
-00:02:17,048 --> 00:02:20,000
-The easiest place to start is probably understanding 
-
-38
-00:02:20,000 --> 00:02:22,619
-the technology we have right now.
-
-39
-00:02:23,500 --> 00:02:25,071
-This is an introduction. 
-
-40
-00:02:25,071 --> 00:02:28,180
-Since it's an introduction, it glosses over a ton of details 
-
-41
-00:02:28,180 --> 00:02:30,882
-so that the big picture's a little easier to see.
-
-42
-00:02:30,882 --> 00:02:33,908
-Quite a few people watching are going to be way past anything 
-
-43
-00:02:33,908 --> 00:02:36,378
-that I'm talking about, at least for now.  
-
-44
-00:02:36,378 --> 00:02:39,293
-On the other hand, I'm probably going to go too fast for folks 
-
-45
-00:02:39,293 --> 00:02:44,558
-who really are are brand new to all of this, so if this is all new, relax. 
-
-46
-00:02:44,558 --> 00:02:48,629
-The important thing is to pick out any ideas  that really grab your imagination.
-
-47
-00:02:48,629 --> 00:02:52,497
-Especially pay attention to the terminology surrounding those ideas, 
-
-48
-00:02:52,479 --> 00:02:56,078
-because with those, and Google, and Wikipedia, you can dig 
-
-49
-00:02:56,078 --> 00:02:57,753
-as deep as interests you.
-
-50
-00:02:57,753 --> 00:03:00,094
-So, without any further ado, 
-
-51
-00:03:00,094 --> 00:03:03,351
-welcome to one hell of a new hobby.
-
-52
-00:03:10,291 --> 00:03:13,030
-Sound is the propagation of pressure waves through air, 
-
-53
-00:03:13,030 --> 00:03:16,981
-spreading out from a source like ripples spread from a stone tossed into a pond.
-
-54
-00:03:16,981 --> 00:03:19,489
-A microphone, or the human ear for that matter, 
-
-55
-00:03:19,489 --> 00:03:22,876
-transforms these passing ripples of pressure into an electric signal.  
-
-56
-00:03:22,876 --> 00:03:25,800
-Right, this is middle school science class, everyone remembers this.
-
-57
-00:03:25,800 --> 00:03:26,771
-Moving on.
-
-58
-00:03:27,465 --> 00:03:32,527
-That audio signal is a one-dimensional function, a single value varying over time.  
-
-59
-00:03:32,527 --> 00:03:34,248
-If we slow the 'scope down a bit... 
-
-60
-00:03:36,450 --> 00:03:38,190
-that should be a little easier to see. 
-
-61
-00:03:38,190 --> 00:03:40,688
-A few other aspects of the signal are important.  
-
-62
-00:03:40,688 --> 00:03:43,418
-It's continuous in both value and time;  
-
-63
-00:03:43,418 --> 00:03:46,813
-that is, at any given time it can have any real value, 
-
-64
-00:03:46,813 --> 00:03:50,228
-and there's a smoothly varying value at every point in in time.  
-
-65
-00:03:50,228 --> 00:03:52,439
-No matter how much we zoom in,
-
-66
-00:03:54,068 --> 00:03:58,510 
-there are no discontinuities, no singularities, no instantaneous steps 
-
-67
-00:03:58,510 --> 00:04:01,285
-or points where the signal ceases to exist. 
-
-68
-00:04:03,247 --> 00:04:08,475
-It's defined everywhere. Classic continuous math works very well on these signals.
-
-69
-00:04:11,001 --> 00:04:15,378
-A digital signal on the other hand is discrete in both value and time.
-
-70
-00:04:15,378 --> 00:04:19,107
-In the simplest and most common system, called Pulse Code Modulation,
-
-71
-00:04:19,107 --> 00:04:24,058
-one of a fixed number of possible values directly represents the instantaneous signal amplitude 
-
-72
-00:04:24,058 --> 00:04:30,165
-at points in time spaced a fixed distance apart. The end result is a stream of digits.
-
-73
-00:04:30,674 --> 00:04:35,309
-Now this looks an awful lot like this.  
-
-74
-00:04:35,309 --> 00:04:38,964
-It seems intuitive that we should somehow be able to rigorously transform 
-
-75
-00:04:38,964 --> 00:04:44,683
-one into the other, and good news, the Sampling Theorem says we can and tells us how. 
-
-76
-00:04:44,683 --> 00:04:48,477
-Published in its most recognizable form by Claude Shannon in 1949
-
-77
-00:04:48,477 --> 00:04:52,409
-and built on the work of Nyquist, and Hartley, and tons of others, 
-
-78
-00:04:52,409 --> 00:04:56,138
-the sampling theorem states that not only can we go back and forth between 
-
-79
-00:04:56,138 --> 00:05:00,913
-analog and digital, but also lays down a set of conditions for which conversion 
-
-80
-00:05:00,913 --> 00:05:06,779
-is lossless and the two representations become equivalent and interchangable.  
-
-81
-00:05:06,779 --> 00:05:10,601
-When the lossless conditions aren't met, the sampling theorem tells us 
-
-82
-00:05:10,601 --> 00:05:14,247
-how and how much information is lost or corrupted.
-
-83
-00:05:14,900 --> 00:05:21,270
-Up until very recently, analog technology was the basis for practically everything done with audio, 
-
-84
-00:05:21,270 --> 00:05:25,267
-and that's not because most audio comes from an originally analog source.
-
-85
-00:05:25,267 --> 00:05:28,450
-You may also think that since computers are fairly recent, 
-
-86
-00:05:28,450 --> 00:05:31,643
-analog signal technology must have come first.  
-
-87
-00:05:31,643 --> 00:05:34,428
-Nope. Digital is actually older.  
-
-88
-00:05:34,428 --> 00:05:37,611
-The telegraph predates the telephone by half a century 
-
-89
-00:05:37,611 --> 00:05:41,951
-and was already fully mechanically automated by the 1860s, sending coded, 
-
-90
-00:05:41,951 --> 00:05:46,476
-multiplexed digital signals long distances. You know... Tickertape. 
-
-91
-00:05:46,476 --> 00:05:50,427
-Harry Nyquist of Bell Labs was researching telegraph pulse transmission 
-
-92
-00:05:50,427 --> 00:05:53,027
-when he published his description of what later became known 
-
-93
-00:05:53,027 --> 00:05:57,219
-as the Nyquist frequency, the core concept of the sampling theorem.  
-
-94
-00:05:57,219 --> 00:06:01,642
-Now, it's true the telegraph was transmitting symbolic information, text, 
-
-95
-00:06:01,642 --> 00:06:06,883
-not a digitized analog signal, but with the advent of the telephone and radio,
-
-96
-00:06:06,883 --> 00:06:12,000
-analog and digital signal technology progressed rapidly and side-by-side.
-
-97
-00:06:12,699 --> 00:06:18,732
-Audio had always been manipulated as an analog signal because, well, gee it's so much easier.  
-
-98
-00:06:18,732 --> 00:06:23,257
-A second-order lowpass filter, for example, requires two passive components.  
-
-99
-00:06:23,257 --> 00:06:26,505
-An all-analog short-time Fourier transform, a few hundred.  
-
-100
-00:06:26,505 --> 00:06:30,752
-Well, maybe a thousand if you want to build something really fancy.  
-
-101
-00:06:31,844 --> 00:06:35,989
-Processing signals digitally requires millions to billions of transistors 
-
-102
-00:06:35,989 --> 00:06:40,366
-running at microwave frequencies, support hardware at very least to digitize 
-
-103
-00:06:40,366 --> 00:06:43,836
-and reconstruct the analog signals, a complete software ecosystem 
-
-104
-00:06:43,836 --> 00:06:47,362
-for programming and controlling that billion-transistor juggernaut,
-
-105
-00:06:47,362 --> 00:06:51,091
-digital storage just in case you want to keep any of those bits for later...
-
-106
-00:06:51,091 --> 00:06:56,171
-So we come to the conclusion that analog is the only practical way to do much with audio...
-
-107
-00:06:56,171 --> 00:07:07,019
-well, unless you happen to have a billion transistors and all the other things just lying around. 
-
-108
-00:07:07,850 --> 00:07:12,660
-And since we do, digital signal processing becomes very attractive.
-
-109
-00:07:13,363 --> 00:07:18,906
-For one thing, analog componentry just doesn't have the flexibility of a general purpose computer.
-
-110
-00:07:18,906 --> 00:07:21,182
-Adding a new function to this beast... 
-
-111
-00:07:22,191 --> 00:07:24,578
-yeah, it's probably not going to happen.  
-
-112
-00:07:24,578 --> 00:07:26,567
-On a digital processor though...
-
-113
-00:07:28,668 --> 00:07:34,127
-...just write a new program. Software isn't trivial, but it is a lot easier.
-
-114
-00:07:34,127 --> 00:07:39,550
-Perhaps more importantly though every analog component is an approximation. 
-
-115
-00:07:39,550 --> 00:07:44,352
-There's no such thing as a perfect transistor, or a perfect inductor, or a perfect capacitor.  
-
-116
-00:07:44,352 --> 00:07:51,569
-In analog, every component adds noise and distortion, usually not very much, but it adds up. 
-
-117
-00:07:51,569 --> 00:07:55,669
-Just transmitting an analog signal, especially over long distances,
-
-118
-00:07:55,669 --> 00:08:00,434
-progressively, measurably, irretrievably corrupts it.  
-
-119
-00:08:00,434 --> 00:08:06,513
-Besides, all of those single-purpose analog components take up a lot of space.  
-
-120
-00:08:06,513 --> 00:08:09,946
-Two lines of code on the billion transistors back here 
-
-121
-00:08:09,946 --> 00:08:14,702
-can implement a filter that would require an inductor the size of a refrigerator.
-
-122
-00:08:14,702 --> 00:08:17,941
-Digital systems don't have these drawbacks.  
-
-123
-00:08:17,941 --> 00:08:24,335
-Digital signals can be stored, copied, manipulated and transmitted without adding any noise or distortion. 
-
-124
-00:08:24,335 --> 00:08:26,889
-We do use lossy algorithms from time to time, 
-
-125
-00:08:26,889 --> 00:08:31,284
-but the only unavoidably non-ideal steps are digitization and reconstruction,
-
-126
-00:08:31,284 --> 00:08:35,929
-where digital has to interface with all of that messy analog.  
-
-127
-00:08:35,929 --> 00:08:40,750
-Messy or not, modern conversion stages are very, very good.  
-
-128
-00:08:40,750 --> 00:08:45,849
-By the standards of our ears, we can consider them practically lossless as well.
-
-129
-00:08:45,849 --> 00:08:50,429
-With a little extra hardware, then, most of which is now small and inexpensive 
-
-130
-00:08:50,429 --> 00:08:55,379
-due to our modern industrial infrastructure, digital audio is the clear winner over analog.
-
-131
-00:08:55,379 --> 00:09:00,857
-So let us then go about storing it, copying it, manipulating it, and transmitting it.
-
-132
-00:09:04,956 --> 00:09:08,639
-Pulse Code Modulation is the most common representation for raw audio.  
-
-133
-00:09:08,639 --> 00:09:13,867
-Other practical representations do exist, for example the Sigma-Delta coding used by the SACD, 
-
-134
-00:09:13,867 --> 00:09:16,625
-which is a form of Pulse Density Modulation.  
-
-135
-00:09:16,625 --> 00:09:19,687
-That said, Pulse Code Modulation is far and away dominant, 
-
-136
-00:09:19,687 --> 00:09:22,158
-mainly because it's so mathematically convenient.  
-
-137
-00:09:22,158 --> 00:09:26,350
-An audio engineer can spend an entire career without running into anything else.
-
-138
-00:09:26,350 --> 00:09:29,135
-PCM encoding can be characterized in three parameters,
-
-139
-00:09:29,135 --> 00:09:34,187
-making it easy to account for every possible PCM variant with mercifully little hassle.
-
-140
-00:09:34,187 --> 00:09:36,426
-The first parameter is the sampling rate.  
-
-141
-00:09:36,426 --> 00:09:40,886
-The highest frequency an encoding can represent is called the Nyquist Frequency.  
-
-142
-00:09:40,886 --> 00:09:45,124
-The Nyquist frequency of PCM happens to be exactly half the sampling rate.
-
-143
-00:09:45,124 --> 00:09:51,389
-Therefore the sampling rate directly determines the highest possible frequency in the digitized signal.
-
-144
-00:09:51,389 --> 00:09:56,515
-Analog telephone systems traditionally band-limited voice channels to just under 4kHz, 
-
-145
-00:09:56,515 --> 00:10:02,224
-so digital telephony and most classic voice applications use an 8kHz sampling rate, 
-
-146
-00:10:02,224 --> 00:10:07,277
-the minimum sampling rate necessary to capture the entire bandwidth of a 4kHz channel.  
-
-147
-00:10:07,227 --> 00:10:14,263
-This is what an 8kHz sampling rate sounds like--- a bit muffled but perfectly intelligible for voice.  
-
-148
-00:10:17,263 --> 00:10:18,149
-This is the lowest sampling rate that's ever been used widely in practice.
-
-149
-00:10:18,149 --> 00:10:23,322
-From there, as power, and memory, and storage increased, consumer computer hardware
-
-150
-00:10:23,322 --> 00:10:29,642
-went to offering 11, and then 16, and then 22, and then 32kHz sampling.  
-
-151
-00:10:29,642 --> 00:10:33,491
-With each increase in the sampling rate and the Nyquist frequency, 
-
-152
-00:10:33,491 --> 00:10:38,302
-it's obvious that the high end becomes a little clearer and the sound more natural.
-
-153
-00:10:38,301 --> 00:10:44,576
-The Compact Disc uses a 44.1kHz sampling rate, which is again slightly better than 32kHz, 
-
-154
-00:10:44,576 --> 00:10:46,788
-but the gains are becoming less distinct.  
-
-155
-00:10:46,788 --> 00:10:52,053
-44.1kHz is a bit of an oddball choice, especially given that it hadn't been used  for anything 
-
-156
-00:10:52,053 --> 00:10:56,559
-prior to the compact disc, but the huge success of the CD has made it a common rate.
-
-157
-00:10:56,559 --> 00:11:01,195
-The most common hi-fidelity sampling rate aside from the CD is 48kHz.
-
-158
-00:11:05,710 --> 00:11:08,597
-There's virtually no audible difference between the two.  
-
-159
-00:11:08,597 --> 00:11:13,640
-This video, or at least the original version of it, was shot and produced with 48kHz audio, 
-
-160
-00:11:13,640 --> 00:11:18,545
-which happens to be the original standard for high-fidelity audio with video.
-
-161
-00:11:18,545 --> 00:11:25,100
-Super-hi-fidelity sampling rates of 88, and 96, and 192kHz have also appeared. 
-
-162
-00:11:25,100 --> 00:11:30,888
-The reason for the sampling rates beyond 48kHz isn't to extend the audible high frequencies further. 
-
-163
-00:11:30,888 --> 00:11:32,489
-It's for a different reason.
-
-164
-00:11:32,896 --> 00:11:37,319
-Stepping back for just a second, the French mathematician Jean Baptiste Joseph Fourier 
-
-165
-00:11:37,319 --> 00:11:42,353
-showed that we can also think of signals like audio as a set of component frequencies.  
-
-166
-00:11:42,353 --> 00:11:45,841
-This frequency domain representation is equivalent to the time representation; 
-
-167
-00:11:45,841 --> 00:11:49,719
-the signal is exactly the same, we're just looking at it a different way.  
-
-168
-00:11:49,719 --> 00:11:56,131
-Here we see the frequency domain representation of a hypothetical analog signal we intend to digitally sample.
-
-169
-00:11:56,131 --> 00:11:59,888
-The sampling theorem tells us two essential things about the sampling process. 
-
-170
-00:11:59,888 --> 00:12:04,727
-First, that a digital signal can't represent any frequencies above the Nyquist frequency. 
-
-171
-00:12:04,727 --> 00:12:10,640
-Second, and this is the new part, if we don't remove those frequencies with a lowpass filter before sampling, 
-
-172
-00:12:10,640 --> 00:12:16,414
-the sampling process will fold them down into the representable frequency range as aliasing distortion.
-
-173
-00:12:16,414 --> 00:12:20,069
-Aliasing, in a nutshell, sounds freakin' awful, 
-
-174
-00:12:20,069 --> 00:12:25,242
-so it's essential to remove any beyond-Nyquist frequencies before sampling and after reconstruction.
-
-175
-00:12:25,871 --> 00:12:31,265
-Human frequency perception is considered to extend to about 20kHz. 
-
-176
-00:12:31,265 --> 00:12:37,548
-In 44.1 or 48kHz sampling, the lowpass before the sampling stage has to be extremely sharp 
-
-177
-00:12:37,548 --> 00:12:42,101
-to avoid cutting any audible frequencies below 20kHz 
-
-178
-00:12:42,101 --> 00:12:49,439
-but still not allow frequencies above the Nyquist to leak forward into the sampling process.  
-
-179
-00:12:49,439 --> 00:12:55,342
-This is a difficult filter to build and no practical filter succeeds completely. 
-
-180
-00:12:55,342 --> 00:13:00,024
-If the sampling rate is 96kHz or 192kHz on the other hand, 
-
-181
-00:13:00,024 --> 00:13:07,223
-the lowpass has an extra octave or two for its transition band. This is a much easier filter to build.  
-
-182
-00:13:07,223 --> 00:13:14,348
-Sampling rates beyond 48kHz are actually one of those messy analog stage compromises.
-
-183
-00:13:15,014 --> 00:13:20,844
-The second fundamental PCM parameter is the sample format, that is, the format of each digital number.  
-
-184
-00:13:20,844 --> 00:13:26,285
-A number is a number, but a number can be represented in bits a number of different ways.
-
-185
-00:13:26,942 --> 00:13:30,902
-Early PCM was eight bit linear, encoded as an unsigned byte.  
-
-186
-00:13:30,902 --> 00:13:37,028
-The dynamic range is limited to about 50dB and the quantization noise, as you can hear, is pretty severe. 
-
-187
-00:13:37,028 --> 00:13:39,970
-Eight bit audio is vanishingly rare today.
-
-188
-00:13:41,007 --> 00:13:47,484
-Digital telephony typically uses one of two related non-linear eight bit encodings 
-called A-law and mu-law. 
-
-189
-00:13:47,484 --> 00:13:51,287
-These formats encode a roughly 14 bit dynamic range into eight bits 
-
-190
-00:13:51,287 --> 00:13:54,674
-by spacing the higher amplitude values farther apart. 
-
-191
-00:13:54,674 --> 00:13:59,226
-A-law and mu-law obviously improve quantization noise compared to linear 8-bit, 
-
-192
-00:13:59,226 --> 00:14:03,557
-and voice harmonics especially hide the remaining quantization noise well. 
-
-193
-00:14:03,557 --> 00:14:08,248
-All three eight bit encodings, linear, A-law, and mu-law, are typically paired 
-
-194
-00:14:08,248 --> 00:14:13,328
-with an 8kHz sampling rate, though I'm demonstrating them here at 48kHz.
-
-195
-00:14:13,328 --> 00:14:18,491
-Most modern PCM uses 16 or 24 bit two's-complement signed integers to encode 
-
-196
-00:14:18,491 --> 00:14:23,858
-the range from negative infinity to zero decibels in 16 or 24 bits of precision. 
-
-197
-00:14:23,858 --> 00:14:27,800
-The maximum absolute value corresponds to zero decibels. 
-
-198
-00:14:27,800 --> 00:14:31,584
-As with all the sample formats so far, signals beyond zero decibels 
-
-199
-00:14:31,584 --> 00:14:35,619
-and thus beyond the maximum representable range are clipped.
-
-200
-00:14:35,619 --> 00:14:41,199
-In mixing and mastering, it's not unusual to use floating point numbers for PCM instead of integers.  
-
-201
-00:14:41,199 --> 00:14:47,222
-A 32 bit IEEE754 float, that's the normal kind of floating point you see on current computers, 
-
-202
-00:14:47,222 --> 00:14:52,793
-has 24 bits of resolution, but a seven bit floating point exponent increases the representable range.  
-
-203
-00:14:52,793 --> 00:14:57,040
-Floating point usually represents zero decibels as +/-1.0, 
-
-204
-00:14:57,040 --> 00:15:00,547
-and because floats can obviously represent considerably beyond that, 
-
-205
-00:15:00,547 --> 00:15:05,220
-temporarily exceeding zero decibels during the mixing process doesn't cause clipping.
-
-206
-00:15:05,220 --> 00:15:11,077 
-Floating point PCM takes up more space, so it tends to be used only as an intermediate production format.
-
-207
-00:15:11,077 --> 00:15:15,796
-Lastly, most general purpose computers still read and write data in octet bytes, 
-
-208
-00:15:15,796 --> 00:15:18,489
-so it's important to remember that samples bigger than eight bits 
-
-209
-00:15:18,489 --> 00:15:22,838
-can be in big or little endian order, and both endiannesses are common.  
-
-210
-00:15:22,838 --> 00:15:28,751
-For example, Microsoft WAV files are little endian, and Apple AIFC files tend to be big-endian.  
-
-211
-00:15:28,751 --> 00:15:30,139
-Be aware of it.
-
-212
-00:15:30,870 --> 00:15:34,071
-The third PCM parameter is the number of channels.  
-
-213
-00:15:34,071 --> 00:15:38,485
-The convention in raw PCM is to encode multiple channels by interleaving the samples 
-
-214
-00:15:38,485 --> 00:15:43,398
-of each channel together into a single stream.  Straightforward and extensible.
-
-215
-00:15:43,398 --> 00:15:47,701
-And that's it!  That describes every PCM representation ever. 
-
-216
-00:15:47,701 --> 00:15:51,578
-Done. Digital audio is _so easy_!  
-
-217
-00:15:51,578 --> 00:15:56,436
-There's more to do of course, but at this point we've got a nice useful chunk of audio data, 
-
-218
-00:15:56,436 --> 00:15:58,092
-so let's get some video too.
-
-219
-00:16:02,571 --> 00:16:08,798
-One could think of video as being like audio but with two additional spatial dimensions, X and Y, 
-
-220
-00:16:08,798 --> 00:16:12,787
-in addition to the dimension of time. This is mathematically sound.  
-
-221
-00:16:12,787 --> 00:16:19,097
-The Sampling Theorem applies to all three video dimensions just as it does the single time dimension of audio.
-
-222
-00:16:19,097 --> 00:16:25,815
-Audio and video are obviously quite different in practice. For one, compared to audio, video is huge. 
-
-223
-00:16:25,815 --> 00:16:29,294
-Raw CD audio is about 1.4 megabits per second. 
-
-224
-00:16:29,294 --> 00:16:33,958
-Raw 1080i HD video is over 700 megabits per second. 
-
-225
-00:16:33,958 --> 00:16:40,056
-That's more than 500 times more data to capture, process and store per second.  
-
-226
-00:16:40,056 --> 00:16:43,711
-By Moore's law... that's... let's see... roughly eight doublings times two years, 
-
-227
-00:16:43,711 --> 00:16:47,838
-so yeah, computers requiring about an extra fifteen years to handle raw video 
-
-228
-00:16:47,838 --> 00:16:51,252
-after getting raw audio down pat was about right.
-
-229
-00:16:51,252 --> 00:16:55,425
-Basic raw video is also just more complex than basic raw audio. 
-
-230
-00:16:55,425 --> 00:16:58,599
-The sheer volume of data currently necessitates a representation 
-
-231
-00:16:58,599 --> 00:17:02,106 
-more efficient than the linear PCM used for audio.  
-
-232
-00:17:02,106 --> 00:17:06,705
-In addition, electronic video comes almost entirely from broadcast television alone,
-
-233
-00:17:06,705 --> 00:17:13,423
-and the standards committees that govern broadcast video have always been very concerned with backward compatibility.
-
-234
-00:17:13,423 --> 00:17:17,559  
-Up until just last year in the US, a sixty year old black and white television 
-
-235
-00:17:17,559 --> 00:17:21,038
-could still show a normal analog television broadcast.  
-
-236
-00:17:21,038 --> 00:17:23,879
-That's actually a really neat trick.
-
-237
-00:17:23,879 --> 00:17:28,718
-The downside to backward compatibility is that once a detail makes it into a standard, 
-
-238
-00:17:28,718 -->  00:17:30,985
-you can't ever really throw it out again. 
-
-239
-00:17:30,985 --> 00:17:37,305
-Electronic video has never started over from scratch the way audio has multiple times.  
-
-240
-00:17:37,305 --> 00:17:43,958
-Sixty years worth of clever but obsolete hacks necessitated by the passing technology of a given era 
-
-241
-00:17:43,958 --> 00:17:50,102
-have built up into quite a pile, and because digital standards also come from broadcast television, 
-
-242
-00:17:50,102 --> 00:17:54,664
-all these eldritch hacks have been brought forward into the digital standards as well.
-
-243
-00:17:54,664 --> 00:18:00,022
-In short, there are a whole lot more details involved in digital video than there were with audio. 
-
-244
-00:18:00,022 --> 00:18:05,592
-There's no hope of covering them all completely here, so we'll cover the broad fundamentals.
-
-245
-00:18:06,036 --> 00:18:10,857
-The most obvious raw video parameters are the width and height of the picture in pixels. 
-
-246
-00:18:10,857 --> 00:18:15,882
-As simple as that may sound, the pixel dimensions alone don't actually specify the absolute 
-
-247
-00:18:15,882 --> 00:18:22,016
-width and height of the picture, as most broadcast-derived video doesn't use square pixels.
-
-248
-00:18:22,016 --> 00:18:25,005
-The number of scanlines in a broadcast image was fixed, 
-
-249
-00:18:25,005 --> 00:18:29,021
-but the effective number of horizontal pixels was a function of channel bandwidth. 
-
-250
-00:18:29,021 --> 00:18:31,945
-Effective horizontal resolution could result in pixels that were either 
-
-251
-00:18:31,945 --> 00:18:35,489
-narrower or wider than the spacing between scanlines.
-
-252
-00:18:35,489 --> 00:18:38,395
-Standards have generally specified that digitally sampled video 
-
-253
-00:18:38,395 --> 00:18:41,902
-should reflect the real resolution of the original analog source, 
-
-254
-00:18:41,902 --> 00:18:45,566
-so a large amount of digital video also uses non-square pixels. 
-
-255
-00:18:45,566 --> 00:18:49,924
-For example, a normal 4:3 aspect NTSC DVD is typically encoded 
-
-256
-00:18:49,924 --> 00:18:55,374
-with a display resolution of 704 by 480, a ratio wider than 4:3.  
-
-257
-00:18:55,374 --> 00:18:59,640
-In this case, the pixels themselves are assigned an aspect ratio of 10:11, 
-
-258
-00:18:59,640 --> 00:19:04,553
-making them taller than they are wide and narrowing the image horizontally to the
-correct aspect.  
-
-259
-00:19:04,553 --> 00:19:09,800
-Such an image has to be resampled to show properly on a digital display with square pixels.
-
-260
-00:19:10,253 -->  00:19:15,287
-The second obvious video parameter is the frame rate, the number of full frames per second.  
-
-261
-00:19:15,287 --> 00:19:19,655
-Several standard frame rates are in active use. Digital video, in one form or another, 
-
-262
-00:19:19,655 --> 00:19:23,689
-can use all of them.  Or, any other frame rate.  Or even variable rates 
-
-263
-00:19:23,689 --> 00:19:27,113
-where the frame rate changes adaptively over the course of the video. 
-
-264
-00:19:27,113 --> 00:19:32,998
-The higher the frame rate, the smoother the motion and that brings us, unfortunately, to interlacing.
-
-265
-00:19:32,998 --> 00:19:37,967
-In the very earliest days of broadcast video, engineers sought the fastest practical framerate 
-
-266
-00:19:37,967 --> 00:19:42,075
-to smooth motion and to minimize flicker on phosphor-based CRTs.  
-
-267
-00:19:42,075 --> 00:19:45,277
-They were also under pressure to use the least possible bandwidth 
-
-268
-00:19:45,277 --> 00:19:48,182
-for the highest resolution and fastest frame rate.  
-
-269
-00:19:48,182 --> 00:19:51,208
-Their solution was to interlace the video where the even lines 
-
-270
-00:19:51,208 --> 00:19:54,826
-are sent in one pass and the odd lines in the next.  
-
-271
-00:19:54,826 --> 00:19:59,961
-Each pass is called a field and two fields sort of produce one complete frame.
-
-272
-00:19:59,961 --> 00:20:05,319
-"Sort of", because the even and odd fields aren't actually from the same source frame.  
-
-273
-00:20:05,319 --> 00:20:10,797
-In a 60 field per second picture, the source frame rate is actually 60 full frames per second, 
-
-274
-00:20:10,797 --> 00:20:15,386
-and half of each frame, every other line, is simply discarded.  
-
-275
-00:20:15,386 --> 00:20:20,272
-This is why we can't deinterlace a video simply by combining two fields into one frame;
-
-276
-00:20:20,272 --> 00:20:23,039
-they're not actually from one frame to begin with.
-
-277
-00:20:24,047 --> 00:20:29,683
-The cathode ray tube was the only available display technology for most of the history of electronic video. 
-
-278
-00:20:29,683 --> 00:20:32,949
-A CRT's output brightness is nonlinear, approximately equal 
-
-279
-00:20:32,949 --> 00:20:36,585
-to the input controlling voltage raised to the 2.5th power. 
-
-280
-00:20:36,585 --> 00:20:43,821
-This exponent, 2.5, is designated gamma, and so it's often referred to as the gamma of a display.  
-
-281
-00:20:43,821 --> 00:20:50,493
-Cameras, though, are linear, and if you feed a CRT a linear input signal, it looks a
-bit like this.
-
-282
-00:20:51,270 --> 00:20:56,637
-As there were originally to be very few cameras, which were fantastically expensive anyway, 
-
-283
-00:20:56,637 --> 00:21:01,634
-and hopefully many, many television sets which best be as inexpensive as possible, 
-
-284
-00:21:01,634 --> 00:21:08,222
-engineers decided to add the necessary gamma correction circuitry to the cameras rather than the sets. 
-
-285
-00:21:08,222 --> 00:21:13,062
-Video transmitted over the airwaves would thus have a nonlinear intensity using the inverse 
-
-286
-00:21:13,062 --> 00:21:18,271
-of the set's gamma exponent, so that once a camera's signal was finally displayed on the CRT, 
-
-287
-00:21:18,271 --> 00:21:23,305
-the overall response of the system from camera to set was back to linear again.
-
-288
-00:21:23,777 --> 00:21:25,118
-Almost.
-
-289
-00:21:30,393 --> 00:21:33,113
-There were also two other tweaks. 
-
-290
-00:21:33,113 --> 00:21:40,442
-A television camera actually uses a gamma exponent that's the inverse of 2.2, not 2.5.  
-
-291
-00:21:40,442 --> 00:21:43,754
-That's just a correction for viewing in a dim environment. 
-
-292
-00:21:43,754 --> 00:21:48,279
-Also, the exponential curve transitions to a linear ramp near black.  
-
-293
-00:21:48,279 --> 00:21:52,360
-That's just an old hack for suppressing sensor noise in the camera.
-
-294
-00:21:54,941 --> 00:21:57,347
-Gamma correction also had a lucky benefit. 
-
-295
-00:21:57,347 --> 00:22:02,214
-It just so happens that the human eye has a perceptual gamma of about 3.  
-
-296
-00:22:02,214 --> 00:22:05,962
-This is relatively close to the CRT's gamma of 2.5. 
-
-297
-00:22:05,962 --> 00:22:10,607
-An image using gamma correction devotes more resolution to lower intensities, 
-
-298
-00:22:10,607 --> 00:22:14,336
-where the eye happens to have its finest intensity discrimination, 
-
-299
-00:22:14,336 --> 00:22:18,222
-and therefore uses the available scale resolution more efficiently.  
-
-300
-00:22:18,222 --> 00:22:22,784
-Although CRTs are currently vanishing, a standard sRGB computer display 
-
-301
-00:22:22,784 --> 00:22:28,419
-still uses a nonlinear intensity curve similar to television, with a linear ramp near black,
-
-302
-00:22:28,419 --> 00:22:32,491
-followed by an exponential curve with a gamma exponent of 2.4. 
-
-303
-00:22:32,491 --> 00:22:36,636
-This encodes a sixteen bit linear range down into eight bits.
-
-304 
-00:22:37,580 --> 00:22:41,790
-The human eye has three apparent color channels, red, green, and blue, 
-
-305
-00:22:41,790 --> 00:22:47,407
-and most displays use these three colors as additive primaries to produce a full range of color output.  
-
-306
-00:22:49,258 --> 00:22:54,190
-The primary pigments in printing are Cyan, Magenta, and Yellow for the same reason; 
-
-307
-00:22:54,190 --> 00:22:59,381
-pigments are subtractive, and each of these pigments subtracts one pure color from reflected light.  
-
-308
-00:22:59,381 --> 00:23:05,682
-Cyan subtracts red, magenta subtracts green, and yellow subtracts blue.
-
-309
-00:23:05,682 --> 00:23:10,919
-Video can be and sometimes is represented with red, green, and blue color channels, 
-
-310
-00:23:10,919 --> 00:23:17,211
-but RGB video is atypical. The human eye is far more sensitive to luminosity than it is the color, 
-
-311
-00:23:17,211 --> 00:23:21,329
-and RGB tends to spread the energy of an image across all three color channels.  
-
-312
-00:23:21,329 --> 00:23:25,326
-That is, the red plane looks like a red version of the original picture, 
-
-313
-00:23:25,326 --> 00:23:28,769
-the green plane looks like a green version of the original picture, 
-
-314
-00:23:28,769 --> 00:23:32,063
-and the blue plane looks like a blue version of the original picture.  
-
-315
-00:23:32,063 --> 00:23:35,705
-Black and white times three.  Not efficient.
-
-316
-00:23:35,706 --> 00:23:39,438
-For those reasons and because, oh hey, television just happened to start out 
-
-317
-00:23:39,438 --> 00:23:45,017
-as black and white anyway, video usually is represented as a high resolution luma channel, 
-
-318
-00:23:45,017 --> 00:23:51,041
-the black & white, along with additional, often lower resolution chroma channels, the color. 
-
-319
-00:23:51,041 --> 00:23:57,074
-The luma channel, Y, is produced by weighting and then adding the separate red, green and blue signals.  
-
-320
-00:23:57,074 --> 00:24:01,867
-The chroma channels U and V are then produced by subtracting the luma signal from blue 
-
-321
-00:24:01,867 --> 00:24:04,070
-and the luma signal from red.
-
-322
-00:24:04,070 --> 00:24:11,750
-When YUV is scaled, offset and quantized for digital video, it's usually more correctly called Y'CbCr, 
-
-323
-00:24:11,750 --> 00:24:15,238
-but the more generic term YUV is widely used to describe 
-
-324
-00:24:15,238 --> 00:24:18,301
-all the analog and digital variants of this color model.
-
-325
-00:24:18,912 --> 00:24:22,983
-The U and V chroma channels can have the same resolution as the Y channel, 
-
-326
-00:24:22,983 --> 00:24:28,674
-but because the human eye has far less spatial color resolution than spatial luminosity resolution, 
-
-327
-00:24:28,674 --> 00:24:34,346
-chroma resolution is usually halved or even quartered in the horizontal direction, the vertical direction, 
-
-328
-00:24:34,346 --> 00:24:39,528
-or both, usually without any significant impact on the apparent raw image quality. 
-
-329
-00:24:39,528 --> 00:24:43,942
-Practically every possible subsampling variant has been used at one time or another,
-
-330
-00:24:43,942 --> 00:24:46,875
-but the common choices today are 
-
-331
-00:24:46,875 --> 00:24:51,187
-4:4:4 video, which isn't actually subsampled at all, 
-
-332
-00:24:51,187 --> 00:24:56,711
-4:2:2 video in which the horizontal resolution of the U and V channels is halved, 
-
-333
-00:24:56,711 --> 00:25:02,587
-and most common of all, 4:2:0 video in which both the horizontal and vertical resolutions 
-
-334
-00:25:02,587 --> 00:25:08,897
-of the chroma channels are halved, resulting in U and V planes that are each one quarter the size of Y.
-
-335
-00:25:08,897 --> 00:25:17,096
-The terms 4:2:2, 4:2:0, 4:1:1 and so on and so forth, aren't complete descriptions of a chroma subsampling. 
-
-336
-00:25:17,096 --> 00:25:21,186
-There's multiple possible ways to position the chroma pixels relative to luma, 
-
-337
-00:25:21,096 --> 00:25:24,776 
-and again, several variants are in active use for each subsampling.  
-
-338
-00:25:24,776 --> 00:25:32,502
-For example, motion JPEG, MPEG-1 video, MPEG-2 video, DV, Theora and WebM all use 
-
-339
-00:25:32,502 --> 00:25:38,137
-or can use 4:2:0 subsampling, but they site the chroma pixels three different ways.
-
-340
-00:25:38,498 --> 00:25:43,023
-Motion JPEG, MPEG1 video, Theora and WebM all site chroma pixels 
-
-341
-00:25:43,023 --> 00:25:46,345
-between luma pixels both horizontally and vertically.
-
-342
-00:25:46,345 --> 00:25:51,989
-MPEG2 sites chroma pixels between lines, but horizontally aligned with every other luma pixel. 
-
-343
-00:25:51,989 --> 00:25:57,106
-Interlaced modes complicate things somewhat, resulting in a siting arrangement that's a tad bizarre.
-
-344
-00:25:57,106 --> 00:26:00,909
-And finally PAL-DV, which is always interlaced, places the chroma pixels 
-
-345
-00:26:00,909 --> 00:26:04,398
-in the same position as every other luma pixel in the horizontal direction, 
-
-346
-00:26:04,398 --> 00:26:07,303
-and vertically alternates chroma channel on each line.
-
-347
-00:26:07,683 --> 00:26:12,282
-That's just 4:2:0 video. I'll leave the other subsamplings as homework for the
-viewer.  
-
-348
-00:26:12,282 --> 00:26:14,882
-You've got the basic idea, moving on.
-
-349
-00:26:15,511 --> 00:26:21,128
-In audio, we always represent multiple channels in a PCM stream by interleaving the samples 
-
-350
-00:26:21,128 --> 00:26:26,383
-from each channel in order. Video uses both packed formats that interleave the color channels, 
-
-351
-00:26:26,383 --> 00:26:30,584
-as well as planar formats that keep the pixels from each channel together in separate planes 
-
-352
-00:26:30,584 --> 00:26:35,415
-stacked in order in the frame. There are at least 50 different formats in these two broad categories 
-
-353
-00:26:35,415 --> 00:26:41,549
-with possibly ten or fifteen in common use. Each chroma subsampling and different bit-depth requires 
-
-354
-00:26:41,549 --> 00:26:46,574
-a different packing arrangement,  and so a different pixel format.  For a given unique subsampling, 
-
-355
-00:26:46,574 --> 00:26:50,858 
-there are usually also several equivalent formats that consist of trivial channel order 
-
-356
-00:26:50,858 --> 00:26:55,966
-rearrangements or repackings due either to convenience once-upon-a-time on some particular 
-
-357
-00:26:55,966 --> 00:27:00,352
-piece of hardware or sometimes just good old-fashioned spite.
-
-358
-00:27:00,352 --> 00:27:04,692
-Pixels formats are described by a unique name or fourcc code.  
-
-359
-00:27:04,692 --> 00:27:08,115
-There are quite a few of these and there's no sense going over each one now.
-
-360
-00:27:08,115 --> 00:27:13,704
-Google is your friend.  Be aware that fourcc codes for raw video specify the pixel arrangement 
-
-361
-00:27:13,704 --> 00:27:20,339
-and chroma subsampling, but generally don't imply anything certain about chroma siting or color space.  
-
-362
-00:27:20,339 --> 00:27:25,807
-YV12 video to pick one, can use JPEG, MPEG-2 or DV chroma siting, 
-
-363
-00:27:25,807 --> 00:27:28,991
-and any one of several YUV colorspace definitions.
-
-364
-00:27:29,472 --> 00:27:33,913
-That wraps up our not so quick and yet very incomplete tour of raw video. 
-
-365
-00:27:33,913 --> 00:27:38,651
-The good news is we can already get quite a lot of real work done using that overview. 
-
-366
-00:27:38,651 --> 00:27:42,528
-In plenty of situations, a frame of video data is a frame of video data.  
-
-367
-00:27:42,528 --> 00:27:46,451
-The details matter, greatly, when it come time to write software, 
-
-368
-00:27:46,452 --> 00:27:52,086
-but for now I am satisfied that the esteemed viewer is broadly aware of the relevant issues.
-
-369
-00:27:55,640 --> 00:27:59,230
-So. We have audio data. We have video data. 
-
-370
-00:27:59,230 --> 00:28:03,246
-What remains is the more familiar non-signal data and straight up engineering 
-
-371
-00:28:03,246 --> 00:28:07,410
-that software developers are used to. And plenty of it!
-
-372
-00:28:07,928 --> 00:28:11,768 
-Chunks of raw audio and video data have no externally visible structure, 
-
-373
-00:28:11,768 -->  00:28:15,173
-but they're often uniformly sized.  We could just string them together 
-
-374
-00:28:15,173 --> 00:28:18,097
-in a rigid pre-determined ordering for streaming and storage, 
-
-375
-00:28:18,097 --> 00:28:21,040
-and some simple systems do approximately that. 
-
-376
-00:28:21,040 --> 00:28:24,195
-Compressed frames though aren't necessarily a predictable size, 
-
-377
-00:28:24,195 --> 00:28:29,405
-and we usually want some flexibility in using a range of different data types in streams.
-
-378
-00:28:29,405 --> 00:28:34,281
-If we string random formless data together, we lose the boundaries that separate frames 
-
-379
-00:28:34,281 --> 00:28:37,871
-and don't necessarily know what data belongs to which streams.  
-
-380
-00:28:37,871 --> 00:28:42,192
-A stream needs some generalized structure to be generally useful.
-
-381
-00:28:42,192 --> 00:28:46,606
-In addition to our signal data, we also have our PCM and video parameters.  
-
-382
-00:28:46,606 --> 00:28:49,752
-There's probably plenty of other metadata we also want to deal with, 
-
-383
-00:28:49,752 --> 00:28:55,415
-like audio tags and video chapters and subtitles, all essential components of rich media.  
-
-384
-00:28:55,415 --> 00:29:01,633
-It makes sense to place this metadata, that is,  data about the data, within the media itself.
-
-385
-00:29:01,633 --> 00:29:06,445
-Storing and structuring formless data and disparate metadata is the job of a container.  
-
-386
-00:29:06,445 --> 00:29:09,221
-Containers provide framing for the data blobs, 
-
-387
-00:29:09,221 --> 00:29:12,015
-interleave and identify multiple data streams, 
-
-388
-00:29:12,015 --> 00:29:15,337
-provide timing information, and store the metadata necessary 
-
-389
-00:29:15,337 --> 00:29:19,140
-to parse, navigate, manipulate and present the media.  
-
-390
-00:29:19,140 --> 00:29:22,222
-In general, any container can hold any kind of data.  
-
-391
-00:29:22,222 --> 00:29:24,970
-And data can be put into any container.
-
-392
-00:29:28,801 --> 00:29:32,391 
-In the past thirty minutes, we've covered digital audio, video, 
-
-393
-00:29:32,391 --> 00:29:35,435
-some history, some math and a little engineering. 
-
-394
-00:29:35,435 --> 00:29:39,377
-We've barely scratched the surface, but it's time for a well earned break.
-
-395
-00:29:41,107 --> 00:29:45,373
-There's so much more to talk about, so I hope you'll join me again in our next episode.  
-
-396
-00:29:45,373 --> 00:29:47,159
-Until then--- Cheers!
-

Added: websites/xiph.org/video/vid1-fr.srt
===================================================================
--- websites/xiph.org/video/vid1-fr.srt	                        (rev 0)
+++ websites/xiph.org/video/vid1-fr.srt	2010-09-23 13:49:13 UTC (rev 17434)
@@ -0,0 +1,1585 @@
+1
+00:00:08,124 --> 00:00:10,742
+Les stations de travail et ordinateurs haut de gamme sont capables
+
+2
+00:00:10,742 --> 00:00:14,749
+de manipuler le son numérique aisément depuis une quinzaine d'années.
+
+3
+00:00:14,749 --> 00:00:17,470
+C'est seulement depuis à peu près cinq ans qu'un ordinateur décent
+
+4
+00:00:17,470 --> 00:00:21,643
+peut manipuler de la vidéo sans du matériel dédié coûteux.
+
+5
+00:00:21,643 --> 00:00:25,400
+De nos jours, même les ordinateurs bas de gamme sont assez puissants
+
+6
+00:00:25,400 --> 00:00:28,092
+et ont la mémoire nécessaire pour manipuler de la vidéo,
+
+7
+00:00:28,092 --> 00:00:30,479
+sans trop de difficultés.
+
+8
+00:00:30,479 --> 00:00:33,579
+Donc, comme tout le monde a accès à ce bon matériel à bas pris,
+
+9
+00:00:33,579 --> 00:00:36,651
+de plus en plus de gens veulent, évidemment, faire des choses intéressantes
+
+10
+00:00:36,651 --> 00:00:39,908
+avec son et images numériques, en particulier la diffusion continue.
+
+11
+00:00:39,908 --> 00:00:44,017
+YouTube fut le premier gros succès, et tout le monde veut être le suivant.
+
+12
+00:00:44,017 --> 00:00:47,413
+C'est une bonne chose, car tout ceci est fascinant!
+
+13
+00:00:48,250 --> 00:00:51,179
+Il est facile de trouver des utilisateurs pour les média numériques.
+
+14
+00:00:51,179 --> 00:00:54,649
+Mais ici, je m'adresse aux ingénieurs, aux mathématiciens,
+
+15
+00:00:54,649 --> 00:00:57,869
+aux hackers, à ceux qui s'intéressent à la découverte de nouvelles choses
+
+16
+00:00:57,869 --> 00:01:01,302
+et veulent créer et faire avancer la technologie.
+
+17
+00:01:01,302 --> 00:01:03,282
+Les gens qui ont la même passion que moi.
+
+18
+00:01:04,250 --> 00:01:08,723
+Les média numériques, et la compression en particulier, sont vus comme un sujet
+
+19
+00:01:08,723 --> 00:01:12,822
+spécialisé, bien plus compliqué que le reste de l'informatique.
+
+20
+00:01:12,822 --> 00:01:15,700
+Les grandes compagnies dans ce domaine ne font rien pour diminuer cette perception,
+
+21
+00:01:15,700 --> 00:01:19,734
+qui les aide à justifier le nombre extrême de brevets triviaux qu'elles détiennent.
+
+22
+00:01:19,734 --> 00:01:23,870
+Ces compagnies aiment cette image de leurs chercheurs en tant que la crème de la crème,
+
+23
+00:01:23,870 --> 00:01:27,738
+tellement plus intelligents que les autres que leurs idées brillantes
+
+24
+00:01:27,738 --> 00:01:29,903
+ne peuvent être comprises par de simples mortels.
+
+25
+00:01:30,625 --> 00:01:33,716
+Et c'est du n'importe quoi.
+
+26
+00:01:35,205 --> 00:01:38,900
+Son numérique, images numériques, diffusion en continu, compression,
+
+27
+00:01:38,900 --> 00:01:42,738
+ceux-ci offrent des problèmes difficiles et intellectuellement stimulants,
+
+28
+00:01:42,738 --> 00:01:44,662
+comme toute autre discipline.
+
+29
+00:01:44,662 --> 00:01:47,929
+L'apparence de difficulté extrême est due au nombre restreint de personnes dans ce domaine.
+
+30
+00:01:47,929 --> 00:01:51,223
+Ce nombre restreint n'est en fait dû qu'à la rareté du matériel dédié
+
+31
+00:01:51,223 --> 00:01:54,665
+requis jusqu'à aujourd'hui.
+
+32
+00:01:54,665 --> 00:01:58,792
+Mais maintenant, la grande majorité des gens qui regardent cette vidéo
+
+33
+00:01:58,792 --> 00:02:03,317
+ont un ordinateur assez puissant pour jouer dans la cour des grands.
+
+34
+00:02:05,926 --> 00:02:11,108
+Il y a des batailles en cours à propos de HTML5, des navigateurs Web,
+
+35
+00:02:11,108 --> 00:02:13,671
+de la vidéo, et ouvert contre fermé.
+
+36
+00:02:13,671 --> 00:02:17,048
+Alors maintenant est un bon moment pour s'intéresser à tout cela.
+
+37
+00:02:17,048 --> 00:02:20,000
+Où commencer ? Le plus simple est probablement avec la technologie
+
+38
+00:02:20,000 --> 00:02:22,619
+que nous avons maintenant.
+
+39
+00:02:23,500 --> 00:02:25,071
+Ceci est une introduction.
+
+40
+00:02:25,071 --> 00:02:28,180
+Comme toute introduction, beaucoup de détails seront passés sous silence,
+
+41
+00:02:28,180 --> 00:02:30,882
+pour que nous puissions avoir une vue d'ensemble.
+
+42
+00:02:30,882 --> 00:02:33,908
+Beaucoup de personnes connaissent probablement déjà ce dont je vais parler,
+
+43
+00:02:33,908 --> 00:02:36,378
+au moins dans cet épisode.
+
+44
+00:02:36,378 --> 00:02:39,293
+D'autres, par contre, trouveront peut-être que je vais trop vite,
+
+45
+00:02:39,293 --> 00:02:44,558
+s'ils n'ont jamais abordé le sujet; si c'est votre cas, ne vous en faites pas.
+
+46
+00:02:44,558 --> 00:02:48,629
+Le plus important est de retenir les quelques idées qui vous marquent le plus.
+
+47
+00:02:48,629 --> 00:02:52,497
+Faites bien attention à la terminologie qui à rapport à toutes ces idées,
+
+48
+00:02:52,479 --> 00:02:56,078
+puisque avec elle, vous pouvez utiliser Google et Wikipedia pour approfondir
+
+49
+00:02:56,078 --> 00:02:57,753
+vos connaissances à volonté.
+
+50
+00:02:57,753 --> 00:03:00,094
+Donc, sans plus attendre,
+
+51
+00:03:00,094 --> 00:03:03,351
+bienvenue à un passe-temps pas comme les autres.
+
+52
+00:03:10,291 --> 00:03:13,030
+Le son est dû à la propagation d'ondes périodiques de pressions à travers l'air,
+
+53
+00:03:13,030 --> 00:03:16,981
+se répandant depuis la source comme les ondes autour d'une pierre lancée dans l'eau.
+
+54
+00:03:16,981 --> 00:03:19,489
+Un microphone, ou une oreille humaine,
+
+55
+00:03:19,489 --> 00:03:22,876
+transforme ces différences de pression en un signal électrique.
+
+56
+00:03:22,876 --> 00:03:25,800
+La plupart d'entre vous auront vu cela à l'école.
+
+57
+00:03:25,800 --> 00:03:26,771
+Passons à la suite.
+
+58
+00:03:27,465 --> 00:03:32,527
+Un signal audio est une fonction à une dimension, une valeur scalaire changeant avec le temps.
+
+59
+00:03:32,527 --> 00:03:34,248
+Si on ralentit un peu l'oscilloscope...
+
+60
+00:03:36,450 --> 00:03:38,190
+ça devrait être un peu plus facile à voir.
+
+61
+00:03:38,190 --> 00:03:40,688
+Certains autres aspects de ce signal sont importants.
+
+62
+00:03:40,688 --> 00:03:43,418
+Il est continu, en valeur comme en temps;
+
+63
+00:03:43,418 --> 00:03:46,813
+c'est-à-dire qu'à tout instant, il peut avoir une valeur réelle quelconque,
+
+64
+00:03:46,813 --> 00:03:50,228
+et sa valeur change graduellement avec le temps.
+
+65
+00:03:50,228 --> 00:03:52,439
+On peut zoomer autant que l'on veut,
+
+66
+00:03:54,068 --> 00:03:58,510 
+il n'y a ni discontinuités, ni singularités, ni sauts de la valeur,
+
+67
+00:03:58,510 --> 00:04:01,285
+ni de points où le signal disparaît. Il existe sur tout l'axe du temps.
+
+68
+00:04:03,247 --> 00:04:08,475
+Les mathématiques classiques des fonctions continues sont parfaites pour travailler sur ces signaux.
+
+69
+00:04:11,001 --> 00:04:15,378
+Un signal numérique, par contre, est discret, en valeur et en temps.
+
+70
+00:04:15,378 --> 00:04:19,107
+Dans le système le plus simple et le plus répandu, appelé modulation d'impulse codée (PCM an Anglais),
+
+71
+00:04:19,107 --> 00:04:24,058
+une valeur parmi un ensemble prédéfini représente l'amplitude du signal à une série
+
+72
+00:04:24,058 --> 00:04:30,165
+de points équidistants sur l'axe du temps. Le résultat est une série de valeurs.
+
+73
+00:04:30,674 --> 00:04:35,309
+En fait, cela ressemble beaucoup à ceci.
+
+74
+00:04:35,309 --> 00:04:38,964
+Intuitivement, il paraîtrait que l'on devrait pouvoir transformer rigoureusement l'un en l'autre,
+
+75
+00:04:38,964 --> 00:04:44,683
+et, par chance, le théorème de Shannon nous dit que c'est possible, et comment.
+
+76
+00:04:44,683 --> 00:04:48,477
+Publié dans sa forme la plus populaire par Claude Shannon en 1949
+
+77
+00:04:48,477 --> 00:04:52,409
+et s'appuyant sur les travaux de Nyquist, Hartley, et bien d'autres,
+
+78
+00:04:52,409 --> 00:04:56,138
+ce théorème dit que non seulement on peut passer d'analogique en numérique et vice versa,
+
+79
+00:04:56,138 --> 00:05:00,913
+mais donne une série de conditions sous lesquelles la conversion
+
+80
+00:05:00,913 --> 00:05:06,779
+est sans perte, et les deux représentations deviennent équivalentes et interchangeables.
+
+81
+00:05:06,779 --> 00:05:10,601
+Lorsque ces conditions ne sont pas observées, le théorème nous dit
+
+82
+00:05:10,601 --> 00:05:14,247
+combien d'information est perdue, ou corrompue.
+
+83
+00:05:14,900 --> 00:05:21,270
+Jusqu'à récemment, la technologie du son était quasiment toute basée sur l'analogique,
+
+84
+00:05:21,270 --> 00:05:25,267
+et pas seulement parce que la plupart du son provient de sources analogiques.
+
+85
+00:05:25,267 --> 00:05:28,450
+Vous pourriez aussi penser que puisque les ordinateurs sont une technologie récente,
+
+86
+00:05:28,450 --> 00:05:31,643
+la technologie analogique a du apparaître la première.
+
+87
+00:05:31,643 --> 00:05:34,428
+C'est faux. Le numérique est en fait plus ancien.
+
+88
+00:05:34,428 --> 00:05:37,611
+Le télégraphe a précédé le téléphone d'un demi siècle
+
+89
+00:05:37,611 --> 00:05:41,951
+et était déjà automatisé vers 1860, envoyant des signaux numériques,
+
+90
+00:05:41,951 --> 00:05:46,476
+multiplexés sur de longues distances. Vous savez... le téléimprimeur.
+
+91
+00:05:46,476 --> 00:05:50,427
+Harry Nyquist, de Bell Labs, faisait de la recherche sur la transmission de signaux
+
+92
+00:05:50,427 --> 00:05:53,027
+par télégraphe lorsqu'il a publié la description de ce qui serait plus tard connu
+
+93
+00:05:53,027 --> 00:05:57,219
+sous le nom de fréquence de Nyquist, le concept de base du théorème de Shannon.
+
+94
+00:05:57,219 --> 00:06:01,642
+Il est vrai que le télégraphe transmet des informations symboliques, du texte,
+
+95
+00:06:01,642 --> 00:06:06,883
+et non un signal analogique numérisé, mais avec l'apparition du téléphone et de la radio,
+
+96
+00:06:06,883 --> 00:06:12,000
+les technologies du signal analogique et numérique ont progressé rapidement en parallèle.
+
+97
+00:06:12,699 --> 00:06:18,732
+Le son a toujours été plus facile à manipuler en tant que signal analogique parce que, et bien, c'est vraiment bien plus facile.
+
+98
+00:06:18,732 --> 00:06:23,257
+Un filtre passe-bas du deuxième ordre, par exemple, requiert deux composants passifs.
+
+99
+00:06:23,257 --> 00:06:26,505
+Une centaine pour une transformée de Fourier en analogique.
+
+100
+00:06:26,505 --> 00:06:30,752
+Bon, peut-être mille si vous voulez faire quelque chose de compliqué.
+
+101
+00:06:31,844 --> 00:06:35,989
+Manipuler des signaux numériques requiert des millions, voire des milliards de transistors
+
+102
+00:06:35,989 --> 00:06:40,366
+fonctionnant à très haute fréquence, du matériel supplémentaire pour au moins numériser
+
+103
+00:06:40,366 --> 00:06:43,836
+et reconstruire les signaux analogiques, un système logiciel complet
+
+104
+00:06:43,836 --> 00:06:47,362
+pour programmer and contrôler ce géant d'un milliard de transistors,
+
+105
+00:06:47,362 --> 00:06:51,091
+de la mémoire de masse pour stocker ces bits pour usage ultérieur...
+
+106
+00:06:51,091 --> 00:06:56,171
+On en vient donc à la conclusion que l'analogique est la seule manière faisable de travailler avec le son...
+
+107
+00:06:56,171 --> 00:07:07,019
+à moins que vous n'ayez un milliard de transistors et autres accessoires traînant dans le coin.
+
+108
+00:07:07,850 --> 00:07:12,660
+Et comme maintenant on les a, manipuler des signaux numériques devient beaucoup plus attractif.
+
+109
+00:07:13,363 --> 00:07:18,906
+Une raison parmi d'autres: les composants analogiques n'ont pas la flexibilité d'un ordinateur.
+
+110
+00:07:18,906 --> 00:07:21,182
+Ajouter une nouvelle fonctionnalité à ce monstre...
+
+111
+00:07:22,191 --> 00:07:24,578
+Impensable.
+
+112
+00:07:24,578 --> 00:07:26,567
+Sur un processeur numérique, par contre...
+
+113
+00:07:28,668 --> 00:07:34,127
+...on peut juste écrire un nouveau programme. Le logiciel n'est pas trivial, mais c'est quand même beaucoup plus facile.
+
+114
+00:07:34,127 --> 00:07:39,550
+Peut-être même plus important encore, chaque composant analogique crée une approximation du signal.
+
+115
+00:07:39,550 --> 00:07:44,352
+Le transistor parfait n'existe pas plus qu'une inductance parfaite, ou une capacité parfaite.
+
+116
+00:07:44,352 --> 00:07:51,569
+En analogique, chaque composant ajoute du bruit, de la distorsion, et même si c'est peu à chaque pas, cela s'accumule.
+
+117
+00:07:51,569 --> 00:07:55,669
+Le simple fait d'envoyer un signal analogique, surtout sur de grandes distances,
+
+118
+00:07:55,669 --> 00:08:00,434
+corrompt ce signal, progressivement, de manière irréversible.
+
+119
+00:08:00,434 --> 00:08:06,513
+De plus, tous ces composants analogiques à usage unique prennent de la place.
+
+120
+00:08:06,513 --> 00:08:09,946
+Deux lignes de code sur le monstre au milliard de transistors
+
+121
+00:08:09,946 --> 00:08:14,702
+peuvent implémenter un filtre qui aurait besoin d'une inductance de la taille d'un réfrigérateur.
+
+122
+00:08:14,702 --> 00:08:17,941
+Un système numérique n'a pas ces problèmes.
+
+123
+00:08:17,941 --> 00:08:24,335
+Un signal numérique peut être stocké, copié, manipulé et transmis sans ajouter de bruit ou de distorsion.
+
+124
+00:08:24,335 --> 00:08:26,889
+Certes, on utilise parfois des algorithmes à perte, qui dégradent les données,
+
+125
+00:08:26,889 --> 00:08:31,284
+mais les seules opérations qui ne peuvent éviter d'être à perte sont la numérisation et la reconversion vers l'analogique,
+
+126
+00:08:31,284 --> 00:08:35,929
+là où le signal numérique doit s'interfacer avec l'analogique.
+
+127
+00:08:35,929 --> 00:08:40,750
+Toutefois, les systèmes de conversion modernes sont très, très bons.
+
+128
+00:08:40,750 --> 00:08:45,849
+En ce qui concerne nos oreilles, ils sont pour ainsi dire pratiquement parfaits.
+
+129
+00:08:45,849 --> 00:08:50,429
+Avec un peu de matériel supplémentaire, la plupart étant maintenant compact et peu cher
+
+130
+00:08:50,429 --> 00:08:55,379
+grâce à notre infrastructure industrielle moderne, le son numérique est le gagnant incontestable, comparé à l'analogique.
+
+131
+00:08:55,379 --> 00:09:00,857
+Examinons donc comment stocker, copier, manipuler, et transmettre ce signal.
+
+132
+00:09:04,956 --> 00:09:08,639
+La modulation d'impulsion codée (PCM) est la représentation la plus répandue pour le son.
+
+133
+00:09:08,639 --> 00:09:13,867
+D'autres représentation utiles existent, par exemple le code Sigma-Delta utilisé par SACD,
+
+134
+00:09:13,867 --> 00:09:16,625
+qui est un type de modulation par densité d'impulsion.
+
+135
+00:09:16,625 --> 00:09:19,687
+Cela dit, PCM est de très loin la plus répandue,
+
+136
+00:09:19,687 --> 00:09:22,158
+pour la principale raison qu'elle est mathématiquement très pratique.
+
+137
+00:09:22,158 --> 00:09:26,350
+Un ingénieur audio peut très bien ne jamais rencontrer une autre représentation durant toute sa carrière.
+
+138
+00:09:26,350 --> 00:09:29,135
+La représentation PCM peut être définie par trois paramètres,
+
+139
+00:09:29,135 --> 00:09:34,187
+ce qui permet de décrire chaque variante possible avec un minimum de problèmes.
+
+140
+00:09:34,187 --> 00:09:36,426
+Le premier paramètre est la fréquence d'échantillonage.
+
+141
+00:09:36,426 --> 00:09:40,886
+La plus haute fréquence qu'un code peut représenter est appelée la fréquence de Nyquist.
+
+142
+00:09:40,886 --> 00:09:45,124
+La fréquence de Nyquist de PCM n'est autre que la moitié de la fréquence d'échantillonage.
+
+143
+00:09:45,124 --> 00:09:51,389
+La fréquence d'échantillonage détermine donc la plus haute fréquence que le signal numérisé peut représenter.
+
+144
+00:09:51,389 --> 00:09:56,515
+Le téléphone analogique utilise traditionnellement des signaux limités en bande passante à presque 4 kHz,
+
+145
+00:09:56,515 --> 00:10:02,224
+menant le téléphone numérique et la plupart des applications manipulant la voix à utiliser une fréquence d'échantillonage de 8 kHz,
+
+146
+00:10:02,224 --> 00:10:07,277
+qui est la plus petite fréquence d'échantillonage pouvant représenter la totalité de la bande passante jusqu'à 4 kHz.
+
+147
+00:10:07,227 --> 00:10:14,263
+Une fréquence d'échantillonage de 8 kHz ressemble à ceci; un peu distordu, mais tout à fait compréhensible pour la voix.
+
+148
+00:10:17,263 --> 00:10:18,149
+C'est la plus petite fréquence d'échantillonage couramment utilisée en pratique.
+
+149
+00:10:18,149 --> 00:10:23,322
+Partant de là, au fur et à mesure que la puissance et la mémoire disponibles augmentèrent,
+
+150
+00:10:23,322 --> 00:10:29,642
+les ordinateurs sont passés à 11, puis 16, puis 22, et 32 kHz.
+
+151
+00:10:29,642 --> 00:10:33,491
+Avec chaque saut dans la fréquence d'échantillonage et la fréquence de Nyquist,
+
+152
+00:10:33,491 --> 00:10:38,302
+il va de soit que les hautes fréquences deviennent de plus en plus claires et le son plus naturel.
+
+153
+00:10:38,301 --> 00:10:44,576
+Le Compact Disc utilise une fréquence d'échantillonage de 44.1 kHz, encore plus élevée que 32 kHz,
+
+154
+00:10:44,576 --> 00:10:46,788
+mais les gains deviennent de moins en moins audibles.
+
+155
+00:10:46,788 --> 00:10:52,053
+44.1 kHz est un choix un peu étrange, surtout que nul ne l'avait utilisé
+
+156
+00:10:52,053 --> 00:10:56,559
+avant le Compact Disc, mais le succès du CD à fait de cette fréquence un choix commun.
+
+157
+00:10:56,559 --> 00:11:01,195
+La seconde fréquence haute fidélité la plus répandue autre que le CD est 48 kHz.
+
+158
+00:11:05,710 --> 00:11:08,597
+Il n'y a quasiment pas de différence audible entre les deux.
+
+159
+00:11:08,597 --> 00:11:13,640
+Cette vidéo, ou du moins la version originale de celle-ci, a été enregistrée et produite avec du son 48 kHz,
+
+160
+00:11:13,640 --> 00:11:18,545
+qui est le standard pour le son de haute fidélité accompagnant de la vidéo.
+
+161
+00:11:18,545 --> 00:11:25,100
+De très hautes fréquences d'échantillonage de 88, 96, et 192 kHz ont aussi été utilisées.
+
+162
+00:11:25,100 --> 00:11:30,888
+La raison pour ces fréquences au delà de 48 kHz n'est pas de permettre des fréquences audibles supérieures.
+
+163
+00:11:30,888 --> 00:11:32,489
+Il y a une autre raison.
+
+164
+00:11:32,896 --> 00:11:37,319
+Une parenthèse pour juste une seconde, le mathématicien Français Jean Baptiste Joseph Fourier
+
+165
+00:11:37,319 --> 00:11:42,353
+a montré que l'on peut représenter un signal tel que le son en une série de fréquences qui le composent.
+
+166
+00:11:42,353 --> 00:11:45,841
+Cette représentation dans le domaine fréquentiel est équivalente à la représentation dans le domaine temporel;
+
+167
+00:11:45,841 --> 00:11:49,719
+le signal est exactement le même, on le représente juste différemment.
+
+168
+00:11:49,719 --> 00:11:56,131
+Ici, on voit la représentation dans le domaine fréquentiel d'un signal analogique que l'on va numériser.
+
+169
+00:11:56,131 --> 00:11:59,888
+Le théorème de Shannon nous dit deux choses principales à ce propos:
+
+170
+00:11:59,888 --> 00:12:04,727
+Premièrement, un signal numérique ne peut représenter aucune fréquence au dessus de la fréquence de Nyquist.
+
+171
+00:12:04,727 --> 00:12:10,640
+Deuxièmement, et c'est la nouveauté, si ces fréquences ne sont pas filtrées à l'aide d'un filtre passe-bas avant la numérisation,
+
+172
+00:12:10,640 --> 00:12:16,414
+elles seront rabattues dans la gamme de fréquences représentable, résultant en de la distorsion de numérisation.
+
+173
+00:12:16,414 --> 00:12:20,069
+Un signal distordu, ça fait mal aux oreilles,
+
+174
+00:12:20,069 --> 00:12:25,242
+c'est pourquoi il est essentiel de filtrer les fréquences au dessus de la fréquence de Nyquist avant numérisation, et après reconstruction.
+
+175
+00:12:25,871 --> 00:12:31,265
+L'oreille humaine peut percevoir jusqu'à 20 kHz.
+
+176
+00:12:31,265 --> 00:12:37,548
+Pour une numérisation à 44.1 ou 48 kHz, le filtre passe-bas d'avant numérisation doit être très sec
+
+177
+00:12:37,548 --> 00:12:42,101
+pour éviter de couper des fréquences audibles sous 20 kHz
+
+178
+00:12:42,101 --> 00:12:49,439
+sans laisser passer de fréquence au dessus de la fréquence de Nyquist.
+
+179
+00:12:49,439 --> 00:12:55,342
+Ce type de filtre est difficile à construire, et aucun filtre commun n'y parvient complètement.
+
+180
+00:12:55,342 --> 00:13:00,024
+Pour une fréquence d'échantillonage de 96 or 192 kHz, par contre,
+
+181
+00:13:00,024 --> 00:13:07,223
+le filtre a un ou deux octaves de marge pour sa réponse, ce qui le rend beaucoup plus facile à construire.
+
+182
+00:13:07,223 --> 00:13:14,348
+Les fréquences d'échantillonage de plus de 48 kHz sont en fait l'un des compromis dus aux problèmes de conversion analogique/numérique.
+
+183
+00:13:15,014 --> 00:13:20,844
+Le deuxième paramètre fondamental de PCM est le format d'un échantillon, c'est-à-dire le format de la valeur enregistrée.
+
+184
+00:13:20,844 --> 00:13:26,285
+Un nombre est un nombre, mais il peut être représenté de différentes manières sous forme de bits.
+
+185
+00:13:26,942 --> 00:13:30,902
+Les premiers formats PCM étaient linéaires sur huit bits, codés sur un octet non signé.
+
+186
+00:13:30,902 --> 00:13:37,028
+La gamme dynamique est limitée à approximativement 50 dB, et le bruit de numérisation, comme vous pouvez l'entendre, est considérable.
+
+187
+00:13:37,028 --> 00:13:39,970
+Le son huit bit est maintenant très rare.
+
+188
+00:13:41,007 --> 00:13:47,484
+Le téléphone numérique peut utiliser deux formats proches non linéaires codés sur huit bits, appelés A-law et mu-law.
+
+189
+00:13:47,484 --> 00:13:51,287
+Ces formats peuvent coder à peu près 14 bits de gamme dynamique sur huit bits
+
+190
+00:13:51,287 --> 00:13:54,674
+en plaçant les valeurs les plus hautes de plus en plus écartées.
+
+191
+00:13:54,674 --> 00:13:59,226
+A-law et mu-law permettent un bruit de numérisation plus faible, comparés au huit bits linéaire,
+
+192
+00:13:59,226 --> 00:14:03,557
+et les harmoniques de la voix cachent bien le bruit restant.
+
+193
+00:14:03,557 --> 00:14:08,248
+Ces trois formats, linéaire, A-law, et mu-law, sont généralement utilisés
+
+194
+00:14:08,248 --> 00:14:13,328
+avec une fréquence d'échantillonage de 8 kHz, mais je les utilise ici à 48 kHz.
+
+195
+00:14:13,328 --> 00:14:18,491
+La plupart des formats PCM modernes utilisent des entiers en complément à deux sur 16 ou 24 bits signés
+
+196
+00:14:18,491 --> 00:14:23,858
+pour représenter une gamme de moins l'infini à zéro décibels.
+
+197
+00:14:23,858 --> 00:14:27,800
+La valeur la plus grande correspond à zéro décibels.
+
+198
+00:14:27,800 --> 00:14:31,584
+Comme dans tous les autres formats mentionnés jusqu'ici, un signal au delà de zéro décibels,
+
+199
+00:14:31,584 --> 00:14:35,619
+et donc au delà de la gamme représentable, sera saturé.
+
+200
+00:14:35,619 --> 00:14:41,199
+Pour mixer et finaliser, il n'est pas rare d'utiliser des nombres à virgule flottante, à la place d'entiers.
+
+201
+00:14:41,199 --> 00:14:47,222
+Le format à virgule flottante sur 32 bits IEEE 754 est un format typique sur les ordinateurs contemporains,
+
+202
+00:14:47,222 --> 00:14:52,793
+avec 24 bits de mantisse, et 7 bits d'exposant pour augmenter la gamme représentable.
+
+203
+00:14:52,793 --> 00:14:57,040
+Les nombres à virgule flottante représentent généralement zéro décibels avec +/-1.0,
+
+204
+00:14:57,040 --> 00:15:00,547
+et, comme ces nombres peuvent représenter des valeurs considérablement plus hautes,
+
+205
+00:15:00,547 --> 00:15:05,220
+il n'y a pas de distorsion si le signal passe temporairement au dessus de zéro décibels lors d'une opération.
+
+206
+00:15:05,220 --> 00:15:11,077 
+Les nombres à virgule flottante requièrent plus de mémoire, ils sont donc généralement utilisés uniquement en tant que format intermédiaire.
+
+207
+00:15:11,077 --> 00:15:15,796
+Enfin, la plupart des ordinateurs manipulent les données avec une granularité de huit bits,
+
+208
+00:15:15,796 --> 00:15:18,489
+il est donc important de se rappeler que les échantillons de plus de huit bits
+
+209
+00:15:18,489 --> 00:15:22,838
+peuvent être stockés avec l'octet de poids fort en premier, ou en dernier, et les deux méthodes sont communes.
+
+210
+00:15:22,838 --> 00:15:28,751
+Par exemple, le format WAV de Microsoft commence par l'octet de poids faible, et le format AIFC d'Apple commence généralement par l'octet de poids fort.
+
+211
+00:15:28,751 --> 00:15:30,139
+Il faut ne pas l'oublier.
+
+212
+00:15:30,870 --> 00:15:34,071
+Le troisième paramètre de PCM est le nombre de pistes.
+
+213
+00:15:34,071 --> 00:15:38,485
+La convention pour le son PCM est de multiplexer les échantillons des différentes pistes,
+
+214
+00:15:38,485 --> 00:15:43,398
+pour former une seule piste de valeurs. Simple et facile à étendre.
+
+215
+00:15:43,398 --> 00:15:47,701
+C'est tout. Tout format PCM peut être décrit par cette représentation.
+
+216
+00:15:47,701 --> 00:15:51,578
+Voila, le son numérique est _si facile_!
+
+217
+00:15:51,578 --> 00:15:56,436
+Il y a d'autres choses à connaître, bien sûr, mais nous avons déjà un bloc de son numérique,
+
+218
+00:15:56,436 --> 00:15:58,092
+alors passons à la vidéo.
+
+219
+00:16:02,571 --> 00:16:08,798
+On peut penser à la vidéo comme du son, mais avec deux dimensions spatiales supplémentaires, X et Y,
+
+220
+00:16:08,798 --> 00:16:12,787
+en plus de la dimension du temps. C'est mathématiquement correct.
+
+221
+00:16:12,787 --> 00:16:19,097
+Le théorème de Shannon s'applique aux trois dimensions comme il s'applique à la dimension unique du temps pour le son.
+
+222
+00:16:19,097 --> 00:16:25,815
+Le son et l'image sont très différents en pratique. Par exemple, la vidéo prend beaucoup plus de place que le son.
+
+223
+00:16:25,815 --> 00:16:29,294
+Le son non compressé d'un CD prend en gros 1.4 megabits par seconde.
+
+224
+00:16:29,294 --> 00:16:33,958
+La vidéo non compressée au format 1080i monte à plus de 700 megabits par seconde,
+
+225
+00:16:33,958 --> 00:16:40,056
+soit plus de 500 fois plus de données à capturer, convertir, et stocker par seconde.
+
+226
+00:16:40,056 --> 00:16:43,711
+D'après la loi de Moore, ça fait... voyons... en gros doublant huit fois, multiplié par deux ans,
+
+227
+00:16:43,711 --> 00:16:47,838
+donc, les ordinateurs peuvent manipuler la vidéo à peu près une quinzaine d'années
+
+228
+00:16:47,838 --> 00:16:51,252
+après pouvoir manipuler le son, c'est à peu près ça.
+
+229
+00:16:51,252 --> 00:16:55,425
+Le format de la vidéo est aussi plus complexe que celui du son.
+
+230
+00:16:55,425 --> 00:16:58,599
+Le volume de données est tel que l'on doit utiliser une représentation
+
+231
+00:16:58,599 --> 00:17:02,106 
+plus compacte que le PCM linéaire utilisé pour le son.
+
+232
+00:17:02,106 --> 00:17:06,705
+De plus, la vidéo numérique provient majoritairement de la diffusion de télévision,
+
+233
+00:17:06,705 --> 00:17:13,423
+et les comités de standards qui régissent la diffusion ont toujours été attentifs à la compatibilité.
+
+234
+00:17:13,423 --> 00:17:17,559  
+Ne serait-ce que l'année dernière aux USA, une télévision noir et blanc vieille de soixante ans
+
+235
+00:17:17,559 --> 00:17:21,038
+pouvait encore recevoir et afficher la télévision hertzienne analogique.
+
+236
+00:17:21,038 --> 00:17:23,879
+C'est en fait pas mal du tout.
+
+237
+00:17:23,879 --> 00:17:28,718
+Le problème de cette compatibilité est que lorsqu'un détail est figé dans un standard,
+
+238
+00:17:28,718 -->  00:17:30,985
+on ne peut plus le changer.
+
+239
+00:17:30,985 --> 00:17:37,305
+La vidéo électronique n'a pas été réinventée plusieurs fois, comme le son l'a été.
+
+240
+00:17:37,305 --> 00:17:43,958
+Soixante ans de bagage s'est accumulé au fur et à mesure du temps, avec l'obsolescence de technologies successives,
+
+241
+00:17:43,958 --> 00:17:50,102
+et comme les standards de la vidéo numérique viennent de la télédiffusion,
+
+242
+00:17:50,102 --> 00:17:54,664
+tout ces anachronismes bizarres se sont retrouvés ajoutés dans les standards numériques.
+
+243
+00:17:54,664 --> 00:18:00,022
+Il y a en fait énormément plus de détails à prendre en compte dans la vidéo numérique qu'il n'y en a dans le son.
+
+244
+00:18:00,022 --> 00:18:05,592
+Beaucoup trop pour les aborder tous ici, donc on ne verra que les principes fondamentaux.
+
+245
+00:18:06,036 --> 00:18:10,857
+Les paramètres les plus évidents de la vidéo sont la largeur et hauteur de l'image en pixels.
+
+246
+00:18:10,857 --> 00:18:15,882
+Cela parait simple, mais cela ne suffit pas à spécifier la taille de l'image visible,
+
+247
+00:18:15,882 --> 00:18:22,016
+car la plupart de la vidéo provenant de la diffusion n'utilise pas des pixels carrés.
+
+248
+00:18:22,016 --> 00:18:25,005
+Le nombre de lignes dans une image était fixe,
+
+249
+00:18:25,005 --> 00:18:29,021
+mais le nombre de pixels dans une ligne était fonction de la bande passante.
+
+250
+00:18:29,021 --> 00:18:31,945
+La résolution réelle de ces images impliquait donc des pixels étant plus fins
+
+251
+00:18:31,945 --> 00:18:35,489
+ou plus épais que l'espace entre les lignes.
+
+252
+00:18:35,489 --> 00:18:38,395
+Les standards ont généralement spécifié que la vidéo numérique
+
+253
+00:18:38,395 --> 00:18:41,902
+doit refléter la résolution réelle de la source analogique originelle,
+
+254
+00:18:41,902 --> 00:18:45,566
+donc une grande partie de la vidéo numérique utilise aussi des pixels non carrés.
+
+255
+00:18:45,566 --> 00:18:49,924
+Par exemple, un DVD NTSC normal avec format d'image 4:3 est typiquement constitué
+
+256
+00:18:49,924 --> 00:18:55,374
+de 704 pixels sur 480, un format plus large que 4:3.
+
+257
+00:18:55,374 --> 00:18:59,640
+Dans ce cas particulier, les pixels ont un format de 10:11,
+
+258
+00:18:59,640 --> 00:19:04,553
+ce qui les rend plus hauts que larges, rendant l'image plus étroite, corrigeant le format.
+
+259
+00:19:04,553 --> 00:19:09,800
+Une telle image doit être re-numérisée pour s'afficher normalement sur un écran avec des pixels carrés.
+
+260
+00:19:10,253 -->  00:19:15,287
+Le deuxième paramètre de la vidéo est le nombre d'images par seconde.
+
+261
+00:19:15,287 --> 00:19:19,655
+Plusieurs standards existent de nos jours pour celui-ci. La vidéo numérique, dans une ou l'autre de ses formes,
+
+262
+00:19:19,655 --> 00:19:23,689
+peut utiliser n'importe lequel d'entre eux, ou n'importe quel nombre que l'on veut. Ou encore même un nombre variable,
+
+263
+00:19:23,689 --> 00:19:27,113
+où le nombre d'images par seconde change avec le temps.
+
+264
+00:19:27,113 --> 00:19:32,998
+Plus le nombre d'images par seconde est élevé, plus l'illusion du mouvement est bonne, et cela nous mène hélas à l'entrelacement.
+
+265
+00:19:32,998 --> 00:19:37,967
+Dans les premiers jours de la vidéo, les ingénieurs ont cherché à utiliser le plus d'images par seconde possible
+
+266
+00:19:37,967 --> 00:19:42,075
+pour une meilleure illusion de mouvement, et minimiser le scintillement produit par les écrans à tube cathodique.
+
+267
+00:19:42,075 --> 00:19:45,277
+Il travaillèrent avec le but de réduire autant que possible la bande passante utilisée
+
+268
+00:19:45,277 --> 00:19:48,182
+pour augmenter la résolution et le nombre d'images par seconde.
+
+269
+00:19:48,182 --> 00:19:51,208
+Leur solution fut d'entrelacer la vidéo, c'est-à-dire d'envoyer les lignes paires
+
+270
+00:19:51,208 --> 00:19:54,826
+en une première passe, et les lignes impaires à la suivante.
+
+271
+00:19:54,826 --> 00:19:59,961
+Chaque passe est appelée trame, et deux trames composent plus ou moins une image entière.
+
+272
+00:19:59,961 --> 00:20:05,319
+"Plus ou moins", car les lignes paires et impaires ne proviennent pas de la même image source.
+
+273
+00:20:05,319 --> 00:20:10,797
+Pour une vidéo à 60 trames par seconde, la source a réellement 60 images par seconde,
+
+274
+00:20:10,797 --> 00:20:15,386
+et la moitié de chaque image, une ligne sur deux, est tout simplement ignorée.
+
+275
+00:20:15,386 --> 00:20:20,272
+C'est pourquoi on ne peut pas dés-entrelacer une vidéo en recombinant deux trames en une image;
+
+276
+00:20:20,272 --> 00:20:23,039
+ces trames ne proviennent pas de la même image à la source.
+
+277
+00:20:24,047 --> 00:20:29,683
+Le tube cathodique était la seule technologie d'affichage pendant la plupart de l'histoire de la vidéo électronique.
+
+278
+00:20:29,683 --> 00:20:32,949
+Un écran à tube cathodique émet une luminance non linéaire, à peu près égale
+
+279
+00:20:32,949 --> 00:20:36,585
+à la tension reçue en entrée élevée à la puissance 2.5.
+
+280
+00:20:36,585 --> 00:20:43,821
+Cet exposant, 2.5, est appelé gamma, et il est souvent appelé le gamma d'un écran.
+
+281
+00:20:43,821 --> 00:20:50,493
+Les caméras, par contre, sont linéaires, et si on connecte le signal de sortie d'une caméra à un écran à tube cathodique, ça ressemble un peu à ça.
+
+282
+00:20:51,270 --> 00:20:56,637
+Comme les caméras étaient au début très rares, et extrêmement chères,
+
+283
+00:20:56,637 --> 00:21:01,634
+et qu'ils voulaient avoir un grand nombre de télévisions au plus bas prix possible,
+
+284
+00:21:01,634 --> 00:21:08,222
+les ingénieurs ont décidé d'ajouter le système de correction de gamma aux caméras, plutôt qu'aux télévisions.
+
+285
+00:21:08,222 --> 00:21:13,062
+La vidéo hertzienne a donc commencé à utiliser une intensité non linéaire, telle que
+
+286
+00:21:13,062 --> 00:21:18,271
+la télévision recevant le signal, étant non linéaire, redresserait ce signal
+
+287
+00:21:18,271 --> 00:21:23,305
+pour donner à l'affichage une luminance linéaire.
+
+288
+00:21:23,777 --> 00:21:25,118
+Presque.
+
+289
+00:21:30,393 --> 00:21:33,113
+Il y avait deux autres détails.
+
+290
+00:21:33,113 --> 00:21:40,442
+Une caméra de télévision utilise un gamma qui est l'inverse de 2.2, et non 2.5.
+
+291
+00:21:40,442 --> 00:21:43,754
+Cela est une correction pour regarder l'image dans un environnement sombre.
+
+292
+00:21:43,754 --> 00:21:48,279
+De plus, la courbe exponentielle devient graduellement linéaire près du noir.
+
+293
+00:21:48,279 --> 00:21:52,360
+C'était un vieux truc pour cacher les imperfections de la capture de l'image.
+
+294
+00:21:54,941 --> 00:21:57,347
+La correction de gamma apporte aussi un avantage inattendu.
+
+295
+00:21:57,347 --> 00:22:02,214
+L'oeil humain perçoit la luminance avec un gamma d'à peu près 3,
+
+296
+00:22:02,214 --> 00:22:05,962
+ce qui est relativement proche du 2.5 d'un écran à tube cathodique.
+
+297
+00:22:05,962 --> 00:22:10,607
+Une image utilisant la correction gamma a plus de résolution à faible luminance,
+
+298
+00:22:10,607 --> 00:22:14,336
+et c'est là que l'oeil est le plus sensible aux changements,
+
+299
+00:22:14,336 --> 00:22:18,222
+et donc bénéficie le plus d'une résolution meilleure.
+
+300
+00:22:18,222 --> 00:22:22,784
+Bien que les écrans à tube cathodique soient en train de disparaître, les écrans d'ordinateurs sRGB
+
+301
+00:22:22,784 --> 00:22:28,419
+continuent à utiliser une courbe de réponse non linéaire similaire à celle de la télévision, incluant la partie linéaire près du noir,
+
+302
+00:22:28,419 --> 00:22:32,491
+et une partie exponentielle avec un gamma de 2.4.
+
+303
+00:22:32,491 --> 00:22:36,636
+Cette courbe transforme une entrée linéaire de 16 bits en une sortie 8 bits.
+
+304 
+00:22:37,580 --> 00:22:41,790
+L'oeil humain comprend des récepteurs pour trois couleurs: rouge, vert, et bleu,
+
+305
+00:22:41,790 --> 00:22:47,407
+et la plupart des écrans utilise ces trois couleurs en synthèse additive pour représenter une grande palette de couleurs affichables.
+
+306
+00:22:49,258 --> 00:22:54,190
+Les couleurs primaires en impression sont cyan, magenta, et jaune, pour les mêmes raisons;
+
+307
+00:22:54,190 --> 00:22:59,381
+ces couleurs sont soustractives, et chacune absorbe certaines longueurs d'onde de la lumière incidente.
+
+308
+00:22:59,381 --> 00:23:05,682
+Cyan absorbe le rouge, magenta absorbe le vert, et jaune absorbe le bleu.
+
+309
+00:23:05,682 --> 00:23:10,919
+La vidéo peut être, et l'est parfois, représentée en trois composantes, rouge, vert, et bleu (RGB),
+
+310
+00:23:10,919 --> 00:23:17,211
+mais ce format est atypique. L'oeil humain est beaucoup plus sensible à la luminance qu'à la couleur,
+
+311
+00:23:17,211 --> 00:23:21,329
+et le format RGB a tendance à diffuser l'énergie de l'image sur ces trois composantes.
+
+312
+00:23:21,329 --> 00:23:25,326
+C'est-à-dire, le plan rouge ressemble à une version rouge de l'image originale,
+
+313
+00:23:25,326 --> 00:23:28,769
+le plan vert ressemble à une version verte de l'image originale,
+
+314
+00:23:28,769 --> 00:23:32,063
+et le plan bleu ressemble à une version bleue de l'image originale.
+
+315
+00:23:32,063 --> 00:23:35,705
+Trois versions en noir et blanc. Pas très efficace.
+
+316
+00:23:35,706 --> 00:23:39,438
+Pour ces raisons, et aussi car la télévision était originalement aussi
+
+317
+00:23:39,438 --> 00:23:45,017
+en noir et blanc, la vidéo est normalement représentée par une composante de luminance à haute résolution,
+
+318
+00:23:45,017 --> 00:23:51,041
+correspondant à l'image en noir et blanc, et des composantes secondaires, souvent de moindre résolution, pour la couleur.
+
+319
+00:23:51,041 --> 00:23:57,074
+La composante de luminance, Y, est obtenue par un barycentre des signaux rouge, vert, et bleu.
+
+320
+00:23:57,074 --> 00:24:01,867
+Les composantes de chrominance U et V sont alors obtenues en soustrayant la luminance du bleu,
+
+321
+00:24:01,867 --> 00:24:04,070
+et la luminance du rouge.
+
+322
+00:24:04,070 --> 00:24:11,750
+Lorsque le signal YUV change d'échelle et est numérisé, on devrait techniquement parler de Y'CbCr,
+
+323
+00:24:11,750 --> 00:24:15,238
+mais le terme générique YUV est très souvent utilisé pour décrire
+
+324
+00:24:15,238 --> 00:24:18,301
+toutes les variations analogiques et numériques de cet espace colorimétrique.
+
+325
+00:24:18,912 --> 00:24:22,983
+Les composantes de chrominance U et V peuvent avoir la même résolution que la composante Y,
+
+326
+00:24:22,983 --> 00:24:28,674
+mais comme l'oeil humain est beaucoup moins sensible aux changements de couleur qu'aux changements de luminance sur de petits angles apparents,
+
+327
+00:24:28,674 --> 00:24:34,346
+les composantes de chrominance utilisent généralement une résolution d'un demi ou même d'un quart horizontalement, verticalement,
+
+328
+00:24:34,346 --> 00:24:39,528
+ou les deux, généralement sans changement significatif de la qualité perçue de l'image.
+
+329
+00:24:39,528 --> 00:24:43,942
+Quasiment toutes les variantes possibles de ce sous-échantillonage ont été utilisées à un moment ou à un autre,
+
+330
+00:24:43,942 --> 00:24:46,875
+mais les plus répandus de nos jours sont
+
+331
+00:24:46,875 --> 00:24:51,187
+4:4:4, où le taux d'échantillonage est en fait le même pour toutes les composantes,
+
+332
+00:24:51,187 --> 00:24:56,711
+4:2:2, où U et V ont une résolution moitié moindre horizontalement,
+
+333
+00:24:56,711 --> 00:25:02,587
+et, la plus commune, 4:2:0, où U et V ont une résolution moitié moindre horizontal et verticalement.
+
+334
+00:25:02,587 --> 00:25:08,897
+Cette dernière résulte en des plans pour U et V qui sont un quart de la taille du plan Y.
+
+335
+00:25:08,897 --> 00:25:17,096
+Les termes 4:2:2, 4:2:0, 4:1:1, etc, ne suffisent pas pour une description complète d'un format de sous-échantillonage particulier.
+
+336
+00:25:17,096 --> 00:25:21,186
+Les échantillons de chrominance peuvent être positionnés de plusieurs manières par rapport aux échantillons de luminance,
+
+337
+00:25:21,096 --> 00:25:24,776 
+et, là encore, plusieurs variantes sont utilisées pour chaque format.
+
+338
+00:25:24,776 --> 00:25:32,502
+Par exemple, motion JPEG, MPEG-1, MPEG-2, DV, Theora et WebM utilisent tous
+
+339
+00:25:32,502 --> 00:25:38,137
+(ou peuvent utiliser) 4:2:0, mais ils placent les échantillons de trois manières différentes.
+
+340
+00:25:38,498 --> 00:25:43,023
+Motion JPEG, MPEG1, Theora et WebM placent les échantillons de chrominance
+
+341
+00:25:43,023 --> 00:25:46,345
+entre ceux de luminance, que ce soit horizontalement ou verticalement.
+
+342
+00:25:46,345 --> 00:25:51,989
+MPEG2 les place entre les lignes verticalement, mais alignés avec un pixel sur deux horizontalement.
+
+343
+00:25:51,989 --> 00:25:57,106
+L'entrelacement ajoute une complication supplémentaire, ce qui donne un système assez bizarre.
+
+344
+00:25:57,106 --> 00:26:00,909
+Finalement, PAL-DV, qui est toujours entrelacé, place les échantillons de chrominance
+
+345
+00:26:00,909 --> 00:26:04,398
+à la même position qu'un pixel de luminance sur deux horizontalement,
+
+346
+00:26:04,398 --> 00:26:07,303
+mais alterne les échantillons de U et V à chaque ligne.
+
+347
+00:26:07,683 --> 00:26:12,282
+C'est juste pour 4:2:0. Je vais laisser les autres formats comme exercice pour ceux qui veulent en savoir plus.
+
+348
+00:26:12,282 --> 00:26:14,882
+C'est l'idée de base. Passons à la suite.
+
+349
+00:26:15,511 --> 00:26:21,128
+Pour le son, les différentes pistes sont représentées en entrelaçant les échantillons
+
+350
+00:26:21,128 --> 00:26:26,383
+de chaque piste à leur tour, en ordre. La vidéo peut utiliser des formats entrelaçant les composantes,
+
+351
+00:26:26,383 --> 00:26:30,584
+mais aussi des formats qui gardent ces échantillons d'une même composante dans des plans séparés,
+
+352
+00:26:30,584 --> 00:26:35,415
+stockés les uns à la suite des autres pour chaque image. Il y a au moins une cinquantaine de formats dans ces deux catégories,
+
+353
+00:26:35,415 --> 00:26:41,549
+et peut-être dix ou quinze d'entre eux en usage commun. Chaque variante d'échantillonage de la chrominance, et chaque résolution d'échantillon
+
+354
+00:26:41,549 --> 00:26:46,574
+nécessite un arrangement de bits différent, et donc un arrangement de pixels différent. Pour chaque variante,
+
+355
+00:26:46,574 --> 00:26:50,858 
+on peut trouver plusieurs formats équivalents, qui diffèrent en de simples ré-arrangements de l'ordre des données,
+
+356
+00:26:50,858 --> 00:26:55,966
+généralement dus à une quelconque idiosyncrasie d'un matériel particulier, un choix arbitraire,
+
+357
+00:26:55,966 --> 00:27:00,352
+ou juste pour faire quelque chose de différent.
+
+358
+00:27:00,352 --> 00:27:04,692
+Ces formats sont décrits par un label unique, ou code fourcc.
+
+359
+00:27:04,692 --> 00:27:08,115
+Il y a un grand nombre de ceux-ci, et non n'allons pas les énumérer.
+
+360
+00:27:08,115 --> 00:27:13,704
+Cherchez sur Internet pour plus d'information, mais gardez en mémoire qu'un code fourcc particulier définit l'arrangement des échantillons
+
+361
+00:27:13,704 --> 00:27:20,339
+et le taux d'échantillonage des plans, mais généralement n'indique pas où les échantillons sont placés, ni l'espace colorimétrique utilisé.
+
+362
+00:27:20,339 --> 00:27:25,807
+Par exemple, le code YV12 peut être utilisé avec le placement d'échantillon de JPEG, MPEG-2, ou DV,
+
+363
+00:27:25,807 --> 00:27:28,991
+et un quelconque espace colorimétrique YUV parmi plusieurs existant.
+
+364
+00:27:29,472 --> 00:27:33,913
+Et ceci termine nos premiers pas incomplets dans le monde de la vidéo.
+
+365
+00:27:33,913 --> 00:27:38,651
+Une bonne chose: avec ce que l'on a vu, on peut déjà commencer à travailler sur le son et l'image.
+
+366
+00:27:38,651 --> 00:27:42,528
+Dans la plupart des cas, une image de vidéo est juste une image de vidéo.
+
+367
+00:27:42,528 --> 00:27:46,451
+Les détails sont très importants, quand on commence à écrire du code,
+
+368
+00:27:46,452 --> 00:27:52,086
+mais pour le moment il est suffisant que vous ayez dans l'esprit une vue globale des problèmes dans ce domaine.
+
+369
+00:27:55,640 --> 00:27:59,230
+Donc. Du son numérique d'un côté. De l'image numérique de l'autre.
+
+370
+00:27:59,230 --> 00:28:03,246
+Ce qui reste à faire n'est pas spécifique au traitement de signal, mais de la programmation
+
+371
+00:28:03,246 --> 00:28:07,410
+tout à fait normale. Et il y en a plein!
+
+372
+00:28:07,928 --> 00:28:11,768 
+Des morceaux de son ou d'image sont généralement des blocs opaques,
+
+373
+00:28:11,768 -->  00:28:15,173
+mais ils ont souvent une taille constante. On peut les concaténer
+
+374
+00:28:15,173 --> 00:28:18,097
+dans un ordre prédéterminé pour les transmettre et les stocker,
+
+375
+00:28:18,097 --> 00:28:21,040
+et c'est en fait ce que font certains systèmes simples.
+
+376
+00:28:21,040 --> 00:28:24,195
+Les données compressées, par contre, n'ont pas toujours la même taille,
+
+377
+00:28:24,195 --> 00:28:29,405
+et l'on a souvent besoin de plus de flexibilité pour les stocker et les transmettre.
+
+378
+00:28:29,405 --> 00:28:34,281
+Si on concatène ces blocs opaques les uns à la suite des autres, on ne sait plus où couper pour les récupérer,
+
+379
+00:28:34,281 --> 00:28:37,871
+et on ne peut plus reconnaître quelle portion des données vient du son ou de l'image.
+
+380
+00:28:37,871 --> 00:28:42,192
+Un système de stockage doit avoir une structure générale pour être utile.
+
+381
+00:28:42,192 --> 00:28:46,606
+En plus de nos données son/image, nous avons aussi les paramètres qui les décrivent.
+
+382
+00:28:46,606 --> 00:28:49,752
+Nous avons peut-être aussi d'autres informations sur ces données que nous voulons conserver,
+
+383
+00:28:49,752 --> 00:28:55,415
+comme des labels, chapitres vidéo, sous-titres, et autres.
+
+384
+00:28:55,415 --> 00:29:01,633
+Il parait idéal de pouvoir placer toutes ces méta-informations, c'est-à-dire informations sur les informations elles-mêmes, avec ces données.
+
+385
+00:29:01,633 --> 00:29:06,445
+Le stockage structuré de ces données et ces méta-informations disparates est le travail du conteneur.
+
+386
+00:29:06,445 --> 00:29:09,221
+Les conteneurs offrent une structure pour stocker les blocs opaques,
+
+387
+00:29:09,221 --> 00:29:12,015
+entrelacent et marquent les données pour garder trace de leur source,
+
+388
+00:29:12,015 --> 00:29:15,337
+maintiennent leur synchronisation, et stockent les méta-informations requises
+
+389
+00:29:15,337 --> 00:29:19,140
+pour récupérer, chercher, manipuler, et présenter les média.
+
+390
+00:29:19,140 --> 00:29:22,222
+En général, un conteneur quelconque peut stocker des données arbitraires.
+
+391
+00:29:22,222 --> 00:29:24,970
+Et des données arbitraires peuvent êtres stockées dans n'importe quel conteneur.
+
+392
+00:29:28,801 --> 00:29:32,391 
+Dans cette demi heure, nous avons parlé de son numérique, de vidéo numérique,
+
+393
+00:29:32,391 --> 00:29:35,435
+nous avons vu un peu d'histoire, de mathématiques, et aussi de technologie.
+
+394
+00:29:35,435 --> 00:29:39,377
+Ce n'est que la surface, mais il est temps pour une pause bien méritée.
+
+395
+00:29:41,107 --> 00:29:45,373
+Il y a tellement d'autres choses à voir, alors j'espère que vous vous joindrez à moi de nouveau pour notre prochain épisode.
+
+396
+00:29:45,373 --> 00:29:47,159
+D'ici là, au revoir!
+
+

Modified: websites/xiph.org/video/vid1.shtml
===================================================================
--- websites/xiph.org/video/vid1.shtml	2010-09-23 12:54:20 UTC (rev 17433)
+++ websites/xiph.org/video/vid1.shtml	2010-09-23 13:49:13 UTC (rev 17434)
@@ -123,8 +123,10 @@
 	<select class="srt-select">
 	  <option>
 	    Subtitles: Off</option>
-	  <option file="vid1-en_US.srt">
+	  <option file="vid1-en.srt">
 	    Subtitles: US English</option>
+	  <option file="vid1-fr.srt">
+	    Subtitles: Français</option>
 	</select>
 	
       </div>
@@ -236,12 +238,11 @@
     <h3>Download subtitles</h3>
     
     <ul>
-      <li><b> Ogg format (Kate): </b><br>
-	<a href="vid1-en_US.kate">
-	  US English</a> 
       <li><b> SRT format: </b><br>
-	<a href="vid1-en_US.srt">
+	<a href="vid1-en.srt"> | 
 	  US English</a> 
+	<a href="vid1-fr.srt">
+	  Français</a> 
     </ul>
     <p>We welcome good, technical translations from the community!
     Please submit translations in SRT or Ogg Kate format



More information about the commits mailing list