article summary-audio fingerprinting

“Instead of D.J., a Web Server Names That Tune” by Anne Eisenberg, the New York Times,  January 23, 2003

                                                                                                                          
    This article talks about audio fingerprinting technology that can verify the name of a song when it plays on an Internet server. The article explains the following: the creation, usage, advantages of and possibility for future development of the system.

   Sometimes people have to wait in vain to get the title of a song when they listen to the tune on the radio. To resolve this inconvenience, nowadays some companies are offering a technology that can identify a title of a tune while it is playing. For instance, this system shows words like Take Five by the Dave Brubeck Quartet on an Internet radio or cellphone.
   The technology can recognize not only names and artists within a huge range of recorded music, but also “different versions of a piece done by the performers, even when the differences are slight.”
For example, at the recent Consumer Electronics Show in Las Vegas, Royal Philips Electronics showed a model of an Internet radio that can verify the name of the band when the music was played, and also “can distinguish a version of a tune that it played at a concert in Verona, Italy, from the same tune recorded in Milan.”
   A mathematician at Philips Research in Eindhoven, the Netherlands, Ton Kalker (the leader of the team that made the rapid identification system) said that the technology is sensitive enough to make distinctions that humans can’t.
   This technology is called audio fingerprinting. It is based on the idea that every performance of a song has unique audio characteristics.
For instance, “a certain relationship of neighboring high and low notes over a minuscule slice of time.” Audio fingerprinting makes those relationships in numbers, and slight differences in code indicate a separate version of a song.
   Dr. Richard Gooch, deputy director of technology at the International Federation of the Phonographic Industry (a trade organization based on London), explained more about this system: Audio fingerprinting operates by making a mathematical explanation of some of the unique characters of a song. The fingerprints are saved on a server. And when a tune is playing on Internet radio and the system needs to verify the song, this technology matches a small piece of the tune, represented in code, with the entire coded version of the song stored on the server.
   According to Dr. Gooch, some companies use fingerprinting technology to distinguish, not only streaming content, but also traditional radio broadcasts and the contents of audio files.
   Dr. Gooch also said that this system might help the recording industry to make a profit (by monitoring which songs are played and thus being able to collect royalties). His group and the Recording Industry Association of America are examining many audio fingerprinting systems for that.
   Dr. Gooch emphasized that the technology can work even in very bad conditions such as “poor loudspeakers, highly compressed streaming files or broadcasters who speed up songs slightly to make room for commercials.” He mentioned that as long as people can hear a song (even in poor circumstances), “the systems can extract the description of unique characteristics in the song, quickly matching the description with the database to identify the track.”
   According to him, since the early 80’s, this technology has been improving little by little, and has been used in the business field. For instance, it has been used in the music industry to verify broadcast performances and then give royalties to “right holders.”
   Dr. Gooch said “Audio fingerprinting is accurate, robust and runs in a sensible amount of time. It really works.”
   He also mentioned that consumer applications of the technology are probably needed because of the popularity of digital music.
   According to the article, Shazam Entertainment, based in London, already gives consumers an audio fingerprinting service connected to cellphones. It works like this: users call the service and then hold their cellphones up to the song on the car radio, and the system gets the song through the cellphone, comparing that audio with its database to find out the title of the tune.
   This company has a database with the fingerprints of 1.6 million tunes. “It matches the incoming fingerprint with its database and within 30 seconds sends a text message back to the phone identifying the song.”
Philip Inghelbrecht, founder of the company and director of its content, said that if a CD is commercially available, they get it, so they have most popular songs that have been recorded.
   This technology is useful for ordinary people who love music. According to Vance Ikezoye, chief executive of Audible Magic (this company presents its own patented audio fingerprinting technology), a company in Los Gatos, California, manufactures of electronic devices may present audio fingerprinting to people who want to arrange their music collections in their computers.
   He said “It’s hard to manage music if you don’t have the correct information for every song in your collection.”
   This article also explained one method for creating the unique code of the fingerprint system (companies have used different techniques to make this, as the article said). The method described was that used by Philips. According to Dr. Kalker, he and his colleagues made the code by breaking each tune into 10-millisecond pieces. Then, “they calculated the differences in the loudness of adjacent frequencies in the snippet and how those differences changed over time – they repeated the process every 10 milliseconds to extract code over the entire length of the song.”
   Because of the precision of this database, Dr. Kalker said that the entire process, once the database is up and running, needs just about three seconds, so even if people listen to a bad AM or FM radio, it doesn’t bother the system to find name of the song.
   Dr. Gooch mentioned the possibility of the fingerprinting system for use in car radios. He said, “when every car has its own digital audio player, people will want to know the name of the song they are listening to displayed on the dashboard.”
   Some people might worry about the rights of music publishers when they use this technology. However, according to Mr. Ikezoye of Audible Magic, people don’t have to be concerned about violating the rights of music publishers.
   He said that even though Audible Magic keeps a huge database of popular copyrighted music, these fingerprints are not the actual songs, just ‘summaries of factual measurements describing the sound.’
He also mentioned that the original sound cannot be rebuilt from the fingerprint, therefore saving and giving out fingerprints does not violate the copyright.
   The article was concluded with a citation by Cary Sherman, president and general counsel of the Recording Industry Association of America: “Radios often don’t bother to tell listeners what they have played.” – the article said that if the technology can reach into consumer applications, radio listeners may solve the problem  Sherman mentioned.
   I think this Audio fingerprinting is notable, and I admire people who developed this technology. However, if this technology costs a lot in consumer applications, it might not be as successful as they think. Unfortunately, the article does not mention any projected costs or who will pay for the service.

  

번역 1

msp Programming and Digital Audio

• msp is a set of digital audio extensions for Max. These extensions are a port of work done by Miller Puckette at IRCAM and UCSD to the Opcode Macintosh Max environment and have been programmed and marketed by David Zicarelli, the programmer of Opcode’s version of Max. The name msp has a couple of connotations: one is Max Signal Processing. Another is that msp are the initals of Miller Smith Puckette.

msp는 max의 디지털 오디오 확장된 set입니다. 이 확장은 IRCAM and UCSD에 있는 Puckette에 의해 매킨토시 환경의 맥스 옵코드로 작업 되었으며 David Zicarelli (맥스의 옵코드 버전의 프로그래머)에 의해 프로그램화, 상용화 되었습니다. Msp는 두 개의 논리로 되어있는데요, 한가지는 맥스의 signal processing이며, 또 한가지는 mas가 Miller Smith Puckette 이름의 이니셜 이라는 것입니다.

• Figure 1 shows a simple msp patch. All msp objects end with a twiddle (~) after the name of the object. Because it looks vaguely like a sine wave, the twiddle indicates that information coming into and/or out of the object is happening at the audio rate. Information in Max usually is sent whenever the user does something — plays in some MIDI, clicks with the mouse, etc. The fastest rate at which events can be scheduled to occur in Max is 1000 times per second. In audio, however, samples must be produced at the sampling rate for the sound to continue, and all msp objects update their outputs at the sampling rate (typically 44100 samples per second).

Figure 1에서는 단순한 패치를 보여줍니다. 모든 msp object들은 object name 뒤에~의 기호로 끝납니다. 이것이 sine곡선과 같은 굴곡으로 보여지기 때문에, 들어오고있는 정보나, object의 밖에서 audio rate에 벌어지고 있는 정보를 지시합니다. 보통 맥스 내의 정보는 사용자가 어떤 것을 해야 할 때 그 정보가 보내어 집니다 -미디를 플래이하거나, 마우스를 클릭하는등등. 사건들이 맥스 안에서 발생하기 위해 계획될 수 있는 가장 빠른 rate는 초당 1000번입니다. 그러나 오디오에서는 samples는 사운드를 지속시키기 위한 sampling rate에 맞추어 만들어져야 합니다. 그리고 모든 msp onject들은 그들이 출력을 sampling rate (전형적으로 초당44100)으로 업데이트 합니다.

Figure 1: 440-hz oscillator

• The dac~ object at the bottom of Figure 1 is the digital-to-analog conversion object of msp. The two inlets at the top correspond to the left and right outputs from the sound system attached to the computer. This can be simply the stereo output from the computer itself or the outputs of a sound card (such as Digidesign gear) installed on the machine. The patch as a whole takes the audio output of a cycle~ object (a simple table-lookup oscillator), reduces the amplitude by multiplication, and sends it to the dac. The startwindow and stop messages to the dac turn on audio (for this window’s patch only) and turn it off, respectively.

figure 1 의 하단에 있는 ‘dac~’object는 msp의 digital을 analog로 변환시킵니다. 상위에 있는 두 개의 inlet은 컴퓨터에 적합한 사운드 시스템으로부터 왼쪽과 오른쪽 output으로 대응됩니다. 이것은 컴퓨터 자체의 스테레오 출력으로, 또는 컴퓨터 내에 인스톨된 사운드 카드 출력(Digidesign gear같은)으로 단순화 될 수 있습니다. cycle~ object(a simple table-lookup oscillator)의 오디오 출력을 택한 패치는 multiplication에 의한 amplitude를 줄이고 dac로 전송합니다. dac로보내지는 startwindow 와 stop messages는 각각 오디오를 키거나 끌 수 있습니다.(이 윈도우 패치에서만)

• A sine wave is an example of simple harmonic motion. The wave completes one cycle of a simple back-and-forth motion at a constant rate. Because each cycle is completed in a constant amount of time, the motion of the wave is periodic. The number of cycles completed per second is the frequency of the wave, and the inverse of the frequency is its period. A wave that completes its cycle 100 times per second, then, has a frequency of 100 cycles per second (cps), also known as hertz (hz), and a period of 1/100 second, or 10 milliseconds.

사인 웨이브는 단순한 하모닉 운동의 예입니다. Wave는 반대의 rate에서 단순하게 앞 뒤로 움직이는 운동에 의해 완성되며, 그 운동은 주기성을 가집니다. 초당 만들어지는 주기의 양이 웨이브의 음높이를 결정합니다. 그리고 그 frequency의 반대(역수? 표현하기가 적당하지 않지만..)는 그것의 주기입니다. 초당 100번의 주기로 된 웨이브를 본다면 100의 cycles per second (cps), (hz라고 알려진) frequency를 가집니다. 그리고 초당 1/100 또는 millisecondf 10의 한 주기를 가집니다.

• Sampling Theorem: To represent digitally a signal containing frequency components up to X Hz, it is necessary to use a sampling rate of at least 2X samples per second. If a signal has frequency components above one-half the sampling rate, these will be misrepresented in what is termed foldover, or aliasing. The frequency that is one-half the sampling rate is called the Nyquist frequency. Each frequency “has an alias equally far from the Nyquist frequency but on the other side of it. . . For this reason the Nyquist frequency is often called the folding frequency because we can think of frequencies above Nyquist as being folded down below Nyquist” [Steiglitz p. 47]. Simplifying this a bit, we can say that when an original frequency higher than one-half the sampling rate is sampled, it will produce a new frequency that is equal to the sampling frequency minus the original frequency.

Sampling 원리 : X Hz에 이르기 위한 frequency성분을 포함하고 있는 한 신호가 보여지기 위해서는, 적어도 초당 2X샘플의 sampling rate를 사용이 요구됩니다. 만약에 한 signal이 1/2의 sampling rate보다 위인 frequency 성분을 가진다면, 이것은 초과된 것으로, 또는 위신호로 잘못 전달 될 것입니다. 이런 1/2의 sampling rate의 frequency를 the Nyquist frequency라고 합니다. 각각의 frequency는 the Nyquist frequency로부터 동등한 거리이지만 그것의 반대편에 있는 alias(허상이라고나 할까요)를 가지고 있습니다. 이런 이유로 the Nyquist frequency는 종종 the folding(‘접힌’ 이라고 생각하시면 이해가 빠를 것 같아요) frequency로 불리는데, 그 이유는 우리가 아래에 접힌 것처럼 Nyquist위에 있는 frequency를 상상할 수 있기 때문입니다. 이것의 한 조각이 단순화 될 때 우리는, 1/2의 sampling rate보다 더 높은 원래의 frequency가 sample될 때, sampling frequency 에서 원래 frequency를 땐 것과 같은 새로운 frequency가 생겨날 것이라고 말할 수 있습니다.

• Samples in msp are interpreted as floating point values within the range –1.0 to +1.0. Samples conforming to that range occupy the full dynamic spectrum when sent to the dac~ object. Therefore, to attenuate the volume of a signal, the signal should be made to occupy a smaller range of values. In Figure 1 and Figure 2 below, the attenuation is performed by multiplying the output of cycle~ by a fractional value. Multiplication by a fraction is equivalent to division (another way to reduce the range of a signal) but is more economical to perform. Adding two signals together is equivalent to mixing them. Whenever more than one signal is mixed (added) together, take care that the combined output of the mixed sources does not exceed the range –1.0 to +1.0, because outside of that range the signal will clip.

msp의 sample들은 -1.0에서 +1.0까지의 범위 내에서 부동의 값으로 설명됩니다. 샘플들은 dac~ object로 보내질 때 full dynamic spectrum이 발생하는 범위에 따릅니다. (문장에 동사가 하나 빠진 것 같은디..) 그러므로 signal의 볼륨을 희박하게 하기 위하여 signal은 작은 범위의 값이 이용되어야 합니다. Figure 1과 2에 따르면, 그 희박함은 분수 값의 cycle~출력을 곱하는 것에 의해 실행됩니다. 그 분수 값을 곱하는 것은 나눗셈과 같지만(이것은 range를 줄이는 또 다른 방법이겠지요)실행에 더 경제적입니다. 두 signal을 더하는 것은 그것들은 섞는 것과 같습니다. 하나의 시그널보다 더 많은 것이 서로 더해질 때 마다 그 혼합된 소스의 혼합된 출력이 range -1.0에서 +1.0을 초과하지 못하도록 처리됩니다. 왜냐하면 그 신호가 clip될 수 있는 범위의 출력이기 때문입니다.
—오늘은 여기까지 -_-!

msp programming and digital audio

msp Programming and Digital Audio

• msp is a set of digital audio extensions for Max. These extensions are a port of work done by Miller Puckette at IRCAM and UCSD to the Opcode Macintosh Max environment and have been programmed and marketed by David Zicarelli, the programmer of Opcode’s version of Max. The name msp has a couple of connotations: one is Max Signal Processing. Another is that msp are the initals of Miller Smith Puckette.

• Figure 1 shows a simple msp patch. All msp objects end with a twiddle (~) after the name of the object. Because it looks vaguely like a sine wave, the twiddle indicates that information coming into and/or out of the object is happening at the audio rate. Information in Max usually is sent whenever the user does something — plays in some MIDI, clicks with the mouse, etc. The fastest rate at which events can be scheduled to occur in Max is 1000 times per second. In audio, however, samples must be produced at the sampling rate for the sound to continue, and all msp objects update their outputs at the sampling rate (typically 44100 samples per second).


Figure 1: 440-hz oscillator

• The dac~ object at the bottom of Figure 1 is the digital-to-analog conversion object of msp. The two inlets at the top correspond to the left and right outputs from the sound system attached to the computer. This can be simply the stereo output from the computer itself or the outputs of a sound card (such as Digidesign gear) installed on the machine. The patch as a whole takes the audio output of a cycle~ object (a simple table-lookup oscillator), reduces the amplitude by multiplication, and sends it to the dac. The startwindow and stop messages to the dac turn on audio (for this window’s patch only) and turn it off, respectively.

• A sine wave is an example of simple harmonic motion. The wave completes one cycle of a simple back-and-forth motion at a constant rate. Because each cycle is completed in a constant amount of time, the motion of the wave is periodic. The number of cycles completed per second is the frequency of the wave, and the inverse of the frequency is its period. A wave that completes its cycle 100 times per second, then, has a frequency of 100 cycles per second (cps), also known as hertz (hz), and a period of 1/100 second, or 10 milliseconds.
• Sampling Theorem: To represent digitally a signal containing frequency components up to X Hz, it is necessary to use a sampling rate of at least 2X samples per second. If a signal has frequency components above one-half the sampling rate, these will be misrepresented in what is termed foldover, or aliasing. The frequency that is one-half the sampling rate is called the Nyquist frequency. Each frequency “has an alias equally far from the Nyquist frequency but on the other side of it. . . For this reason the Nyquist frequency is often called the folding frequency because we can think of frequencies above Nyquist as being folded down below Nyquist” [Steiglitz p. 47]. Simplifying this a bit, we can say that when an original frequency higher than one-half the sampling rate is sampled, it will produce a new frequency that is equal to the sampling frequency minus the original frequency.

• Samples in msp are interpreted as floating point values within the range –1.0 to +1.0. Samples conforming to that range occupy the full dynamic spectrum when sent to the dac~ object. Therefore, to attenuate the volume of a signal, the signal should be made to occupy a smaller range of values. In Figure 1 and Figure 2 below, the attenuation is performed by multiplying the output of cycle~ by a fractional value. Multiplication by a fraction is equivalent to division (another way to reduce the range of a signal) but is more economical to perform. Adding two signals together is equivalent to mixing them. Whenever more than one signal is mixed (added) together, take care that the combined output of the mixed sources does not exceed the range –1.0 to +1.0, because outside of that range the signal will clip.


Figure 2: Attenuation in msp

• To operate with signals digitally, we must discretize the waveform in two dimensions: in time (sampling) and in amplitude (quantizing). There are three steps to the conversion of an analog signal into a digital signal, the process called analog-to-digital conversion (ADC):

1) FILTER: A low-pass filter removes any frequency components of the signal exceeding one-half of the sampling rate.
2) MEASURE: A measurement is taken of the instantaneous amplitude of the signal at equally spaced intervals of time.
3) QUANTIZE: A quantizer assigns a precise numeric value to the measurement made in the previous step.

• The inverse process changes a digital representation to an analog one, and is called digital-to-analog conversion (DAC). In a DAC, voltage generators proportional to 2k volts are switched on when the corresponding bit k of the incoming digital representation is on. The steps of the DAC process are as follows:

1) TO VOLTAGE: The digital signal is converted to a time-varying voltage proportional to the sequence of numbers at the input.
2) TRANSIENT REMOVAL: “Glitches” introduced by step one are eliminated by ignoring fast transients.
3) FILTER: A low-pass filter set to half the sampling rate smoothes out the resultant analog signal.


Figure 3: DSP status window

• The DSP status window shows information about the configuration of digital audio on the Mac and the load a running msp program places on the central processing unit (CPU). Figure 3 shows the DSP status window during a typical execution of the simple oscillator patch of Figure 1. Notice that just running one oscillator uses almost 9% of the processing power of a Macintosh 8500/150 Power PC. The power of current CPUs to deliver digital audio directly is revolutionary, but actually using such applications quickly places a premium on processing speed.

• The cycle~ object is a table-lookup oscillator that uses a stored table of 512 samples. You can input your own sample tables or use the default sine wave. Cycle~ continuously outputs samples from the table at a frequency that corresponds to its argument (as in Figure 1) or to a value input to the left inlet.


Figure 4: cycle~ with variable frequency

• Figure 4 shows a patch with variable control over the oscillator frequency. The object line~ works like the object line, but at audio rates. Therefore the messages coming into line~ in Figure 4 will be changed into audio rate designations of frequency for the cycle~ object. Changing the value of the interpolation time into line (100 ms. in Figure 4) will change the speed with which the patch makes a portamento from one frequency to another.

• Note that the input to line~ is an ordinary Max message box. Some msp objects (such as line~ ) can take non-signal inputs and interpret these as controls over processes at the audio rate. The msp object sig~ explicitly upsamples a max value to an audio value; the object snapshot~ downsamples audio outputs to the Max range (a maximum of 1000 values per second).

• The average amplitude of a waveform is usually measured by the root-mean-square (rms) method. This works as it sounds: instantaneous measurements of amplitude are squared, summed, and averaged. The square root of the resulting number is the rms amplitude of the waveform.

• Noise is any unwanted signal added to the desired representation. Noise generally has a constant value (think of “hum”) and can be thought of as a lower limit to the range of useful signals. A commonly used measure of the presence of noise in a system is the signal-to-noise ratio (SNR), “which is usually defined as the ratio between the amplitudes of the largest useful signal and the amplitude of the inherent noise in a system.” Both amplitudes are expressed as rms values, and the SNR in decibels.
SNR (in dB) =  

• msp is very useful for experimenting with digital audio processing because so many DSP algorithms can be implemented quite directly using the objects msp provides. Rather than simply memorizing a formula such as the one above, you can use msp patches to try it out in practice. The patch shown in Figure 5 changes an amplitude value (varying between 0.0 and 1.0) to a value in decibels, using the formula shown above.


Figure 4: Amplitude to DB conversion

• MSP Exercise: Bring a Max/MSP patch to class February 26. Use MSP to implement one of the following digital signal processing techniques:
1) a low-pass filter
2) a reverb unit
3) a mixer
4) a flanger
NB: Don’t just copy the tutorial examples: do something in your patch that differs from the manual

Roads, C.. (1996) The Computer Music Tutorial. Cambridge, MA: The MIT Press.

Steiglitz, K. (1996) A Digital Signal Processing Primer New York: Addison-Wesley Publishing Company, Inc.