You are here : USA Education News » MASSACHUSETTS » Automatic Speaker Tracking in Audio Recordings

Automatic Speaker Tracking in Audio Recordings

Posted On Sunday, October 20, 2013 By USA Education News. Under MASSACHUSETTS Tags: central topic, CSAIL, Najim Dehak, speaker diarization, spoken-language-systems research

Check

A central topic in spoken-language-systems research is what’s called speaker diarization, or computationally determining how many speakers feature in a recording and which of them speaks when. Speaker diarization would be an essential function of any program that automatically annotated audio or video recordings.

To date, the best diarization systems have used what’s called supervised machine learning: They’re trained on sample recordings that a human has indexed, indicating which speaker enters when. In the October issue of IEEE Transactions on Audio, Speech, and Language Processing, however, MIT researchers describe a new speaker-diarization system that achieves comparable results without supervision: No prior indexing is necessary.

Moreover, one of the MIT researchers’ innovations was a new, compact way to represent the differences between individual speakers’ voices, which could be of use in other spoken-language computational tasks.

“You can know something about the identity of a person from the sound of their voice, so this technology is keying in to that type of information,” says Jim Glass, a senior research scientist at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and head of its Spoken Language Systems Group. “In fact, this technology could work in any language. It’s insensitive to that.”

To create a sonic portrait of a single speaker, Glass explains, a computer system will generally have to analyze more than 2,000 different speech sounds; many of those may correspond to familiar consonants and vowels, but many may not. To characterize each of those sounds, the system might need about 60 variables, which describe properties such as the strength of the acoustic signal in different frequency bands.

E pluribus tres

The result is that for every second of a recording, a diarization system would have to search a space with 120,000 dimensions, which would be prohibitively time-consuming. In prior work, Najim Dehak, a research scientist in the Spoken Language Systems Group and one of the new paper’s co-authors, had demonstrated a technique for reducing the number of variables required to describe the acoustic signature of a particular speaker, dubbed the i-vector.

To get a sense of how the technique works, imagine a graph that plotted, say, hours worked by an hourly worker against money earned. The graph would be a diagonal line in a two-dimensional space. Now imagine rotating the axes of the graph so that the x-axis is parallel to the line. All of a sudden, the y-axis becomes irrelevant: All the variation in the graph is captured by the x-axis alone.

Similarly, i-vectors find new axes for describing the information that characterizes speech sounds in the 120,000-dimension space. The technique first finds the axis that captures most of the variation in the information, then the axis that captures the next-most variation, and so on. So the information added by each new axis steadily decreases.

Stephen Shum, a graduate student in MIT’s Department of Electrical Engineering and Computer Science and lead author on the new paper, found that a 100-variable i-vector — a 100-dimension approximation of the 120,000-dimension space — was an adequate starting point for a diarization system. Since i-vectors are intended to describe every possible combination of sounds that a speaker might emit over any span of time, and since a diarization system needs to classify only the sounds on a single recording, Shum was able to use similar techniques to reduce the number of variables even further, to only three.

Birds of a feather

For every second of sound in a recording, Shum thus ends up with a single point in a three-dimensional space. The next step is to identify the bounds of the clusters of points that correspond to the individual speakers. For that, Shum used an iterative process. The system begins with an artificially high estimate of the number of speakers — say, 15 — and finds a cluster of points that corresponds to each one.

Clusters that are very close to each other then coalesce to form new clusters, until the distances between them grow too large to be plausibly bridged. The process then repeats, beginning each time with the same number of clusters that it ended with on the previous iteration. Finally, it reaches a point at which it begins and ends with the same number of clusters, and the system associates each cluster with a single speaker.

According to Patrick Kenny, a principal research scientist at the Computer Research Institute of Montreal, i-vectors were originally devised to solve the problem of speaker recognition, or determining whether the same speaker features on multiple recordings. “Najim showed that this representation was very good for speaker recognition with long utterances, typically 30 seconds or a minute,” Kenny says. “The issue when you’re diarizing telephone conversations is that a speaker’s turn is generally very short. It’s really an order of magnitude less than the recordings that are used in text-dependent speech recognition.”

“What was completely not obvious, what was surprising, was that this i-vector representation could be used on this very, very different scale, that you could use this method of extracting features on very, very short speech segments, perhaps one second long, corresponding to a speaker turn in a telephone conversation,” Kenny adds. “I think that was the significant contribution of Stephen’s work.”

**************************************************

Rating: 10.0/10 (1 vote cast)

Rating: +1 (from 1 vote)

Automatic Speaker Tracking in Audio Recordings, 10.0 out of 10 based on 1 rating

Comments are closed.

« Thunder Bay Marine Sanctuary holds huge untapped economic potential for Alpena area U-M study finds

UMD Book Named ‘Must Read’ Before Graduation »

Subscribe Now!

Enter your email address:
USA Education News
October 2013

M T W T F S S

« Sep Nov »

1 2 3 4 5 6

7 8 9 10 11 12 13

14 15 16 17 18 19 20

21 22 23 24 25 26 27

28 29 30 31
Find Us on Facebook
Category Specific RSS
FEATURED NEWS
FEATURED
- Is a Law Degree Enough for the Law? September 11, 2015
- Registration open for Department of Energy’s National Science Bowl November 1, 2013
- LSU Partners with OfficeMax to Reduce Office Supply Costs October 28, 2013
- NMSU selects firm to assist with search for vice president for university advancement October 11, 2013
- Air Force Vets Who Helped Pioneer Chinese Language Study at Yale Return to Campus September 28, 2013
- Free Public Breast Cancer Conference Slated for Oct. 4 September 22, 2013
- Wisconsin Science Festival features Nobel, Pulitzer Prize winners and NPR host September 21, 2013
- Medical Tourism is One of the Growing Trends Across the World September 20, 2013
- University Student Population On The Rise with Students increasing Course Loads September 17, 2013
- A New Approach to Teaching: Mid-Columbia Prepares for Changes in Standards September 16, 2013
- Two Veterinary Medicine Students Earn Firsts For Their Clinical Cases September 12, 2013
- Groomed for Success: High Energy Neurosurgeon Combines Love of Barns and Brains September 8, 2013
- Tuberculosis Genomes Portray Secrets of Pathogen’s Success August 23, 2013

Automatic Speaker Tracking in Audio Recordings

California Education News

CONNECTICUT EDUCATION NEWS

Florida Education News

ILLINOIS EDUCATION NEWS

Massachusetts Education News

Michigan Education News

New Jersey Education News

New York Education News

Ohio Education News

Pennsylvania Education News

Texas Education News

Washington Education News

Subscribe Now!

USA Education News

Find Us on Facebook

Category Specific RSS

FEATURED NEWS

www.USAEDUCATIONNEWS.com

Quick Contact

Education News Links

Other Education Links

October 2013
M	T	W	T	F	S	S
« Sep				Nov »
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Automatic Speaker Tracking in Audio Recordings

California Education News

CONNECTICUT EDUCATION NEWS

Florida Education News

ILLINOIS EDUCATION NEWS

Massachusetts Education News

Michigan Education News

New Jersey Education News

New York Education News

Ohio Education News

Pennsylvania Education News

Texas Education News

Washington Education News

Tag Cloud

Subscribe Now!

USA Education News

Find Us on Facebook

Category Specific RSS

FEATURED NEWS

www.USAEDUCATIONNEWS.com

Quick Contact

Education News Links

Other Education Links