[speex-dev] Server based audio merge

Thu Nov 20 18:58:04 PST 2003

>True, but there is one critical place where it's necessary to mix at
least
>two streams--when someone's trying to break into a stream.  If speaker
A
>goes on and on and speaker B (or C, D, E, F...) wants to interject or
>interrupt, who do they do it without inband without mixing?

It doesn't have to be done that way.  You can simply have the server
echo the voice streams back to the various clients.  Leave the job of
mixing the sounds to the sound device (ex: DirectSound or sound
hardware) from multiple streams.

In most conversations when one person starts and another stops people
tend to stop speaking otherwise you cannot understand either person.

>The 'obvious' solution seems to be run N processes to detect 'speech'
>or important audio content on the incoming N streams.  Pick on or
>two that need output, then mix and recode them. 

Again not recommended as it has a major impact on total latency of the
voice stream to decode, mix and recode at the server to only decode
again at the client.

>If the detection is
>done in the client, then the servers job is much simpler--arbitrate,
>mix, and encode.  Since the overlap periods of the mixing are going
>to be infrequent and discontinuous, you don't have to be sample 
>exact--no stream synchronization required.

Additionally you should never upstream voice from clients to the server
that aren't transmitting.  You write code to detect transmission.

-----Original Message-----
From: owner-speex-dev at xiph.org [mailto:owner-speex-dev at xiph.org] On
Behalf Of David Willmore
Sent: Thursday, November 20, 2003 5:43 PM
To: speex-dev at xiph.org
Subject: Re: [speex-dev] Server based audio merge

> I tend to disagree.  It normal human conversation it wouldn't make
much
> sense to have 2 people talking over each other at the same time.
Thus,
> it most scenarios you would have only one talker anyway.
Additionally,
> encode->decode/mix/encode->decode isn't a very efficient CPU process
for
> a server, it's complicated to keep timing correct and it has a
negative
> impact on total latency.

True, but there is one critical place where it's necessary to mix at
least
two streams--when someone's trying to break into a stream.  If speaker A
goes on and on and speaker B (or C, D, E, F...) wants to interject or
interrupt, who do they do it without inband without mixing?

> The overhead required to mix merge and re-encode is usually not worth
> the benefit as in most situations you are not really saving any
> bandwidth.

But the options are *don't transcode* and *always transcode*.  Switching
between them is difficult to do on the fly.

The 'obvious' solution seems to be run N processes to detect 'speech'
or important audio content on the incoming N streams.  Pick on or
two that need output, then mix and recode them.  If the detection is
done in the client, then the servers job is much simpler--arbitrate,
mix, and encode.  Since the overlap periods of the mixing are going
to be infrequent and discontinuous, you don't have to be sample 
exact--no stream synchronization required.

So, I'd say any maching that can decode two streams, encode one stream,
and do a little overhead should be able to act as a server.

Hey, the client has to be able to encode in speex in real time, anyway,
why waste that effort?

Cheers,
David
--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to
'speex-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is
needed.
Unsubscribe messages sent to the list will be ignored/filtered.

<p>--- >8 ----
List archives:  http://www.xiph.org/archives/
Ogg project homepage: http://www.xiph.org/ogg/
To unsubscribe from this list, send a message to 'speex-dev-request at xiph.org'
containing only the word 'unsubscribe' in the body.  No subject is needed.
Unsubscribe messages sent to the list will be ignored/filtered.