[Tremor] Re:Tremor lowmem on TI 55x DSP (Roland Wintersteller)

Santosh Kumar santoshkumar at ftdpl.com
Sun Oct 24 21:52:53 PDT 2004


  hi

   I want to thank Mr.Ronald for his last mail.It was very helpful to a
newbe like me.I  had the memory leakage problem which i could solve now.

  regards,
  santosh.


  ----- Original Message -----
  From: <tremor-request at xiph.org>
  To: <tremor at xiph.org>
  Sent: Monday, October 25, 2004 12:30 AM
  Subject: Tremor Digest, Vol 5, Issue 16


  > Send Tremor mailing list submissions to
  > tremor at xiph.org
  >
  > To subscribe or unsubscribe via the World Wide Web, visit
  > http://lists.xiph.org/mailman/listinfo/tremor
  > or, via email, send a message with subject or body 'help' to
  > tremor-request at xiph.org
  >
  > You can reach the person managing the list at
  > tremor-owner at xiph.org
  >
  > When replying, please edit your Subject line so it is more specific
  > than "Re: Contents of Tremor digest..."
  >
  >
  > Today's Topics:
  >
  >    1. Tremor lowmem on TI 55x DSP (Roland Wintersteller)
  >    2. Re: TI55xx implementation: stuck ('Monty')
  >
  >
  > ----------------------------------------------------------------------
  >
  > Message: 1
  > Date: Sat, 23 Oct 2004 21:10:53 +0200
  > From: "Roland Wintersteller" <rwinters at europe.com>
  > Subject: [Tremor] Tremor lowmem on TI 55x DSP
  > To: <tremor at xiph.org>
  > Message-ID: <000a01c4b934$048e0230$9600a8c0 at CNH0WINTER>
  > Content-Type: text/plain; charset="us-ascii"
  >
  >
  > I currently recognised, that a message from me sent to the mailing list
  > a few weeks ago cannot be displayed by the archive because I've used
  > HTML format instead of text format. Since I think this message may be
  > helpful to somebody trying to port Tremor on a TI C5x DSP, here a very
  > similar message but this time in text format...
  >
  >
  >
  > I ported the tremor lowmem decoder to a C55x DSP a half year ago. Hear
  > are my experiences and a few numbers concerning the porting results.
  >
  > List of the main problems I've had:
  >
  > A list of the main problems I've had:
  > - 16 bit char:
  >   function floor1_info_unpack(...)
  >     ...
  >     info->class[j].class_subbook[k]=(oggpack_read(opb,8)-1) & 0xff;  //
  > <-- add "& 0xff" here
  >     if(info->class[j].class_subbook[k]>=ci->books &&
  >       info->class[j].class_subbook[k]!=0xff)goto err_out;
  >     ...
  >
  > - 16 bit / 32 bit integer - cast operators have to be added in a few
  > places:
  >   function oggpack_look(...)
  >   ...
  >     if(bits>8){
  >     // added cast to (uint32)
  >     ret|=(uint32)b->headptr[1]<<(8-b->headbit);
  >     if(bits>16){
  >       // added cast to (uint32)
  >       ret|=(uint32)b->headptr[2]<<(16-b->headbit);
  >       if(bits>24){
  >         // added cast to (uint32)
  >         ret|=(uint32)b->headptr[3]<<(24-b->headbit);
  >         if(bits>32 && b->headbit) {
  >           // added cast to (uint32)
  >           ret|=(uint32)b->headptr[4]<<(32-b->headbit);
  >         }
  >       }
  >     }
  >   }
  >   ...
  >
  > - 16 bit integers: all int (int16) types (at least those which needs
  > 32bit) has to replaced by long (int32)
  >
  > -memoyry leakage problem: have a look in the archives...
  > http://lists.xiph.org/pipermail/tremor/2004-April/000965.html and
  > http://lists.xiph.org/pipermail/tremor/2004-October/001112.html
  >
  > One additional hint:
  > Try to make your changes also run in a gcc or windows environment. I've
  > seen that the windows compiler reports warnings ccs does not and vice
  > versa.
  >
  >
  >
  > After this changes I was able to add the decoder in a test environment:
  > - Demo application running on a C5510 DSK, which
  > - reads an Ogg Vorbis file from the external SDRAM and
  > - puts out the decoded samples to the headphone jack with DMA/MCBSP.
  >
  > But the achieved results were pretty poor (stereo with 44.1 kHz
  > samplerate):
  > - 20 kW code
  > - 28 kW heap
  > - 11 kW constants
  > -  1 kW static data
  > -  ? kW stack
  > - 150 MCPS (million cycles per second) for q0
  > - 164 MCPS for q4
  > - 190 MCPS for q7
  > - Accuracy = -410/511 (10 bit)
  >
  > Since then I worked on optimizing the decoder (reducing lookup tables,
  > modified IMDCT implementation, s.a. Diploma Thesis and Patch of Johannes
  > Sandvall: http://lists.xiph.org/pipermail/tremor/2004-March/000957.html,
  > compressed char-buffers, adapted memory management, ...).  The current
  > memory consumption is...
  > +  17 kW code memory (doesn't include floor 0 decoding)
  > +  22 kW heap (up to 128kbit/s (q4); has to be increased to at least 23
  > kW with higher bit rates (244kbit/s <> q7))
  > +   0 kW static data (changed to constants)
  > +   2 kW constants
  > + 0.5 kW stack
  > + 2.5 kW application buffers (as a part of the decoder application;
  > needed for real-time decoding)
  > (Therefore the decoder is suitable to run on a C5502 DSP; code has to be
  > executed from an external memory)
  >
  > The code is not assembler optimized yet. The CPU load is still high...
  > + 100 MCPS for q0
  > + 110 MCPS for q4
  > + 130 MCPS for q7
  > With a number of assembler optimizations I expect the CPU load to can be
  > reduced to 40-50 MCPS. (As there is only c-code a performance test on a
  > C54X DSP would be interesting.)
  > The achieved accuracy compared to the vorbis reference decoder is -3/+2
  > digits (3 bit).
  >
  > Regards, Roland
  >
  >
  >
  >
  >
  > ------------------------------
  >
  > Message: 2
  > Date: Sat, 23 Oct 2004 21:09:39 -0400
  > From: 'Monty' <xiphmont at xiph.org>
  > Subject: Re: [Tremor] TI55xx implementation: stuck
  > To: Roland Wintersteller <rwinters at europe.com>
  > Cc: tremor at xiph.org
  > Message-ID: <20041024010939.GB19227 at xiph.org>
  > Content-Type: text/plain; charset=us-ascii
  >
  >
  >
  >
  > On Sat, Oct 23, 2004 at 08:59:01PM +0200, Roland Wintersteller wrote:
  > > Let me briefly sum up (and give some additional) facts of this
  > > discussion:
  > >
  > > - You have had problems on porting Tremor to C5x DSP, but now it
works.
  >
  > It always worked, it just wasn't very fast before writing alot of
  > assembly, yes :-)
  >
  > > - You were the first guy who achieved to make Ogg Vorbis run on the
C5x
  > > DSP,
  >
  > First that I know of, there may have been others.  To be pedantic, I
  > participated in the port effort; I was not the only engineer working
  > on it.
  >
  > > but in the meanwhile several others (including Johannes Sandvall
  > > and me) got Ogg Vorbis work on C5x too.
  >
  > Yes, and the other Neuros Audio engineers took over maintaining the
  > port the Neuros after the initial port, and have made apparently
  > substantial improvements to it since I let go of it.
  >
  > > - An ARM uC achieves better (faster) results than the C5x. We should
not
  > > forget that the ARM is a 32bit micro processor.
  >
  > The ARM is also running at a fraction of the clock speed.  There are
  > three things that make the ARM somewhat better suited to Tremor:
  >
  > 1) 32 bit math-- not actually as big a deal as it seems, but it makes
  > things easier on the compiler and developer.
  >
  > 2) Shifter on ALU inputs (not outputs).  This is more useful than
  > you'd think; it makes implementing true floating point relatively easy
  > and also eases fixed point math substantially.  It also comes in handy
  > during bit-slicing operations during packet decode.
  >
  > 3) ARM typically uses the on-core SRAM as a zero-wait cache for slower
  > (7-14 wait) off-board memory.  I expect some 5xxx can do this too, but
  > it's a less common arrangement if so.  The TI chips tend to have much
  > more SRAM on the core and designers tend to choose a such a chip with
  > the intent of putting everything in on-core storage.  An ARM is
  > usually paired with a small (where small is 8-16 megabit) offboard
  > DRAM.
  >
  > > As the most frequently
  > > used multiply operations I have seen in the sources are 32bit x 32bit
=
  > > 64bit>>32 = 32bit
  >
  > Yes, although really only 24x24->48 >> 24 bit depth is needed.
  >
  > > and the C5x DSP only supports 16bit x 16bit =
  > > 32bit>>16 = 16bit, that means, that the ARM is expected to be 4 times
  > > (400%) faster (compared to C5x assembler code). You only saw a
  > > performance gain of 12%.
  >
  > Don't forget the statements inbetween needed to glue; you have
  > only... two? real registers on TI as well.  Memory is fuzzy here.  ARM
  > addressing is more flexible as well, don't discount that.  OTOH, most
  > ARMs don't do 1 cycle multiplies.  So, the comparison is complicated.
  >
  > The TI chips can do some vectorized math, but I've not been able to
  > arrange it in a way to make use of the available vectorization.  A TI
  > guru could probably manage it though.
  >
  > > - TI C5x compiler is not able to implement a 32bit x 32bit multiply in
4
  > > cycles which is able in assembler.
  >
  > Right.  It is also missing intrinsics for doing so.  Actually, you
  > only need 3 multiplies if you know you're throwing away low bits.
  >
  > > On the other side the ARM compiler is
  > > probably not able to do a 64bit by 64bit multiply without a call to
  > > stdlib, which is the same considering the bit depth.
  >
  > I can only comment on GCC, but GCC can in fact do a 32x32->64 in one
  > insn.  The ARM Consortium paid Cygnus to write a first-class GCC
  > backend for ARM and I can attest to it being pretty well tuned.  GCC's
  > way of doing inline assembly also adds substantial convenience.
  >
  > > - TI C5x is currently the only 16bit CPU which is able to decode Ogg
  > > Vorbis.
  >
  > That I am uncertain of, but it could well be true.
  >
  > Monty
  >
  >
  > ------------------------------
  >
  > _______________________________________________
  > Tremor mailing list
  > Tremor at xiph.org
  > http://lists.xiph.org/mailman/listinfo/tremor
  >
  >
  > End of Tremor Digest, Vol 5, Issue 16
  > *************************************
  >


DISCLAIMER: The information contained in this e-mail message and in any annexure is 
confidential to the  recipient and may contain privileged information. If you are not
the intended recipient, please notify the sender and delete the message along with 
any annexure. You should not disclose, copy or otherwise use the information contained 
in the message or any annexure. Any views expressed in this e-mail are those of the 
individual sender except where the sender specifically states them to be the views of 
Future Techno Designs Pvt Ltd. OR Avedis Microsystems Pvt. Ltd 


More information about the Tremor mailing list