[Vorbis-dev] libvorbis 1.2.1 release?

Mon Jun 2 17:59:59 PDT 2008

Reply-to munging wherefore art thou? Come back, all is forgiven!

> The main point of the assembly versions there was to get a large
> (~10%) performance gain for decode, for almost no effort. I doubt
> calling lrint is going to be as fast...

I am willing to bet a case of your choice of Australian beer, payable
at LCA2008 that lrint will be the same speed +/- 5% as this:

	#if defined(__i386__) && defined(__GNUC__) && !defined(__BEOS__)
	#  define VORBIS_FPU_CONTROL
	/* both GCC and MSVC are kinda stupid about rounding/casting to int.
	   Because of encapsulation constraints (GCC can't see inside the asm
	   block and so we end up doing stupid things like a store/load that
	   is collectively a noop), we do it this way */

	/* we must set up the fpu before this works!! */

	typedef ogg_int16_t vorbis_fpu_control;

	static inline void vorbis_fpu_setround(vorbis_fpu_control *fpu){
	  ogg_int16_t ret;
	  ogg_int16_t temp;
	  __asm__ __volatile__("fnstcw %0\n\t"
		  "movw %0,%%dx\n\t"
		  "orw $62463,%%dx\n\t"
		  "movw %%dx,%1\n\t"
		  "fldcw %1\n\t":"=m"(ret):"m"(temp): "dx");
	  *fpu=ret;
	}

	static inline void vorbis_fpu_restore(vorbis_fpu_control fpu){
	  __asm__ __volatile__("fldcw %0":: "m"(fpu));
	}

	/* assumes the FPU is in round mode! */
	static inline int vorbis_ftoi(double f){  /* yes, double!  Otherwise,
	                                             we get extra fst/fld to
	                                             truncate precision */
	  int i;
	  __asm__("fistl %0": "=m"(i) : "t"(f));
	  return(i);
	}
	#endif

On x86, x86_64 and PowerPC (the only ones I've really looked at)
lrint compiles to a single instruction.

Erik
--
-----------------------------------------------------------------
Erik de Castro Lopo
-----------------------------------------------------------------
"I'd crawl over an acre of 'Visual This++' and 'Integrated
Development That' to get to gcc, Emacs, and gdb.  Thank you."
-- Vance Petree