[flac-dev] FLAC as a format for archiving non-audio (SDR) sample data?

Alistair Buxton a.j.buxton at gmail.com
Wed Mar 31 15:51:11 UTC 2021


Hi,

There are several projects devoted to preserving analog video media such as
laserdiscs, vhs tapes etc. These projects use raw sampling and SDR
techniques to recover higher quality versions than what is possible using
normal players and capture cards. In the process of this work we end up
with huge files of raw analog sample data. These files range from 10s to
100s of gigabytes of samples, with a typical sample rate of 25MHz to 40MHz.
Rate and format varies depending on the hardware used for capture.

We've found that FLAC compresses these much better than gzip, lzma etc,
getting 50% ratio vs 80% for general data compression algorithms, and in my
testing it seems fast enough to encode in real-time.

Currently most people are using their own ad-hoc solutions for archiving
data, but as the author of one of the tools people are using, I'd like to
make it a bit more standardized and automatic. Compatibility with existing
playback hardware is not required of course. So I have some questions about
the internals of the FLAC format, suitability, and how it can be stretched
to our needs.

Is there some way I could store a sample rate with 32 bit precision? It
seems like the sample rate doesn't actually make any difference for raw
data and people are just using 48000 as a placeholder, but it would be
really useful if the true sample rate could be stored as it is required by
the decoding tools. It does need more than 16 bits of precision.

What about other metadata? Can I store arbitrary information? I don't need
to store things like artist/title that you would expect for audio tracks. I
need to store things like the format of the capture, hardware used, number
of samples per line and number of lines per field.

I also need a way to mark sections of the file. Think of this like having a
FLAC file of a whole album, and marking in the metadata where each track
begins and ends within the file. In my case, these are the start and end of
different recordings on one VHS tape. These sections will also need their
own metadata. Basically I need to store structured data, not just flat
key=value records.

A lot of this information is not available at capture time and can only be
found by decoding the samples - which cannot be done in real-time. So I
need to be able to insert it after the decoding process, without rewriting
the whole file. I gather that it is possible to pad the metadata block to
allow it to grow later. Are there any limits on how much padding I can
insert?

Would there be any advantage to using OGA container instead of straight
FLAC files for this?

Finally, the programming language of choice for this type of work is
Python. Can you suggest a good binding that supports encode, decode,
metadata manipulation, and fast sample accurate seeking?

I'm interested to hear any thoughts you have about this.

Background information about our projects:

https://zxnet.co.uk/teletext/recovery/
https://github.com/ali1234/vhs-teletext
https://www.domesday86.com/?page_id=978
https://github.com/happycube/ld-decode/

Thanks,

-- 
Alistair Buxton
a.j.buxton at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xiph.org/pipermail/flac-dev/attachments/20210331/f539852c/attachment.html>


More information about the flac-dev mailing list