[ogg-dev] adwantages of ogg container?

Sun Aug 29 15:55:07 PDT 2010

On 2010-08-30, Alexey Fisher wrote:

> One more question, haw about error resistance? For short time i worked 
> on computer forensic project, we needed to recover some parts of 
> videos from broken hard drive. There was many broken files. Are there 
> any options to recover video with missing header?

The best that I know about in that department is the kind of stuff they 
use in networks when trying to discern encrypted protocol flows based on 
auxiliary characteristics, or the stuff utilized in steganalysis. That 
is, statistical analysis of an extensive set of traits, which are 
heuristically derived from understanding of the protocol flow/behavior, 
and then statistically aggregated, typically using Bayesian methods. 
That sort of thing works extremely well in the interactive network 
environment, but I think it could be passable even in the storage 
forensic one. Still, I haven't seen any real work in the area myself.

My kind of nerd at least goes after the problem directly and generally. 
Map out all of the information you have, subdivide the problem into 
pieces you can crank out some explicit measurement code for, rebuild it 
using statistical methods, and then apply machine learning algorithms 
against a corpus of examples to yield a classification model. After 
that, mine the open source "literature" for what people have done in the 
past when something became corrupted and needed to be put back together.

As a simple suggestion for a couple of relevant measurements/traits... 
Most current video protocols are highly compressed and thus mimic noise. 
But only in the part of the data which is actually video; the protocol 
overhead tends to be regular and thus compressible/well-correlated in 
time. So while it's pretty involved, computationally speaking, it's at 
least theoretically possible to use autocorrelative analysis or local 
compressibility measures using heavy duty text compressors as a means to 
identify that sort of structure. I've thought from time to time myself 
that record structures could probably be identified via windowed Fourier 
transforms over traits derived from the data. Textual similarity scores 
based on sets of short sparsified hashes over a sliding window (like 
rsync's first pass) can work, especially if adapted to that 
deterministic framing and protocol overhead I mentioned. Or whatever.

Then there's a *ton* of combinatoric and statistical reassembly stuff 
being done, especially in the shotgun sequencing biocrowd who use these 
algorithms to reconstruct sequences (files, as it were) of DNA and 
protein. They even have edge continuity/matching based algorithms for 
reconstructing jigsaw puzzles nowadays, which have been applied in 
corporate espionage to automatically reassemble shredded documents, I 
believe.

It's just that I've never seen all of this applied to media recovery. 
The tech is evidently out there, but in case somebody's already applied 
it to broken files, I for one haven't seen it done. So either the NSA 
already hired/killed everybody doing that sort of stuff, or I'm just 
behind my usual game, or there's a dissertation or a startup lurking 
around there, somewhere. ;)
-- 
Sampo Syreeni, aka decoy - decoy at iki.fi, http://decoy.iki.fi/front
+358-50-5756111, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2