[ogg-dev] adwantages of ogg container?
Sampo Syreeni
decoy at iki.fi
Sun Aug 29 15:55:07 PDT 2010
On 2010-08-30, Alexey Fisher wrote:
> One more question, haw about error resistance? For short time i worked
> on computer forensic project, we needed to recover some parts of
> videos from broken hard drive. There was many broken files. Are there
> any options to recover video with missing header?
The best that I know about in that department is the kind of stuff they
use in networks when trying to discern encrypted protocol flows based on
auxiliary characteristics, or the stuff utilized in steganalysis. That
is, statistical analysis of an extensive set of traits, which are
heuristically derived from understanding of the protocol flow/behavior,
and then statistically aggregated, typically using Bayesian methods.
That sort of thing works extremely well in the interactive network
environment, but I think it could be passable even in the storage
forensic one. Still, I haven't seen any real work in the area myself.
My kind of nerd at least goes after the problem directly and generally.
Map out all of the information you have, subdivide the problem into
pieces you can crank out some explicit measurement code for, rebuild it
using statistical methods, and then apply machine learning algorithms
against a corpus of examples to yield a classification model. After
that, mine the open source "literature" for what people have done in the
past when something became corrupted and needed to be put back together.
As a simple suggestion for a couple of relevant measurements/traits...
Most current video protocols are highly compressed and thus mimic noise.
But only in the part of the data which is actually video; the protocol
overhead tends to be regular and thus compressible/well-correlated in
time. So while it's pretty involved, computationally speaking, it's at
least theoretically possible to use autocorrelative analysis or local
compressibility measures using heavy duty text compressors as a means to
identify that sort of structure. I've thought from time to time myself
that record structures could probably be identified via windowed Fourier
transforms over traits derived from the data. Textual similarity scores
based on sets of short sparsified hashes over a sliding window (like
rsync's first pass) can work, especially if adapted to that
deterministic framing and protocol overhead I mentioned. Or whatever.
Then there's a *ton* of combinatoric and statistical reassembly stuff
being done, especially in the shotgun sequencing biocrowd who use these
algorithms to reconstruct sequences (files, as it were) of DNA and
protein. They even have edge continuity/matching based algorithms for
reconstructing jigsaw puzzles nowadays, which have been applied in
corporate espionage to automatically reassemble shredded documents, I
believe.
It's just that I've never seen all of this applied to media recovery.
The tech is evidently out there, but in case somebody's already applied
it to broken files, I for one haven't seen it done. So either the NSA
already hired/killed everybody doing that sort of stuff, or I'm just
behind my usual game, or there's a dissertation or a startup lurking
around there, somewhere. ;)
--
Sampo Syreeni, aka decoy - decoy at iki.fi, http://decoy.iki.fi/front
+358-50-5756111, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
More information about the ogg-dev
mailing list