No subject
Thu Apr 11 16:32:37 PDT 2013
it would be helpful to open a discussion on how the FOMS/Xiph community
can help support the underpinnings of OCR compatibility and open,
portable-document formats (i.e., PDF and DjVu) with relatively minimal
effort.
As best as I can tell the hOCR specification has been non-maintained for
the past three years; the 'hocr-tools' project has been largely inactive
for the past four. The spec is elegant in that is is built entirely from
HTML but there are unresolved ambiguities and a few underdeveloped
sections. It needs a new home...
I've started the 'hocr-workflow-tools' project, making it possible to
markup images with hOCR (via Inkscape) then export-all as a
text-searchable PDF (via command-line or Inkscape):
https://groups.google.com/forum/#!msg/hocr/CjeiE5MiqS8/iumxuSXNvRsJ
(Think text-searchable handwriting.)
I'd like to extend hocr-workflow-tools to produce DjVu files as well, as a
separate effort. DjVu is in a unique position: the patent license granted
by LizardTech to the DjVu community only covers implementations of the
current spec; innovation is disallowed. My understanding is that
LizardTech's key patents are expiring now or in the near future; it's a
neat, high-quality "codec" also in need of adoption (possibly). By
contrast PDF is a proprietary specification provided by Adobe, even if
"fair" via standardization ('PDF/A' is especially helpful).
OCR inter-compatibility and portable-document formats are big deal for
libraries and law offices, other organizations too.
Sincerely,
George
More information about the foms
mailing list