No subject

Thu Apr 11 16:32:37 PDT 2013

it would be helpful to open a discussion on how the FOMS/Xiph community
can help support the underpinnings of OCR compatibility and open,
portable-document formats (i.e., PDF and DjVu) with relatively minimal
effort.

As best as I can tell the hOCR specification has been non-maintained for
the past three years; the 'hocr-tools' project has been largely inactive
for the past four.  The spec is elegant in that is is built entirely from
HTML but there are unresolved ambiguities and a few underdeveloped
sections.  It needs a new home...

I've started the 'hocr-workflow-tools' project, making it possible to
markup images with hOCR (via Inkscape) then export-all as a
text-searchable PDF (via command-line or Inkscape):
 https://groups.google.com/forum/#!msg/hocr/CjeiE5MiqS8/iumxuSXNvRsJ
(Think text-searchable handwriting.)

I'd like to extend hocr-workflow-tools to produce DjVu files as well, as a
separate effort.  DjVu is in a unique position: the patent license granted
by LizardTech to the DjVu community only covers implementations of the
current spec; innovation is disallowed.  My understanding is that
LizardTech's key patents are expiring now or in the near future; it's a
neat, high-quality "codec" also in need of adoption (possibly).  By
contrast PDF is a proprietary specification provided by Adobe, even if
"fair" via standardization ('PDF/A' is especially helpful).

OCR inter-compatibility and portable-document formats are big deal for
libraries and law offices, other organizations too.

Sincerely,
George