Re: SANE V2 - again...

Tom Martone (tom@martoneconsulting.com)
Mon, 30 Aug 1999 22:24:52 -0400

abel deuring wrote:
>
> Oliver Rauch wrote:
> >
> > Nick Lamb wrote:
> >
> > > Should all these data be
> > > > considered as one data stream by the backend / frontend API,
> > > > or should there be provisions to allow multiple data
> > > > streams? I am sure that Tom Martone and other people working
> > > > with this class of scanners have already thought about it,
> > > > so I would like hear comments from them.
> > >
> > > There is a provision for multiple data frames, so the question
> > > is: Should we use it for this in SANE 2.0, and I think the
> > > answer has to be "Yes", unless there's a better idea.
> > >
> > > (SANE 1.0 doesn't use multiple data frames for this purpose,
> > > but it does provide them for several other reasons anyway)
> > >
> >
> > Hm, can anyone tell me for what that shall be good?
>
> Hi Oliver,
>
> I simply wondered how Tom's backend/frontend combination handles the
> different types and sets of data (ASCII/pixel data) for one scan, and if
> there should be any kind of hints given by the backend to the frontend,
> how to handle these sets of data: should they be stored just in one
> file, or in several files, for example.
The backend/frontend combination handles the different types of data
by lying about it, basically. Both the compressed image data (ccitt-g4)
and the text decoded barcode information are sent in a SANE_FRAME_GRAY
frame. But the frontend has a --raw commandline option which disables
the writing of the pbm header to the file, so you just get the raw
data written out.

I'd rather be able to tell the truth, so to speak, and give a proper
hint as to the format of the frame. Then each frontend could choose
to do with it what it thought was best, but that the baseline
functionality would be to pass it uninterpreted onwards like the
behavior I get with the --raw commandline option.

The frontend, scanadf, allows you to specify a scan script which gets
forked off for each image aquired and this allows the user/integrator
great flexibility in doing stuff with each captured file. It provides
a nice separation between the basic frontend and the specifics of
a particular application's requirements. What I typically do with
the g4 data is to convert it to a full-fledged tiff file using a
simple utility called g42tiff, which is a slightly modified
version of fax2tiff from the tools within Sam Leffler's libtiff code.
Our imaging archive system uses tiff as its file format of choice.
You could also scan without compression, getting true _GRAY data
and have the scan script use pnmtotiff to get the same result. It
just seems nice to have the data compressed in the firmware of the
scanner and have a much smaller amount of data flow across the SCSI

bus and through the software. Any savings here would be more noticeable
if you were going through saned/net as well.

Now getting at the barcode data is a different matter. Basically the
encoded data is to be associated with the image in the "document
database" which provides the infrastructure to support flexible
searches for retrieval. In one case, the encoded data is an employee
identifier - the employee who signed and returned the document. This
allows the document (image) to be associated with that person in a
relational database. Then it is a trivial matter to collect all the
documents for a person, etc. The barcoding technique helps to eliminate
a manual data entry process and is quite desireable in terms of labor
savings.

So for barcodes the scan script pulls out the decoded data and stores
it in a index file which is used to update the database. All this
happens during the scan process which streamlines things and allows
for good throughput.

Now there might be more sophisticated ways of associating a series
of data streams (frames) together as being from the same page, but
I don't really see a dire need for this. As long as the frames
arrive in a well-defined (by the backend), predictable manner,
a custom scan script should be able to make the association simply
by the sequence. The front end really doesn't know about this at
all, and that's all right, the job gets done.

I hope this is the information you were seeking. I'm not quite sure
I understood you completely regarding the multiple data frames, so I
may have missed something here.

Tom Martone

--
Source code, list archive, and docs: http://www.mostang.com/sane/
To unsubscribe: echo unsubscribe sane-devel | mail majordomo@mostang.com