|
Our overall goal is to add a company logo to the first page of every
invoice in the file invoice.pcl. Because
the invoices vary in page length, we don’t know which pages are the first.
This example will show how to use SwiftConvert to index invoice.pcl
and
identify the first page of each individual invoice.
Note - some PCL files produced by printing on Windows contain "scrambled
text" that cannot be indexed with SwiftConvert.
SwiftPublish can be used to overcome this problem.
There are three steps to indexing a document:
1) Determine a Unique Identifier
Before we can index a document we must determine what we can use to
categorize the document. This must be done manually by analyzing
incoming documents, and assumes the documents have some type of uniformity.
A unique identifier can be a string of text that is in the same location (e.g a form field) on
the pages you want to identify, in our case: the first page. Look
carefully; this could be as simple as a page number, header, or a word
that always appears in one location.
View invoice.pcl. In the example all of our
invoices are uniform in layout. After looking at the file you will
see it is easy to find a unique identifier: the string "Invoice#NNNNN", e.g. "Invoice#20001" on the first page, is in the
same location on the first page of each individual invoice. While you are viewing the file,
save it to your computer for the next steps.
Use the "save" button in SwiftView Pro to save the file as Type "Original PCL5 File" (the actual original file).
2) Find the exact coordinates of the Unique Identifier
To find the coordinates, use the handy dlltest.exe program included in the
development package download
for the SwiftInside DLL.
Download and run this installer, then run c:\program files\SwiftView\dlltext.exe.
In the "Command" line, enter:
ldoc invoice.pcl
draw wide
Click the "Size" button, and draw a rubber box with the mouse around the string "Invoice#20001". The window at the bottom will show a string like
UL: 6.241556, 1.035101; LR 7.295382, 1.168056
You will use these four numbers (i.e. two x-y positions) in the next step.
3) Find which pages contain the Unique Identifier
With the coordinates of the Unique Identifier we can use SwiftConvert
to tell us which pages contain text in this exact location.
Now that we know the exact X Y coordinates of the Unique Identifier
we can tell SwiftConvert to parse through the file and tell us what pages
contain the desired text value of the identifier.
Run SwiftConvert using the following command:
sview –c”ldoc invoice.pcl|
save TEXTPOS all c:\indexfile.txt rcut 6.241556 1.035101 7.295382 1.168056 onefile”
This command tells SwiftConvert to open the file invoice.pcl, and save
the textpositioning info that appears in the specified coordinates on every
page. It is important that we use TEXTPOS
and not TEXT
because it will report the page number as well as the text.
The result is c:\indexfile.txt which contains:
page 1
6.253 1.027 7.25 1.191
Invoice#20001
page 2
page 3
page 4
6.253 1.027 7.25 1.191
Invoice#20002
page 5
page 6
6.253 1.027 7.25 1.191
Invoice#20003
page 7
page 8
page 9
page 10
6.253 1.027 7.25 1.191
Invoice#20004
page 11
This is the resulting index file. The page number is listed, followed
by the positioning info and text if there was a string in the specified
coordinates on that page. This output file tells us that pages 1,4,6,
and 10 all contain the string in that specific location. Therefore,
those are the first pages of each individual document.
One could also conclude from this information that the first invoice
is 3 pages (pages 1-3) the second invoice is 2 pages (pages 4-5) the third
invoice is 4 pages (pages 6-9), and the fourth is 2 pages (pages 10-11).
This information may be useful for sorting documents by page length, such
as to determine envelope thickness when the documents are printed and enveloped.
Part II: Markup
|