Part I:  Indexing
 

Our overall goal is to add a company logo to the first page of every invoice in the file invoice.pcl.  Because the invoices vary in page length, we don’t know which pages are the first.  This example will show how to use SwiftConvert to index invoice.pcl and identify the first page of each individual invoice.

Note - some PCL files produced by printing on Windows contain "scrambled text" that cannot be indexed with SwiftConvert.  SwiftPublish can be used to overcome this problem.

There are three steps to indexing a document:

1) Determine a Unique Identifier

Before we can index a document we must determine what we can use to categorize the document.  This must be done manually by analyzing incoming documents, and assumes the documents have some type of uniformity.   A unique identifier can be a string of text that is in the same location (e.g a form field) on the pages you want to identify, in our case: the first page.  Look carefully; this could be as simple as a page number, header, or a word that always appears in one location.

View invoice.pcl.  In the example all of our invoices are uniform in layout.  After looking at the file you will see it is easy to find a unique identifier: the string "Invoice#NNNNN", e.g. "Invoice#20001" on the first page, is in the same location on the first page of each individual invoice.  While you are viewing the file, save it to your computer for the next steps.  Use the "save" button in SwiftView Pro to save the file as Type "Original PCL5 File" (the actual original file).

2) Find the exact coordinates of the Unique Identifier

To find the coordinates, use the handy dlltest.exe program included in the development package download for the SwiftInside DLL.

Download and run this installer, then run c:\program files\SwiftView\dlltext.exe.  In the "Command" line, enter:
      ldoc invoice.pcl
      draw wide

Click the "Size" button, and draw a rubber box with the mouse around the string "Invoice#20001".  The window at the bottom will show a string like
      UL: 6.241556, 1.035101; LR 7.295382, 1.168056
You will use these four numbers (i.e. two x-y positions) in the next step.

3)  Find which pages contain the Unique Identifier

With the coordinates of the Unique Identifier we can use SwiftConvert to tell us which pages contain text in this exact location.

Now that we know the exact X Y coordinates of the Unique Identifier we can tell SwiftConvert to parse through the file and tell us what pages contain the desired text value of the identifier.  Run SwiftConvert using the following command:

sview –c”ldoc invoice.pcl| save TEXTPOS all c:\indexfile.txt rcut 6.241556 1.035101 7.295382 1.168056 onefile”

This command tells SwiftConvert to open the file invoice.pcl, and save the textpositioning info that appears in the specified coordinates on every page.  It is important that we use TEXTPOS and not TEXT because it will report the page number as well as the text.

The result is c:\indexfile.txt which contains:

page 1
6.253 1.027 7.25 1.191 Invoice#20001
page 2
page 3
page 4
6.253 1.027 7.25 1.191 Invoice#20002
page 5
page 6
6.253 1.027 7.25 1.191 Invoice#20003
page 7
page 8
page 9
page 10
6.253 1.027 7.25 1.191 Invoice#20004
page 11

This is the resulting index file.  The page number is listed, followed by the positioning info and text if there was a string in the specified coordinates on that page.  This output file tells us that pages 1,4,6, and 10 all contain the string in that specific location.  Therefore, those are the first pages of each individual document.

One could also conclude from this information that the first invoice is 3 pages (pages 1-3) the second invoice is 2 pages (pages 4-5) the third invoice is 4 pages (pages 6-9), and the fourth is 2 pages (pages 10-11).  This information may be useful for sorting documents by page length, such as to determine envelope thickness when the documents are printed and enveloped.

Part II: Markup



SwiftView®, SwiftConvert, SwiftStamp, SwiftExtract, SwiftReprint, SwiftPublish, and LoanDocs®, are trademarks of eLynx, Ltd.
SwiftView, a division of eLynx, 15605 SW 72nd Ave Portland, OR 97224 USA
800-304-5941 or 971-223-2600
  ©2008 eLynx, Ltd.