Last Reviewed: Auguest 20, 2007
Article: DTS0116
Applies to: dtSearch Web 6, 7
See also: How to use dtSearch or dtSearch Web with OCR
dtSearch Web can index and search PDF files, and can highlight hits in retrieved files. (There is also a developer API that provides a way to do this using ASP or ASP.NET. For more information on this API, see "Highlighting hit in PDF files" in the dtSearch Engine API reference.)
Creating PDF Files from Documents
To convert individual documents to PDF, you can use the PDF Writer tool that is included with Adobe Acrobat (www.adobe.com), or any other tool that generates PDF from documents.
Server Requirements
1. Install dtSearch Web.
2. Make sure the web server has Microsoft Internet Information Server 4 or later. Older web servers do not handle PDF files correctly.
3. Put the PDF files you want to publish in a folder that has a virtual root (or "Alias") name, like /docs or /webfiles. A subdirectory of the root folder for your web site (i.e., c:\inetpub\wwwroot\docs) will work.
4. Index the PDF files with dtSearch Desktop. See the dtSearch Quick Start for more information on how to index documents with dtSearch.
5. Use dtSearch Web's Search Form Builder to build a search form. See the dtSearch Web Quick Start for more information on setting up dtSearch Web.
Setting up the Client Machines
PDF hit highlighting works best with current versions of the web browser and Adobe Reader. The following are recommended:
- Adobe Reader 8 or later
- Internet Explorer 7 or Firefox 2
To test your client setup, you can use the dtSearch Web demo site (go to www.dtsearch.com and click on the link to the dtSearch Web Demo). Adobe Reader plug-in files are available from www.adobe.com.
Changing the Highlight Color
The color used to highlight hits in Adobe Reader is controlled by the client machine's Display Options in Windows. The specific option is the "Selected Items" color, which defaults to white-on-blue. If you change it to black-on-yellow, Adobe Reader will show highlights in yellow. (This will also make your Windows menus and list boxes use yellow highlights.) There is no way to control this from the server.
Optimizing Page Downloads
When a user clicks on a retrieved PDF file in the search results list, the user will initially see the first page of the document that contains hits. Downloads of large files are optimized so that only the page(s) that a user views are downloaded. For example, if a user clicks on a 1000-page document with a hit on page 87, page 87 will be downloaded initially and displayed. If the user navigates to the next page in Adobe Reader, page 88 will then be downloaded and displayed.