Last Reviewed: January 22, 2009

Article: DTS0103

Applies to: dtSearch 7.60 and later

Supported file formats

Automatically-recognized fields

Older file formats

Image file formats

dtSearch can automatically recognize, index, search and display documents, including graphic marking of hits and multiple hit and file navigation options, in the following current formats.  HTML and PDF documents appear with all formatting and embedded images and links intact, exactly as in the original document.  dtSearch developer product can display XML files with XSL formatting.  dtSearch converts other file types to HTML for display with highlighted hits.  dtSearch uses its own built-in file viewers for document parsing and display, unless otherwise noted.  All file formats are supported through the current release versions, unless otherwise noted.

While extensions are provided to identify some file formats below, dtSearch generally does not rely on extensions to detect file formats.   For example, a Word document named "sample.mp3" would still be identified as a Word document.

Related Topics

International language support:

dtSearch supports all languages through Unicode support. See "Unicode Support" and "International Language Support".

SQL databases:

See "How to index databases with the dtSearch Engine."

Dynamically-generated content generated by ASP.NET, CMS, Sharepoint and similar products (*.jsp, *.asp, *.aspx, *.php, etc.):

See "How to use dtSearch Web with dynamically-generated web sites".

GroupWise, Lotus Notes, and other message archive formats:

See "Email conversion tools".

To use IFilters to add support for unsupported formats:

See "How to use dtSearch with IFilters".

For scanned document data that requires OCR:

See "How to use dtSearch or dtSearch Web with OCR"

Supported file formats

Adobe Acrobat (*.pdf)

Ami Pro (*.sam)

Ansi Text (*.txt)

ASCII Text (See note 3)

ASF media files (metadata only) (*.asf)

CSV (Comma-separated values) (*.csv)

DBF (*.dbf)

EBCDIC

EML files (emails saved by Outlook Express) (*.eml)

Enhanced Metafile Format (*.emf)

Eudora MBX message files (*.mbx)

Flash (*.swf)

GZIP (*.gz)

HTML (*.htm, *.html)

JPEG (*.jpg)

Lotus 1-2-3 (*.123, *.wk?)

MBOX email archives (including Thunderbird) (*.mbx)

MHT archives (HTML archives saved by Internet Explorer) (*.mht)

MIME messages

MSG files (emails saved by Outlook) (*.msg)

Microsoft Access MDB files (see note 1) (*.mdb, *.accdb)

Microsoft Document Imaging (*.mdi)

Microsoft Excel (*.xls)

Microsoft Excel 2003 XML (*.xml)

Microsoft Excel 2007 (*.xlsx)

Microsoft Outlook/Exchange (See note 2)

Microsoft Outlook Express 5 and 6 (*.dbx) message stores

Microsoft PowerPoint

Microsoft PowerPoint 2007 (*.pptx)

Microsoft Rich Text Format (*.rtf)

Microsoft Searchable Tiff (*.tiff)

Microsoft Word for DOS (*.doc)

Microsoft Word for Windows (*.doc)

Microsoft Word 2003 XML (*.xml)

Microsoft Word 2007 (*.docx)

Microsoft Works (*.wks)

MP3 (metadata only) (*.mp3)

Multimate Advantage II (*.dox)

Multimate version 4 (*.doc)

OpenOffice 2.x and 1.x documents, spreadsheets, and presentations (*.sxc, *.sxd, *.sxi, *.sxw, *.sxg, *.stc, *.sti, *.stw, *.stm, *.odt, *.ott, *.odg, *.otg, *.odp, *.otp, *.ods, *.ots, *.odf) (includes OASIS Open Document Format for Office Applications)

Quattro Pro (*.wb1, *.wb2, *.wb3, *.qpw)

QuickTime (*.mov, *.m4a, *.m4v)

TAR (*.tar)

TIFF (*.tif)

TNEF (winmail.dat files)

Treepad HJT files (*.hjt)

Unicode (UCS16, Mac or Windows byte order, or UTF-8)

Windows Metafile Format (*.wmf)

WMA media files (metadata only) (*.wma)

WMV video files (metadata only) (*.wmv)

WordPerfect 4.2 (See note 3) (*.wpd, *.wpf)

WordPerfect (5.0 and later) (*.wpd, *.wpf)

WordStar version 1, 2, 3 (See note 3) (*.ws)

WordStar versions 4, 5, 6 (*.ws)

WordStar 2000

Write (*.wri)

XBase (including FoxPro, dBase, and other XBase-compatible formats) (*.dbf)

XML (*.xml)

XML Paper Specification (*.xps) (version 7.40)

XSL

XyWrite (See note 3)

ZIP (*.zip)

Notes

[1] Databases. Beginning with version 7.54, dtSearch no longer uses ODBC or any Microsoft database drivers to index Microsoft Access files.  Earlier versions relied on ODBC to parse Access files.  Each record of a database is indexed as a separate document.  For information on indexing SQL databases, click here.

[2] Outlook and Exchange.  dtSearch Desktop can index Outlook and Exchange message stores using MAPI.  For more information, click here.

 [3] Older Word Processor Formats.  dtSearch can index and display, but cannot automatically recognize, documents in the following formats:

     WordPerfect 4.2

     WordStar versions before 4

     XyWrite

     Ascii Text

In dtSearch Desktop, click Options > Preferences > File Types tell dtSearch how to identify these types of files.

[4] Web Sites. dtSearch Desktop/Network includes a spider that can index and search dynamically-generated content or static content on web sites.  For more information, click here.

Automatically-detected fields

The dtSearch Engine automatically detects fields in the following file formats:

 

File format

Fields

Email files (Outlook Express, Eudora, MBOX, EML)

Sender, Recipient, Subject, Date, CC, BCC

Outlook items and .MSG files

Sender, Recipient, Subject, Sent Date, CC, BCC, contact fields (StreetAddress, CompanyName, etc.)

Microsoft Word, Excel, PowerPoint

Document summary information fields

OpenOffice/Open Document Format

Document properties fields

HTML

META tags; <TITLE> is indexed as HtmlTitle field; <H1>, <H2>, <H3> are indexed as HtmlH1, HtmlH2, HtmlH3, etc.

XML

All fields

DBF

All fields

CSV

All fields (CSV, or comma-separated values, files must have a .csv extension, a list of field names in the first line, and must use tab, comma, or semicolon delimiters)

PDF files

Document Properties

WordPerfect

Document summary information fields

MP3

All metadata fields

JPG, TIFF

EXIF and IPTC metadata fields; XMP (Vista) metadata supported in version 7.40

ASF, WMA, WMV

All metadata fields

 

Other File Formats

dtSearch will still index, search, and display other file formats, but they will be treated as binary file types. In other words, all binary codes, etc. will be displayed along with the text. dtSearch can also use a proprietary binary file filtering algorithm to clean up these file formats. For more information see Indexing Options in the dtSearch help file.

For legacy file types in which multiple messages or log entries are stored in one very large text file, use the dtSearch File Segmentation Rules feature to tell dtSearch how to break up the file into multiple logical subdocuments. For more information, see File Segmentation Rules in the dtSearch help file.

Image Formats

dtSearch Desktop/Network can display images in the following formats:

     BMP

     EPSF

     GIF

     IMG

     JPEG

     PCX

     PNG

     TIFF

     Targa

     WMF

     WPG (WPG version 1.0 only)

When viewing multipage images, use PgUp and PgDn to navigate between the pages. The dtSearch image viewer also includes viewing options such as Zoom In, Zoom Out, Invert, Rotate, etc.