dtSearch Corp., a leading supplier of enterprise and developer text retrieval software along with document filters, announces Version 7.72 of its product line. The new version expands dtSearch’s proprietary document filters built into its text retrieval products. For customers in need of data parsing, conversion and extraction only, the dtSearch Engine (with APIs in native 64-bit/32-bit, Win/Linux C++, Java and .NET through current versions) also provides the document filters for separate OEM licensing.
Supported Data Types. dtSearch’s document filters support a broad range of data formats:
• Web-ready static data: covers integrated image and text support in HTML, XML/XSL and PDF.
• Web-based dynamic data: through the dtSearch Spider, covers integrated image and text support in PHP, ASP.NET, SharePoint, etc.
• Other databases: through the dtSearch Engine APIs, covers SQL-type databases along with the full-text of BLOB data; all products support Access, XBASE, XML, CSV, etc.
• MS Office documents: covers integrated image and text support in Word (DOC/DOCX), PowerPoint (PPT/PPTX), Excel (XLS/XLSX) and Access (MDB/ACCDB).
• Other “Office” documents and compression formats: covers PDF with integrated image and text support, RTF, OpenOffice, ZIP, RAR, GZIP/TAR, etc.
• Emails and email attachments: covers MS Exchange, Outlook (PST/MSG), Thunderbird (MBOX/EML), and other popular email types, including nested email attachments.
• Embedded image support: covers images in Word, PowerPoint, Excel, Access and RTF files, as well as Outlook and Thunderbird emails, including images in recursively embedded files.
For all supported formats, the document filters support data parsing and optional extraction, as well conversion to HTML for browser display with highlighted hits.
New in Document Filters. Version 7.72 adds OneNote (*.one) support through current versions, including support for images and documents embedded in OneNote files. The new version also expands the document filter APIs, enhancing options for text extraction from individual files, nested objects, etc.
Terabyte Indexer. dtSearch enterprise and developer products can index over a terabyte of data in a single index, spanning multiple directories, emails and attachments, online data and other databases. The products can create and search any number of indexes. Indexed search time is typically less than a second, even across terabytes of data. The product line also supports highly concurrent, multithreaded searching.
dtSearch Spider and Federated Searching. dtSearch products offer federated searching across any number of directories, emails (with nested attachments), and databases. The dtSearch Spider adds local and remote, static and dynamic online content to a search. The Spider can index sites to any level of depth, with support for public and private or secure online content, including log-ins and forms-based authentication. dtSearch products support integrated relevancy ranking with highlighted hits of data across both online and offline repositories.
25+ Search Options; Advanced Data Classification. The dtSearch product line offers over 25 search options, ranging from basic search types, to full-text and/or fielded data positive and negative variable term weighting, to special search options for forensically-recovered data. In the dtSearch Engine, a wide range of API filters and objects support additional categorization via document full-text contents, document fielded data, database content, or data attributes attached during document indexing.
International Language Support. dtSearch products support Unicode, including support for right-to-left languages, and special Chinese/Japanese/Korean character options.
Developer SDKs. The dtSearch Engine for Win & .NET and the dtSearch Engine for Linux make available dtSearch instant searching and document filters (both together with searching as well as available for separate licensing) for a wide range of Internet, Intranet and other commercial applications. SDKs include native 64-bit and 32-bit C++, Java and .NET (through current versions) APIs. For over a hundred developer case studies, please see www.dtsearch.com/casestudies.html.
Other dtSearch Products. In addition to the dtSearch Engine versions, the new release also covers dtSearch Web with Spider for quickly publishing instantly searchable data to an Internet or Intranet site, dtSearch Network with Spider for instantly searching across a network, dtSearch Publish for publishing searchable data to portable media, and dtSearch Desktop with Spider for desktop search.