Statistics

Total Posts: 34
This Year: 0
This Month: 0
This Week: 0
Comments: 174


RSS 2.0   SocialTwist Tell-a-Friend


Admin

Sign In

Navigation


Recent Posts


On this page....

Why we are not able to Index content in a PDF file using Microsoft’s Indexing Service?

Archives

 Full Archives By Category
 2007 Calendar View
<September 2010>
SunMonTueWedThuFriSat
2930311234
567891011
12131415161718
19202122232425
262728293012
3456789

Categories

CDOSYS (1) Classic ASP (10) Command Line (2) Databases (16) Excel (1) HTML (1) IIS (10) Indexing Service (1) Internet Explorer (7) Media Streaming (1) MS.Net (2) SQA (7) SQL Server (16) Windows OS (2)

Blogroll - Fav Blogs


Acknowledgments

DasBlog Theme Design by: Tom Watts
E-mail: Send mail to the author(s)
Theme Image by: dreamLogic

Disclaimer

The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

Technology Blog

Problem:

We have created a Indexing Catalog in Microsoft’s Indexing Service and have included the folders to be indexed. We did a merge and the restarted the Indexing Service to enable the catalog. A server side script written in ASP and/or ASP.Net is written to perform search and display the results. We are able to see results with file extensions HTML, HTM, ASP, ASPX, DOC, XLS, and PPT. There are many PDF files in the website and we are not able to see any results from PDF files.

Solution:

You will have to install IFilter for PDF file indexing. The Adobe PDF IFilter v6.0 is close to 10 MB and installing this IFilter and doing a catalog re-indexing will solve the issue.

The Microsoft’s Indexing Service by default implements the following IFilter listed below:

  • MIME Filter (mimefilt.dll) - Multipurpose Internet Mail Extensions (MIME) - .eml and .nws
  • HTML Filter (nlhtml.dll) - HTML 3.0 or earlier - .htm, .html, .asp, .aspx
  • Microsoft Office Document Filter (offfilt.dll) - Word, Excel, Microsoft PowerPoint® - .doc, .mdb, .ppt, and .xlt
  • Default or Plain Text Filter (query.dll) - Plain text files, Default Filter - retrieves only the system properties like FileName, LastWriteTime, FileSize, and Attributes
  • Binary or Null Filter (query.dll) - Binary files, Null Filter - retrieves only the system properties like FileName, LastWriteTime, FileSize, and Attributes

You can search the Internet to get readily available IFilter for JPEG, GIF, ZIP, RAR, MS Project, MS Visio, MHTML etc. You can even opt to write a custom IFilter for any file formats.