Problem:
We have created a Indexing Catalog in Microsoft’s Indexing Service and
have included the folders to be indexed. We did a merge and the
restarted the Indexing Service to enable the catalog. A server side
script written in ASP and/or ASP.Net is written to perform search and
display the results. We are able to see results with file extensions
HTML, HTM, ASP, ASPX, DOC, XLS, and PPT. There are many PDF files in
the website and we are not able to see any results from PDF files.
Solution:
You will have to install IFilter for PDF file indexing. The
Adobe PDF IFilter v6.0 is close to 10 MB and installing this IFilter and doing a catalog re-indexing will solve the issue.
The Microsoft’s Indexing Service by default implements the following IFilter listed below:
-
MIME Filter (mimefilt.dll) - Multipurpose Internet Mail Extensions (MIME) - .eml and .nws
-
HTML Filter (nlhtml.dll) - HTML 3.0 or earlier - .htm, .html, .asp, .aspx
-
Microsoft Office Document Filter (offfilt.dll) - Word, Excel, Microsoft PowerPoint® - .doc, .mdb, .ppt, and .xlt
-
Default or Plain Text Filter (query.dll) - Plain text files, Default
Filter - retrieves only the system properties like FileName,
LastWriteTime, FileSize, and Attributes
-
Binary or Null Filter (query.dll) - Binary files, Null Filter -
retrieves only the system properties like FileName, LastWriteTime,
FileSize, and Attributes
You can search the Internet to get readily available IFilter for JPEG,
GIF, ZIP, RAR, MS Project, MS Visio, MHTML etc. You can even opt to
write a custom IFilter for any file formats.