Spotlight with Elasticsearch Backend

From SambaWiki

Introduction

Using Elasticsearch as search engine, is the recommended setup for any deployment.

Installation

You have to install the following components:

Recently an alternative to fscrawler for indexing has evolved: fs2es-indexer

This is a small Python programm with low-overhead that will only index filesystem metadata, not file content.

Configuration

Elasticsearch

Elasticsearch doesn't need any specific configuration to work with Samba, once it's installed and up and running, you're ready to index your filesystems with fscrawler.

fscrawler

Please consult the fscrawler documentation to learn how to index your filesystems.

Samba

You have to set a few global options to tell Samba how to connect to Elasticsearch and you have to enable Spotlight on a per share basis.

   [global]
   spotlight backend = elasticsearch
   elasticsearch:address = localhost
   elasticsearch:port = 9200
   [share]
   ...
   spotlight = yes

See the smb.conf manpage for detailed explanation of all available parameters.

Testing

There's a handy commandline tool that works as Spotlight client: mdfind. See the manpage of mdfind for usage details.

Known issues

Results overflow fixed size response buffer

The Samba Spotlight server currently doesn't support Spotlight RPC packet fragmentation, thus the results and their metadata must fit in a fixed size 32 KB buffer. Depending on the number of requested attributed and the path lengths, this may overflow when using the default maximum allowed number of search results which is 100. Reducing this setting would work around this issue:

   elasticsearch:max results = 50

Unknown Spotlight attributes and file types

When encountering an unknown attribute or type in a query expression the query parser will fail, resulting in no visible search results. It is possible to let Samba ignore such attributes and types by setting

   elasticsearch:ignore unknown attribute = yes
   elasticsearch:ignore unknown type = yes