Spotlight with Elasticsearch Backend
Introduction
Using Elasticsearch as search engine, is the recommended setup for any deployment.
Installation
You have to install the following components:
- Elasticsearch the search database engine itself
- fscrawler the filesystem indexing tool
Recently an alternative to fscrawler for indexing has evolved: fs2es-indexer
This is a small Python programm with low-overhead that will only index filesystem metadata, not file content.
Configuration
Elasticsearch
Elasticsearch doesn't need any specific configuration to work with Samba, once it's installed and up and running, you're ready to index your filesystems with fscrawler.
fscrawler
Please consult the fscrawler documentation to learn how to index your filesystems.
Samba
You have to set a few global options to tell Samba how to connect to Elasticsearch and you have to enable Spotlight on a per share basis.
[global] spotlight backend = elasticsearch elasticsearch:address = localhost elasticsearch:port = 9200
[share] ... spotlight = yes
See the smb.conf manpage for detailed explanation of all available parameters.
Testing
There's a handy commandline tool that works as Spotlight client: mdfind. See the manpage of mdfind for usage details.
Known issues
Results overflow fixed size response buffer
The Samba Spotlight server currently doesn't support Spotlight RPC packet fragmentation, thus the results and their metadata must fit in a fixed size 32 KB buffer. Depending on the number of requested attributed and the path lengths, this may overflow when using the default maximum allowed number of search results which is 100. Reducing this setting would work around this issue:
elasticsearch:max results = 50
Unknown Spotlight attributes and file types
When encountering an unknown attribute or type in a query expression the query parser will fail, resulting in no visible search results. It is possible to let Samba ignore such attributes and types by setting
elasticsearch:ignore unknown attribute = yes elasticsearch:ignore unknown type = yes