WSP: Difference between revisions

From SambaWiki
No edit summary
Line 174: Line 174:
wspsearch -U$user%$password //$host/$share --kind picture
wspsearch -U$user%$password //$host/$share --kind picture


=securing indexed content=
=Securing indexed content=


It is important to understand that the samba WSP server only translates the WSP protocol messages that make up a query and/or traversal and retrieval of results into requests that elastic/opensearch server can understand. Those requests are made against elastic/opensearch using either the anonymous user, basic user or an api key (only for elasticsearch)
It is important to understand that the samba WSP server only translates the WSP protocol messages that make up a query and/or traversal and retrieval of results into requests that elastic/opensearch server can understand. Those requests are made against elastic/opensearch using either the anonymous user, basic user or an api key (only for elasticsearch)


samba WSP communicats with elastic/openseach using encrypted (or plain text) http connections either using anonymous access or basic authentication.
samba WSP communicates with elastic/openseach using encrypted (or plain text) http connections either using anonymous access or basic authentication.


elastic/opensearch provide for authentication and access control via their own security so it is the permissions of the authenticated elastic/opensearch user (or more precisely the role associated with that user) that determines what information can be retrieved and not (at least directly) the authenticated samba user. (note: when using basic authentication (or api keys) there is a mapping definined between the authenticated samba user and the authenticated elastic/opensearch user)
elastic/opensearch provide for authentication and access control via their own security so it is the permissions of the authenticated elastic/opensearch user (or more precisely the role associated with that user) that determines what information can be retrieved and not (at least directly) the authenticated samba user. (note: when using basic authentication (or api keys) there is a mapping defined between the authenticated samba user and the authenticated elastic/opensearch user)


populating the index is a separate process, fwcrawler is a tool/process for scanning files (of many different types, videos, photographs, documents etc.) and populating an elastic/opensearch index with the metadata that it has extracted from those files. In order to access the files fscrawler of course needs to run as a user that has permission to read those files. The fwcrawler configuration can have elastic/opensearch credentials defined that will be used to communicate with the elastic/opensearch instance or those credentials can be passed on the command line
populating the index is a separate process, fwcrawler is a tool/process for scanning files (of many different types, videos, photographs, documents etc.) and populating an elastic/opensearch index with the metadata that it has extracted from those files. In order to access the files fscrawler of course needs to run as a user that has permission to read those files. The fwcrawler configuration can have elastic/opensearch credentials defined that will be used to communicate with the elastic/opensearch instance or those credentials can be passed on the command line
Line 194: Line 194:


The samba WSP server is however able to acl check the files in the results and filter out results that the authenticated samba user cannot read, this can be enabled with the following setting
The samba WSP server is however able to acl check the files in the results and filter out results that the authenticated samba user cannot read, this can be enabled with the following setting

<code>
<code>
elasticsearch:wsp_acl_filtering [G]
elasticsearch:wsp_acl_filtering [G]
Line 201: Line 202:


* query results are now cached,
* query results are now cached,
* you MUST specificy a limit on the number of results returned (cached) elasticsearch:max results [S] or the query will fail.
* you MUST specify a limit on the number of results returned (cached)
:elasticsearch:max results [S] or the query will fail.
* depending on the data stored in the indexes searches could be take a much greater time than expected, for example a search might yield a large amount of results but if the results that satisfy the acl check are mostly the end of the result set then a very large amount of results (or maybe even all the results) will need to be tested before the 'max results' to be cached is reached
* depending on the data stored in the indexes searches could be take a much greater time than expected, for example a search might yield a large amount of results but if the results that satisfy the acl check are mostly the end of the result set then a very large amount of results (or maybe even all the results) will need to be tested before the 'max results' to be cached is reached
* because results are now cached the memory footprint will be impacted and many concurrent searches could affect available memory of the host system
* because results are now cached the memory footprint will be impacted and many concurrent searches could affect available memory of the host system
Line 207: Line 209:
If at all possible it is better to NOT enable acl filtering. That way the search results don't need to be additionally filtered however the flip side is;
If at all possible it is better to NOT enable acl filtering. That way the search results don't need to be additionally filtered however the flip side is;


* filesnames and paths not readable to the authenticated samba user MAY be returned as results of a search (possible information leak)
* filenames and paths not readable to the authenticated samba user MAY be returned as results of a search (possible information leak)
* similarly information such as keywords or content from a document result may be exposed to a user that otherwise might not have permission to access that information
* similarly information such as keywords or content from a document result may be exposed to a user that otherwise might not have permission to access that information


Line 223: Line 225:


for an api key creation result like
for an api key creation result like
<code>

{
{
"id" : "23mvj5IBhjGmVofgCu1p",
"id" : "23mvj5IBhjGmVofgCu1p",
Line 230: Line 232:
"encoded" : "MjNtdmo1SUJoakdtVm9mZ0N1MXA6WlhmUm9ua0lURXlmX2RCSGtfNG9fZw=="
"encoded" : "MjNtdmo1SUJoakdtVm9mZ0N1MXA6WlhmUm9ua0lURXlmX2RCSGtfNG9fZw=="
}
}
</code>

you would store
you would store



Revision as of 14:40, 17 October 2024

WSP (Windows Search Protocol) support in samba

support in samba

  • since samba-4.20 samba ships a command line client for searching using the WSP protocol. The 'wspsearch' cli client does not work against a samba server as it currently does not implement the WSP protocol

WSP server support

  • The WSP protocol is not supported upstream in samba yet. However, there are a couple of upstream merge requests currently open
1. Support rawpipe services (servers using named pipes but not using the dcerpc protocol) allowing them to be managed in the same way as dcerpc servers are. See here
2. Allow mapping between authenticated samba user and elastic/opensearch basic user. The allows samba using (spotlight or WSP) to authenticate over http with a basic elastic/opensearch user. See here
3 A merge request with the WSP stand alone server code.

can I try it out

Yes you can, if you are willing you can build from a git branch. Note: This branch is based off the current samba-4.21.1 branch with all of the merge requests above combined together.

  git clone git://git.samba.org/npower/samba.git samba-wsp
  cd samba-wsp
  git checkout -b current_wsp_421_wip origin/current_wsp_421_wip
  ./configure.developer # (and install all the dependencies)
  make install

WSP running and testing using elasticsearch

install elasticsearch

using elasticsearch-8.15.2 (latest version at time of writing)

rpm -ivh elasticsearch-8.15.2-x86_64.rpm

take note of the generated built-in superuser 'elastic' (output as part of the rpm install)

if desired change the generated superuser password

/usr/share/elasticsearch/bin/elasticsearch-reset-password -u elastic

start it

systemctl daemon-reload
systemctl start elasticsearch.service

check if is running

systemctrl status elasticsearch.service

check communication

 curl -k -uelastic:elastic https://127.0.0.1:9200

should respond with

{
 "name" : "localhost.localdomain",
 "cluster_name" : "elasticsearch",
 "cluster_uuid" : "n-AXwOQeTOSddb_p3UXsUQ",
 "version" : {
   "number" : "8.15.2",
   "build_flavor" : "default",
   "build_type" : "rpm",
   "build_hash" : "98adf7bf6bb69b66ab95b761c9e5aadb0bb059a3",
   "build_date" : "2024-09-19T10:06:03.564235954Z",
   "build_snapshot" : false,
   "lucene_version" : "9.11.1",
   "minimum_wire_compatibility_version" : "7.17.0",
   "minimum_index_compatibility_version" : "7.0.0"
 },
 "tagline" : "You Know, for Search"
}

configure elasticsearch

Note: This is a developer setup, not suitable for production, please refer to the elasticsearch documentation for specific information about securing elasticsearch

  • disable ssl
for testing it is convenient to be able to easily see the communication between samba and elasticsearch unencrypted, of course ssl can be re-enabled after a working setup has been established.

in /etc/elasticsearch/elasticsearch.yml:

xpack.security.http.ssl:
-  enabled: true
+  enabled: false

WSP running and testing using opensearch

install opensearch

using opensearch-2.15.0 (latest version at time of writing)

OPENSEARCH_INITIAL_ADMIN_PASSWORD=1234?Changeme rpm -ivh opensearch-2.15.0-linux-x64.rpm

start it

systemctl daemon-reload
systemctl start opensearch.service

check if is running

systemctrl status opensearch.service

check communication

 curl -k -uadmin:1234?Changeme https://127.0.0.1:9200

should respond with

{
 "name" : "localhost.localdomain",
 "cluster_name" : "opensearch",
 "cluster_uuid" : "6fJA5WMmSiK2wc4rHdkVvw",
 "version" : {
   "number" : "7.10.2",
   "build_type" : "rpm",
   "build_hash" : "61dbcd0795c9bfe9b81e5762175414bc38bbcadf",
   "build_date" : "2024-06-20T03:27:31.591886152Z",
   "build_snapshot" : false,
   "lucene_version" : "9.10.0",
   "minimum_wire_compatibility_version" : "7.10.0",
   "minimum_index_compatibility_version" : "7.0.0"
 },
 "tagline" : "The OpenSearch Project: https://opensearch.org/"
}

configure opensearch

Note: This is a developer setup, not suitable for production, please refer to the opensearch documentation for specific information about securing opensearch

  • disable ssl
for testing it is convenient to be able to easily see the communication between samba and opensearch unencrypted, of course ssl can be re-enabled after a working setup has been established.

in /etc/opensearch/opensearch.yml:

-plugins.security.ssl.http.enabled: false
+plugins.security.ssl.http.enabled: true
  • allow fscrawler to talk to opensearch
Add following line to /etc/opensearch/opensearch.yml
compatibility.override_main_response_version: true (to allow fscrawler to communicate with opensearch)

Use fscrawler to index files for elasticsearch or opensearch

using latest fscrawler version 2.10 (at time of writing)

  • identify (or create) some locations on the filesystem (which are accessible from samba shares) that have content you would like to index
  • install fscrawler
unzip fscrawler-distribution-2.10-20240702.144319-374.zip
  • create a user to use to communicate with elasticsearch or opensearch to populate the index
Here we will use the 'elastic' user that comes already setup with elaticsearch. Note: the elasticsearch 'elastic' user is a super user. You might want to consider creating a specific elasticsearch user for fscrawler to use that has appropriate roles assigned. e.g. with 'just enough' privileges to access the index(s) you want to create/modify. Same applies to opensearch

for elasticsearch see creating users, creating roles, creating roles, creating API key similarly for opensearch see here and associated documentation.

  • use fscrawler to create an index
 ./fscrawler-distribution-2.10-SNAPSHOT/bin/fscrawler index_name

you will be prompted (if this is the first time to run the command)

INFO  [f.console] job [index_name] does not exist
INFO  [f.console] Do you want to create it (Y/N)?
answer 'Y'
  • edit the config file ~/.fscrawler/index_name/_settings.yaml created in the last step
  • configure path fscrawler to index
url: "/path/to/index"
  • setup fscrawler to disable ssl when communicating with opensearch
- url: "https://127.0.0.1:9200"
- url: "http://127.0.0.1:9200"

-  ssl_verification: true
+  ssl_verification: false
  • don't stop on error (otherwise any problem indexing a specific file will stop the indexing process)
-  continue_on_error: false
+  continue_on_error: true
  • setup optional stuff
-  attributes_support: false
-  raw_metadata: false
+  attributes_support: true
+  raw_metadata: true
  • run fscrawler again
./fscrawler-distribution-2.10-SNAPSHOT/bin/fscrawler index_name --username admin --loop 1

Configure WSP for samba

use the following global configuration

wsp backend = elasticsearch
elasticsearch:auth=credfile
elasticsearch:auth_file=/etc/samba/usercreds.txt
elasticsearch:wsp_acl_filtering=true

use the following share configuration

wsp = true
elasticsearch:index = index_name
elasticsearch:max results = 200

credfile format

samba_user:opensearch_user%password

'*' can be used in place of a 'samba user' to match all currently unmatched samba users

example

*:admin%1234?Changeme

will map all previously unmapped (in credfile) users to admin note: as the credentials are stored in a local file (which should be root rw only) the opensearch users defined in the credfile should have the most restrictive privileges possible (and no write permissions)

start samba

systemctrl start smb.service

use wspsearch cli or windows client to search for content (e.g. pictures)

wspsearch -U$user%$password //$host/$share --kind picture

Securing indexed content

It is important to understand that the samba WSP server only translates the WSP protocol messages that make up a query and/or traversal and retrieval of results into requests that elastic/opensearch server can understand. Those requests are made against elastic/opensearch using either the anonymous user, basic user or an api key (only for elasticsearch)

samba WSP communicates with elastic/openseach using encrypted (or plain text) http connections either using anonymous access or basic authentication.

elastic/opensearch provide for authentication and access control via their own security so it is the permissions of the authenticated elastic/opensearch user (or more precisely the role associated with that user) that determines what information can be retrieved and not (at least directly) the authenticated samba user. (note: when using basic authentication (or api keys) there is a mapping defined between the authenticated samba user and the authenticated elastic/opensearch user)

populating the index is a separate process, fwcrawler is a tool/process for scanning files (of many different types, videos, photographs, documents etc.) and populating an elastic/opensearch index with the metadata that it has extracted from those files. In order to access the files fscrawler of course needs to run as a user that has permission to read those files. The fwcrawler configuration can have elastic/opensearch credentials defined that will be used to communicate with the elastic/opensearch instance or those credentials can be passed on the command line

so the users involved in searching and retrieving results are

a) the local user on the host machine (where the files to be indexed are located) that fscrawler runs as to read the files to produce meta data which will be used to populate the index b) the elastic/opensearch user used by fscrawler to populate the elastic/opensearch index c) the elastic/opensearch user used by the samba WSP server to query the elastic/opensearch index d) the authenticated samba user to perform the search (who may or maynot have permission to access the filenames returned by the search)

There is therefore a likely disconnect between the user used to populate the index and the user used to search that index.

The samba WSP server is however able to acl check the files in the results and filter out results that the authenticated samba user cannot read, this can be enabled with the following setting

   elasticsearch:wsp_acl_filtering [G]

Enabling acl filtering brings with it some penalties

  • query results are now cached,
  • you MUST specify a limit on the number of results returned (cached)
elasticsearch:max results [S] or the query will fail.
  • depending on the data stored in the indexes searches could be take a much greater time than expected, for example a search might yield a large amount of results but if the results that satisfy the acl check are mostly the end of the result set then a very large amount of results (or maybe even all the results) will need to be tested before the 'max results' to be cached is reached
  • because results are now cached the memory footprint will be impacted and many concurrent searches could affect available memory of the host system

If at all possible it is better to NOT enable acl filtering. That way the search results don't need to be additionally filtered however the flip side is;

  • filenames and paths not readable to the authenticated samba user MAY be returned as results of a search (possible information leak)
  • similarly information such as keywords or content from a document result may be exposed to a user that otherwise might not have permission to access that information

The user to authenticate against elastic/opensearch is specified by the mapping between the authenticated samba user and an elastic/opensearch basic user. This mapping is stored in a credential file which is controlled by the following configuration

elastic/opensearch elasticsearch:auth=credfile [G]

       elasticsearch:auth_file=path_to_mapping_file [G]

or (elasticsearch only) elasticsearch:auth=apikey [G]

       elasticsearch:auth_file=path_to_mapping_file [G]
   in the case of the apikey the details of the key are stored similary to credfile, inplace of username use id and inplace of password use api_key; e.g

for an api key creation result like {

 "id" : "23mvj5IBhjGmVofgCu1p",
 "name" : "wspserver",
 "api_key" : "ZXfRonkITEyf_dBHk_4o_g",
 "encoded" : "MjNtdmo1SUJoakdtVm9mZ0N1MXA6WlhmUm9ua0lURXlmX2RCSGtfNG9fZw=="

} you would store

 *:23mvj5IBhjGmVofgCu1p%ZXfRonkITEyf_dBHk_4o_g

to map all authenticated samba users to use that apikey

for more info about creating api keys please see https://www.elastic.co/guide/en/elasticsearch/reference/current/security-api-create-api-key.html

Using elastic/opensearch to limit results

elastic/opensearch users have roles associated with them, those roles can be configured to limit or allow access to specific indexes. so it is possible to limit access to the information stored in an index based on the role associated with the authenticated elastic/opensearch user. The authenticated elastic/opensearch user to be used to communicate with elastic/openseach can be mapped (see above)

Added granularity can be added by using index names that are user specific, e.g the index name configured to used by samba when sending a query to elasticsearch can use variable substitutions so the index name could for example be based on a variation of the username of the authenticated samba user (this is suitable for an index dedicated to specific share that is a personal data store for that user)

the role associated with the authentication elastic/opensearch user can also be modified to use field level security which allows sensitive fields to included or excluded from the results of a query

elasticsearch:

   see https://www.elastic.co/guide/en/elasticsearch/reference/current/document-level-security.html
   see https://www.elastic.co/guide/en/elasticsearch/reference/current/field-level-security.html

opensearch:

   see https://opensearch.org/docs/latest/security/access-control/document-level-security
   see https://opensearch.org/docs/latest/security/access-control/field-level-security/

Additionally to align the users that have access to a share (and it's files) you could use a combination of 'force user', 'valid groups' or 'valid users' to limit access to shares to be searched with elastic/opensearch. This for example coule be used to ensure there is a match between the files 'crawled' using a user with a certain group and the group used to access the share (and it's files).