SoC/Ideas: Difference between revisions

From SambaWiki
(Update dbwrap back-end for Ceph RADOS - mention new RocksDB librados integration)
m (remove myself from printing protocol task)
Line 93: Line 93:
*Difficulty: Medium, Hard
*Difficulty: Medium, Hard
*Language(s): C
*Language(s): C
*Possible Mentors: Andreas Schneider, David Disseldorp
*Possible Mentors: Andreas Schneider


===dbwrap back-end for Ceph RADOS key-value storage===
===dbwrap back-end for Ceph RADOS key-value storage===

Revision as of 13:05, 17 January 2017

Google Summer of Code: Suggested Project ideas

The following are the Samba project ideas for Summer of Code. Of course you are free to come up with ideas not listed here. Please discuss the your planned project by either joining us on irc://irc.freenode.net/#samba-technical or by sending email to samba-technical@lists.samba.org

Most of our projects will require C programming skills, but the Samba section has a couple of Python projects.

Samba

Some additional possible GSoC topics can be found in Bugzilla in the form of bugs which are marked as "Feature request": here. Questions regarding complexity and requirements should be directed to the technical mailing list.


Utilize libsmbclient server-side copy support in file managers

With libsmbclient now supporting server-side copy requests via cli_splice(), file managers making use of libsmbclient can be changed to utilize server-side copy support for greatly improved remote copy performance. Potential file manager targets include GNOME Files/Nautilus (gvfs_smb), Dolphin (kio_smb) and Kodi's File Manager.

  • Difficulty: Easy, Medium
  • Language(s): C, C++
  • Possible mentors: David Disseldorp

Improve libcli/dns

Samba comes with its own asynchronous DNS parser framework developed for the internal DNS server. Basic calls have been implemented for a client-side library as well, but a more fleshed out implementation would be needed. The goal of this project is to implement more high-level calls handling DNS requests, such as UDP/TCP switchover and client-side GSS-TSIG cryptography. A test suite excercising all the functions is required and can be used to cross-check and complement the existing DNS server tests already shipped by Samba. This testsuite should use cmocka.

See Samba's gitweb for the current code.

  • Difficulty: Medium
  • Language(s): C
  • Possible mentors: Kai Blin

Windows Search Protocol WSP client library and torture tests

The Windows Search Protocol WSP is used to implement remote full filesystem indexing (indexed search) between Windows machines. We would like to support this functionality in Samba, interfacing with existing indexing tools on Unix systems (such as GNOME Tracker).

This is a DCE/RPC protocol. See http://msdn.microsoft.com/en-us/library/cc251767.aspx .

The student should write a (un)marshalling library to push and pull PDUs and an asynchronous client library on top of the Samba raw smb client library.

The student should write sub-tests for smbtorture which should demonstrate how the protocol works against a Windows server. The student doesn't have to implement the Samba server code. Noel Power from SUSE has done some basic server implementation, he should be able to give guidance

  • Difficulty: Medium, Hard
  • Language(s): C, (Python)
  • Possible Mentors: Noel Power

Print System Asynchronous Remote Protocol client library and torture tests

The Print System Asynchronous Remote Protocol (MS-PAR) is a replacement for the synchronous Print System Remote Protocol (MS-RPRN). MS-PAR inherits many message and buffer formats from the old protocol, but allows for asynchronous submission and notification of print jobs. Further details of the protocol can be found in Günther and Andreas' SambaXP presentation.

The student should write a (un)marshalling library to push and pull MS-PAR PDUs, and an asynchronous client library on top of the Samba raw smb client library.

The student should write sub-tests for smbtorture which should demonstrate how the protocol works against a Windows server. The student doesn't have to implement the Samba server code.

  • Difficulty: Medium, Hard
  • Language(s): C
  • Possible Mentors: Andreas Schneider

dbwrap back-end for Ceph RADOS key-value storage

Ceph offers a highly scalable and fault-tolerant storage system. Samba is already capable of sharing data located on the Ceph Filesystem, however scale-out sharing (the same data exposed by multiple Samba nodes) currently requires the use of CTDB for consistent and coherent state across Samba cluster nodes. In such a setup CTDB provides a clustered database with persistent key-value data storage and locking. Database usage is abstracted out via a generic dbwrap interface.

Ceph's librados library provides an API for the storage and retrieval of arbitrary key-value data via the omap functions. A watch/notify protocol is also provided as a mechanism for synchronising client state (locking). Key-value data stored in the RADOS back-end inherits the same redundancy features as regular objects, making it a potentially good candidate as a replacement for CTDB in scale-out Samba clusters.

Alternatively, the RocksDB key-value store includes a Ceph librados back-end, which could perhaps also be plumbed into dbwrap. Doing so would however require architectural changes, to ensure that the RocksDB database is only consumed by a single process on each node.

This task involves the implementation and testing of a new dbwrap back-end that uses librados for the storage, retrieval and locking of Samba key-value state. Ideally, the candidate would also allow time for benchmarking, and an investigation of scalability bottlenecks.

  • Difficulty: Medium, Hard
  • Language(s): C
  • Possible Mentors: David Disseldorp

Samba AD DC as the ideal POSIX Directory

Samba is a great Active Directory Domain Controller, but it is not an ideal directory server for a large, passionate and important user base: Sites with Samba SMB servers, but also general purpose Linux servers. A smaller subset of these sites also have Linux desktops. These sites may also have Windows servers, but they like the Windows desktops, are not the focus.

These sites often used Samba + OpenLDAP, and are finding the move to Samba's AD DC a bit difficult, because schema extension is hard, some things are not done automatically (like uidNumber allocation), and in general the focus has been around matching Windows not listening to the needs of this part of our user base.

Specific research should be done into what FreeIPA does well in targeting this user segment, and what customisations advanced users of OpenLDAP apply.

This project would be to propose a number of specific improvements, and to add both tests and an implementation of these improvements to Samba.

  • Difficulty: Hard
  • Languages(s): C, Python
  • Possible Mentors: Andrew Bartlett

Linux Kernel CIFS/SMB2/SMB3 client improvements

Interested students should contact Steve French (or Jeff Layton) and discuss possible improvements to the Linux Kernel CIFS VFS client. Here are some ideas to get you started:

File Copy Offload: T10 operations, and improved tools for using CopyChunk

  • Benefits: Improved performance. Copy offload is useful for quickly replicating large files, and for backup and for virtualization. Good news is that one copy offload mechanism (CopyChunk) already works. Windows 2012 introduced a second mechanism (https://msdn.microsoft.com/en-us/library/windows/desktop/hh848056(v=vs.85).aspx and also see pages 33 to 42 of http://www.snia.org/sites/default/files/SNIA_SMB3_final.pdf). May be even more useful if TRIM/DISCARD support also added. This is also very timely given the recent support in the linux kernel vfs being added for the copy_range API.
  • Challenges: Ensuring semantics match what is being used in the new copy_range Linux kernel interface, and if not either emulate the alternate semantics, enhance copy_range or provide additional private ioctls to handle the SMB3 copy offload semantics (CopyChunk vs. ODX)
  • Language: C
  • Difficulty: Low / moderate
  • Possible Mentors: Steve French

Multiadapter support

  • Benefits: Big performance advantage for some common cases (e.g. RSS capable adapters, and also two adapter scenarios) and prepares for RDMA in the future which will help cifs.ko in even more workloads.
  • Challenges: Testing may require more physical hardware (two, dual adapter machines to demonstrate performance improvements).
  • Language: C
  • Difficulty: Moderate
  • Possible Mentors: Steve French

Directory oplocks

  • Benefits: Will reduce network load a lot in some workloads, and improve performance as well. Works with recent Windows servers (Windows 2012 and later e.g.).
  • Challenges: Samba does not support it yet (although this might help drive changes to the Server and Linux VFS eventually, if we have client support).
  • Language: C
  • Difficulty: Moderate
  • Possible Mentors: Steve French

Failover/Continuous Availability and HA improvements (Witness protocol)

  • Benefits: Improved reliability, data integrity - may also allow planned migrations (moving data from one server to another). This is very timely given the recent addition of resilient and persistent handle support to the Linux smb3 kernel client.
  • Challenges: Complexity, requires additional RPC infrastructure in client.
  • Language: C
  • Difficulty: High
  • Possible Mentors: Steve French

Support for SELinux

  • Mac Security Label support is important for virtualization and useful for improved security some workloads. Support for setting/getting these labels over the wire was investigated in the NFS version 4 workgroup. Adding support to the CIFS Unix Extensions (Linux kernel client and Samba server) should be possible, especially if this is just a new class of extended attribute. The goal would be to support this feature of SELinux to allow KVM and other applications to take advantage of security labels. Some of the background requirements are loosely related to the (nfs equivalent of) what is mentioned in: http://tools.ietf.org/html/draft-quigley-nfsv4-sec-label-01
  • Language: C
  • Difficulty: Hard
  • Possible Mentors: Steve French

Create GUI or command-line tools for displaying /proc/fs/cifs statistics and and mount/session status

  • Might also involve some cleanup of the in-kernel stats / status output.
  • A mostly complete cifs.ko Performance Co-Pilot (PCP) monitoring agent was implemented in 2013.
  • Language: some C (for kernel code), something else for GUI?
  • Difficulty: Easy
  • Possible Mentors: Steve French

Create a common uid mapping mechanism for Linux nfs and cifs vfs clients

  • or maybe just figure out a way to hook cifs up to rpc.idmapd
  • add a way for the client to remap the uids returned by the server to uids which would be valid on the client (or to a default if such uid does not exist).
  • This is helpful especially when the server supports the CIFS Unix Extensions and has different uids and gids mapping than the client
  • Difficulty: Hard
  • Possible Mentors: Steve French

VFS change notification support

  • add VFS support for calling into the filesystem when setting up notifications
  • add code to cifs/smb2 to set up and deal with notifications from the server in response to inotify/dnotify calls
  • Difficulty: Hard
  • Possible Mentors: Steve French

Support for retrieving snapshots, encrypted files, or compressed files from Windows

  • Difficulty: Medium
  • Possible Mentors: Steve French

cifs->Samba automated test facility

  • Do build verification similar to what we can now do with the Samba server and tools in the Samba build farm. Mounts from the Linux SMB3, SMB2 and CIFS kernel clients could be tested with posix file i/o tests which might include modified versions of the "connectathon" and xfstest test suites and others. The goal is to quickly identify problems with newly integrated patches by running automatically against a variety of cifs/smb2/smb3 mounts (and mount options) to ensure that regressions aren't introduced.
  • xfstests support for CIFS was added as part of SoC/2014.
  • Difficulty: Medium
  • Possible Mentors: Steve French

Other Random Ideas

  • Ideas aren't limited to these, feel free to propose something else:
    • Improve integration between cifs.ko and userspace Samba tools and libraries. Allow userspace Samba libraries to use an existing CIFS mount if it exists by passing requests (via an ioctl or other user->kernel IPC) to cifs.ko. This could improve performance but also more naturally allow use of the same credentials for a user across file and management operations (e.g. listing shares via smbclient and mounting that share).
    • Create a GUI for creating and managing Linux cifs mounts, and more easily configuring the many complex cifs mount options, statistics (/proc/fs/cifs)
    • Support for alternate transport protocols (other than TCP sockets). Adding support for SCTP to cifs/smb2 kernel clients and Samba server or perhaps more interesting add support for Linux's "virtio" transport to the cifs/smb2 kernel clients and Samba server (to allow optimized mounts and zero-copy transfer of data from virtualized guests to hosts on the same box)
    • Support for features (such as directory delegations) which NFS version 4.1 has but which current CIFS even with the most current CIFS->Samba protocol extensions (CIFS Unix Extensions) do not have -- will probably need server support too.
    • Add additional library support or modify Samba client libraries so they can use existing kernel cifs functions (such as sending SMBs on negotiated sessions when the kernel client already has a session to the server). With the addition of library to access cifs's pipe (in kernel), Samba client libraries or other dce/rpc code could use cifs kernel sessions for management of and over cifs mounts.
    • Add libraries and utilities to manage acls (cifs kernel client has an extended attribute for setting/getting "raw" cifs acls but userspace posix acl tools obviously can't be used to manage cifs specific acl features).
  • Difficulty: Low
  • Language(s): C
  • Possible mentors: Steve French