1
0
mirror of https://github.com/Mailu/Mailu.git synced 2025-08-10 22:31:47 +02:00
2972: switch to fts-flatcurve r=mergify[bot] a=nextgens

## What type of PR?

bug-fix

## What does this PR do?

Switch from fts-xapian to fts-flatcurve. This should address the problem with indexes getting too big and will be the default in dovecot 2.4.

Don't forget to nuke old indexes to reclaim space.

### Related issue(s)
- close #2390
- close #2184
- close #2976

## Prerequisites
Before we can consider review and merge, please make sure the following list is done and checked.
If an entry in not applicable, you can check it or remove it from the list.

- [ ] In case of feature or enhancement: documentation updated accordingly
- [x] Unless it's docs or a minor change: add [changelog](https://mailu.io/master/contributors/workflow.html#changelog) entry file.


Co-authored-by: Florent Daigniere <nextgens@freenetproject.org>
This commit is contained in:
bors[bot]
2023-10-12 18:45:55 +00:00
committed by GitHub
12 changed files with 84 additions and 14 deletions

View File

@@ -18,7 +18,7 @@ Features
Main features include:
- **Standard email server**, IMAP and IMAP+, SMTP and Submission with autoconfiguration profiles for clients
- **Advanced email features**, aliases, domain aliases, custom routing
- **Advanced email features**, aliases, domain aliases, custom routing, full-text search of email attachments
- **Web access**, multiple Webmails and administration interface
- **User features**, aliases, auto-reply, auto-forward, fetched accounts, managesieve
- **Admin features**, global admins, announcements, per-domain delegation, quotas

View File

@@ -75,6 +75,8 @@ DEFAULT_CONFIG = {
'API': False,
'WEB_API': '/api',
'API_TOKEN': None,
'FULL_TEXT_SEARCH': 'en',
'FULL_TEXT_SEARCH_ATTACHMENTS': False,
'LOG_LEVEL': 'INFO',
'SESSION_KEY_BITS': 128,
'SESSION_TIMEOUT': 3600,

View File

@@ -78,6 +78,7 @@ ENV \
\
ADMIN_ADDRESS="127.0.0.1" \
FRONT_ADDRESS="127.0.0.1" \
FTS_ATTACHMENTS_ADDRESS="127.0.0.1" \
SMTP_ADDRESS="127.0.0.1" \
IMAP_ADDRESS="127.0.0.1" \
REDIS_ADDRESS="127.0.0.1" \

View File

@@ -81,6 +81,7 @@ ENV \
PATH="/app/venv/bin:${PATH}" \
ADMIN_ADDRESS="admin" \
FRONT_ADDRESS="front" \
FTS_ATTACHMENTS_ADDRESS="tika" \
SMTP_ADDRESS="smtp" \
IMAP_ADDRESS="imap" \
OLETOOLS_ADDRESS="oletools" \

View File

@@ -7,7 +7,8 @@ ARG VERSION
LABEL version=$VERSION
RUN set -euxo pipefail \
; apk add --no-cache --repository=http://dl-cdn.alpinelinux.org/alpine/edge/community --repository=http://dl-cdn.alpinelinux.org/alpine/edge/main 'dovecot<2.4' dovecot-fts-xapian dovecot-lmtpd dovecot-pigeonhole-plugin dovecot-pop3d dovecot-submissiond xapian-core \
; apk add --no-cache --repository=http://dl-cdn.alpinelinux.org/alpine/edge/main 'dovecot<2.4' dovecot-lmtpd dovecot-pigeonhole-plugin dovecot-pop3d dovecot-submissiond \
; apk add --no-cache --repository=http://dl-cdn.alpinelinux.org/alpine/edge/testing dovecot-fts-flatcurve \
; apk add --no-cache rspamd-client \
; mkdir /var/lib/dovecot

View File

@@ -37,7 +37,7 @@ mail_plugins = $mail_plugins quota quota_clone{{ ' ' }}
zlib{{ ' ' }}
{%- endif %}
{%- if (FULL_TEXT_SEARCH or '').lower() not in ['off', 'false', '0'] -%}
fts fts_xapian
fts fts_flatcurve
{%- endif %}
default_vsz_limit = 2GB
@@ -57,11 +57,20 @@ plugin {
quota_clone_dict = proxy:/tmp/podop.socket:quota
{% if (FULL_TEXT_SEARCH or '').lower() not in ['off', 'false', '0'] %}
fts = xapian
fts_xapian = partial=2 full=30
fts = flatcurve
fts_languages = {% if FULL_TEXT_SEARCH %}{{ FULL_TEXT_SEARCH.split(",") | join(" ") }}{% else %}en{% endif %}
fts_tokenizers = generic email-address
fts_autoindex = yes
fts_enforced = yes
fts_autoindex_exclude = \Trash
fts_autoindex_exclude1 = \Junk
fts_filters = normalizer-icu stopwords
fts_filters_en = lowercase english-possessive stopwords
fts_filters_fr = lowercase contractions stopwords
fts_header_excludes = Received DKIM-* ARC-* X-* x-* Comments Delivered-To Return-Path Authentication-Results Message-ID References In-Reply-To Thread-* Accept-Language Content-* MIME-Version
{% if FULL_TEXT_SEARCH_ATTACHMENTS %}
fts_tika = http://{{ FTS_ATTACHMENTS_ADDRESS }}:9998/tika/
{% endif %}
{% endif %}
{% if COMPRESSION in [ 'gz', 'bz2', 'lz4', 'zstd' ] %}

View File

@@ -131,12 +131,15 @@ later classify incoming mail based on the custom part.
The ``DMARC_RUA`` and ``DMARC_RUF`` are DMARC protocol specific values. They hold
the localpart for DMARC rua and ruf email addresses.
Full-text search is enabled for IMAP is enabled by default. This feature can be disabled
(e.g. for performance reasons) by setting the optional variable ``FULL_TEXT_SEARCH`` to ``off``.
The ``FULL_TEXT_SEARCH`` variable (default: 'en') is a comma separated list of
language codes as defined on `fts_languages`_. This feature can be disabled
(e.g. for performance reasons) by setting the variable to ``off``.
You can set a global ``DEFAULT_QUOTA`` to be used for mailboxes when the domain has
no specific quota configured.
.. _`fts_languages`: https://doc.dovecot.org/settings/plugin/fts-plugin/#fts-languages
.. _web_settings:
Web settings

View File

@@ -24,7 +24,7 @@ popular groupware.
Main features include:
- **Standard email server**, IMAP and IMAP+, SMTP and Submission with autoconfiguration profiles for clients
- **Advanced email features**, aliases, domain aliases, custom routing
- **Advanced email features**, aliases, domain aliases, custom routing, full-text search of email attachments
- **Web access**, multiple Webmails and administration interface
- **User features**, aliases, auto-reply, auto-forward, fetched accounts, managesieve
- **Admin features**, global admins, announcements, per-domain delegation, quotas

View File

@@ -98,8 +98,16 @@ services:
volumes:
- "{{ root }}/mail:/mail"
- "{{ root }}/overrides/dovecot:/overrides:ro"
networks:
- default
{% if tika_enabled %}
- fts_attachments
{% endif %}
depends_on:
- front
{% if tika_enabled %}
- fts_attachments
{% endif %}
{% if resolver_enabled %}
- resolver
dns:
@@ -140,6 +148,21 @@ services:
{% endif %}
{% endif %}
{% if tika_enabled %}
fts_attachments:
image: apache/tika:2.9.0.0-full
hostname: tika
restart: always
networks:
- fts_attachments
depends_on:
{% if resolver_enabled %}
- resolver
dns:
- {{ dns }}
{% endif %}
{% endif %}
antispam:
image: ${DOCKER_ORG:-ghcr.io/mailu}/${DOCKER_PREFIX:-}rspamd:${MAILU_VERSION:-{{ version }}}
hostname: antispam
@@ -257,3 +280,8 @@ networks:
driver: bridge
internal: true
{% endif %}
{% if tika_enabled %}
fts_attachments:
driver: bridge
internal: true
{% endif %}

View File

@@ -49,19 +49,19 @@ DISABLE_STATISTICS={{ disable_statistics or 'False' }}
# Expose the admin interface (value: true, false)
ADMIN={{ admin_enabled or 'false' }}
# Choose which webmail to run if any (values: roundcube, snappymail, none)
# Choose which webmail to run if any (values: roundcube, snappymail, none). To enable this feature, recreate the docker-compose.yml file via setup.
WEBMAIL={{ webmail_type }}
# Expose the API interface (value: true, false)
API={{ api_enabled or 'false' }}
# Dav server implementation (value: radicale, none)
# Dav server implementation (value: radicale, none). To enable this feature, recreate the docker-compose.yml file via setup.
WEBDAV={{ webdav_enabled or 'none' }}
# Antivirus solution (value: clamav, none)
# Antivirus solution (value: clamav, none). To enable this feature, recreate the docker-compose.yml file via setup.
ANTIVIRUS={{ antivirus_enabled or 'none' }}
# Scan Macros solution (value: true, false)
# Scan Macros solution (value: true, false). To enable this feature, recreate the docker-compose.yml file via setup.
SCAN_MACROS={{ oletools_enabled or 'false' }}
###################################
@@ -110,8 +110,10 @@ COMPRESSION={{ compression }}
# change compression-level, default: 6 (value: 1-9)
COMPRESSION_LEVEL={{ compression_level }}
# IMAP full-text search is enabled by default. Set the following variable to off in order to disable the feature.
# FULL_TEXT_SEARCH=off
# IMAP full-text search is enabled by default.
# Set the following variable to off in order to disable the feature
# or a comma separated list of language codes to support
FULL_TEXT_SEARCH=en
###################################
# Web settings
@@ -186,3 +188,5 @@ DEFAULT_SPAM_THRESHOLD=80
# This is a mandatory setting for using the RESTful API.
API_TOKEN={{ api_token }}
# Whether tika should be enabled (scan/OCR email attachements). To enable this feature, recreate the docker-compose.yml file via setup.
FULL_TEXT_SEARCH_ATTACHMENTS={{ tika_enabled }}

View File

@@ -64,6 +64,15 @@ the security implications caused by such an increase of attack surface.<p>
<i>Oletools scans documents in email attachements for malicious macros. It has a much lower memory footprint than a full-fledged anti-virus.</i>
</div>
<div class="form-check form-check-inline">
<label class="form-check-label">
<input class="form-check-input" type="checkbox" name="tika_enabled" value="true">
Enable Tika
</label>
<i>Tika enables the functionality for searching through attachments. Tika scans documents in email attachments, process (OCR, keyword extraction) and then index them in a way they can be efficiently searched. This requires significant ressources (RAM, CPU and storage).</i>
</div>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.6.0/jquery.min.js"></script>
<script type="text/javascript" src="{{ url_for('static', filename='render.js') }}"></script>

View File

@@ -0,0 +1,12 @@
- Switch from fts-xapian to fts-flatcurve. This should address the problem with indexes getting too big and will be the default in dovecot 2.4
- Enable full-text search of email attachments if configured (via Tika: you'll need to re-run setup)
If you would like more than english to be supported, please ensure you update your FULL_TEXT_SEARCH configuration variable.
You may also want to dispose of old indexes using a command such as:
find /mailu/mail -type d -name xapian-indexes -prune -exec rm -r {} \+
And proactively force a reindexing using:
docker compose exec imap doveadm index -A '*'