1
0
mirror of https://github.com/laurent22/joplin.git synced 2025-07-06 23:56:13 +02:00

All: Sort search results by average of multiple criteria, including 'Sort notes by' field setting (#3777)

* Weight search results by most recently updated

As discussed here: https://github.com/laurent22/joplin/pull/3777#issuecomment-696491859
Before this commit, results were rarely sorted by date. Content weights and fuzziness were
determined, and then the first criteria to differ would win in sort order (and user_updated_time
was the last criteria checked).

Now the weight score itself will also include age of user_updated_time, surfacing fresh content.
At the current alpha level, results are weighted logarithmically, prioritizing mostly within the
last 30 days, and especially heavily within the past week.

* Updated unit tests to weight search results by last updated date

* Updated unit test title

* Fixed issue with weighted search engine test, and made it more deterministic using mock date

Date was being calculated only at the start of the test suite. It also wasn't using a set mock date, so the milliseconds between the real search engine calculations and the test calculation caused differences in results

* Added initial Search Engine spec

* Added Search Engine spec to README.md

* Renamed Search Sorting spec per laurent22's mentioned naming

* Revised copy in search sorting spec

Co-authored-by: Laurent <laurent22@users.noreply.github.com>
This commit is contained in:
Shawn Axsom
2020-10-09 16:51:11 -04:00
committed by GitHub
parent c42d9cf069
commit 5eb0417b1a
5 changed files with 117 additions and 14 deletions

View File

@ -328,6 +328,21 @@ class SearchEngine {
return idf * (freq * (K1 + 1)) / (freq + K1 * (1 - B + B * (numTokens / avgTokens)));
};
const msSinceEpoch = Math.round(new Date().getTime());
const msPerDay = 86400000;
const weightForDaysSinceLastUpdate = (row) => {
// BM25 weights typically range 0-10, and last updated date should weight similarly, though prioritizing recency logarithmically.
// An alpha of 200 ensures matches in the last week will show up front (11.59) and often so for matches within 2 weeks (5.99),
// but is much less of a factor at 30 days (2.84) or very little after 90 days (0.95), focusing mostly on content at that point.
if (!row.user_updated_time) {
return 0;
}
const alpha = 200;
const daysSinceLastUpdate = (msSinceEpoch - row.user_updated_time) / msPerDay;
return alpha * Math.log(1 + 1 / Math.max(daysSinceLastUpdate, 0.5));
};
for (let i = 0; i < rows.length; i++) {
const row = rows[i];
row.weight = 0;
@ -346,6 +361,8 @@ class SearchEngine {
});
row.wordFound.push(found);
}
row.weight += weightForDaysSinceLastUpdate(row);
}
}