mirror of
https://github.com/MarginaliaSearch/MarginaliaSearch.git
synced 2025-02-23 21:18:58 +00:00
Update readme.md
This commit is contained in:
parent
c974d72e7e
commit
45dd9fea25
@ -1,7 +1,11 @@
|
||||
# Lexicon
|
||||
|
||||
The lexicon contains a mapping for words to identifiers. This lexicon is populated from a journal.
|
||||
The actual word data isn't mapped, but rather a 64 bit hash.
|
||||
The lexicon contains a mapping for words to identifiers.
|
||||
|
||||
To ease index construction, it makes calculations easier if the domain of word identifiers is dense, that is, there is no gaps between ids; if there are 100 words, they're indexed 0-99 and not 5, 23, 107, 9999, 819235 etc. The lexicon exists to create such a mapping.
|
||||
|
||||
This lexicon is populated from a journal. The actual word data isn't mapped, but rather a 64 bit hash. As a result of the <a href="https://en.wikipedia.org/wiki/Birthday_problem">birthday paradox</a>, colissions will be rare up until about to 2<sup>32</sup> words.
|
||||
|
||||
|
||||
The lexicon is constructed by [processes/loading-process](../../processes/loading-process) and read when
|
||||
[services-core/index-service](../../services-core/index-service) interprets queries.
|
||||
|
Loading…
Reference in New Issue
Block a user