mirror of
https://github.com/MarginaliaSearch/MarginaliaSearch.git
synced 2025-02-23 04:58:59 +00:00
![]() Feed discover is improved with by probing a few likely endpoints when no feed link tag is provided. To store the feed URLs, a sqlite database is added to each crawlset that stores a simple summary of the crawl job, including any feed URLs that have been discovered. Solves issue #135 |
||
---|---|---|
.. | ||
array | ||
blocking-thread-pool | ||
braille-block-punch-cards | ||
btree | ||
coded-sequence | ||
easy-lsh | ||
geo-ip | ||
guarded-regex | ||
language-processing | ||
message-queue | ||
random-write-funnel | ||
term-frequency-dict | ||
test-helpers | ||
LICENSE.txt | ||
readme.md |
Libraries
These are libraries that are not strongly coupled to the search engine's business logic. These libraries may not depend on features, services, processes, models, etc.
NOTE: These libraries are co-licensed under the MIT license.
Libraries
- The array library is for memory mapping large memory-areas, which Java has bad support for. It's designed to be able to easily replaced when Java's Foreign Function And Memory API is released.
- The btree library offers a static BTree implementation based on the array library.
- language-processing contains primitives for sentence extraction and POS-tagging.
- The message-queue library.
Micro libraries
- easy-lsh is a simple locality-sensitive hash for document deduplication
- guarded-regex makes predicated regular expressions clearer
- random-write-funnel is a tool for reducing write amplification when constructing large files out of order.
- next-prime naive brute force prime sieve.
- braille-block-punch-cards renders bit masks into human-readable dot matrices using the braille block.