mirror of
https://github.com/MarginaliaSearch/MarginaliaSearch.git
synced 2025-02-23 13:09:00 +00:00
Merge pull request #118 from MarginaliaSearch/vlofgren-patch-1
Update ROADMAP.md
This commit is contained in:
commit
9899d45ea8
@ -11,7 +11,7 @@ Major goals:
|
||||
* Improve technical ability of indexing and search. Although this area has improved a bit, the
|
||||
search engine is still not very good at dealing with longer queries.
|
||||
|
||||
## Proper Position Index
|
||||
## Proper Position Index (COMPLETED 2024-09)
|
||||
|
||||
The search engine uses a fixed width bit mask to indicate word positions. It has the benefit
|
||||
of being very fast to evaluate and works well for what it is, but is inaccurate and has the
|
||||
@ -21,6 +21,8 @@ word n-grams known beforehand. This limits the ability to interpret longer quer
|
||||
The positions mask should be supplemented or replaced with a more accurate (e.g.) gamma coded positions
|
||||
list, as is the civilized way of doing this.
|
||||
|
||||
Completed with PR https://github.com/MarginaliaSearch/MarginaliaSearch/pull/99
|
||||
|
||||
## Hybridize crawler w/ Common Crawl data
|
||||
|
||||
Sometimes Marginalia's relatively obscure crawler is blocked when attempting to crawl a website, or for
|
||||
@ -51,7 +53,8 @@ It would be very helpful to find a speaker of a large language other than Englis
|
||||
|
||||
Marginalia has experimental RSS preview support for a few domains. This works well and
|
||||
it should be extended to all domains. It would also be interesting to offer search of the
|
||||
RSS data itself.
|
||||
RSS data itself, or use the RSS set to feed a special live index that updates faster than the
|
||||
main dataset.
|
||||
|
||||
## Support for binary formats like PDF
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user