Viktor Lofgren
542690d9f6
(search-service) Hide pagination when there is only 1 page of results
2024-09-28 13:48:09 +02:00
Viktor Lofgren
596a7fb4ea
(actor) Disable the feed scraper on all nodes but the first
2024-09-28 12:36:16 +02:00
Viktor Lofgren
c3f726a01f
(actor) Add a feed scraping actor
...
Add a new actor that polls an URL every 6 hours and amends the domain database with any unseen domains, flagging them to be crawled by the next crawl job.
The URLs are specified in data/scrape-urls.txt. If this file is absent, the actor shuts down.
2024-09-28 12:33:29 +02:00
Viktor Lofgren
4538ade156
(live-capture) Add readme to live-capture function
2024-09-28 11:35:46 +02:00
Viktor Lofgren
f4709d8f32
(live-capture) Handle case when screenshot bytes are empty.
...
Add logic to flag the domain as fetched when the pngBytes array is empty. This ensures we won't try to re-fetch this domain again for a while.
2024-09-27 15:53:17 +02:00
Viktor Lofgren
3dda8c228c
(live-capture) Handle failed screenshot fetch in BrowserlessClient
...
Return an empty byte array when screenshot fetch fails, ensuring downstream processes are not impacted by null responses. Additionally, only attempt to upload the screenshot if the byte array is non-empty, preventing invalid data from being stored.
2024-09-27 14:52:05 +02:00
Viktor Lofgren
ccf6b7caf3
(assistant) Refactor scheduling of tasks within SimilarDomainsService
...
Changed the scheduling function to use a single schedule call instead of a fixed delay for the init task. The updateScreenshotInfo method was also moved and slightly refactored for clearer readability and consistency.
2024-09-27 14:43:19 +02:00
Viktor Lofgren
fed33ed64a
(search-service) Update screenshot request handling
...
Always request the main site screenshot to ensure staleness checks and necessary updates. Limit additional screenshot requests for similar and linking domains to avoid overloading with a maximum of 5 requests per view.
2024-09-27 14:27:25 +02:00
Viktor Lofgren
ca27d95ce1
(assistant) Add bounds checks for domain idx
2024-09-27 14:24:04 +02:00
Viktor Lofgren
3566fe296a
(assistant) Add scheduled update job for screenshot information
2024-09-27 14:16:28 +02:00
Viktor Lofgren
c91435e314
(assistant) Don't attempt to respond to similarity and linkedness queries before the data is ready
...
This will reduce the number of exceptions in the assistant logs quite significantly.
2024-09-27 14:08:08 +02:00
Viktor Lofgren
31f30069a4
(live-capture) Dial down logging a bit
2024-09-27 14:00:55 +02:00
Viktor
e5726a75d2
Merge pull request #120 from MarginaliaSearch/live-capture-function
...
Add a new function 'Live Capture' for on-demand screenshot capture
2024-09-27 13:48:53 +02:00
Viktor Lofgren
c757d116bf
(misc) Fix Broken Tests
2024-09-27 13:46:34 +02:00
Viktor Lofgren
23cce0c78a
Add a new function 'Live Capture' for on-demand screenshot capture
...
The screenshots are requested by the site-service, and triggered via the site-info view.
2024-09-27 13:46:34 +02:00
Viktor Lofgren
1bd29a586c
(service-discovery) Add common base interface to all Grpc services
...
To be able to tell service discovery whether to enable a service on a particular runtime, a common base interface DiscoverableService extends BindableService was added.
2024-09-27 13:46:34 +02:00
Viktor Lofgren
4565bfe359
(crawler) Make the crawler report crawling progress correctly when stopped and resumed.
2024-09-26 18:30:29 +02:00
Viktor Lofgren
336d6fdd14
(index-client) Fix error when zero results are found
2024-09-25 20:23:13 +02:00
Viktor Lofgren
95cde242ca
(assistant) Fix NPE when IP information is absent
2024-09-25 20:19:17 +02:00
Viktor
9224176202
Merge pull request #119 from MarginaliaSearch/result-pagination
...
Add pagination support for the search results
2024-09-25 14:29:24 +02:00
Viktor Lofgren
0d2390fd13
(search-service) Only autofocus on the query when the query is empty
2024-09-25 14:27:03 +02:00
Viktor Lofgren
4a0356e26f
(search-service) Add pagination support to the search GUI
2024-09-25 14:26:49 +02:00
Viktor Lofgren
73f973cc06
(search-query) Add pagination to search query API and the direct query-service interface
2024-09-25 14:20:59 +02:00
Viktor Lofgren
e9e8580913
(converter) Fix NPE bugs in converter due to the reintroduction of CrawledDocument.headers
2024-09-25 12:18:56 +02:00
Viktor Lofgren
8b85a58fea
(search UX) Autofocus on the search form
2024-09-24 15:56:03 +02:00
Viktor Lofgren
40512511af
(crawler) Refactor boundary between CrawlerRetreiver and HttpFetcherImpl
...
This code is still a bit too complex, but it's slowly getting better.
2024-09-24 15:08:22 +02:00
Viktor
10d8fc4fe7
Update ROADMAP.md
2024-09-24 14:57:30 +02:00
Viktor
9899d45ea8
Merge pull request #118 from MarginaliaSearch/vlofgren-patch-1
...
Update ROADMAP.md
2024-09-24 14:13:47 +02:00
Viktor
3eea471ca6
Update ROADMAP.md
2024-09-24 14:13:32 +02:00
Viktor Lofgren
3dec4b6b34
(index) Fix bug where tcfFirstPosition lit up because one term was in the title and the other was missing from the document
...
This was because firstPosition calculation was not invalidated when positions were missing.
2024-09-24 13:33:37 +02:00
Viktor Lofgren
162fc25ebc
(minor) Fix accidental commit errors
2024-09-23 18:03:09 +02:00
Viktor Lofgren
e9854f194c
(crawler) Refactor
...
* Restructure the code to make a bit more sense
* Store full headers in crawl data
* Fix bug in retry-after header that assumed the timeout was in milliseconds, and then clamped it to a lower bound of 500ms, meaning this was almost always handled wrong
2024-09-23 17:51:07 +02:00
Viktor Lofgren
9c292a4f62
(doc) Fix outdated links in documentation
2024-09-22 13:56:17 +02:00
Viktor Lofgren
edb42836da
(vcs) Fix shared state issues with VarintCodedSequence's iterators.
...
Also cleans up the code a bit.
2024-09-21 16:09:15 +02:00
Viktor Lofgren
1ff88ff0bc
(vcs) Stopgap fix for quoted queries with the same term appearinc multiple times
...
There are reentrance issues with VarintCodedSequence, this hides the symptom but these need to be corrected properly.
2024-09-21 14:07:59 +02:00
Viktor Lofgren
28e7c8e5e0
Increase temporal bias weight to give the recent results filter a bit more recency
2024-09-17 18:11:40 +02:00
Viktor
463b3ed0ce
Merge pull request #99 from MarginaliaSearch/term-positions
...
Improve term positions accuracy
2024-09-17 15:30:04 +02:00
Viktor Lofgren
8e78286068
Merge branch 'master' into term-positions
2024-09-17 15:20:46 +02:00
Viktor Lofgren
f4eeef145e
(index) Reduce fetch size to improve timeout characteristics
2024-09-17 15:20:41 +02:00
Viktor Lofgren
87aa869338
(index) Correct positions mask to take into account offsets when overlapping
2024-09-17 14:40:37 +02:00
Viktor Lofgren
60ad4786bc
(index) Use MemorySegment.copy for LongArray->LongArray transfers
2024-09-17 13:56:31 +02:00
Viktor Lofgren
a74df7f905
(index) Increase buffer size for PrioDocIdsTransformer
2024-09-17 13:52:52 +02:00
Viktor Lofgren
9f9c6736ab
(index) Use MemorySegment.copy for LongArray->LongArray transfers
2024-09-17 13:49:02 +02:00
Viktor Lofgren
b95646625f
(index) Correct prio index construction with mmap
...
Accidentally snuck in behavior from full index
2024-09-17 13:39:08 +02:00
Viktor Lofgren
6e47eae903
(index) Correct strange close handling of PositionsFileConstructor
2024-09-13 16:34:14 +02:00
Viktor Lofgren
934af0dd4b
(index) Correct units in log message when shrinking the documents file
2024-09-13 16:33:19 +02:00
Viktor Lofgren
a8bec13ed9
(index) Evaluate using mmap reads during index construction in favor of filechannel reads
...
It's likely that this will be faster, as the reads are on average small and sequential, and can't be buffered easily.
2024-09-13 16:14:56 +02:00
Viktor Lofgren
1cf62f5850
(doc) Correct dead links and stale information in the docs
2024-09-13 11:02:13 +02:00
Viktor Lofgren
8047e77757
(doc) Correct dead links and stale information in the docs
2024-09-13 11:01:05 +02:00
Viktor Lofgren
2a92de29ce
(loader) Fix it so that the loader doesn't explode if it sees an invalid URL
2024-09-12 11:36:00 +02:00