Viktor Lofgren
eab61cd48a
Merge branch 'master' into serp-redesign
2024-12-11 17:09:27 +01:00
Viktor Lofgren
cf7f84f033
(rank) Reduce the impact of domain rank bonus, and only apply it to cancel out negative penalties, never to increase the ranking
2024-12-10 22:04:12 +01:00
Viktor Lofgren
f3382b5bd8
(search) Completely remove all old hdb templates
...
Create new views for conversion results, dictionary results, and site crosstalk.
2024-12-10 15:04:49 +01:00
Viktor Lofgren
f050bf5c4c
(WIP) Initial semi-working transformation to new tailwind UI
...
Still missing is a proper build, we're currently pulling in tailwind from a CDN, which is no bueno in prod.
There's also a lot of polish remaining everywhere, dead links, etc.
2024-12-05 14:00:17 +01:00
Viktor Lofgren
c97c66a41c
(ranking) Reduce the verbatim score multiplier
2024-11-28 13:37:11 +01:00
Viktor Lofgren
923ebbac81
(feeds) Add logic to handle URI fragments in feed items
...
Introduced a method to decide whether to retain URI fragments in feed items based on their uniqueness. Enhanced FeedItem processing to conditionally strip fragments to maintain clean URLs where applicable.
2024-11-23 16:38:56 +01:00
Viktor Lofgren
4d23fe6261
(feeds) Simplify RSS User-Agent header
...
Removed the redundant "RSS Feed Fetcher" suffix from the User-Agent header in the FeedFetcherService. This will help avoid making the feed fetcher trigger bot mitigation that accepts the regular UA-string.
2024-11-21 16:43:56 +01:00
Viktor Lofgren
a91ab4c203
(live-crawler) Crude first-try process for live crawling #WIP
...
Some refactoring is still needed, but an dummy actor is in place and a process that crawls URLs from the livecapture service's RSS endpoints; that makes it all the way to being indexable.
2024-11-19 19:35:01 +01:00
Viktor Lofgren
c728a1e2f2
(rss) Add endpoint for extracting URLs changed withing a timespan.
2024-11-18 14:59:32 +01:00
Viktor Lofgren
d874d76a09
(rss) Add an endpoint that can be used for identifying when RSS data has changed
2024-11-18 14:22:17 +01:00
Viktor Lofgren
9eb16cb667
(test) Remove tests from fast suite
...
Adding a new @Tag("flaky") for tests that do not reliably return successes. These may still be valuable during development, but should not run in CI.
Also tagging a few of the slower tests with the old @Tag("slow"), to speed up the run-time.
2024-11-17 19:45:59 +01:00
Viktor Lofgren
e5db3f11e1
(chore) Clean up some of the uglier delomboking artifacts
2024-11-15 13:57:20 +01:00
Viktor Lofgren
9f47ce8d15
(chore) Remove lombok
...
There are likely some instances of delombok gore with this commit.
2024-11-11 21:14:38 +01:00
Viktor Lofgren
a5b4951f23
(chore) Remove use of deprecated STR.-style string templates
2024-11-11 18:02:28 +01:00
Viktor Lofgren
a456ec9599
(feed) Use the message queue to permit the feeds service to tell the calling actor when it's finished
2024-11-10 18:30:28 +01:00
Viktor Lofgren
a2bc9a98c0
(feed) Use the message queue to permit the feeds service to tell the calling actor when it's finished
2024-11-10 17:45:20 +01:00
Viktor Lofgren
e24a98390c
(feed) Update API to allow specifying clean vs refresh update
...
Move the logic deciding which operation to perform into the actor, updating its state graph to incorporate a counter that runs a clean update once in a blue moon.
2024-11-09 18:43:47 +01:00
Viktor Lofgren
a293266ccd
(feed) Wipe the feeds db and start over from system URLs periodically.
2024-11-09 18:17:16 +01:00
Viktor Lofgren
d774c39031
(feeds) Reduce log spam
2024-11-09 17:56:43 +01:00
Viktor Lofgren
ab17af99da
(feeds) Refresh the feed db using the previous db, when it is available.
2024-11-09 17:56:43 +01:00
Viktor Lofgren
b0ac3c586f
(feeds) Correct parallelism using SimpleBlockingThreadPool
2024-11-09 17:56:43 +01:00
Viktor Lofgren
139fa85b18
(feeds) Add working heartbeat tracking progress
2024-11-09 17:56:43 +01:00
Viktor Lofgren
bfeb9a4538
(feeds) Retire feedlot the feed bot, move RSS capture into the live-capture service
2024-11-09 17:56:43 +01:00
Viktor Lofgren
89f7f3c17c
(query-parser) Fix regression where advice terms weren't parsed properly
2024-10-14 13:46:37 +02:00
Viktor Lofgren
2ee58f4bc9
(index) Adjust ranking parameters to dial down the importance of tcfProximity and firstPosition
2024-09-29 15:33:12 +02:00
Viktor Lofgren
4538ade156
(live-capture) Add readme to live-capture function
2024-09-28 11:35:46 +02:00
Viktor Lofgren
f4709d8f32
(live-capture) Handle case when screenshot bytes are empty.
...
Add logic to flag the domain as fetched when the pngBytes array is empty. This ensures we won't try to re-fetch this domain again for a while.
2024-09-27 15:53:17 +02:00
Viktor Lofgren
3dda8c228c
(live-capture) Handle failed screenshot fetch in BrowserlessClient
...
Return an empty byte array when screenshot fetch fails, ensuring downstream processes are not impacted by null responses. Additionally, only attempt to upload the screenshot if the byte array is non-empty, preventing invalid data from being stored.
2024-09-27 14:52:05 +02:00
Viktor Lofgren
ccf6b7caf3
(assistant) Refactor scheduling of tasks within SimilarDomainsService
...
Changed the scheduling function to use a single schedule call instead of a fixed delay for the init task. The updateScreenshotInfo method was also moved and slightly refactored for clearer readability and consistency.
2024-09-27 14:43:19 +02:00
Viktor Lofgren
ca27d95ce1
(assistant) Add bounds checks for domain idx
2024-09-27 14:24:04 +02:00
Viktor Lofgren
3566fe296a
(assistant) Add scheduled update job for screenshot information
2024-09-27 14:16:28 +02:00
Viktor Lofgren
c91435e314
(assistant) Don't attempt to respond to similarity and linkedness queries before the data is ready
...
This will reduce the number of exceptions in the assistant logs quite significantly.
2024-09-27 14:08:08 +02:00
Viktor Lofgren
31f30069a4
(live-capture) Dial down logging a bit
2024-09-27 14:00:55 +02:00
Viktor Lofgren
23cce0c78a
Add a new function 'Live Capture' for on-demand screenshot capture
...
The screenshots are requested by the site-service, and triggered via the site-info view.
2024-09-27 13:46:34 +02:00
Viktor Lofgren
1bd29a586c
(service-discovery) Add common base interface to all Grpc services
...
To be able to tell service discovery whether to enable a service on a particular runtime, a common base interface DiscoverableService extends BindableService was added.
2024-09-27 13:46:34 +02:00
Viktor Lofgren
c757d116bf
(misc) Fix Broken Tests
2024-09-27 13:46:34 +02:00
Viktor Lofgren
95cde242ca
(assistant) Fix NPE when IP information is absent
2024-09-25 20:19:17 +02:00
Viktor Lofgren
4a0356e26f
(search-service) Add pagination support to the search GUI
2024-09-25 14:26:49 +02:00
Viktor Lofgren
73f973cc06
(search-query) Add pagination to search query API and the direct query-service interface
2024-09-25 14:20:59 +02:00
Viktor Lofgren
28e7c8e5e0
Increase temporal bias weight to give the recent results filter a bit more recency
2024-09-17 18:11:40 +02:00
Viktor Lofgren
99523ca079
(query-parser) Remove test that is no longer relevant
2024-09-10 10:35:56 +02:00
Viktor Lofgren
50ec922c2b
(index) Fix broken index tests
...
Also cleaned up the tests to be less fragile to ranking algorithm changes.
2024-09-10 10:23:46 +02:00
Viktor Lofgren
50ba8fd099
(query-parsing) Correct handling of trailing parentheses
2024-09-03 11:45:14 +02:00
Viktor Lofgren
99b3b00b68
(query-parsing) Merge QueryTokenizer into QueryParser and add escaping of query grammar
2024-09-03 11:35:32 +02:00
Viktor Lofgren
f6d981761d
(query-parsing) Drop search term elements that aren't indexed by the search engine
2024-09-03 11:24:05 +02:00
Viktor Lofgren
8290c19e24
(query-parsing) Drop search term elements that aren't indexed by the search engine
2024-09-03 11:21:01 +02:00
Viktor Lofgren
bb5d946c26
(index, EXPERIMENTAL) Clean up ranking code
2024-08-29 11:34:23 +02:00
Viktor Lofgren
4fbcc02f96
(index) Adjust sensible defaults for ranking parameters
2024-08-25 11:24:16 +02:00
Viktor Lofgren
9aa8f13731
(index) Remove tcfAvgDist ranking parameter
...
This is captured by tcfProximity already
2024-08-25 11:20:19 +02:00
Viktor Lofgren
0999f07320
(search-query) Add new ranking parameters for proximity and verbatim matches
2024-08-25 10:34:12 +02:00