Viktor Lofgren
1b27c5cf06
(search) Add a copy of the old UI as a separate service, search-service-legacy
2025-01-02 18:02:17 +01:00
Viktor Lofgren
8b05c788fd
(Search) Enable gzip compression of responses
2025-01-01 18:34:42 +01:00
Viktor Lofgren
236f033bc9
(Search) Reduce whitespace in explore view on all resolutions
2025-01-01 18:23:35 +01:00
Viktor Lofgren
510fc75121
(Search) Reduce whitespace in explorer view on mobile
2025-01-01 18:18:09 +01:00
Viktor Lofgren
0376f2e6e3
Merge branch 'master' into serp-redesign
...
# Conflicts:
# code/services-application/search-service/resources/templates/search/index/index.hdb
2025-01-01 18:15:09 +01:00
Viktor Lofgren
0b65164f60
(chore) Fix broken test
2025-01-01 18:06:29 +01:00
Viktor Lofgren
9be477de33
(domain-info) Add a feed flag to domain info
...
This is a bit of a sketchy solution that requires both assistant services to run on the same physical machine.
2025-01-01 18:02:33 +01:00
Viktor Lofgren
84f55b84ff
(search) Add experimental OPML-export function for feed subscriptions
2025-01-01 17:17:54 +01:00
Viktor Lofgren
ab5c30ad51
(search) Fix site info view for completely unknown domains
...
Also correct the DbDomainQueries.getDomainId so that it throws NoSuchElementException when domain id is missing, and not UncheckedExecutionException via Cache.
2025-01-01 16:29:01 +01:00
Viktor Lofgren
0c839453c5
(search) Fix crosstalk link
2025-01-01 16:09:19 +01:00
Viktor Lofgren
5e4c5d03ae
(search) Clean up breakpoints in site overview
2025-01-01 16:06:08 +01:00
Viktor Lofgren
710af4999a
(feed-fetcher) Add " entity mapping in feed fetcher
2025-01-01 15:45:17 +01:00
Viktor Lofgren
a5b0a1ae62
(search) Move linked/similar domains to a popover style menu on mobile
...
Fix scroll
2025-01-01 15:37:35 +01:00
Viktor Lofgren
e9f71ee39b
(search) Move linked/similar domains to a popover style menu on mobile
2025-01-01 15:23:25 +01:00
Viktor Lofgren
baeb4a46cd
(search) Reintroduce query rewriting for recipes, add rules for wikis and forums
2024-12-31 16:05:00 +01:00
Viktor Lofgren
0ea8092350
(search) Add link promoting the redesign beta
2024-12-30 15:47:13 +01:00
Viktor Lofgren
bae44497fe
(crawler) Add a new system property crawler.maxFetchSize
...
This gives the same upper limit to the live crawler and the big boy crawler, though the live crawler will reject items too large, and the big crawler will truncate at that point.
2024-12-30 15:10:11 +01:00
Viktor Lofgren
0d59202aca
(crawler) Do not remove W/-prefix on weak e-tags
...
The server expects to get them back prefixed, as we received them.
2024-12-27 20:56:42 +01:00
Viktor Lofgren
0ca43f0c9c
(live-crawler) Improve live crawler short-circuit logic
...
We should not wait until we've fetched robots.txt to decide whether we have any data to fetch! This makes the live crawler very slow and leads to unnecessary requests.
2024-12-27 20:54:42 +01:00
Viktor Lofgren
3bc99639a0
(feed-fetcher) Make feed fetcher requests conditional
...
Add `If-None-Match` and `If-Modified-Since` headers as appropriate to the feed fetcher's requests. On well-configured web servers, this should short-circuit the request and reduce the amount of bandwidth and processing that is necessary.
A new table was added to the FeedDb to hold one etag per domain.
If-Modified-Since semantics are based on the creation date for the feed database, which should serve as a cutoff date for the earliest update we can have received.
This completes the changes for Issue #136 .
2024-12-27 15:10:15 +01:00
Viktor Lofgren
927bc0b63c
(live-crawler) Add Accept-Encoding: gzip to outbound requests
...
This change adds `Accept-Encoding: gzip` to all outbound requests from the live crawler and feed fetcher, and the corresponding decoding logic for the compressed response data.
The change addresses issue #136 , save for making the fetcher's requests conditional.
2024-12-27 03:59:34 +01:00
Viktor Lofgren
d968801dc1
(converter) Drop feed data from SlopDomainRecord
...
Also remove feed extraction from converter. This is the crawler's responsibility now.
2024-12-26 17:57:08 +01:00
Viktor Lofgren
89db69d360
(crawler) Correct feed URLs in domain state db
...
Discovered feed URLs were given a double slash after their domain name in the DB. This will go away in the URL normalizer, so the URLs are still viable, but the commit fixes the issue regardless.
2024-12-26 15:18:31 +01:00
Viktor Lofgren
895cee7004
(crawler) Improved feed discovery, new domain state db per crawlset
...
Feed discover is improved with by probing a few likely endpoints when no feed link tag is provided. To store the feed URLs, a sqlite database is added to each crawlset that stores a simple summary of the crawl job, including any feed URLs that have been discovered.
Solves issue #135
2024-12-26 15:05:52 +01:00
Viktor Lofgren
4bb71b8439
(crawler) Correct content type probing to only run on URLs that are suspected to be binary
2024-12-26 14:26:23 +01:00
Viktor Lofgren
e4a41f7dd1
(crawler) Correct content type probing to only run on URLs that are suspected to be binary
2024-12-26 14:13:17 +01:00
Viktor Lofgren
81cdd6385d
Add rendering tests for most major views
...
This will prevent accidentally deploying a broken search service
2024-12-25 15:22:26 +01:00
Viktor Lofgren
e76c42329f
Correct dark mode for infobox in site focused search
2024-12-25 15:06:05 +01:00
Viktor Lofgren
e6ef4734ea
Fix tests
2024-12-25 15:05:41 +01:00
Viktor Lofgren
41a59dcf45
(feed) Sanitize illegal HTML entities out of the feed XML before parsing
2024-12-25 14:53:28 +01:00
Viktor Lofgren
df4bc1d7e9
Add update time to front page subscriptions
2024-12-25 14:42:00 +01:00
Viktor Lofgren
2b222efa75
Merge branch 'master' into serp-redesign
2024-12-25 14:22:42 +01:00
Viktor Lofgren
94d4d2edb7
(live-crawler) Add refresh date to feeds API
...
For now this is just the ctime for the feeds db. We may want to store this per-record in the future.
2024-12-25 14:20:48 +01:00
Viktor Lofgren
56d14e56d7
(live-crawler) Improve LiveCrawlActor resilience to FeedService outages
2024-12-23 23:33:54 +01:00
Viktor Lofgren
a557c7ae7f
(live-crawler) Limit concurrent accesses per domain using DomainLocks from main crawler
2024-12-23 23:31:03 +01:00
Viktor Lofgren
b66879ccb1
(feed) Add support for date discovery through atom:issued and atom:created
...
This is specifically to help parse monadnock.net's Atom feed.
2024-12-23 20:05:58 +01:00
Viktor Lofgren
0da2047eae
(live-capture) Correctly update processed count, disable poll rate adjustment based on freshness.
2024-12-23 15:56:27 +01:00
Viktor Lofgren
5ca8523220
(math) Reduce log error spam from null unit conversions
2024-12-21 18:51:45 +01:00
Viktor Lofgren
1118657ffd
(system) Supply local IP to service discovery if multiFace is enabled
2024-12-19 22:20:19 +01:00
Viktor Lofgren
b1f970152d
(system) To support configurations with multiple docker networks, bind to the "most local" interface.
...
Make the behavior optional.
2024-12-19 20:26:31 +01:00
Viktor Lofgren
e1783891ab
(system) To support configurations with multiple docker networks, bind to the "most local" interface.
2024-12-19 20:18:57 +01:00
Viktor Lofgren
8c963bd4ba
(feeds) Remove Content-Encoding: gzip from feed fetcher
...
We don't support decompressing gzip, so this just gives us errors at this point should the server support it.
2024-12-18 22:23:44 +01:00
Viktor Lofgren
6a079c1c75
(feeds) Add per-domain throttling for feed fetcher.
2024-12-18 22:06:46 +01:00
Viktor Lofgren
2dc9f2e639
(feeds) Make feed XML parsing more lenient
...
... by consuming BOM markers and leading whitespace.
2024-12-18 17:18:41 +01:00
Viktor Lofgren
b66fb9caf6
(feeds) Improve error handling in the feed fetcher.
2024-12-18 17:02:13 +01:00
Viktor Lofgren
6d18e6d840
(search) Add clustering to subscriptions view
2024-12-18 15:36:05 +01:00
Viktor Lofgren
2a3c63f209
(search) Exclude generated style.css from git
2024-12-18 15:24:31 +01:00
Viktor Lofgren
9f70cecaef
(search) Add site subscription feature that puts RSS updates on the front page
2024-12-18 15:24:31 +01:00
Viktor Lofgren
c08203e2ed
(search) Prevent paperdoll from being run as a test by CI
2024-12-14 20:35:57 +01:00
Viktor Lofgren
86497fd32f
(site-info) Mobile layout fix
2024-12-14 16:19:56 +01:00