Viktor Lofgren
8c8f2ad5ee
(search) Add an indicator when a link has a feed in the similar/linked domains views
2025-01-02 18:11:57 +01:00
Viktor Lofgren
f71e79d10f
(search) Add a copy of the old UI as a separate service, search-service-legacy
2025-01-02 18:03:42 +01:00
Viktor Lofgren
1b27c5cf06
(search) Add a copy of the old UI as a separate service, search-service-legacy
2025-01-02 18:02:17 +01:00
Viktor Lofgren
67edc8f90d
(domain-info) Only flag domains with rss feed items as having a feed
2025-01-02 17:41:52 +01:00
Viktor Lofgren
5f576b7d0c
(query-parser) Strip leading underlines
...
This addresses issue #140 , where __builtin_ffs gives no results.
2025-01-02 14:39:03 +01:00
Viktor Lofgren
8b05c788fd
(Search) Enable gzip compression of responses
2025-01-01 18:34:42 +01:00
Viktor Lofgren
236f033bc9
(Search) Reduce whitespace in explore view on all resolutions
2025-01-01 18:23:35 +01:00
Viktor Lofgren
510fc75121
(Search) Reduce whitespace in explorer view on mobile
2025-01-01 18:18:09 +01:00
Viktor Lofgren
0376f2e6e3
Merge branch 'master' into serp-redesign
...
# Conflicts:
# code/services-application/search-service/resources/templates/search/index/index.hdb
2025-01-01 18:15:09 +01:00
Viktor Lofgren
0b65164f60
(chore) Fix broken test
2025-01-01 18:06:29 +01:00
Viktor Lofgren
9be477de33
(domain-info) Add a feed flag to domain info
...
This is a bit of a sketchy solution that requires both assistant services to run on the same physical machine.
2025-01-01 18:02:33 +01:00
Viktor Lofgren
84f55b84ff
(search) Add experimental OPML-export function for feed subscriptions
2025-01-01 17:17:54 +01:00
Viktor Lofgren
ab5c30ad51
(search) Fix site info view for completely unknown domains
...
Also correct the DbDomainQueries.getDomainId so that it throws NoSuchElementException when domain id is missing, and not UncheckedExecutionException via Cache.
2025-01-01 16:29:01 +01:00
Viktor Lofgren
0c839453c5
(search) Fix crosstalk link
2025-01-01 16:09:19 +01:00
Viktor Lofgren
5e4c5d03ae
(search) Clean up breakpoints in site overview
2025-01-01 16:06:08 +01:00
Viktor Lofgren
710af4999a
(feed-fetcher) Add " entity mapping in feed fetcher
2025-01-01 15:45:17 +01:00
Viktor Lofgren
a5b0a1ae62
(search) Move linked/similar domains to a popover style menu on mobile
...
Fix scroll
2025-01-01 15:37:35 +01:00
Viktor Lofgren
e9f71ee39b
(search) Move linked/similar domains to a popover style menu on mobile
2025-01-01 15:23:25 +01:00
Viktor Lofgren
baeb4a46cd
(search) Reintroduce query rewriting for recipes, add rules for wikis and forums
2024-12-31 16:05:00 +01:00
Viktor Lofgren
5e2a8e9f27
(deploy) Add capability of adding tags to deploy script
2024-12-31 16:04:13 +01:00
Viktor
cc1a5bdf90
Merge pull request #138 from MarginaliaSearch/vlofgren-patch-1
...
Update ROADMAP.md
2024-12-31 14:41:02 +01:00
Viktor
7f7b1ffaba
Update ROADMAP.md
2024-12-31 14:40:34 +01:00
Viktor Lofgren
0ea8092350
(search) Add link promoting the redesign beta
2024-12-30 15:47:13 +01:00
Viktor Lofgren
483d29497e
(deploy) Add hashbang to deploy script
2024-12-30 15:47:13 +01:00
Viktor Lofgren
bae44497fe
(crawler) Add a new system property crawler.maxFetchSize
...
This gives the same upper limit to the live crawler and the big boy crawler, though the live crawler will reject items too large, and the big crawler will truncate at that point.
2024-12-30 15:10:11 +01:00
Viktor Lofgren
0d59202aca
(crawler) Do not remove W/-prefix on weak e-tags
...
The server expects to get them back prefixed, as we received them.
2024-12-27 20:56:42 +01:00
Viktor Lofgren
0ca43f0c9c
(live-crawler) Improve live crawler short-circuit logic
...
We should not wait until we've fetched robots.txt to decide whether we have any data to fetch! This makes the live crawler very slow and leads to unnecessary requests.
2024-12-27 20:54:42 +01:00
Viktor Lofgren
3bc99639a0
(feed-fetcher) Make feed fetcher requests conditional
...
Add `If-None-Match` and `If-Modified-Since` headers as appropriate to the feed fetcher's requests. On well-configured web servers, this should short-circuit the request and reduce the amount of bandwidth and processing that is necessary.
A new table was added to the FeedDb to hold one etag per domain.
If-Modified-Since semantics are based on the creation date for the feed database, which should serve as a cutoff date for the earliest update we can have received.
This completes the changes for Issue #136 .
2024-12-27 15:10:15 +01:00
Viktor Lofgren
927bc0b63c
(live-crawler) Add Accept-Encoding: gzip to outbound requests
...
This change adds `Accept-Encoding: gzip` to all outbound requests from the live crawler and feed fetcher, and the corresponding decoding logic for the compressed response data.
The change addresses issue #136 , save for making the fetcher's requests conditional.
2024-12-27 03:59:34 +01:00
Viktor Lofgren
d968801dc1
(converter) Drop feed data from SlopDomainRecord
...
Also remove feed extraction from converter. This is the crawler's responsibility now.
2024-12-26 17:57:08 +01:00
Viktor Lofgren
89db69d360
(crawler) Correct feed URLs in domain state db
...
Discovered feed URLs were given a double slash after their domain name in the DB. This will go away in the URL normalizer, so the URLs are still viable, but the commit fixes the issue regardless.
2024-12-26 15:18:31 +01:00
Viktor Lofgren
895cee7004
(crawler) Improved feed discovery, new domain state db per crawlset
...
Feed discover is improved with by probing a few likely endpoints when no feed link tag is provided. To store the feed URLs, a sqlite database is added to each crawlset that stores a simple summary of the crawl job, including any feed URLs that have been discovered.
Solves issue #135
2024-12-26 15:05:52 +01:00
Viktor Lofgren
4bb71b8439
(crawler) Correct content type probing to only run on URLs that are suspected to be binary
2024-12-26 14:26:23 +01:00
Viktor Lofgren
e4a41f7dd1
(crawler) Correct content type probing to only run on URLs that are suspected to be binary
2024-12-26 14:13:17 +01:00
Viktor
69ad6287b1
Update ROADMAP.md
2024-12-25 21:16:38 +00:00
Viktor Lofgren
81cdd6385d
Add rendering tests for most major views
...
This will prevent accidentally deploying a broken search service
2024-12-25 15:22:26 +01:00
Viktor Lofgren
e76c42329f
Correct dark mode for infobox in site focused search
2024-12-25 15:06:05 +01:00
Viktor Lofgren
e6ef4734ea
Fix tests
2024-12-25 15:05:41 +01:00
Viktor Lofgren
41a59dcf45
(feed) Sanitize illegal HTML entities out of the feed XML before parsing
2024-12-25 14:53:28 +01:00
Viktor Lofgren
df4bc1d7e9
Add update time to front page subscriptions
2024-12-25 14:42:00 +01:00
Viktor Lofgren
2b222efa75
Merge branch 'master' into serp-redesign
2024-12-25 14:22:42 +01:00
Viktor Lofgren
94d4d2edb7
(live-crawler) Add refresh date to feeds API
...
For now this is just the ctime for the feeds db. We may want to store this per-record in the future.
2024-12-25 14:20:48 +01:00
Viktor Lofgren
7ae19a92ba
(deploy) Improve deployment script to allow specification of partitions
2024-12-24 11:16:15 +01:00
Viktor Lofgren
56d14e56d7
(live-crawler) Improve LiveCrawlActor resilience to FeedService outages
2024-12-23 23:33:54 +01:00
Viktor Lofgren
a557c7ae7f
(live-crawler) Limit concurrent accesses per domain using DomainLocks from main crawler
2024-12-23 23:31:03 +01:00
Viktor Lofgren
b66879ccb1
(feed) Add support for date discovery through atom:issued and atom:created
...
This is specifically to help parse monadnock.net's Atom feed.
2024-12-23 20:05:58 +01:00
Viktor Lofgren
f1b7157ca2
(deploy) Add basic linting ability to deployment script.
2024-12-23 16:21:29 +01:00
Viktor Lofgren
7622335e84
(deploy) Correct deploy script, set correct name for assistant
2024-12-23 15:59:02 +01:00
Viktor Lofgren
0da2047eae
(live-capture) Correctly update processed count, disable poll rate adjustment based on freshness.
2024-12-23 15:56:27 +01:00
Viktor Lofgren
5ee4321110
(ci) Correct deploy script
2024-12-22 20:08:37 +01:00