Viktor Lofgren
c08203e2ed
(search) Prevent paperdoll from being run as a test by CI
2024-12-14 20:35:57 +01:00
Viktor Lofgren
86497fd32f
(site-info) Mobile layout fix
2024-12-14 16:19:56 +01:00
Viktor Lofgren
3b998573fd
Adjust colors on dark mode for site overview
2024-12-13 21:51:25 +01:00
Viktor Lofgren
e161882ec7
(search) Fix layout for light mode
2024-12-13 21:47:29 +01:00
Viktor Lofgren
357f349e30
(search) Table layout fixes for dictionary lookup
2024-12-13 21:47:08 +01:00
Viktor Lofgren
e4769f541d
(search) Sort and deduplicate search results for better relevance.
...
Added a custom sorting mechanism to prioritize HTTPS over HTTP and domain-based URLs over raw IPs during deduplication. Ensures "bad duplicates" are discarded while maintaining the original presentation order for user-facing results.
2024-12-13 21:47:08 +01:00
Viktor Lofgren
2a173e2861
(search) Dark Mode
2024-12-13 21:47:07 +01:00
Viktor Lofgren
a6a900266c
(search) Fix redirects
2024-12-13 02:40:51 +01:00
Viktor Lofgren
bdba53f055
(site) Update domain parameter type from PathParam to QueryParam
2024-12-13 02:15:35 +01:00
Viktor Lofgren
eb2fe18867
(sideload) Add LSH generation for sideloaded StackExchange data
...
Previously, the sideloader did not generate a locality-sensitive hashCode for document details. This caused all documents from the same domain to be considered duplicates by the deduplication logic.
2024-12-13 02:10:52 +01:00
Viktor Lofgren
a7468c8d23
(converter) Ensure paths are created for converter batch writer
2024-12-13 01:35:07 +01:00
Viktor Lofgren
fb2beb1eac
(converter) Fix data-loss bug where the converter writer would remove all but the last batch of processed data
2024-12-13 01:19:30 +01:00
Viktor Lofgren
0fb03e3d62
(export) Add logging to AtagExporter for error handling
2024-12-12 22:54:32 +01:00
Viktor Lofgren
67db3f295e
(index) Revert some optimization changes
2024-12-12 22:14:24 +01:00
Viktor Lofgren
dafaab3ef7
(index) Additional optimization pass
2024-12-12 18:57:33 +01:00
Viktor Lofgren
3f11ca409f
(index) Increase thread limit and optimize search result handling
...
Updated the default "index.valuationThreads" to 16 for improved concurrency. Expanded buffer sizes and restructured result handling logic for better memory management and performance.
2024-12-12 17:07:06 +01:00
Viktor Lofgren
694eed79ef
(index) Increase thread limit and optimize search result handling
...
Updated the default "index.valuationThreads" to 16 for improved concurrency. Expanded buffer sizes and restructured result handling logic for better memory management and performance.
2024-12-12 15:32:31 +01:00
Viktor Lofgren
4220169119
(index) Increase thread limit and optimize search result handling
...
Updated the default "index.valuationThreads" to 16 for improved concurrency. Expanded buffer sizes and restructured result handling logic for better memory management and performance.
2024-12-12 15:31:11 +01:00
Viktor Lofgren
bbdde789e7
Merge branch 'master' into serp-redesign
2024-12-11 19:45:17 +01:00
Viktor Lofgren
0a53ac68a0
Add specialization for steam store and GOG
2024-12-11 18:32:45 +01:00
Viktor Lofgren
eab61cd48a
Merge branch 'master' into serp-redesign
2024-12-11 17:09:27 +01:00
Viktor Lofgren
e65d75a0f9
(crawler) Reintroduce content type probing and clean out bad content type data from the existing crawl sets
2024-12-11 17:01:52 +01:00
Viktor Lofgren
3b99cffb3d
(link-parser) Filter out URLs with binary file suffixes in LinkParser
...
Added an additional filter step to ensure URLs with binary suffixes are excluded during crawling. This prevents unnecessary processing of non-HTML content, improving the efficiency of the link parsing process.
2024-12-11 16:42:47 +01:00
Viktor Lofgren
a97c05107e
Add synthetic meta flag for root path documents
...
If the document's URL path is "/", a "special:root" meta flag is now added with the "Synthetic" bit set. This will help searching only for the root document of a website, neat stuff ahead :D
2024-12-11 16:10:44 +01:00
Viktor Lofgren
5002870d1f
(converter) Refactor sideloaders to improve feature handling and keyword logic
...
Centralized HTML feature handling with `applyFeatures` in StackexchangeSideloader and added dynamic synthetic term generation. Improved HTML structure in RedditSideloader and enhanced metadata processing with feature-based keywords. Updated DomainLinks to correctly compute link counts using individual link occurrences.
2024-12-11 16:01:38 +01:00
Viktor Lofgren
73861e613f
(ranking) Downtune score boost for unordered heading matces
2024-12-11 15:44:29 +01:00
Viktor Lofgren
0ce2ba9ad9
(jooby) Fix asset handler
2024-12-11 14:38:04 +01:00
Viktor Lofgren
3ddcebaa36
(search) Give serp/start a more consistent name to siteinfo/start
...
The change also cleans up the layout a bit.
2024-12-11 14:33:57 +01:00
Viktor Lofgren
b91463383e
(jooby) Clean up initialization process
2024-12-11 14:33:18 +01:00
Viktor Lofgren
7444a2f36c
(site-info) Add placeholder when a feed item lacks a title.
2024-12-10 22:46:12 +01:00
Viktor Lofgren
461bc3eb1a
(generator) Add special workaround to flag fextralife as a wiki
2024-12-10 22:22:52 +01:00
Viktor Lofgren
cf7f84f033
(rank) Reduce the impact of domain rank bonus, and only apply it to cancel out negative penalties, never to increase the ranking
2024-12-10 22:04:12 +01:00
Viktor Lofgren
fdee07048d
(search) Remove Spark and migrate to Jooby for the search service
2024-12-10 19:13:13 +01:00
Viktor Lofgren
2fbf201761
(search) Adjust crosstalk flex-basis
2024-12-10 15:12:51 +01:00
Viktor Lofgren
4018e4c434
(search) Add crosstalk to paperdoll
2024-12-10 15:12:39 +01:00
Viktor Lofgren
f3382b5bd8
(search) Completely remove all old hdb templates
...
Create new views for conversion results, dictionary results, and site crosstalk.
2024-12-10 15:04:49 +01:00
Viktor Lofgren
9fc82574f0
(fingerprint) Add FluxGarden as a wiki generator
...
#130
2024-12-10 13:51:42 +01:00
Viktor
589f4dafb9
Merge pull request #129 from MarginaliaSearch/atags-counts
...
(WIP) Improve atag sentence matching
2024-12-10 12:42:34 +00:00
Viktor Lofgren
c5d657ef98
(live-crawler) Flag live crawled documents with a special keyword
2024-12-10 13:42:10 +01:00
Viktor Lofgren
3c2bb566da
(converter) Wipe the converter output path on initialization to avoid lingering stale data.
2024-12-10 13:41:05 +01:00
Viktor Lofgren
9287ee0141
(search) Improve hyphenation logic for titles
2024-12-09 15:29:10 +01:00
Viktor Lofgren
2769c8f869
(search) Remove sticky search bar to aid with performance on firefox (and iOS?)
2024-12-09 15:20:33 +01:00
Viktor Lofgren
ddb66f33ba
(search) Add more feedback when pressing some buttons
2024-12-09 15:07:23 +01:00
Viktor Lofgren
79500b8fbc
(search) Move search bar back up top on mobile, put filter buttom at the bottom instead.
2024-12-09 14:55:37 +01:00
Viktor Lofgren
187eea43a4
(search) Remove redundant @if
2024-12-09 14:46:02 +01:00
Viktor Lofgren
a89ed6fa9f
(search) Fix rendering on site overview, more dense serp layout on mobile
2024-12-09 14:45:45 +01:00
Viktor Lofgren
e0c0ed27bc
(keyword-extraction) Clean up code and add tests for position and spans calculation
...
This code has been a bit of a mess and historically significantly flaky, so some test coverage is more than overdue.
2024-12-08 14:14:52 +01:00
Viktor Lofgren
20abb91657
(loader) Correct DocumentLoaderService to properly do bulk inserts
...
Fixes issue #128
2024-12-08 13:12:52 +01:00
Viktor Lofgren
291ca8daf1
(converter/index) Improve atag sentence matching by taking into consideration how many times a sentence appears in the links
...
This change breaks the format of the atags.parquet file.
2024-12-08 00:27:11 +01:00
Viktor Lofgren
8d168be138
(search) Typeahead search, etc.
2024-12-07 15:47:01 +01:00