MarginaliaSearch

mirror of https://github.com/MarginaliaSearch/MarginaliaSearch.git synced 2025-02-23 13:09:00 +00:00

Author	SHA1	Message	Date
Viktor Lofgren	ec5a17ad13	(index) Tune ranking for verbatim matches in the title, rewarding shorter titles	2024-08-03 14:07:02 +02:00
Viktor Lofgren	8462e88b8f	(index) Add min-dist factor and adjust rankings	2024-08-03 13:07:00 +02:00
Viktor Lofgren	bf26ead010	(index) Remove hasPrioTerm check as we should sort this out in ranking	2024-08-03 13:06:50 +02:00
Viktor Lofgren	c2cedfa83c	(index) Experimental ranking signals	2024-08-03 10:33:41 +02:00
Viktor Lofgren	c6c8b059bf	(index) Return some variant of the previously removed 'Bm25PrioGraphVisitor'	2024-08-03 10:10:12 +02:00
Viktor Lofgren	d8a99784e5	(index) Adding a few experimental relevance signals	2024-08-02 20:26:07 +02:00
Viktor Lofgren	e2107901ec	(index) Add span information for anchor tags, tweak ranking params	2024-08-01 11:46:30 +02:00
Viktor Lofgren	15745b692e	(index) Coherences need to be able to deal with null values among positions	2024-07-31 22:00:14 +02:00
Viktor Lofgren	dc5c668940	(index) Re-enable parallelization of index construction, disable parallel sorting during construction The first change, running index construction in parallel, was previously how it was done, but it was changed to run sequentially to see how it would affect performance. It got worse, so the change is reverted. Though it's been noted that sorting in parallel is likely not a good idea as it leads to a lot of I/O thrashing, so this is changed to be done sequentially.	2024-07-31 10:06:53 +02:00
Viktor Lofgren	b316b55be9	(index) Experimental initial integration of document spans into index	2024-07-30 12:01:53 +02:00
Viktor Lofgren	34703da144	(slop) Support for nested array types and array-of-object types Also adding very basic support for filtered reads via SlopTable. This is probably not a final design.	2024-07-29 14:00:43 +02:00
Viktor Lofgren	1282f78bc5	(slop-models) Fix incorrect column grouping leading to errors in converter	2024-07-29 11:01:18 +02:00
Viktor Lofgren	2d5d965f7f	(slop-models) Fix incorrect column grouping leading to errors in converter	2024-07-29 10:34:33 +02:00
Viktor Lofgren	e585116dab	(slop) Add 32 bit read method for Varint along with the old 64 bit version	2024-07-28 13:20:18 +02:00
Viktor Lofgren	d05a2e57e9	(index-forward) Spans Writer should not be in the index page loop context	2024-07-27 15:17:04 +02:00
Viktor Lofgren	6c3abff664	(slop) Move GCS Slop column to the coded-sequence package This lets the slop library be stand-alone without dependence on coded-sequence. The change also gets rid of the vestigial seek() method in ColumnReader.	2024-07-27 13:58:45 +02:00
Viktor Lofgren	dcb43a3308	(slop) Introduce table concept to keep track of positions and simplify closing The most common error when dealing with Slop columns is that they can fall out of sync with each other if the programmer accidentally does a conditional read and forgets to skip. The second most common error is forgetting to close one of the columns in a reader or writer. To deal with both cases, a new class SlopTable is added that keeps track of the lifecycle of all slop columns and performs a check when closing them that they are in sync.	2024-07-27 13:47:47 +02:00
Viktor Lofgren	aebb2652e8	(wip) Extract and encode spans data Refactoring keyword extraction to extract spans information. Modifying the intermediate storage of converted data to use the new slop library, which is allows for easier storage of ad-hoc binary data like spans and positions. This is a bit of a katamari damacy commit that ended up dragging along a bunch of other fairly tangentially related changes that are hard to break out into separate commits after the fact. Will push as-is to get back to being able to do more isolated work.	2024-07-27 11:44:13 +02:00
Viktor Lofgren	0b31c4cfbb	(coded-sequence) Replace GCS usage with an interface	2024-07-16 14:37:50 +02:00
Viktor Lofgren	5c098005cc	(index) Fix broken test Expected behavior changed since the ranking algorithm now takes into account the number of positions of the keyword, and the test loader was previously modified to generate positions based on prime factors of the document id.	2024-07-16 12:37:59 +02:00
Viktor Lofgren	ae87e41cec	(index) Fix rare BitReader.takeWhileZero bug Fix rare bug where the takeWhileZero method would fail to repopulate the underlying buffer. This caused intermittent de-compression errors if takeWhileZero happened at a 64 bit boundary while the underlying buffer was empty. The change also alters how sequence-lengths are encoded, to more consistently use the getGamma method instead of adding special significance to a zero first byte. Finally, assertions are added checking the invariants of the gamma and delta coding logic as well as UrlIdCodec to earlier detect issues.	2024-07-16 11:03:56 +02:00
Viktor Lofgren	dfd19b5eb9	(index) Reduce the number of abstractions around result ranking The change also restructures the internal API a bit, moving resultsFromDomain from RpcRawResultItem into RpcDecoratedResultItem, as the previous order was driving complexity in the code that generates these objects, and the consumer side of things puts all this data in the same object regardless.	2024-07-16 08:18:54 +02:00
Viktor Lofgren	ad3857938d	(search-api, ranking) Update with new ranking parameters Adding new ranking parameters to the API and routing them through the system, in order to permit integration of the new position data with the ranking algorithm. The change also cleans out several parameters that no longer filled any function.	2024-07-15 04:49:40 +02:00
Viktor Lofgren	179a6002c2	(coded-sequence) Add a callback for re-filling underlying buffer	2024-07-12 23:50:28 +02:00
Viktor Lofgren	d28fc86956	(index-prio) Add fuzz test for prio index	2024-07-11 19:22:36 +02:00
Viktor Lofgren	6303977e9c	(index-prio) Fail louder when size is 0 in PrioDocIdsTransformer We can't deal with this scenario and should complain very loudly	2024-07-11 19:22:05 +02:00
Viktor Lofgren	97695693f2	(index-prio) Don't increment readItems counter when the output buffer is full This behavior was causing the reader to sometimes discard trailing entries in the list.	2024-07-11 19:21:36 +02:00
Viktor Lofgren	1ab875a75d	(test) Correcting flaky tests Also changing the inappropriate usage of ReverseIndexPrioFileNames for the full index in test code.	2024-07-11 16:13:23 +02:00
Viktor Lofgren	f090f0101b	(index-construction) Gather up preindex writes Use fewer writes when finalizing the preindex documents.dat file, as this was getting too slow.	2024-07-11 16:13:23 +02:00
Viktor Lofgren	9881cac2da	(index-reader) Correctly handle negative offset values When wordOffset(...) returns a negative value, it means the word isn't present in the index, and we should abort.	2024-07-11 16:13:23 +02:00
Viktor Lofgren	12590d3449	(index-reverse) Added compression to priority index The priority index documents file can be trivially compressed to a large degree. Compression schema: ``` 00b -> diff docord (E gamma) 01b -> diff domainid (E delta) + (1 + docord) (E delta) 10b -> rank (E gamma) + domainid,docord (raw) 11b -> 30 bit size header, followed by 1 raw doc id (61 bits) ```	2024-07-11 16:13:23 +02:00
Viktor Lofgren	abf7a8d78d	(coded-sequence) Correct implementation of Elias gamma Also clean up the code a bit as the EliasGammaCodec class was an iterator, and it was leaking abstraction details.	2024-07-10 14:28:28 +02:00
Viktor Lofgren	0d29e2a39d	(index-reverse) Entry Sources reset() their LongQueryBuffer Previously this was the responsibility of the caller, which lead to the possibility of passing in improperly prepared buffers and receiving bad outcome	2024-07-09 01:39:40 +02:00
Viktor Lofgren	d90bd340bb	(index-reverse) Removing btree indexes from prio documents file Btree index adds overhead and disk space and doesn't fill any function for the prio index. * Update finalize logic with a new IO transformer that copies the data and prepends a size * Update the reader to read the new format * Added a test	2024-07-08 17:20:17 +02:00
Viktor Lofgren	21afe94096	(index-reverse) Don't use 128 bit merge function for prio index	2024-07-07 21:36:10 +02:00
Viktor Lofgren	fa36689597	(index-reverse) Simplify priority index * Do not emit a documents file * Do not interlace metadata or offsets with doc ids	2024-07-06 18:04:08 +02:00
Viktor Lofgren	85c99ae808	(index-reverse) Split index construction into separate packages for full and priority index	2024-07-06 15:44:47 +02:00
Viktor Lofgren	a4ecd5f4ce	(minor) Fix non-compiling test due to previous refactor	2024-07-06 15:11:43 +02:00
Viktor Lofgren	d023e399d2	(index) Remove unnecessary allocations in journal reader The term data iterator is quite hot and was performing buffer slice operations that were not necessary. Replacing with a fixed pointer alias that can be repositioned to the relevant data. The positions data was also being wrapped in a GammaCodedSequence only to be immediately un-wrapped. Removed this unnecessary step and move to copying the buffer directly instead.	2024-07-04 15:38:22 +02:00
Viktor Lofgren	0e4dd3d76d	(minor) Remove accidentally committed debug printf	2024-06-27 13:40:53 +02:00
Viktor Lofgren	975b8ae2e9	(minor) Tidy code	2024-06-27 13:15:31 +02:00
Viktor Lofgren	3faa5bf521	(search-query) Tidy up QueryGRPCService and IndexClient	2024-06-26 14:03:30 +02:00
Viktor Lofgren	6973712480	(query) Tidy up code	2024-06-26 13:40:06 +02:00
Viktor Lofgren	95b9af92a0	(index) Implement working optional TermCoherences	2024-06-26 12:22:06 +02:00
Viktor Lofgren	8ee64c0771	(index) Correct TermCoherence requirements	2024-06-25 22:18:10 +02:00
Viktor Lofgren	dae22ccbe0	(test) Integration test from crawl->query	2024-06-25 22:17:26 +02:00
Viktor Lofgren	9d00243d7f	(index) Partial re-implementation of position constraints	2024-06-24 15:55:54 +02:00
Viktor Lofgren	b798f28443	(journal) Fixing journal encoding Adjusting some bit widths for entry and record sizes to ensure these don't overflow, as this would corrupt the written journal.	2024-06-24 13:56:27 +02:00
Viktor Lofgren	23759a7243	(loader) Correctly clamp document size	2024-06-10 18:29:14 +02:00
Viktor Lofgren	36160988e2	(index) Integrate positions data with indexes WIP This change integrates the new positions data with the forward and reverse indexes. The ranking code is still only partially re-written.	2024-06-10 15:09:06 +02:00
Viktor Lofgren	9f982a0c3d	(index) Integrate positions file properly	2024-06-06 16:45:42 +02:00
Viktor Lofgren	dcbec9414f	(index) Fix non-compiling tests	2024-06-06 16:35:09 +02:00
Viktor Lofgren	4a8afa6b9f	(index, WIP) Position data partially integrated with forward and reverse indexes. There's no graceful way of doing this in small commits, pushing to avoid the risk of data loss.	2024-06-06 12:54:52 +02:00
Viktor Lofgren	89aae93e60	(*) Lift jetty and guava-dependencies	2024-05-23 14:20:01 +02:00
Viktor Lofgren	4fcd4a8197	(index) Refactor to reduce the level of indirection	2024-05-19 12:40:33 +02:00
Viktor Lofgren	19163fa883	(array) Clean up the Array library IntArray gets the YAGNI axe. The array library had two implementations, one for longs which was used, and one for ints, which only ever saw bit rot. Removing the latter, as all it ever did was clutter up the codebase and add technical debt. If we need int arrays, we fork LongArray again (or add int capabilities to it) Also cleaning up the interfaces, removing layers of redundant abstractions and adding javadocs. Finally adding sz=2 specializations to the quick- and insertion sort algorithms. It seems the JIT isn't optimizing these particularly well, this is an attempt to help it out a bit.	2024-05-18 13:23:06 +02:00
Viktor	2d49071e96	Merge branch 'master' into run-outside-docker	2024-04-25 18:53:26 +02:00
Viktor Lofgren	e4b34b6ee6	(index) Correctly detect the presence of an all-virtual path through the query	2024-04-25 14:01:46 +02:00
Viktor Lofgren	32fe864a33	(build) Java 22 and its consequences has been a disaster for Marginalia Search Roll back to JDK 21 for now, and make Java version configurable in the root build.gradle The project has run into no less than three distinct show-stopping bugs in JDK22, across multiple vendors, and gradle still doesn't fully support it, meaning you need multiple JDK versions installed.	2024-04-24 14:44:39 +02:00
Viktor Lofgren	f46733a47a	(ranking) TermCoherenceFactory should be run for size=2 queries	2024-04-24 14:44:39 +02:00
Viktor Lofgren	5165cf6d15	(ranking) Set regularMask correctly	2024-04-24 14:44:39 +02:00
Viktor Lofgren	4489b21528	(ranking) Cleanup	2024-04-24 14:44:39 +02:00
Viktor Lofgren	f623b37577	(ranking) Suppress NaN:s in ranking output	2024-04-24 14:44:39 +02:00
Viktor Lofgren	f4a2fea451	(ranking, bugfix) Use bm25NgramWeight and not full weight for bM25N	2024-04-24 14:44:39 +02:00
Viktor Lofgren	a748fc5448	(index, bugfix) Pass url quality to query service	2024-04-24 14:44:39 +02:00
Viktor Lofgren	0dcca0cb83	(index) Fix TCF bug where the ngram terms would be considered instead of the regular ones due to a logical derp	2024-04-24 14:44:39 +02:00
Viktor Lofgren	b80a83339b	(qs) Additional info in query debug UI	2024-04-24 14:44:39 +02:00
Viktor Lofgren	eb74d08f2a	(qs) Additional info in query debug UI	2024-04-24 14:44:39 +02:00
Viktor Lofgren	462aa9af26	(query) Update ranking parameters with new variables for bm25 ngrams and tcf mutual jaccard The change also makes it so that as long as the values are defaults, they don't need to be sent over the wire and decoded.	2024-04-24 14:44:39 +02:00
Viktor Lofgren	44b33798f3	(index) Clean up jaccard index term code and down-tune the parameter's importance a bit	2024-04-24 14:44:39 +02:00
Viktor Lofgren	2f0b648fad	(index) Add jaccard index term to boost results based on term overlap	2024-04-24 14:44:39 +02:00
Viktor Lofgren	de0e56f027	(index) Remove position overlap check, coherences will do the work instead	2024-04-24 14:44:39 +02:00
Viktor Lofgren	973ced7b13	(index) Omit absent terms from coherence checks	2024-04-24 14:44:39 +02:00
Viktor Lofgren	cb4b824a85	(index) Split ngram and regular keyword bm25 calculation and add ngram score as a bonus	2024-04-24 14:44:39 +02:00
Viktor Lofgren	e0224085b4	(index) Improve recall for small queries Partially reverse the previous commit and add a query head for the priority index when there are few query interpretations.	2024-04-24 14:44:39 +02:00
Viktor Lofgren	44c1e1d6d9	(index) Remove dead code Since the performance fix in `3359f72239` had a huge positive impact without reducing result quality, it's possible to remove the QueryBranchWalker and associated code.	2024-04-24 14:44:39 +02:00
Viktor Lofgren	c620e9c026	(index) Experimental performance regression fix	2024-04-24 14:44:39 +02:00
Viktor Lofgren	1bb88968c5	(test) Fix broken test	2024-04-24 14:44:39 +02:00
Viktor Lofgren	df75e8f4aa	(index) Explicitly free LongQueryBuffers	2024-04-24 14:44:39 +02:00
Viktor Lofgren	adf846bfd2	(index) Fix term coherence evaluation The code was incorrectly using the documentId instead of the combined id, resulting in almost all result sets being incorrectly seen as zero.	2024-04-24 14:44:39 +02:00
Viktor Lofgren	1748fcc5ac	(valuation) Impose stronger constraints on locality of terms Clean up logic a bit	2024-04-24 14:44:39 +02:00
Viktor Lofgren	08416393e0	(valuation) Impose stronger constraints on locality of terms	2024-04-24 14:44:39 +02:00
Viktor Lofgren	155be1078d	(index) Fix priority search terms This functionality fell into disrepair some while ago. It's supposed to allow non-mandatory search terms that boost the ranking if they are present in the document.	2024-04-24 14:44:39 +02:00
Viktor Lofgren	6efc0f21fe	(index) Clean up data model The change set cleans up the data model for the term-level data. This used to contain a bunch of fields with document-level metadata. This data-duplication means a larger memory footprint and worse memory locality. The ranking code is also modified to not accept SearchResultKeywordScores, but rather CompiledQueryLong and CqDataInts containing only the term metadata and the frequency information needed for ranking. This is again an effort to improve memory locality.	2024-04-24 14:44:39 +02:00
Viktor Lofgren	55f627ed4c	(index) Clean up the code	2024-04-24 14:44:38 +02:00
Viktor Lofgren	4fb86ac692	(search) Fix outdated assumptions about the results We no longer break the query into "sets" of search terms and need to adapt the code to not use this assumption. For the API service, we'll simulate the old behavior to keep the API stable. For the search service, we'll introduce a new way of calculating positions through tree aggregation.	2024-04-24 14:44:38 +02:00
Viktor Lofgren	6cba6aef3b	(minor) Remove dead code	2024-04-24 14:44:38 +02:00
Viktor Lofgren	7e216db463	(index) Add origin trace information for index readers This used to be supported by the system but got lost in refactoring at some point.	2024-04-24 14:44:38 +02:00
Viktor Lofgren	e3316a3672	(index) Clean up new index query code	2024-04-24 14:44:38 +02:00
Viktor Lofgren	a3a6d6292b	(qs, index) New query model integrated with index service. Seems to work, tests are green and initial testing finds no errors. Still a bit untested, committing WIP as-is because it would suck to lose weeks of work due to a drive failure or something.	2024-04-24 14:44:38 +02:00
Viktor Lofgren	4668b1ddcb	(build) Java 22 and its consequences has been a disaster for Marginalia Search Roll back to JDK 21 for now, and make Java version configurable in the root build.gradle The project has run into no less than three distinct show-stopping bugs in JDK22, across multiple vendors, and gradle still doesn't fully support it, meaning you need multiple JDK versions installed.	2024-04-24 13:54:04 +02:00
Viktor Lofgren	8769704462	(ranking) TermCoherenceFactory should be run for size=2 queries	2024-04-21 12:29:25 +02:00
Viktor Lofgren	ed250f57f2	(ranking) Set regularMask correctly	2024-04-19 14:31:57 +02:00
Viktor Lofgren	e92c25f7e0	(ranking) Cleanup	2024-04-19 14:13:12 +02:00
Viktor Lofgren	3ab563f314	(ranking) Suppress NaN:s in ranking output	2024-04-19 13:58:28 +02:00
Viktor Lofgren	426338cb45	(ranking, bugfix) Use bm25NgramWeight and not full weight for bM25N	2024-04-19 12:41:48 +02:00
Viktor Lofgren	5fa2375898	(index, bugfix) Pass url quality to query service	2024-04-19 12:41:26 +02:00
Viktor Lofgren	41782a0ab5	(index) Fix TCF bug where the ngram terms would be considered instead of the regular ones due to a logical derp	2024-04-19 12:19:26 +02:00
Viktor Lofgren	9b06433b82	(qs) Additional info in query debug UI	2024-04-19 12:18:53 +02:00
Viktor Lofgren	def607d840	(qs) Additional info in query debug UI	2024-04-19 11:46:27 +02:00
Viktor Lofgren	7641a02f31	(query) Update ranking parameters with new variables for bm25 ngrams and tcf mutual jaccard The change also makes it so that as long as the values are defaults, they don't need to be sent over the wire and decoded.	2024-04-18 10:36:15 +02:00
Viktor Lofgren	d64bd227cf	(index) Clean up jaccard index term code and down-tune the parameter's importance a bit	2024-04-17 17:40:16 +02:00
Viktor Lofgren	c5ab0a9054	(index) Add jaccard index term to boost results based on term overlap	2024-04-17 16:50:26 +02:00
Viktor Lofgren	dac948973d	(index) Remove position overlap check, coherences will do the work instead	2024-04-17 14:20:01 +02:00
Viktor Lofgren	9d008d1d6f	(index) Omit absent terms from coherence checks	2024-04-17 14:12:16 +02:00
Viktor Lofgren	f52457213e	(index) Split ngram and regular keyword bm25 calculation and add ngram score as a bonus	2024-04-17 14:05:02 +02:00
Viktor Lofgren	af8ff8ce99	(index) Improve recall for small queries Partially reverse the previous commit and add a query head for the priority index when there are few query interpretations.	2024-04-16 22:51:03 +02:00
Viktor Lofgren	7fa3e86e64	(index) Remove dead code Since the performance fix in `3359f72239` had a huge positive impact without reducing result quality, it's possible to remove the QueryBranchWalker and associated code.	2024-04-16 19:59:27 +02:00
Viktor Lofgren	3359f72239	(index) Experimental performance regression fix	2024-04-16 19:48:14 +02:00
Viktor Lofgren	41fa154aa6	(test) Fix broken test	2024-04-16 19:48:14 +02:00
Viktor Lofgren	deaba0152d	(index) Explicitly free LongQueryBuffers	2024-04-16 19:23:00 +02:00
Viktor Lofgren	feaef6093e	(index) Fix term coherence evaluation The code was incorrectly using the documentId instead of the combined id, resulting in almost all result sets being incorrectly seen as zero.	2024-04-16 18:07:43 +02:00
Viktor Lofgren	078fa4fdd0	(valuation) Impose stronger constraints on locality of terms Clean up logic a bit	2024-04-16 17:22:58 +02:00
Viktor Lofgren	2dc77a0638	(valuation) Impose stronger constraints on locality of terms	2024-04-16 17:15:21 +02:00
Viktor Lofgren	599e719ad4	(index) Fix priority search terms This functionality fell into disrepair some while ago. It's supposed to allow non-mandatory search terms that boost the ranking if they are present in the document.	2024-04-15 16:44:08 +02:00
Viktor Lofgren	b6d365bacd	(index) Clean up data model The change set cleans up the data model for the term-level data. This used to contain a bunch of fields with document-level metadata. This data-duplication means a larger memory footprint and worse memory locality. The ranking code is also modified to not accept SearchResultKeywordScores, but rather CompiledQueryLong and CqDataInts containing only the term metadata and the frequency information needed for ranking. This is again an effort to improve memory locality.	2024-04-15 16:04:07 +02:00
Viktor Lofgren	65e3caf402	(index) Clean up the code	2024-04-11 18:50:21 +02:00
Viktor Lofgren	fcdc843c15	(search) Fix outdated assumptions about the results We no longer break the query into "sets" of search terms and need to adapt the code to not use this assumption. For the API service, we'll simulate the old behavior to keep the API stable. For the search service, we'll introduce a new way of calculating positions through tree aggregation.	2024-04-07 12:09:44 +02:00
Viktor Lofgren	dbdcf459a7	(minor) Remove dead code	2024-04-06 16:27:16 +02:00
Viktor Lofgren	ef25d60666	(index) Add origin trace information for index readers This used to be supported by the system but got lost in refactoring at some point.	2024-04-06 13:28:14 +02:00
Viktor Lofgren	ae7c760772	(index) Clean up new index query code	2024-04-05 13:30:49 +02:00
Viktor Lofgren	81815f3e0a	(qs, index) New query model integrated with index service. Seems to work, tests are green and initial testing finds no errors. Still a bit untested, committing WIP as-is because it would suck to lose weeks of work due to a drive failure or something.	2024-04-04 20:17:58 +02:00
Viktor Lofgren	002afca1c5	(sys) Upgrade to JDK22 This also entails upgrading JIB to 3.4.1 and Lombok to 1.18.32.	2024-03-21 14:33:27 +01:00
Viktor Lofgren	fe8d583fdd	(sys) Upgrade to JDK22 This also entails upgrading JIB to 3.4.1 and Lombok to 1.18.32.	2024-03-21 14:27:13 +01:00
Viktor Lofgren	46423612e3	(refac) Merge service-discovery and service modules Also adds a few tests to the server/client code.	2024-03-03 10:49:23 +01:00
Viktor Lofgren	9a045a0588	(index) Clean up index code	2024-02-28 13:09:47 +01:00
Viktor Lofgren	d78e9e715f	(misc) Fix broken tests	2024-02-28 12:12:43 +01:00
Viktor Lofgren	9f1649636e	Clean up documentation and rename `domain-links` to `link-graph`	2024-02-28 11:40:39 +01:00
Viktor Lofgren	99a6e56e99	(index-client) Increase thread count in index client This should be a fair bit larger than the number of index nodes	2024-02-27 22:00:29 +01:00
Viktor Lofgren	e696fd9e92	(docs) Begin un-fucking the docs after refactoring	2024-02-27 21:22:21 +01:00
Viktor Lofgren	eaf836dc66	(service/grpc) Reduce thread count Netty and GRPC by default spawns an incredible number of threads on high-core CPUs, which amount to a fair bit of RAM usage. Add custom executors that throttle this behavior.	2024-02-27 21:22:21 +01:00
Viktor Lofgren	67aa20ea2c	(array) Attempting to debug strange errors	2024-02-27 21:22:18 +01:00
Viktor Lofgren	1a51ec2d69	(index) Index optimization	2024-02-27 21:22:17 +01:00
Viktor Lofgren	3eb0800742	(index) Improve granularity of candidate queue polling	2024-02-27 21:22:17 +01:00
Viktor Lofgren	427f3e922f	(index) Retire count operation, clean up index code.	2024-02-27 21:22:17 +01:00
Viktor Lofgren	823ca73a3f	(domain-ranking) Fix a crash during ranking the edges of the similarity graph doesn't quite match the vertices of the link graph.	2024-02-27 21:22:17 +01:00
Viktor Lofgren	7fc0d4d786	(index) Observability for query execution queues	2024-02-27 21:22:17 +01:00
Viktor Lofgren	b8e336e809	(index) Reduce time allocation a bit	2024-02-27 21:22:17 +01:00
Viktor Lofgren	9429bf5c45	(index) Clean up	2024-02-27 21:22:17 +01:00
Viktor Lofgren	fc00701a1e	(index) Experimental refactoring of the indexing functionality	2024-02-25 11:05:10 +01:00
Viktor Lofgren	1d34224416	(refac) Remove src/main from all source code paths. Look, this will make the git history look funny, but trimming unnecessary depth from the source tree is a very necessary sanity-preserving measure when dealing with a super-modularized codebase like this one. While it makes the project configuration a bit less conventional, it will save you several clicks every time you jump between modules. Which you'll do a lot, because it's modular. The src/main/java convention makes a lot of sense for a non-modular project though. This ain't that.	2024-02-23 16:13:40 +01:00
Viktor Lofgren	2201b1a506	(refac) Clean up code issues	2024-02-23 11:39:19 +01:00
Viktor Lofgren	5cdb07023b	(refac) Clean up unused imports	2024-02-23 11:27:20 +01:00
Viktor Lofgren	4740156cfa	Clean up docs	2024-02-22 18:18:58 +01:00
Viktor Lofgren	f8e7f75831	Move index to top level of code	2024-02-22 18:01:35 +01:00
Viktor Lofgren	73eaa0865d	The refactoring will continue until morale improves.	2023-03-12 10:50:31 +01:00
Viktor Lofgren	616effdb3c	The refactoring will continue until morale improves.	2023-03-12 10:04:48 +01:00
Viktor Lofgren	6d939175b1	Additional code restructuring to get rid of util and misc-style packages.	2023-03-11 13:48:40 +01:00
Viktor Lofgren	722ff3bffb	Word feature bit for words that appear in the URL, new search profile for plain text files, better plain text titles.	2023-03-10 16:46:56 +01:00
Viktor Lofgren	1252f95da5	Fix for valuation bug in index code that wouldn't sort bad-ish items properly.	2023-03-07 21:26:04 +01:00

1 2 3 4 5 ...

251 Commits