MarginaliaSearch

mirror of https://github.com/MarginaliaSearch/MarginaliaSearch.git synced 2025-02-24 13:19:02 +00:00

Author	SHA1	Message	Date
Jaseem Abid	0dd14a4bd0	Specify C++ standard in build command The default C++ language standard on macOS is gnu++98, which won't build this module. Full error: ``` > Task :code:libraries:array:cpp:compileCpp FAILED src/main/cpp/cpphelpers.cpp:28:5: error: expected expression [](const p64x2& fst, const p64x2& snd) { ^ ```	2024-06-12 12:47:10 +01:00
Jaseem Abid	9974b31a09	Don't track build files(libcpp.so) with git	2024-06-12 12:45:49 +01:00
Viktor Lofgren	a07cf1ba93	(array/cpp) Update gitignore to properly exclude libcpp.so	2024-06-06 13:06:08 +02:00
Sam Storment	e2f68d9ccf	Add a theme select to the header that lets users toggle their theme independent of their OS theme	2024-06-02 21:02:52 -05:00
Viktor Lofgren	ab4e2b222e	(array) Fix broken benchmarks	2024-05-18 13:41:24 +02:00
Viktor Lofgren	19163fa883	(array) Clean up the Array library IntArray gets the YAGNI axe. The array library had two implementations, one for longs which was used, and one for ints, which only ever saw bit rot. Removing the latter, as all it ever did was clutter up the codebase and add technical debt. If we need int arrays, we fork LongArray again (or add int capabilities to it) Also cleaning up the interfaces, removing layers of redundant abstractions and adding javadocs. Finally adding sz=2 specializations to the quick- and insertion sort algorithms. It seems the JIT isn't optimizing these particularly well, this is an attempt to help it out a bit.	2024-05-18 13:23:06 +02:00
Viktor Lofgren	650f3843bb	(array) Clean up search function jungle Retire search functions that weren't used, including the native implementations. Drop confusing suffixes on search function names. Search functions no longer encode search misses as negative values. Replaced binary search function with a branchless version that is much faster. Cleaned up benchmark code.	2024-05-17 14:31:02 +02:00
Viktor Lofgren	9e766bc056	(array) Clean up search function jungle Retire search functions that weren't used, including the native implementations. Drop confusing suffixes on search function names. Search functions no longer encode search misses as negative values. Replaced binary search function with a branchless version that is much faster. Cleaned up benchmark code.	2024-05-17 14:30:06 +02:00
Viktor Lofgren	48aff52e00	(array) Increase LongArray on-heap alignment to 16 bytes This primarily affects benchmarks, making performance more consistent for the 128 bit operations, as the system mostly works with memory mapped data.	2024-05-16 19:12:36 +02:00
Viktor Lofgren	9d7616317e	(array) Clean up native code a bit	2024-05-16 14:47:10 +02:00
Viktor Lofgren	f48cf77c4d	(array, experimental) Add benchmark results for quicksort	2024-05-14 18:15:30 +02:00
Viktor Lofgren	3549be216f	(array, experimental) Documentation for native algos	2024-05-14 17:43:05 +02:00
Viktor Lofgren	55a7c1db00	(array, experimental) Call C++ helper methods to do some low level stuff a bit faster than is possible with Java	2024-05-14 12:54:14 +02:00
Viktor Lofgren	4668b1ddcb	(build) Java 22 and its consequences has been a disaster for Marginalia Search Roll back to JDK 21 for now, and make Java version configurable in the root build.gradle The project has run into no less than three distinct show-stopping bugs in JDK22, across multiple vendors, and gradle still doesn't fully support it, meaning you need multiple JDK versions installed.	2024-04-24 13:54:04 +02:00
Viktor Lofgren	deaba0152d	(index) Explicitly free LongQueryBuffers	2024-04-16 19:23:00 +02:00
Viktor Lofgren	ae7c760772	(index) Clean up new index query code	2024-04-05 13:30:49 +02:00
Viktor Lofgren	81815f3e0a	(qs, index) New query model integrated with index service. Seems to work, tests are green and initial testing finds no errors. Still a bit untested, committing WIP as-is because it would suck to lose weeks of work due to a drive failure or something.	2024-04-04 20:17:58 +02:00
Viktor Lofgren	002afca1c5	(sys) Upgrade to JDK22 This also entails upgrading JIB to 3.4.1 and Lombok to 1.18.32.	2024-03-21 14:33:27 +01:00
Viktor Lofgren	e696fd9e92	(docs) Begin un-fucking the docs after refactoring	2024-02-27 21:22:21 +01:00
Viktor Lofgren	67aa20ea2c	(array) Attempting to debug strange errors	2024-02-27 21:22:18 +01:00
Viktor Lofgren	1d34224416	(refac) Remove src/main from all source code paths. Look, this will make the git history look funny, but trimming unnecessary depth from the source tree is a very necessary sanity-preserving measure when dealing with a super-modularized codebase like this one. While it makes the project configuration a bit less conventional, it will save you several clicks every time you jump between modules. Which you'll do a lot, because it's modular. The src/main/java convention makes a lot of sense for a non-modular project though. This ain't that.	2024-02-23 16:13:40 +01:00
Viktor Lofgren	0307c55f9f	(refac) Zookeeper for service-discovery, kill service-client lib (WIP) To avoid having to either hard-code or manually configure service addresses (possibly several dozen), and to reduce the project's dependency on docker to deal with routing and discovery, the option to use [Zookeeper](https://zookeeper.apache.org/) to manage services and discovery has been added. A service registry interface was added, with a Zookeeper implementation and a basic implementation that only works on docker and hard-codes everything. The last remaining REST service, the assistant-service, has been migrated to gRPC. This also proved a good time to clear out primordial technical debt from the root of the codebase. The 'service-client' library has been taken behind the barn and given a last farewell. It's replaced by a small library for managing gRPC channels. Since it's no longer used by anything, RxJava has been removed as a dependency from the project. Although the current state seems reasonably stable, this is a work-in-progress commit.	2024-02-20 11:41:14 +01:00
Viktor Lofgren	300b1a1b84	(index-query) Add some tests for the QueryFilter code	2024-02-15 12:03:30 +01:00
Viktor Lofgren	6c3b49417f	(index-query) Improve documentation and code quality	2024-02-15 11:33:50 +01:00
Viktor Lofgren	95d1bd98e4	(array) Update documentation, make unsafe configurable The readme for the array library was extremely out of date. Updating it with accurate information about how the library works, and a demo that should compile. Also added a system property for disabling the use of sun.misc.Unsafe.	2024-02-07 12:26:47 +01:00
Viktor Lofgren	d986f90074	(index) Fix consistency between RandomFileAssembler implementations The RandomFileAssembler implementations, introduced in commit `53c575db3f` were all acting subtly differently. The RWF implementation wrote BigEndian longs instead of the native endianness used by the other implementations (and expected by the index construction code), further the mmap implementation exposed a bug in LongArray.write() that caused it to create a larger file than necessary. A test was built to ensure the output of these implementations is equivalent.	2024-02-05 21:01:32 +01:00
Viktor Lofgren	400f4840ad	(*) Fix broken code in jmh	2024-01-23 17:08:21 +01:00
Viktor Lofgren	1eb0adf6d3	(array) Add sun.misc.Unsafe variant of LongArray	2024-01-22 13:38:42 +01:00
Viktor Lofgren	6a1bfd6270	(array) Remove unused 'madvise' code and 3rd party dependency on 'uppend' This wasn't actually hooked in anywhere. Removing the dependency and code. If it turns out we need madvise in the future, we'll re-introducde it.	2024-01-22 13:01:57 +01:00
Viktor Lofgren	7c6e18f7a7	(*) Overhaul settings and properties Use a system.properties file to configure the system. This is loaded statically by MainClass or ProcessMainClass. Update the property names to be more consistent, and update the documentations to reflect the changes.	2024-01-13 17:12:18 +01:00
Viktor Lofgren	f613f4f2df	(array) Fix spurious search results This was caused by a bug in the binary search algorithm causing it to sometimes return positive values when encoding a search miss. It was also necessary to get rid of the vestiges of the old LongArray and IntArray classes to make this fix doable.	2023-10-26 15:27:02 +02:00
Viktor	8e1abc3f10	(index-reverse) Parallel construction of the reverse indexes. (#52 ) * (index-reverse) Parallel construction of the reverse indexes. * (array) Remove wasteful calculation of numDistinct before merging two sorted arrays. * (index-reverse) Force changes to disk on close, reduce logging. * (index-reverse) Clean up merging process and add back logging * (run) Add a conservative default for INDEX_CONSTRUCTION_PROCESS_OPTS's parallelism as it eats a lot of RAM * (index-reverse) Better logging during processing * (array) 2GB+ compatible write() function * (array) 2GB+ compatible write() function * (index-reverse) We are logging like Bolsonaro and I will not have it. * (reverse-index) Self-diagnostics * (btree) Fix bug in btree reader to do with large data sizes	2023-10-07 10:00:00 +02:00
Viktor Lofgren	f6e9ef6de9	(array) Fix transferFrom() so it survives larger than 2 GB transfers	2023-10-04 13:57:36 +02:00
Viktor Lofgren	40768e935b	(test) Removing /tmp-guardrails as it doesn't hold in CI	2023-10-02 16:52:59 +02:00
Viktor Lofgren	cd12f49fc0	(long-array) Return slices SegmentLongArray of itself for range() &c	2023-09-24 11:31:54 +02:00
Viktor Lofgren	d0aa754252	(long-array) Implement java.lang.foreign.Arena based lifecycle control for LongArray. Further de-ByteBuffer:ing of these classes is to be done, but this is the smallest most urgently needed benefit. This commit is a WIP but in a fully working state, pushing due to the importance of the changes to offer lifecycle control over mmaps.	2023-09-24 10:40:06 +02:00
Viktor Lofgren	dbe9235f3a	(*) Upgrade to JDK21 with preview enabled. ... also move some common configuration into the root build.gradle-file. Support for JDK21 in lombok is a bit sketchy at the moment, but it seems to work. This upgrade is kind of important as the new index construction really benefits from Arena based lifecycle control over off-heap memory.	2023-09-24 10:38:59 +02:00
Viktor Lofgren	f74b9df0a7	(array) Don't use paging arrays when mapping small files for writing	2023-08-31 20:15:10 +02:00
Viktor Lofgren	f321fa5ad3	(array) Override to Paging...Array$range() This is a big performance boost in array.range().get(). Without an override, each access will go through pages[page].get(...) for each get()-operation. This adds up very quickly. BTreeReader does a bunch of get():s on a range()'d array during traversal in the queryData... methods.	2023-08-31 13:52:29 +02:00
Viktor Lofgren	3101b74580	(index) Move to a lexicon-free index design This is a system-wide change. The index used to have a lexicon, mapping words to wordIds using a large in-memory hash table. This made index-construction easier, but it also added a fairly significant RAM penalty to both the index service and the loader. The new design moves to 64 bit word identifiers calculated using the murmur hash of the keyword, and an index construction based on merging smaller indices. It also became necessary half-way through to upgrade guice as its error reporting wasn't quite compatible with JDK20.	2023-08-28 14:02:23 +02:00
Viktor Lofgren	aa0d256d6a	Upgrade code to Java 20. * Change language version * Upgrade Lombok to a JDK20 compatible version	2023-08-23 13:37:49 +00:00
Viktor Lofgren	4e9e79454f	Fix broken transformation functions in the PagingArray classes.	2023-05-28 13:31:05 +02:00
Viktor Lofgren	b0bc07b4e7	Insertion sort was super busted I don't even know how it worked	2023-05-28 12:17:50 +02:00
Viktor Lofgren	6814c90625	Fix N-width sorting bug	2023-05-28 11:57:06 +02:00
Viktor	96bac70b85	Tools for merging sorted lists, and merging btrees. (#14 ) * Utilities for merging BTrees of entity size 1 and 2. * Isolate and clean up sorting algorithms. * Functions for keeping distinct items in a LongArray	2023-04-20 15:28:09 +02:00
Viktor	1b9ae7b42d	Update readme.md	2023-03-21 16:38:39 +01:00
Viktor Lofgren	1bb1248ab0	Optimize array library, jmh benchmarks.	2023-03-21 16:02:31 +01:00
Viktor Lofgren	616effdb3c	The refactoring will continue until morale improves.	2023-03-12 10:04:48 +01:00
Viktor Lofgren	ad1be7c835	Move all code to a code directory.	2023-03-07 17:14:32 +01:00

49 Commits