MarginaliaSearch/code/processes/converting-process/java/nu/marginalia/converting/sideload
Viktor Lofgren 3714104976 Add loader for slop data in converter.
Also alter CrawledDocument to not require String parsing of the underlying byte[] data.  This should reduce the number of large memory allocations quite significantly, hopefully reducing the GC churn a bit.
2024-12-17 15:40:24 +01:00
..
dirtree (chore) Remove lombok 2024-11-11 21:14:38 +01:00
encyclopedia (encyclopedia-sideloader) Add test suite and clean up urlencoding logic 2024-11-26 13:34:15 +01:00
reddit (converter) Refactor sideloaders to improve feature handling and keyword logic 2024-12-11 16:01:38 +01:00
stackexchange (sideload) Add LSH generation for sideloaded StackExchange data 2024-12-13 02:10:52 +01:00
warc (chore) Remove lombok 2024-11-11 21:14:38 +01:00
SideloaderProcessing.java Add loader for slop data in converter. 2024-12-17 15:40:24 +01:00
SideloadSource.java (refac) Remove src/main from all source code paths. 2024-02-23 16:13:40 +01:00
SideloadSourceFactory.java (crawler) Reintroduce content type probing and clean out bad content type data from the existing crawl sets 2024-12-11 17:01:52 +01:00