This website requires JavaScript.
Explore
Help
Sign In
Mirror
/
MarginaliaSearch
Watch
1
Star
0
Fork
0
You've already forked MarginaliaSearch
mirror of
https://github.com/MarginaliaSearch/MarginaliaSearch.git
synced
2025-02-24 21:29:00 +00:00
Code
Issues
Actions
Packages
Projects
Releases
Wiki
Activity
fb673de370
MarginaliaSearch
/
code
/
processes
/
crawling-process
/
model
/
java
/
nu
/
marginalia
History
Viktor Lofgren
c8b0a32c0f
(crawler) Reduce long retention of CrawlDataReference objects and their associated SerializableCrawlDataStreams
2025-01-26 15:40:17 +01:00
..
io
(converter) Refactor to remove CrawledDomainReader and move its functionality into SerializableCrawlDataStream
2025-01-26 14:46:50 +01:00
model
(converter) Add truncation att the parser step to prevent the converter from spending too much time on excessively large documents
2025-01-26 14:28:53 +01:00
parquet
/crawldata
(crawler) Migrate away from using OkHttp in the crawler, use Java's HttpClient instead.
2025-01-19 15:07:11 +01:00
slop
(crawler) Reduce long retention of CrawlDataReference objects and their associated SerializableCrawlDataStreams
2025-01-26 15:40:17 +01:00
ContentTypes.java
(crawler) Reintroduce content type probing and clean out bad content type data from the existing crawl sets
2024-12-11 17:01:52 +01:00