mirror of
https://github.com/MarginaliaSearch/MarginaliaSearch.git
synced 2025-02-24 05:18:58 +00:00
data:image/s3,"s3://crabby-images/c765d/c765d5283f4176ac41b612e7ae83ed62e7ddf9a1" alt="Viktor Lofgren"
This commit also refactors the executor a bit, and introduces a new converter-feature called data-extractors for this class of jobs.
496 B
496 B
Contains converter-like extraction jobs that operate on crawled data to produce export files.
Important classes
- AtagExporter - extracts anchor texts from the crawled data.
- FeedExporter - tries to find RSS/Atom feeds within the crawled data.
- TermFrequencyExporter - exports the 'TF' part of TF-IDF.