MarginaliaSearch/code/common/db
Viktor Lofgren 0caef1b307 (warc) Toggle for saving WARC data
Add a toggle for saving the WARC data generated by the search engine's crawler.  Normally this is discarded, but for debugging or archival purposes, retaining it may be of interest.

The warc files are concatenated into larger archives, up to about 1 GB each.
An index is also created containing filenames, domain names, offsets and sizes
to help navigate these larger archives.

The warc data is saved in a directory warc/ under the crawl data storage.
2024-01-12 13:45:14 +01:00
..
src (warc) Toggle for saving WARC data 2024-01-12 13:45:14 +01:00
build.gradle (db) Fix migrations, bump flyway to 10.0.1 2023-11-21 20:04:35 +01:00
readme.md (doc) Remove confusingly outdated ER-diagrams 2023-09-21 15:08:27 +02:00

DB

This module primarily contains SQL files for the URLs database. The most central tables are EC_DOMAIN, EC_URL and EC_PAGE_DATA.

Flyway

The system uses flyway to track database changes and allow easy migrations, this is accessible via gradle tasks.

  • flywayMigrate
  • flywayBaseline
  • flywayRepair
  • flywayClean (dangerous as in wipes your entire database)

Refer to the Flyway documentation for guidance. It's well documented and these are probably the only four tasks you'll ever need.

If you are not running the system via docker, you need to provide alternative connection details than the defaults (TODO: how?).

The migration files are in resources/db/migration. The file name convention incorporates the project's cal-ver versioning; and are applied in lexicographical order.

VYY_MM_v_nnn__description.sql

Central Paths

See Also

  • common/service implements DatabaseModule, which is from where the services get database connections.