(control) Add warnings about domain data contamination

This commit is contained in:
Viktor Lofgren 2024-01-25 18:26:15 +01:00
parent 0b105b5986
commit 182c0cf28e
3 changed files with 13 additions and 1 deletions

View File

@ -1,8 +1,15 @@
<h1 class="my-3">Download Sample Data</h1> <h1 class="my-3">Download Sample Data</h1>
<div class="my-3 p-3 border bg-light"> <div class="my-3 p-3 border bg-light">
This will download sample crawl data from <a href="https://downloads.marginalia.nu">downloads.marginalia.nu</a> onto Node {{node.id}}. <p>This will download sample crawl data from <a href="https://downloads.marginalia.nu">downloads.marginalia.nu</a> onto Node {{node.id}}.
This is a sample of real crawl data. It is intended for demo, testing and development purposes. Several sets are available. This is a sample of real crawl data. It is intended for demo, testing and development purposes. Several sets are available.
</p>
<p>
<span class="text-danger">Warning</span> While processing the sample data, the domains associated with it will be loaded
into the domain database. This means that if you run the re-crawl action on this machine, regardless of which crawl data
is specified, the domains in the sample data will be crawled!
</p>
</div> </div>
<form method="post" action="actions/download-sample-data"> <form method="post" action="actions/download-sample-data">

View File

@ -6,6 +6,9 @@
If you are just looking to test the software, feel free to use <a href="https://downloads.marginalia.nu/domain-list-test.txt">this If you are just looking to test the software, feel free to use <a href="https://downloads.marginalia.nu/domain-list-test.txt">this
short list of marginalia-related websites</a>, that are safe to crawl repeatedly without causing any problems. short list of marginalia-related websites</a>, that are safe to crawl repeatedly without causing any problems.
</p> </p>
<p><span class="text-danger">Warning</span> Ensure <a href="?view=download-sample-data">downloaded sample data</a> has not been loaded onto this instance
before performing this action, otherwise those domains will also be crawled while re-crawling in the future!</p>
</div> </div>
<form method="post" action="actions/new-crawl-specs"> <form method="post" action="actions/new-crawl-specs">

View File

@ -18,6 +18,8 @@
crawl spec. If the document has changed, it will be re-crawled. If it has not changed, it will be skipped, crawl spec. If the document has changed, it will be re-crawled. If it has not changed, it will be skipped,
and the previous data will be retained. This is both faster and easier on the target server. and the previous data will be retained. This is both faster and easier on the target server.
</p> </p>
<p><span class="text-danger">Warning</span> Ensure <a href="?view=download-sample-data">downloaded sample data</a>
has not been loaded onto this instance before performing this action, otherwise those domains will also be crawled!</p>
</div> </div>
<form method="post" action="actions/recrawl"> <form method="post" action="actions/recrawl">