Done


Done 2022-01-30

- generate better default thumbnail on the fly (/)


Done 2022-01-19

- public API gateway (/)

Done 2022-01-16

- overhaul CSS of MEMEX (/)


Done 2022-01-15

- Improved random (/)
      INSERT INTO EC_RANDOM_DOMAINS 
            SELECT DISTINCT(EC_DOMAIN.ID) FROM EC_DOMAIN_NEIGHBORS A
            INNER JOIN EC_DOMAIN_NEIGHBORS B ON B.NEIGHBOR_ID=A.DOMAIN_ID
            INNER JOIN EC_DOMAIN_NEIGHBORS C ON C.NEIGHBOR_ID=B.DOMAIN_ID
            INNER JOIN EC_DOMAIN ON A.DOMAIN_ID=EC_DOMAIN.ID
            WHERE C.DOMAIN_ID IN (SELECT ID FROM EC_DOMAIN WHERE URL_PART IN (secret-sauce))
            AND EC_DOMAIN.STATE>=0;

Done 2022-01-14

- Dark Mode (/)
- Screengrabs by domain (/)
- Revise exploration mode (/)
- Improve keyboard navigation (/)

Done 2022-01-12

- Search redesign (/)
- Fixed dictionary corruption bug (/)

Done 2022-01-04

- Improve site:-query QOL (/)
- Fix byte folder bug (/)

- refactor EC_URL (/)
  ALTER TABLE EC_URL MODIFY COLUMN PROTO ENUM('http', 'https', 'gemini') NOT NULL;
-- put visit-metadata in separate table (/)


Done 2021-12-03


- fix bug in language detection (/)
-- re-fetching some pages (/)

Done 2021-12-02


- new approach for query rewriting (/)

Done 2021-11-14


- make site:-queries return a dummy entry when no site information is available (/)

Done 2021-11-11


- hybridized ordering of domains on reindex, F(previous rank, previous quality). (/)

- mark documents with audio, video, object tags (/)

Done 2021-11-10


- car service <2021-11-18> (/)


Done 2021-10-30

- Add auto redirects for guesswork rss/atom/feed-requests to /log/feed.xml (/)


Done 2021-10-29


- investigate extracting more keywords (/)
-- textrank (/)
-- sideload additional keywords for most popular sites (/)

Done 2021-10-12


- refactor index converter (/)
- clean up code garbage (/)

Done 2021-10-05

- trial more vanilla PageRank approach as a tertiary algorithm (/)

- fix a search result priortization bugs for mixed rankings (/)

    It is reportedly broken

- fix potential DoS where certain search queries with a large number of common but mutually exclusive terms would take forever to process. (/)
    test query: generic stores underground unusual

Done 2021-10-03


- prioritize n-gram matches over word matches (/)
- show informative error page when the index server reboots (/)

Done 2021-10-02


- Personalized Page Rank (/)
- Duelling Algorithms (/)

Done 2021-10-30


- Launch October Update (/)

Done 2021-09-26

- fix broken search use-cases (/)
-- c language (/)
-- 67 chevy (/)
-- 68000 (/)
-- c# (/)
-- @twitterhandle (/)
-- #hashtag (/)

- trial tar based archiving to save the poor ext4 fs (/)

- use words to tag document format etc (/)

- dynamic re-bucketing based on something like (/)
    SELECT DEST.URL_PART,EXP(DEST.QUALITY)*SUM(EXP(SOURCE.QUALITY)) AS Q from EC_DOMAIN DEST INNER JOIN EC_DOMAIN_LINK ON DEST.ID=DEST_DOMAIN_ID INNER JOIN EC_DOMAIN SOURCE ON SOURCE.ID=SOURCE_DOMAIN_ID WHERE DEST.INDEXED>0 GROUP BY DEST_DOMAIN_ID


Done 2021-09-19


- Fix several indexing bugs that hid relevant search results (/)

Done 2021-09-17


- Added search profiles (/)

Done 2021-09-16


- Rephrased an error message that some people took to mean they weren't speaking a proper language (/)

Done 2021-09-15


- Using in-site domain link-names to add search terms (/)
- Fixed buggy default content-type (/)
- Even more aggressive unicode language dectection (/)

Done 2021-09-11

- Status flag for domains (/)
    Indexed, Active, Blocked
- Improve topic detection (/)

Done 2021-09-09


- Tuned search results to demote very short results (/)

Done 2021-09-08


- Encyclopedia tries harder to find the right article if the case match isn't exact (/)

Done 2021-09-06


- Breaking changes for next Index-rebuild (/)
-- Change writer bucket scaling to 1/4 (/)
-- Move protocol and port from EdgeDomain to EdgeURL (/)
-- Change database schemas to reflect (/)
-- ISO-8859-1/UTF-8 charset sniffer (/)
-- Fixed a bug that would occasionally cause the crawler to re-index the same working set multiple times (/)



Done 2021-09-02


- improve edge-director throughput (/)
- give edge-director state for semi-blocking tasks (/)

Done 2021-08-31


- optimize URL index size (/)

Done 2021-08-28


- clean up gemini navigation (/)
- Atom feed for HTTPS and Gemini (/)


Done 2021-08-27

- Feed gemini server with rendered gmi-content (/)
-- Output the content (/)
-- Generate feeds (/)
-- Switch over (/)


Done 2021-08-26

- Absorb gemini server into WMSA (/)

Done 2021-08-25

- wildcard domain for marginalia.nu (/)
-- move memex to memex-subdomain (/)

- feeds on FEED pragma (/)

Done 2021-08-24

- Top nav bar overhaul (/)

Done 2021-08-23

- add marker for which files are todo files (/)
    Added %%%/pragmas for toggling behavior
-- Added template helpers for consuming pragmas (/)
-- Used to improve topic pages (/)

- Fixes for git (/)

Done 2021-08-22

- File manager (/)
-- Delete (/)
-- Move/Rename (/)
--- System for tombstones/redirects (/)

- Edit for / does not work (/)
    Needed better support for non-normalized URLs, e.g. //index.gmi

- Backlinks for index (/)


Done 2021-08-21

- Git Integration (/)
-- Use commit hooks to trigger pull (/)
https://git-scm.com/book/uz/v2/Appendix-B%3A-Embedding-Git-in-your-Applications-JGit

- Recursive directory watch (/)

- Two column layout (/)

Done 2021-08-20

- Overhaul MEMEX navigation (/)
-- Navigation bar (/)
-- Editing (/)
--- Add update-root link (/)

- Tombstones aren't generated properly on-delete (/)
  The tombstone db wasn't properly
  reloaded after being updated.

- Just write static files to disk instead of using an intermediary backend server. (/)
-- Use alias directive to set different root for memex path. (/)
-- Content-type is finnicky (/)
  I want to serve html-wrapped .gmi and .html 
       location ~* \.(gmi|png)$ {
            types {
                text/html gmi;
                text/html png;
            }
        }


Done 2021-08-19

- Move away from statically generated HTML forms in memex (/)

- Fix stability of podcast scraper (/)

- Get crawling up again (/)
-- Monitoring (/)
--- Extraction (/)
--- Status page (/)
-- Scraper config (/)
-- DNS cache (?)
-- IP Block CDNs (/)
--- Parse CIDR (/)
    Apache Commons.Net SubnetUtil seems to
    do the job, although it can't deal 
    with IPV6 :-/
--- CloudFlare (/)
    173.245.48.0/20
    103.21.244.0/22
    103.22.200.0/22
    103.31.4.0/22
    141.101.64.0/18
    108.162.192.0/18
    190.93.240.0/20
    188.114.96.0/20
    197.234.240.0/22
    198.41.128.0/17
    162.158.0.0/15
    172.64.0.0/13
    131.0.72.0/22
    104.16.0.0/13
    104.24.0.0/14
    2400:cb00::/32
    2606:4700::/32
    2803:f800::/32
    2405:b500::/32
    2405:8100::/32
    2a06:98c0::/29
    2c0f:f248::/32
--- Fastly (/)
    23.235.32.0/20
    43.249.72.0/22
    103.244.50.0/24
    103.245.222.0/23
    103.245.224.0/24
    104.156.80.0/20
    146.75.0.0/17
    151.101.0.0/16
    157.52.64.0/18
    167.82.0.0/17
    167.82.128.0/20
    167.82.160.0/20
    167.82.224.0/20
    172.111.64.0/18
    185.31.16.0/22
    199.27.72.0/21
    199.232.0.0/16

- Refactor task management (/)
-- Fix prepend (/)
-- Add tests (/)

- Refactor Floyd-Steinberg ditherer (/)

- Todo move-to-done function puts header last in #Done (/)

Done 2021-08-16

- Pictures-in-HTML (/)
-- Implement compression via Floyd-Steinberg dithering (/)
https://encyclopedia.marginalia.nu/wiki/Floyd%E2%80%93Steinberg_dithering
http://image4j.sourceforge.net/javadoc/index.html?net/sf/image4j/util/ConvertUtil.html
--- Ensure 4 bit (/)
--- On upload (/)
-- Render image views (/)
--- Add to index (/)
-- Upload form (/)

Done 2021-08-15

- CSS fixes for mobile (/)
-- text align for tasks (/)
-- indent overflowed tasks (/)

- Fix CME (/)
    java.util.ConcurrentModificationException: null
        at java.util.HashMap.forEach(HashMap.java:1428) ~[?:?]
        at nu.marginalia.wmsa.memex.MemexData.forEach(MemexData.java:51) ~[WMSA-1628951793.jar:?]
        at nu.marginalia.wmsa.memex.Memex.reRender(Memex.java:49) ~[WMSA-1628951793.jar:?]
        at io.reactivex.rxjava3.core.Scheduler$PeriodicDirectTask.run(Scheduler.java:566) [WMSA-1628951793.jar:?]
        at io.reactivex.rxjava3.core.Scheduler$Worker$PeriodicTask.run(Scheduler.java:513) [WMSA-1628951793.jar:?]
        at io.reactivex.rxjava3.internal.schedulers.ScheduledRunnable.run(ScheduledRunnable.java:65) [WMSA-1628951793.jar:?]
        at io.reactivex.rxjava3.internal.schedulers.ScheduledRunnable.call(ScheduledRunnable.java:56) [WMSA-1628951793.jar:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
        at java.lang.Thread.run(Thread.java:832) [?:?]
    ERROR 2021-08-14 16:36:39,467 RxCachedThreadScheduler-2 MemexMain           : Uncaught exception
    java.util.ConcurrentModificationException: null
        at java.util.HashMap.forEach(HashMap.java:1428) ~[?:?]
        at nu.marginalia.wmsa.memex.MemexData.forEach(MemexData.java:51) ~[WMSA-1628951793.jar:?]
        at nu.marginalia.wmsa.memex.Memex.reRender(Memex.java:49) ~[WMSA-1628951793.jar:?]
        at io.reactivex.rxjava3.core.Scheduler$PeriodicDirectTask.run(Scheduler.java:566) ~[WMSA-1628951793.jar:?]
        at io.reactivex.rxjava3.core.Scheduler$Worker$PeriodicTask.run(Scheduler.java:513) ~[WMSA-1628951793.jar:?]
        at io.reactivex.rxjava3.internal.schedulers.ScheduledRunnable.run(ScheduledRunnable.java:65) [WMSA-1628951793.jar:?]
        at io.reactivex.rxjava3.internal.schedulers.ScheduledRunnable.call(ScheduledRunnable.java:56) [WMSA-1628951793.jar:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
        at java.lang.Thread.run(Thread.java:832) [?:?]

Done 2021-08-14

- Automatic TODO task categorization (/)
- Login API on separate service (/)
-- Set up service (/)
-- Route requests (/)

- Fix header auto-location (/)

- Display top tasks in index (/)

Done 2021-08-10


-- + in URLs? (/)
  proxy_pass with / forces nginx to parse the url (why?)
  Bad:
        proxy_pass http://127.0.0.1:5025/public/wiki/
  Good:
        rewrite ^ $request_uri
        rewrite ^/(.*) /public/$1 break;
        return 400;
        proxy_pass http://127.0.0.1:5025$uri;

- Encyclopedia (/)
-- Search API (/)
-- code tags (/)


Done 2021-08-06


- Memex (/)
-- GemtextParser (/)
-- Service skeleton (/)
-- Link extraction (/)
-- Rendering (/)
--- Stylesheet (/)
-- Uppdateringar (/)
--- API (/)
--- Formulär (/)

Done 2021-08-04


- Service Lockdown (/)
-- X-Public header in code (/)

-- Move endpoints (/)
--- Resource Store (/)
--- Search (/)
--- Assistant (/)

-- Update clients (/)
--- Resource Store (/)
--- Search Service (/)

-- Update nginx (/)
-- Update links on website (/)

- Tune wiki archive fs (/)
    sudo tune2fs -O ^dir_index /dev/nvme0n1p2
    
- marginalia.nu:9999 "BBS" (/)

Done 2021-08-03


- encyclopedia.marginalia.nu (/)

- Verify automatic backup of git (/)

- Reddit frontend (/)
-- Scraper: (/)
-- API: Marginalia 2: (/)

- Wiki (/)
-- on Optane (/)
-- fix Hildegard of Bingen (/)

- Block bots on nginx (/)
    https://kb.linuxlove.xyz/nginx-badbotblocker.html

Done 2021-08-02


- Install Optane (/)
-- Migrate MariaDB (/)

- Wiki (/)
-- redirects (/)
-- top notices (/)

- Bucket4J rate limiting (/)

- Service Monitoring (/)

Done 2021-08-01


- Update Cert (/)
- Backups for git (/)

Done 2021-07-30


- Load Wikidata from ZIM (/)
- Migrate Server to Debian Buster (/)

Done 2021-07-28


- Update description generation algorithm (/)
-- Recalculate descriptions (...) (/)

- Wiki data (/)
-- Load data (/)
-- Wrap wikipedia (/)
-- Wikipedia Cleaner (/)

Done 2021-07-27


- Spell checker service? (/)
https://github.com/wolfgarbe/SymSpell

- Calculations (/)
-- Detection (/)
-- Parser (/)
-- Unit conversion (/)
--- Temperature (/)
--- Distance (/)
--- Weight (/)
--- Area (/)
--- Volume (/)

Done 2021-07-26


- Save websites to disk? (/)
-- GZipped (/)
-- XFS (?)

- Local backlinks in GMI (/)
-- Parse GMI for links and titles (/)
-- Create tags system (/)

- Use prime sizing for HashMap! (/)
-- How to find primes (/)

- Arbitarary size HashMap (/)

Done 2021-07-25


- Syntax for orgmode + GMI in kate (/)
  Use /usr/share/kde4/apps/katepart/syntax/markdown.xml 

Done 2021-07-23


- Dictionary analysis in scraping (/)
   It seems viable to estimate 
   the lanaguage of a document 
   based on the overlap with a
   N-most-common-words dictionary. 
   Threshold 0.05 ok?
-- English (/)
-- Swedish (/)
-- Latin (/)

- Clean up tests (/)

Done 2021-07-22


GZip Compression stats:
   63% old
   21% new

- Hash map (/)
-- Contiguous memory bins (/)

- Key Folding (/)
-- For strings (/)
-- For integers (/)

- Debian Desktop (/)
-- Docker (/)
-- Java 14 (/)
-- IntelliJ (/)
-- Code (/)
-- Gradle (/)
-- OrgMode (/)

Done 2021-07-21


- Bugfix: Domain Resolution (/)

Done 2021-07-20


- Index Changes (/)
-- Remove Junk Logging (/)
-- Split Query (/)
-- Implement in Frontend (/)


- Dictionary Service (/)
-- Add Index To Table (/)
-- Populate test db (/)
-- Build tests (/)
-- Integrate into frontend (/)


- Site Information (/)
-- Fetch (/)
-- 404 (/)