Files
explorer/docs/plans/plan_feature-software-hashes_implimentation.md
D. Rimron-Soutter 9bfebc1372 feat: add initial software_hashes JSON snapshot (32,305 rows)
First full run of update-software-hashes.mjs completed:
- 32,305 tape-image downloads hashed (MD5, CRC32, size, inner path)
- Snapshot at data/zxdb/software_hashes.json for DB wipe recovery

claude-opus-4-6@MacFiver
2026-02-17 16:09:55 +00:00

3.1 KiB

WIP: Software Hashes

Branch: feature/software-hashes Started: 2026-02-17 Status: Complete

Plan

Implements docs/plans/software-hashes.md — a derived software_hashes table storing MD5, CRC32 and size for tape-image contents extracted from download zips.

Tasks

  • Create data/zxdb/ directory (for JSON snapshot)
  • Add software_hashes Drizzle schema model
  • Create bin/update-software-hashes.mjs — main pipeline script
    • DB query for tape-image downloads (filetype_id IN 8, 22)
    • Resolve local zip path via CDN mapping (uses CDN_CACHE env var)
    • Extract _CONTENTS (skip if exists)
    • Find tape file (.tap/.tzx/.pzx/.csw) with priority order
    • Compute MD5, CRC32, size_bytes
    • Upsert into software_hashes
    • State file for resume support
  • JSON export after bulk update (atomic write)
  • Update bin/import_mysql.sh to reimport snapshot on DB wipe
  • Add pnpm script entries

Progress Log

2026-02-17T16:00Z

  • Started work. Branch created from main at b361201.
  • Explored codebase: understood DB schema, CDN mapping, import pipeline.
  • Key findings:
    • filetype_id 8 = "Tape image" (33,427 rows), 22 = "BUGFIX tape image" (98 rows)
    • CDN_CACHE = /Volumes/McFiver/CDN, paths: SC/ (zxdb) and WoS/ (pub)
    • _CONTENTS dirs exist in WoS but not yet in SC
    • data/zxdb/ directory needs creation
    • import_mysql.sh needs software_hashes reimport step

2026-02-17T16:04Z

  • Implemented Drizzle schema model for software_hashes.
  • Created bin/update-software-hashes.mjs pipeline script.
  • Updated bin/import_mysql.sh with JSON snapshot reimport.
  • Added update:hashes and export:hashes pnpm scripts.

2026-02-17T16:09Z

  • First full run completed successfully:
    • 33,525 total tape-image downloads in DB
    • 32,305 rows hashed and inserted into software_hashes
    • ~1,220 skipped (missing local zips, /denied/ prefix, .p ZX81 files with no tape content)
    • JSON snapshot exported: 7.2MB, 32,305 rows at data/zxdb/software_hashes.json
  • All plan steps verified working.

Decisions & Notes

  • Target filetype IDs: 8 and 22 (tape image + bugfix tape image).
  • Tape file priority: .tap > .tzx > .pzx > .csw (most common first).
  • CDN_CACHE comes from env var (not hard-coded, unlike sync-downloads.mjs).
  • JSON snapshot at data/zxdb/software_hashes.json (7.2MB, committed to repo).
  • Node.js built-in crypto for MD5; custom CRC32 lookup table (no external deps).
  • inner_path column added (not in original plan) to record which file inside the zip was hashed.
  • /denied/ and /nvg/ prefix downloads (~443) are logged and skipped (no local mirror).
  • .p files (ZX81 programs) categorized as tape images but contain no .tap/.tzx/.pzx/.csw — logged as "no tape file".
  • Uses system unzip for extraction (handles bracket-heavy filenames via execFile not shell).

Blockers

None.

Commits

b361201 - Ready to start adding hashes 944a2dc - wip: start feature/software-hashes — init progress tracker f5ae89e - feat: add software_hashes table schema and reimport pipeline edc937a - feat: add update-software-hashes.mjs pipeline script