# WIP: Software Hashes **Branch:** `feature/software-hashes` **Started:** 2026-02-17 **Status:** In Progress ## Plan Implements [docs/plans/software-hashes.md](software-hashes.md) — a derived `software_hashes` table storing MD5, CRC32 and size for tape-image contents extracted from download zips. ### Tasks - [ ] Create `data/zxdb/` directory (for JSON snapshot) - [ ] Add `software_hashes` Drizzle schema model - [ ] Create `bin/update-software-hashes.mjs` — main pipeline script - [ ] DB query for tape-image downloads (filetype_id IN 8, 22) - [ ] Resolve local zip path via CDN mapping - [ ] Extract `_CONTENTS` (skip if exists) - [ ] Find tape file (.tap/.tzx/.pzx/.csw) with priority order - [ ] Compute MD5, CRC32, size_bytes - [ ] Upsert into software_hashes - [ ] State file for resume support - [ ] JSON export after bulk update (atomic write) - [ ] Update `bin/import_mysql.sh` to reimport snapshot on DB wipe - [ ] Add pnpm script entries ## Progress Log ### 2026-02-17T16:00Z - Started work. Branch created from `main` at `b361201`. - Explored codebase: understood DB schema, CDN mapping, import pipeline. - Key findings: - filetype_id 8 = "Tape image" (33,427 rows), 22 = "BUGFIX tape image" (98 rows) - CDN_CACHE = /Volumes/McFiver/CDN, paths: SC/ (zxdb) and WoS/ (pub) - `_CONTENTS` dirs exist in WoS but not yet in SC - data/zxdb/ directory needs creation - import_mysql.sh needs software_hashes reimport step ## Decisions & Notes - Target filetype IDs: 8 and 22 (tape image + bugfix tape image). - Tape file priority: .tap > .tzx > .pzx > .csw (most common first). - CDN_CACHE hard-coded to /Volumes/McFiver/CDN (same as sync-downloads). - JSON snapshot at data/zxdb/software_hashes.json. - Use Node.js built-in crypto for MD5, crc32 from buffer-based calculation. ## Blockers None currently. ## Commits b361201 - Ready to start adding hashes