First full run of update-software-hashes.mjs completed: - 32,305 tape-image downloads hashed (MD5, CRC32, size, inner path) - Snapshot at data/zxdb/software_hashes.json for DB wipe recovery claude-opus-4-6@MacFiver
3.1 KiB
3.1 KiB
WIP: Software Hashes
Branch: feature/software-hashes
Started: 2026-02-17
Status: Complete
Plan
Implements docs/plans/software-hashes.md — a derived software_hashes table storing MD5, CRC32 and size for tape-image contents extracted from download zips.
Tasks
- Create
data/zxdb/directory (for JSON snapshot) - Add
software_hashesDrizzle schema model - Create
bin/update-software-hashes.mjs— main pipeline script- DB query for tape-image downloads (filetype_id IN 8, 22)
- Resolve local zip path via CDN mapping (uses CDN_CACHE env var)
- Extract
_CONTENTS(skip if exists) - Find tape file (.tap/.tzx/.pzx/.csw) with priority order
- Compute MD5, CRC32, size_bytes
- Upsert into software_hashes
- State file for resume support
- JSON export after bulk update (atomic write)
- Update
bin/import_mysql.shto reimport snapshot on DB wipe - Add pnpm script entries
Progress Log
2026-02-17T16:00Z
- Started work. Branch created from
mainatb361201. - Explored codebase: understood DB schema, CDN mapping, import pipeline.
- Key findings:
- filetype_id 8 = "Tape image" (33,427 rows), 22 = "BUGFIX tape image" (98 rows)
- CDN_CACHE = /Volumes/McFiver/CDN, paths: SC/ (zxdb) and WoS/ (pub)
_CONTENTSdirs exist in WoS but not yet in SC- data/zxdb/ directory needs creation
- import_mysql.sh needs software_hashes reimport step
2026-02-17T16:04Z
- Implemented Drizzle schema model for
software_hashes. - Created
bin/update-software-hashes.mjspipeline script. - Updated
bin/import_mysql.shwith JSON snapshot reimport. - Added
update:hashesandexport:hashespnpm scripts.
2026-02-17T16:09Z
- First full run completed successfully:
- 33,525 total tape-image downloads in DB
- 32,305 rows hashed and inserted into software_hashes
- ~1,220 skipped (missing local zips,
/denied/prefix,.pZX81 files with no tape content) - JSON snapshot exported: 7.2MB, 32,305 rows at
data/zxdb/software_hashes.json
- All plan steps verified working.
Decisions & Notes
- Target filetype IDs: 8 and 22 (tape image + bugfix tape image).
- Tape file priority: .tap > .tzx > .pzx > .csw (most common first).
- CDN_CACHE comes from env var (not hard-coded, unlike sync-downloads.mjs).
- JSON snapshot at data/zxdb/software_hashes.json (7.2MB, committed to repo).
- Node.js built-in
cryptofor MD5; custom CRC32 lookup table (no external deps). inner_pathcolumn added (not in original plan) to record which file inside the zip was hashed./denied/and/nvg/prefix downloads (~443) are logged and skipped (no local mirror)..pfiles (ZX81 programs) categorized as tape images but contain no .tap/.tzx/.pzx/.csw — logged as "no tape file".- Uses system
unzipfor extraction (handles bracket-heavy filenames viaexecFilenot shell).
Blockers
None.
Commits
b361201 - Ready to start adding hashes
944a2dc - wip: start feature/software-hashes — init progress tracker
f5ae89e - feat: add software_hashes table schema and reimport pipeline
edc937a - feat: add update-software-hashes.mjs pipeline script