4.1 KiB
4.1 KiB
Software Hashes Plan
Plan for adding a derived software_hashes table, its update pipeline, and JSON snapshot lifecycle to survive DB wipes.
1) Goals and Scope (Plan Step 1)
- Create and maintain
software_hashesfor (at this stage) tape-image downloads. - Preserve existing
_CONTENTSfolders; only create missing ones. - Export
software_hashesto JSON after each bulk update. - Reimport
software_hashesJSON during DB wipe inbin/import_mysql.sh(or a helper script it invokes). - Ensure all scripts are idempotent and resume-safe.
2) Confirm Pipeline Touchpoints (Plan Step 2)
- Verify
bin/import_mysql.shis the authoritative DB wipe/import entry point. - Confirm
bin/sync-downloads.mjsremains responsible only for CDN cache sync. - Confirm
src/server/schema/zxdb.tsusesdownloads.idas the natural FK target.
3) Define Data Model: software_hashes (Plan Step 3)
Table naming and FK alignment
- Table:
software_hashes. - FK:
download_id→downloads.id. - Column names follow existing DB
snake_caseconventions.
Planned columns
download_id(PK or unique index; FK todownloads.id)md5crc32size_bytesupdated_at
Planned indexes / constraints
- Unique index on
download_id. - Index on
md5for reverse lookup. - Index on
crc32for reverse lookup.
4) Define JSON Snapshot Format (Plan Step 4)
Location
- Default:
data/zxdb/software_hashes.json(or another agreed path).
Structure
{
"exportedAt": "2026-02-17T15:18:00.000Z",
"rows": [
{
"download_id": 123,
"md5": "...",
"crc32": "...",
"size_bytes": 12345,
"updated_at": "2026-02-17T15:18:00.000Z"
}
]
}
Planned import policy
- If snapshot exists: truncate
software_hashesand bulk insert. - If snapshot missing: log and continue without error.
5) Implement Tape Image Update Workflow (Plan Step 5)
Planned script
bin/update-software-hashes.mjs(name can be adjusted).
Planned input dataset
- Query
downloadsfor tape-image rows (filter byfiletype_idor joinedfiletypestable).
Planned per-item process
- Resolve local zip path using the same CDN mapping used by
sync-downloads. - Compute
_CONTENTSfolder name:<zip filename>_CONTENTS(exact match). - If
_CONTENTSexists, keep it untouched. - If missing, extract zip into
_CONTENTSusing a library that avoids shell expansion issues with brackets. - Locate tape file inside (
.tap,.tzx,.pzx,.csw):- Apply a deterministic priority order.
- If multiple candidates remain, log and skip (or record ambiguity).
- Compute
md5,crc32, andsize_bytesfor the selected file. - Upsert into
software_hasheskeyed bydownload_id.
Planned error handling
- Log missing zips or missing tape files.
- Continue after recoverable errors; fail only on critical DB errors.
6) Implement JSON Export Lifecycle (Plan Step 6)
- After each bulk update, export
software_hashesto JSON. - Write atomically (temp file + rename).
- Include
exportedAttimestamp in snapshot.
7) Reimport During Wipe (bin/import_mysql.sh) (Plan Step 7)
Planned placement
- Immediately after database creation and ZXDB SQL import completes.
Planned behavior
- Attempt to read JSON snapshot.
- If present, truncate and reinsert
software_hashes. - Log imported row count.
8) Add Idempotency and Resume Support (Plan Step 8)
- State file similar to
.sync-downloads.state.jsonto track lastdownload_idprocessed. - CLI flags:
--resume(default)--start-from-id--rebuild-all
- Reprocess when zip file size or mtime changes.
9) Validation Checklist (Plan Step 9)
_CONTENTSfolders are never deleted.- Hashes match expected MD5/CRC32 for known samples.
- JSON snapshot is created and reimported correctly.
- Reverse lookup by
md5/crc32/size_bytesidentifies misnamed files. - Script can resume safely after interruption.
10) Open Questions / Confirmations (Plan Step 10)
- Final
software_hashescolumn list and types. - Exact JSON snapshot path.
- Filetype IDs that map to “Tape Image” in
downloads.