Files

D. Rimron-Soutter b361201cf2 Ready to start adding hashes

2026-02-17 15:53:42 +00:00

4.1 KiB

Raw Blame History

Software Hashes Plan

Plan for adding a derived software_hashes table, its update pipeline, and JSON snapshot lifecycle to survive DB wipes.

1) Goals and Scope (Plan Step 1)

Create and maintain software_hashes for (at this stage) tape-image downloads.
Preserve existing _CONTENTS folders; only create missing ones.
Export software_hashes to JSON after each bulk update.
Reimport software_hashes JSON during DB wipe in bin/import_mysql.sh (or a helper script it invokes).
Ensure all scripts are idempotent and resume-safe.

2) Confirm Pipeline Touchpoints (Plan Step 2)

Verify bin/import_mysql.sh is the authoritative DB wipe/import entry point.
Confirm bin/sync-downloads.mjs remains responsible only for CDN cache sync.
Confirm src/server/schema/zxdb.ts uses downloads.id as the natural FK target.

3) Define Data Model: `software_hashes` (Plan Step 3)

Table naming and FK alignment

Table: software_hashes.
FK: download_id → downloads.id.
Column names follow existing DB snake_case conventions.

Planned columns

download_id (PK or unique index; FK to downloads.id)
md5
crc32
size_bytes
updated_at

Planned indexes / constraints

Unique index on download_id.
Index on md5 for reverse lookup.
Index on crc32 for reverse lookup.

4) Define JSON Snapshot Format (Plan Step 4)

Location

Default: data/zxdb/software_hashes.json (or another agreed path).

Structure

{
  "exportedAt": "2026-02-17T15:18:00.000Z",
  "rows": [
    {
      "download_id": 123,
      "md5": "...",
      "crc32": "...",
      "size_bytes": 12345,
      "updated_at": "2026-02-17T15:18:00.000Z"
    }
  ]
}

Planned import policy

If snapshot exists: truncate software_hashes and bulk insert.
If snapshot missing: log and continue without error.

5) Implement Tape Image Update Workflow (Plan Step 5)

Planned script

bin/update-software-hashes.mjs (name can be adjusted).

Planned input dataset

Query downloads for tape-image rows (filter by filetype_id or joined filetypes table).

Planned per-item process

Resolve local zip path using the same CDN mapping used by sync-downloads.
Compute _CONTENTS folder name: <zip filename>_CONTENTS (exact match).
If _CONTENTS exists, keep it untouched.
If missing, extract zip into _CONTENTS using a library that avoids shell expansion issues with brackets.
Locate tape file inside (.tap, .tzx, .pzx, .csw):
- Apply a deterministic priority order.
- If multiple candidates remain, log and skip (or record ambiguity).
Compute md5, crc32, and size_bytes for the selected file.
Upsert into software_hashes keyed by download_id.

Planned error handling

Log missing zips or missing tape files.
Continue after recoverable errors; fail only on critical DB errors.

6) Implement JSON Export Lifecycle (Plan Step 6)

After each bulk update, export software_hashes to JSON.
Write atomically (temp file + rename).
Include exportedAt timestamp in snapshot.

7) Reimport During Wipe (`bin/import_mysql.sh`) (Plan Step 7)

Planned placement

Immediately after database creation and ZXDB SQL import completes.

Planned behavior

Attempt to read JSON snapshot.
If present, truncate and reinsert software_hashes.
Log imported row count.

8) Add Idempotency and Resume Support (Plan Step 8)

State file similar to .sync-downloads.state.json to track last download_id processed.
CLI flags:
- --resume (default)
- --start-from-id
- --rebuild-all
Reprocess when zip file size or mtime changes.

9) Validation Checklist (Plan Step 9)

_CONTENTS folders are never deleted.
Hashes match expected MD5/CRC32 for known samples.
JSON snapshot is created and reimported correctly.
Reverse lookup by md5/crc32/size_bytes identifies misnamed files.
Script can resume safely after interruption.

10) Open Questions / Confirmations (Plan Step 10)

Final software_hashes column list and types.
Exact JSON snapshot path.
Filetype IDs that map to “Tape Image” in downloads.

4.1 KiB Raw Blame History

Software Hashes Plan

1) Goals and Scope (Plan Step 1)

2) Confirm Pipeline Touchpoints (Plan Step 2)

3) Define Data Model: software_hashes (Plan Step 3)

Table naming and FK alignment

Planned columns

Planned indexes / constraints

4) Define JSON Snapshot Format (Plan Step 4)

Location

Structure

Planned import policy

5) Implement Tape Image Update Workflow (Plan Step 5)

Planned script

Planned input dataset

Planned per-item process

Planned error handling

6) Implement JSON Export Lifecycle (Plan Step 6)

7) Reimport During Wipe (bin/import_mysql.sh) (Plan Step 7)

Planned placement

Planned behavior

8) Add Idempotency and Resume Support (Plan Step 8)

9) Validation Checklist (Plan Step 9)

10) Open Questions / Confirmations (Plan Step 10)

4.1 KiB

Raw Blame History

3) Define Data Model: `software_hashes` (Plan Step 3)

7) Reimport During Wipe (`bin/import_mysql.sh`) (Plan Step 7)