155 lines
4.1 KiB
Markdown
155 lines
4.1 KiB
Markdown
# Software Hashes Plan
|
|
|
|
Plan for adding a derived `software_hashes` table, its update pipeline, and JSON snapshot lifecycle to survive DB wipes.
|
|
|
|
---
|
|
|
|
## 1) Goals and Scope (Plan Step 1)
|
|
|
|
- Create and maintain `software_hashes` for (at this stage) tape-image downloads.
|
|
- Preserve existing `_CONTENTS` folders; only create missing ones.
|
|
- Export `software_hashes` to JSON after each bulk update.
|
|
- Reimport `software_hashes` JSON during DB wipe in `bin/import_mysql.sh` (or a helper script it invokes).
|
|
- Ensure all scripts are idempotent and resume-safe.
|
|
|
|
---
|
|
|
|
## 2) Confirm Pipeline Touchpoints (Plan Step 2)
|
|
|
|
- Verify `bin/import_mysql.sh` is the authoritative DB wipe/import entry point.
|
|
- Confirm `bin/sync-downloads.mjs` remains responsible only for CDN cache sync.
|
|
- Confirm `src/server/schema/zxdb.ts` uses `downloads.id` as the natural FK target.
|
|
|
|
---
|
|
|
|
## 3) Define Data Model: `software_hashes` (Plan Step 3)
|
|
|
|
### Table naming and FK alignment
|
|
|
|
- Table: `software_hashes`.
|
|
- FK: `download_id` → `downloads.id`.
|
|
- Column names follow existing DB `snake_case` conventions.
|
|
|
|
### Planned columns
|
|
|
|
- `download_id` (PK or unique index; FK to `downloads.id`)
|
|
- `md5`
|
|
- `crc32`
|
|
- `size_bytes`
|
|
- `updated_at`
|
|
|
|
### Planned indexes / constraints
|
|
|
|
- Unique index on `download_id`.
|
|
- Index on `md5` for reverse lookup.
|
|
- Index on `crc32` for reverse lookup.
|
|
|
|
---
|
|
|
|
## 4) Define JSON Snapshot Format (Plan Step 4)
|
|
|
|
### Location
|
|
|
|
- Default: `data/zxdb/software_hashes.json` (or another agreed path).
|
|
|
|
### Structure
|
|
|
|
```json
|
|
{
|
|
"exportedAt": "2026-02-17T15:18:00.000Z",
|
|
"rows": [
|
|
{
|
|
"download_id": 123,
|
|
"md5": "...",
|
|
"crc32": "...",
|
|
"size_bytes": 12345,
|
|
"updated_at": "2026-02-17T15:18:00.000Z"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
### Planned import policy
|
|
|
|
- If snapshot exists: truncate `software_hashes` and bulk insert.
|
|
- If snapshot missing: log and continue without error.
|
|
|
|
---
|
|
|
|
## 5) Implement Tape Image Update Workflow (Plan Step 5)
|
|
|
|
### Planned script
|
|
|
|
- `bin/update-software-hashes.mjs` (name can be adjusted).
|
|
|
|
### Planned input dataset
|
|
|
|
- Query `downloads` for tape-image rows (filter by `filetype_id` or joined `filetypes` table).
|
|
|
|
### Planned per-item process
|
|
|
|
1. Resolve local zip path using the same CDN mapping used by `sync-downloads`.
|
|
2. Compute `_CONTENTS` folder name: `<zip filename>_CONTENTS` (exact match).
|
|
3. If `_CONTENTS` exists, keep it untouched.
|
|
4. If missing, extract zip into `_CONTENTS` using a library that avoids shell expansion issues with brackets.
|
|
5. Locate tape file inside (`.tap`, `.tzx`, `.pzx`, `.csw`):
|
|
- Apply a deterministic priority order.
|
|
- If multiple candidates remain, log and skip (or record ambiguity).
|
|
6. Compute `md5`, `crc32`, and `size_bytes` for the selected file.
|
|
7. Upsert into `software_hashes` keyed by `download_id`.
|
|
|
|
### Planned error handling
|
|
|
|
- Log missing zips or missing tape files.
|
|
- Continue after recoverable errors; fail only on critical DB errors.
|
|
|
|
---
|
|
|
|
## 6) Implement JSON Export Lifecycle (Plan Step 6)
|
|
|
|
- After each bulk update, export `software_hashes` to JSON.
|
|
- Write atomically (temp file + rename).
|
|
- Include `exportedAt` timestamp in snapshot.
|
|
|
|
---
|
|
|
|
## 7) Reimport During Wipe (`bin/import_mysql.sh`) (Plan Step 7)
|
|
|
|
### Planned placement
|
|
|
|
- Immediately after database creation and ZXDB SQL import completes.
|
|
|
|
### Planned behavior
|
|
|
|
- Attempt to read JSON snapshot.
|
|
- If present, truncate and reinsert `software_hashes`.
|
|
- Log imported row count.
|
|
|
|
---
|
|
|
|
## 8) Add Idempotency and Resume Support (Plan Step 8)
|
|
|
|
- State file similar to `.sync-downloads.state.json` to track last `download_id` processed.
|
|
- CLI flags:
|
|
- `--resume` (default)
|
|
- `--start-from-id`
|
|
- `--rebuild-all`
|
|
- Reprocess when zip file size or mtime changes.
|
|
|
|
---
|
|
|
|
## 9) Validation Checklist (Plan Step 9)
|
|
|
|
- `_CONTENTS` folders are never deleted.
|
|
- Hashes match expected MD5/CRC32 for known samples.
|
|
- JSON snapshot is created and reimported correctly.
|
|
- Reverse lookup by `md5`/`crc32`/`size_bytes` identifies misnamed files.
|
|
- Script can resume safely after interruption.
|
|
|
|
---
|
|
|
|
## 10) Open Questions / Confirmations (Plan Step 10)
|
|
|
|
- Final `software_hashes` column list and types.
|
|
- Exact JSON snapshot path.
|
|
- Filetype IDs that map to “Tape Image” in `downloads`. |