|
GE-115 Emulator
An Emulator of the General Electrics GE-115 computer
|
How the .cap card-image captures are structured, and the four hole-pattern → byte conversions used to read GE-120 decks. The column-binary ("by-pass") conversion is the one this project cracked empirically; the others are documented here alongside it because they share the same on-card geometry.
Scope: this is a reverse-engineering findings document. Verified claims are backed by the Burroughs reader firmware (ground truth), the
gemutranscoder, and direct decode ofsoftware/DUMP1/funktionalcpu.cap. OCR/uncertain items are flagged. Confidence labels: high / medium / low.
The capture hardware is the project's own Raspberry-Pi-Pico reader; its firmware defines the .cap format, so it is the authority for every bit position:
software/burroughs-card-reader/main.c — sampling + dump format.software/gemu/cap.c, cap.h — the .cap parser used by the emulator.software/gemu/transcode.c, transcode.h — the four conversions.software/DUMP1/funktionalcpu.cap / .bin — the oracle pair.Everything below that says "verified" was checked against one of these.
A standard 80-column Hollerith card has 12 punch rows per column. Top to bottom they are:
A column is the set of rows punched in that vertical slice. The reader samples all 12 rows of a column simultaneously into one value.
A .cap file is plain ASCII text produced by the reader's USB console. It contains two independent dumps of the same deck, selected by which key the operator pressed during capture (main.c:191/:213):
a/A): for each card, a header line Card n. N followed by 80 four-hex-digit tokens, one per column — the raw 13-bit sample value (main.c:217, printf("%04X ", …)).Card n. N followed by 12 rows of 80 characters, * = hole, _ = no hole, printed in row order 12, 11, 0, 1, 2 … 9 (main.c:194-207).So a deck of C cards yields 2·C Card n. headers. For funktionalcpu.cap: 228 headers = 114 hex cards + 114 ASCII cards (verified by count). The emulator's parser (cap.c) keys on the literal Card n. prefix, collects only strict 4-hex-digit tokens, ignores everything else (counters, FEED ON/OFF, the hole-art), and masks each value to 13 bits.
Each sample is gpio_get_all() & BITMASK, where **BITMASK = 0x1BFF** (main.c:14). In the 13-bit value:
| cap value bit | card row | notes |
|---|---|---|
| 0 | row 0 | the "0 / ten" zone row |
| 1 … 9 | rows 1 … 9 | digit rows |
| 10 | — | always 0 — GPIO 10 is not wired (main.c:141, BITMASK clears 0x400) |
| 11 | row 11 | X zone |
| 12 | row 12 | Y zone |
The bit-10 hole is the crux. The reader's data pins are GPIO 0–12 except 10 (deliberately skipped in init and masked out of every sample). So a cap value's bit b equals card row b for b ∈ {0–9, 11, 12}, and bit 10 is structurally absent. Miss this and every high-nibble decode is shifted.
The single fact that unblocked the column-binary decode:
What the firmware labels "row 0 / bit 0" is the punch-card "ten" row — the zero-zone row that sits third from the top (12, 11, 0, 1…9), not a digit position below row 9.
Two consequences followed:
0,1,2,3, …, the opposite of treating row 9 as the high bit.Combined, these gave the B2R row map in §4.3. Confidence: high — the resulting decode produces valid, contiguously-loadable GE-120 code (§5).
The same column geometry feeds four different read modes. Mode is chosen by the reader command / deck type, not by the card.
| Mode | enum | Use | Output rule |
|---|---|---|---|
| Normal (Hollerith→GE) | TC_NORMAL | text / loader cards | 13-bit lookup table |
| Binary passthrough | TC_BINARY | raw low-8-bit reads | column & 0xFF |
| Loader hex-nibble | TC_HEX | the bootstrap loader listing | nibble = Σ set digit-row indices |
| Column-binary / by-pass | TC_COLBIN | program (data) cards | byte bit i ← row B2R[i] |
An empirically derived 13-bit→byte table (transcode.c, 8192 entries). Built by aligning all 9120 (card,column) pairs of the 114 hex cards against the 9120 bytes of funktionalcpu.bin: 267 distinct inputs, 0 collisions, 18 non-space outputs; everything else defaults to 0x20 (GE space / blank column). The observed digit mappings (single-row punches) are:
plus a handful of zone+digit combos (<, @, #, X, Y, ,, +, …). funktionalcpu.cap → funktionalcpu.bin is an exact byte match, which is the table's acceptance test (tests/transcode.c).
⚠️ These output codes are EBCDIC, not the GE machine's own character code.
0xF0–0xF9= digits,0xE7=X,0x6B=,are EBCDIC code points. The documented GE-120 transcoder (CRZ[2] §5.3 Table 3) instead converts card punches to the GE-100 internal graphic code (digit0=0x40,J=0xA1; APS Reference Manual Fig 3 — seedocs/ISA.md§2.1).TC_NORMALmatches the externally-supplied.bin(an EBCDIC rendering), not the machine code; it is only used for NORMAL-mode text, never for program loading (which is by-pass/COLBIN, §4.3). TreatTC_NORMAL's glyph labels as EBCDIC.
The bootstrap loader is punched as hex digits where a nibble equals the sum of the set digit-row indices (rows 0–9; zone rows ignored), and two columns are nibble-packed into one byte (transcode.c:562). E.g. row6+row8 → 6+8 = 14 = 0xE. Verified against the loader listing (38 of 40 bytes; the 2 differences are OCR noise in a trailing descriptor) — confidence medium on those 2 bytes.
One column → one 8-bit byte, NOT nibble-packed. Each output bit is taken from a specific row:
So the digit rows split into two nibbles around the unused middle rows 4/5, and the zone rows 11/12 carry no data in this mode.
Confirmed by the manual (was empirical). CPU[1] *(GE 120 CENTRAL PROCESSOR [1]*, the DIAGNOSTIC ORGANIZATION volume, dwg 4T4714100UA, folio 53a / PDF p.332) prints the binary card bit↔row "law of correspondence" verbatim:
bit 07 06 05 04 03 02 01 00 (note: bit 00 = minimum weight)row 0 1 2 3 6 7 8 9This matches
B2Rexactly (bit7←row0 … bit0←row9; rows 4/5 skipped), upgrading the COLBIN decode from reverse-engineered to document-confirmed (high). The same encoding is the documented binary-card format for the CPU ISOLATION TEST medium (§5.1 below), so it is the general GE column-binary convention, not specific to funktionalcpu.
Worked example — the same column value 0x0202 (rows 9 and 1 punched) under each mode:
| Mode | result | reasoning |
|---|---|---|
TC_COLBIN | 0x41 | bit0←row9=1, bit6←row1=1 → 0100 0001 |
TC_NORMAL | 0x20 | 0x0202 not in the observed table → space |
TC_BINARY | 0x02 | low 8 bits of 0x0202 |
The mode you pick changes the byte completely — which is why the program payload read as nonsense under TC_NORMAL.
Decoding the 114 hex cards of funktionalcpu.cap with TC_COLBIN reveals a clean, self-describing data-card format. Cards 6–111 are the program payload; each is:
| Field | Cols | Value (verified) | Meaning |
|---|---|---|---|
| Identifier prefix | 0–7 | 00 04 40 00 20 40 40 42 (this deck) | card-type tag — per-deck constant (see below) |
LL | 8 | 0x41 (= 65) | payload length − 1 → 66 payload bytes (LL+1) |
| Load address | 9–10 | big-endian | absolute target address of byte 0 of payload |
| Payload | 11–76 | 66 bytes | the program image fragment |
| Sequence/check | 79 | rotating (40,20,10,08,04,02,01,00…) | per-card sequence or check byte (meaning unconfirmed) |
Contiguous, self-describing load (verified): the load addresses step by exactly **66 (0x42)** per card — 0x0100, 0x0142, 0x0184, 0x01C6, … — covering **0x0100 … 0x1C54** with no gaps. Combined with an entry jump JU 0x0100, the deck boots into the program at 0x0100.
The identifier prefix is PER-DECK, not universal (verified, high — 2026-05-29). The earlier "constant tag" claim holds within a deck but not across decks. Decoding the full software/DUMP1/*.cap set with the gdis depuncher shows each CPU-program deck carries its own constant 8-byte prefix on its data cards:
| Deck | Data-card prefix | Data cards |
|---|---|---|
funktionalcpu | 00 04 40 00 20 40 40 42 | 108 |
control-program-cr (+ -copia) | 00 04 80 04 02 40 40 04 | 61 |
ls600-doe | 00 04 80 00 00 40 10 02 | 82 |
printermechanicaltest | 00 04 40 01 20 40 80 02 | 40 |
The family signature is 00 04 .. .. .. 40 .. .. (bytes 0–1 = 00 04, byte 5 = 40; bytes 2,3,4,6,7 vary per deck — likely a deck/title identifier or checksum, meaning unconfirmed). Consequence for tooling: a depuncher must auto-detect the deck's dominant prefix rather than hardcode one value; gdis does this. Confidence: high for "per-deck constant", low for the meaning of the varying bytes.
A second card framing exists — now documented (2026-05-29). The isolationcpu01/02/03 and isolat-dsu-erganz-cpu decks do not use the 00 04 .. record format above; they are the CPU ISOLATION TEST medium, whose card layout CPU[1] (DIAGNOSTIC ORGANIZATION, dwg 4T4714100UA folio 52 / PDF p.330) specifies directly:
.cap mis-frames them);001,002,…,999,A00,A02,…,A99,B00,…; digit = punch in that row, A/B/C = row 12 + row 1/2/3) — metadata, not payload;So extracting these decks means: drop the title + summary cards, read cols 1–76 as 76 COLBIN bytes per card (ignore 77–80), then interpret the byte stream as SMAC WORDS via INTE — a separate effort from the funktionalcpu loader. There is also a distinct SMAC "system composition card" (CPU[1] folio 45, PDF p.65): cols 9–14 = length (L−1) + system-table address, cols 15–74 = peripheral codes.
**gdis now implements this extraction family-aware** (auto-detected, or forced with --iso). It emits cols 1–76 (COLBIN) of every card whose Hollerith identifier (cols 77–79) decodes cleanly and progresses as the isolation deck's card number, concatenated in deck order at --org, and skips cards with a blank/garbled identifier (a stray title/summary/blank separator) rather than assuming fixed first/last positions — the captured decks already have the title and summary removed. Verified on the four CPU-ISOLATION decks, whose identifiers run contiguously (001–210, 211–502, 503–…, A00–…). The resulting byte stream is the SMAC program; interpreting it (loader + INTE + WORDS) is future work — gdis --hex / --image give the bytes today.
The ls600-* / sat-ls600 / printer* / reading-* decks are peripheral test decks — even where they share the 00 04 framing, their payloads are test patterns, not 0x0100 CPU programs.
Order-independence (verified, notable): cards 107–108 are physically out of sequence in the capture (their addresses 0x18FA, 0x193C belong between cards 98 and 99), yet because each card carries its own load address the image is still assembled correctly. The deck can be shuffled without corrupting the load — a deliberate robustness property of the format.
A few cards are not 66-byte payload cards (different LL); they appear to be loader/patch/terminator records. Roles are inferred, not confirmed:
| Card | LL | Addr | Note |
|---|---|---|---|
| 1–5 | mixed | 0404, 4010, 0A00, 4001, 0A02 | loader / patch / control (purpose TBD) |
| 112 | 00 | **0E00** | all-zero payload at **0x0E00** — the console test-option byte (explains the idle-HLT: no test selected) |
| 113 | 07 | 0000 | short record near address 0 (entry/jump area) |
| 114 | 20 | 1040 | trailing record |
The card-112 → 0x0E00 correspondence is consistent with the observed run halting in the "no test selected" branch when that byte is zero.
| Claim | How verified | Confidence |
|---|---|---|
.cap = 2 dumps (hex + hole-art), 228 = 114+114 | header count + main.c dump code | high |
| cap bit b = row b; bit 10 absent | BITMASK 0x1BFF, GPIO init skips 10 | high |
TC_NORMAL table | exact match funktionalcpu.cap→<tt>.bin (9120 B) | high |
TC_COLBIN B2R map | decodes to valid GE code; program runs to HLT | high |
TC_HEX nibble-sum | loader listing match (38/40 bytes) | medium |
| Data-card layout (LL/addr/66 B/prefix) | direct decode of 106 cards, addresses step 66 | high |
| col-79 meaning | pattern observed, semantics not pinned | low |
| Control-card roles (1–5, 112–114) | addresses decoded, purpose inferred | medium |
40,20,10,08,04,02,01,00…; is it a sequence number, a checksum, or a column-sync mark? (low)TC_COLBIN; do any GE program decks use them (e.g. for a sign/flag)? (open)0x0404 overlapping the main image. (medium)TC_HEX 2-byte discrepancy** — confirm whether OCR noise or a real end-of-record descriptor. (medium)cap.c → per-card arrays of 80 × 13-bit columns.TC_HEX; text → TC_NORMAL; program payload → TC_COLBIN.LL, bytes 9–10 = load address, bytes 11..(11+LL) = LL+1 payload bytes; write them to mem[address …].0x0100 (or the deck's entry jump).The gdis depuncher implements exactly this and can write the reconstructed image as a gemu-loadable unified binary (gdis --image -o deck.bin deck.cap; see docs/binformat.md), bridging a Hollerith .cap deck to the binary load path (./ge deck.bin).
When a base deck needs extra payload cards appended, use tools/capcat:
capcat copies the base deck's parsed hex-token cards, auto-detects its dominant 8-byte program-card prefix, and emits appended self-addressed COLBIN cards (LL, load address, payload). The result is a real .cap medium that can be scatter-loaded (./ge combined.cap) or run through the authentic reader path (./ge --deck combined.cap).
Findings derived from the project's own reader firmware and the gemu transcoder, cross-checked against funktionalcpu.cap/.bin. The column-binary (B2R) decode was solved 2026-05-28; the program-card layout was re-verified by direct decode for this document.