C64 NMI Setup: Non-Maskable Interrupt Guide
Overview
Documentation covering Non-Maskable Interrupts (NMIs) for Commodore 64 development remains sparse across available resources. This reference guide addresses that gap by exploring NMI functionality—an advanced interrupt mechanism operating parallel to standard raster interrupts—providing assembly programmers with thorough implementation guidance.
The Non-Maskable Interrupt represents one of the 6502 processor’s most powerful yet underutilized features on the Commodore 64. Unlike standard IRQs that can be disabled through the SEI instruction, NMIs cannot be masked by software—hence their designation as “non-maskable.” This characteristic makes them invaluable for timing-critical operations where guaranteed execution takes precedence over flexibility.
Within the C64’s architecture, NMIs serve multiple potential roles: they can function as a secondary interrupt system running independently from raster-based IRQs, provide ultra-precise timing for audio synchronization, or enable complex multi-layer visual effects requiring sub-scanline precision. Professional demoscene productions frequently exploit NMI capabilities to achieve effects impossible through conventional interrupt handling alone.
NMI Fundamentals
NMIs operate through CIA Chip #2’s timer system rather than the VIC-II’s raster-triggered IRQ mechanism. The key distinction: NMIs activate based on accumulated CPU cycles instead of vertical screen positioning. This fundamental difference creates both opportunities and challenges for developers.
The 6502 processor’s NMI line connects to CIA#2’s interrupt output through hardware logic. When CIA#2’s Timer A or Timer B underflows (counts from zero to $FFFF in one-shot mode, or reloads in continuous mode), the chip can trigger an NMI by pulling the processor’s NMI pin low. The processor completes its current instruction, pushes the program counter and status register to the stack, then vectors through addresses $FFFA-$FFFB to the NMI handler.
The hardware priority system ensures NMIs always preempt active IRQ handlers. If an IRQ routine is executing when an NMI triggers, the processor suspends IRQ processing immediately, services the NMI, then resumes the interrupted IRQ handler. This hierarchical execution model enables sophisticated multi-threaded programming patterns—a critical capability for complex visual effects requiring both raster-synchronized and cycle-accurate timing simultaneously.
One important consideration involves the NMI edge detection mechanism. The 6502 triggers on the falling edge of the NMI signal, not the level. This means the NMI line must return high before another NMI can register. Reading the CIA#2 interrupt control register ($DD0D) acknowledges pending interrupts and releases the NMI line, enabling subsequent triggers. Failure to properly acknowledge NMI sources creates conditions where subsequent interrupts appear to be ignored.
Cycle Timing
PAL system timing calculations follow this pattern:
target raster lines × 63 cycles per line = cycle count
A practical example: achieving a 10-line interval requires configuring 630 CPU cycles (10 × 63 = 630).
Understanding the relationship between CPU cycles and display timing proves essential for effective NMI programming. The PAL C64 operates at 985,248 Hz (approximately 0.985 MHz), executing one cycle per clock period. Each complete video frame spans 312 raster lines, with each line consuming exactly 63 CPU cycles. Simple multiplication yields 19,656 cycles per frame—a critical value for wraparound calculations.
NTSC systems operate at 1,022,727 Hz with 65 cycles per line across 263 lines per frame, totaling 17,095 cycles. Developers targeting both standards must account for these timing differences, either through runtime detection and adjustment or by providing separate code paths for each video standard.
The 16-bit timer registers ($DD04-$DD05 for Timer A, $DD06-$DD07 for Timer B) accept values from 0 to 65,535. Since timers count down and trigger on underflow (transition from 0 to $FFFF), the actual delay equals the loaded value plus one cycle. Loading $0000 creates a one-cycle delay; loading $FFFF creates a 65,536-cycle delay—sufficient for approximately 3.3 PAL frames or 3.8 NTSC frames.
Precise timing calculations must account for several additional factors: the latency between timer underflow and NMI handler entry (7 cycles for the processor’s NMI response sequence), any jitter introduced by instruction boundaries (the 6502 completes its current instruction before responding, adding 0-7 cycles of variance), and the overhead of the handler’s entry sequence itself. Professional implementations employ jitter compensation techniques to achieve cycle-exact timing despite these variables.
Setup Procedure
Recommended initialization follows three sequential phases:
- Configure a temporary raster interrupt targeting a specific scanline
- Trigger NMI chain initialization from within that IRQ handler
- Deactivate the bootstrap interrupt and establish primary IRST sequencing
This phased approach ensures NMI timing synchronizes with the display frame from the first trigger. Direct initialization from main code produces unpredictable starting positions relative to screen rendering, potentially causing visual artifacts during the initial frames of execution.
Phase 1: Bootstrap IRQ Configuration
Establish a standard raster interrupt targeting a scanline safely above your intended NMI operating region. The bootstrap handler will execute once, configure the NMI system, then deactivate itself. This approach guarantees consistent frame-relative timing for all subsequent NMI triggers.
Phase 2: NMI System Activation
Within the bootstrap IRQ handler, configure CIA#2’s timer registers with your calculated initial delay value. Enable NMI generation by writing to $DD0D with bit 7 set (source enable) and the appropriate timer bit (bit 0 for Timer A, bit 1 for Timer B). Start the timer by configuring $DD0E (Timer A) or $DD0F (Timer B) with the desired operating mode—typically one-shot mode with automatic reload for chained NMI sequences.
Phase 3: Transition to Production Configuration
After NMI initialization completes, the bootstrap handler reconfigures itself as your primary raster interrupt handler (or removes itself entirely if NMIs alone handle all timing requirements). The NMI system now operates independently, triggering at calculated intervals regardless of main code execution or IRQ activity.
A critical implementation detail: the KERNAL’s default NMI handler at $FE47 checks for the RESTORE key and performs cartridge detection. Production code must redirect the NMI vector at $0318-$0319 to bypass KERNAL processing, or operate with KERNAL ROM disabled by configuring the processor port at address $01.
Register Preservation
Zero-page storage outperforms stack operations for register preservation:
; Handler entry (3 cycles)
STA ZP_NMI_HOLD_A
; Handler exit (3 cycles)
LDA ZP_NMI_HOLD_A
; Combined: 6 cycles versus 29 cycles for stack push/pull sequences
The dramatic cycle savings stem from the 6502’s addressing mode architecture. Stack operations (PHA, PHP, PLA, PLP) require internal processor overhead for stack pointer manipulation. Zero-page absolute addressing provides direct memory access without this overhead.
For complete register preservation, allocate three dedicated zero-page locations for A, X, and Y register storage. The entry sequence becomes:
STA ZP_HOLD_A ; 3 cycles
STX ZP_HOLD_X ; 3 cycles
STY ZP_HOLD_Y ; 3 cycles
; Total: 9 cycles versus 21+ cycles for stack-based preservation
Exit sequences mirror entry, restoring registers in reverse order to maintain correct values. Note that the processor status register cannot be preserved through zero-page storage—if flags must survive handler execution, either use stack operations for the status register alone or structure handler code to regenerate required flag states before returning.
Some implementations eliminate register preservation entirely by dedicating specific registers to NMI-exclusive use. If the NMI handler always uses the X register for indexing and never requires the main program’s X value, preservation overhead disappears. This approach demands careful coordination between main code and interrupt handlers but maximizes available cycles for actual processing.
Sequential NMI Handlers
An essential complexity: each NMI routine configures the cycle delay for the subsequent trigger rather than its own. This produces cascading dependencies:
- Handler #1 configures delay preceding Handler #2
- Handler #2 configures delay preceding Handler #3
- Pattern continues through the chain…
This forward-configuration model creates a conceptual shift from standard raster interrupt programming. With raster IRQs, each handler typically configures its own trigger line for the next frame. With NMI chains, handlers configure the timing for the next handler in sequence, not themselves. The final handler in the chain configures the wraparound delay that positions the first handler correctly for the subsequent frame.
Chain Implementation Patterns
Two primary approaches exist for managing NMI handler chains:
Vector Table Method: Maintain a table of handler addresses indexed by a chain position counter. Each handler increments the counter (with wraparound) and updates the NMI vector to point to the next handler. This approach offers flexibility—handlers can be added, removed, or reordered by modifying table contents—but incurs overhead for vector updates and counter management.
Self-Modifying Method: Each handler contains hardcoded instructions to set the NMI vector directly to the next handler’s address. The final handler sets the vector back to the first handler. This eliminates table lookup overhead but creates tightly coupled code requiring careful maintenance when modifying the chain structure.
Wraparound Calculation
Determining the final wraparound interval:
Frame cycles total − (accumulated handler gaps) − (handler count) = wraparound value
PAL example: With 19,656 frame cycles (63 × 312), gaps totaling 6,040, and 5 handlers, the wraparound equals 19,656 − 6,040 − 5 = 13,611 cycles.
The calculation requires accounting for every cycle consumed within the frame. “Handler gaps” represents the sum of all inter-handler delays configured throughout the chain. “Handler count” accounts for the one-cycle timer underflow behavior (timers trigger on the transition from 0, not when reaching 0). Accurate wraparound values ensure the first handler triggers at the same raster position each frame, maintaining stable visual output.
Practical debugging often reveals off-by-one or off-by-few-cycles errors in wraparound calculations. Visual indicators include handlers that drift slowly downward (wraparound too large) or upward (wraparound too small) across frames. Single-cycle errors accumulate to one-scanline drift every 63 frames on PAL systems—subtle but visible during extended observation.
Dynamic Chain Modification
Advanced implementations modify NMI chains during runtime—enabling or disabling handlers based on game state, adjusting timing for different screen regions, or reconfiguring entirely for scene transitions. Such modifications require careful synchronization to prevent chain corruption. The safest approach disables NMI generation momentarily, applies modifications atomically, then re-enables with appropriate delay values.
Implementation Challenges
Raster Interrupt Blocking
Standard IRQ processing cannot preempt active NMI execution. Any raster interrupt scheduled during NMI handling will miss its trigger point, potentially disrupting the entire IRQ sequence.
This interaction creates critical timing constraints. If an NMI handler executes during the precise scanline when a raster IRQ should trigger, the IRQ remains pending but unserviced until the NMI completes. By that point, the raster beam has moved past the intended trigger line, and the IRQ handler executes at an incorrect screen position—or misses its window entirely if the handler includes position-dependent logic.
Mitigation strategies include: designing NMI handlers to complete before approaching critical raster positions, scheduling NMIs to fall between raster interrupt trigger points, or implementing hybrid architectures where NMIs handle time-critical operations and raster IRQs manage less position-sensitive tasks. Some implementations avoid the conflict entirely by using NMIs as the sole interrupt source, eliminating raster IRQs completely.
Positional Drift
NMIs differ from raster interrupts in recovery behavior. When handlers exceed allocated time, NMI positions shift progressively downward on subsequent frames rather than recovering within the same frame.
Consider the mechanism: raster interrupts trigger at absolute screen positions defined by the $D012/$D011 registers. If a raster handler runs long, subsequent handlers in the same frame trigger late, but the next frame’s first handler still triggers at its configured absolute position—the system “resets” each frame.
NMI timing operates on accumulated cycles with no frame-relative anchor. If handler execution consumes more cycles than allocated, those excess cycles delay all subsequent triggers. The delay accumulates across handlers and persists into the next frame, creating visible downward drift. Without correction mechanisms, drift continues indefinitely until the chain wraps completely around the frame—a dramatic failure mode visible as rapidly scrolling handler positions.
Prevention requires conservative handler timing budgets with safety margins, or implementing drift detection and correction logic. Detection typically involves comparing actual handler raster positions against expected values (readable from $D012) and adjusting wraparound delays to compensate. This creates a feedback loop that maintains stable positioning despite minor timing variations.
Optimization Strategies
Jitter Compensation
CIA#1’s Timer A enables precise stabilization through the “Inverted Timer Method”—measuring execution variance and routing through instruction sequences of varying lengths to achieve cycle-accurate timing.
The technique exploits timer countdown behavior to measure jitter precisely. Before critical timing sections, start a short-duration timer. Upon NMI entry (or at any synchronization point), read the timer value. This value represents accumulated jitter—the variance in execution timing due to instruction boundary effects and interrupt response latency.
The measured jitter value becomes an index into a delay table. Each table entry contains a series of NOP instructions or other delay-producing code, with lengths calibrated to compensate for the corresponding jitter amount. Executing through the appropriate table entry consumes the variable cycles needed to achieve consistent total timing, regardless of initial variance.
Implementation complexity lies in table construction: each entry must delay precisely the right number of cycles, and the table-lookup mechanism itself consumes cycles that factor into the calibration. Standard implementations require careful measurement and adjustment during development to achieve true cycle-accuracy.
Exit Optimization
Standard BIT $DD0D + RTI exit sequences can be improved: store #$40 (the RTI opcode) at $DD0C, then execute JMP $DD0C to eliminate one cycle per handler exit.
The underlying principle exploits CIA register memory mapping. Addresses $DD0C and $DD0D occupy adjacent memory locations. Reading $DD0D acknowledges the NMI interrupt (necessary to re-enable subsequent triggers). By placing an RTI opcode ($40) at $DD0C and jumping to that address, the processor executes RTI immediately after the implicit $DD0D read effect occurs.
The savings appear modest—one cycle per handler exit—but accumulate across complex systems. A five-handler chain executing 50 frames per second saves 250 cycles per second. In cycle-constrained implementations, such savings may enable additional features or improve timing margins.
Undocumented Opcodes
The 6502 processor responds to all 256 possible opcode values, though only 151 are officially documented. Many undocumented opcodes perform useful combined operations at reduced cycle counts. Within NMI handlers, opcodes like LAX (load A and X simultaneously), SAX (store A AND X), or ANC (AND with carry update) provide cycle savings unavailable through documented instructions.
Careful consideration applies: undocumented opcode behavior varies slightly across processor revisions, and some opcodes produce unpredictable results. Well-characterized “safe” undocumented opcodes appear in established 6502 references, but testing across target hardware variants remains advisable for production code.
CIA#2 Register Map
| Address | Function |
|---|---|
$DD04-$DD05 |
Timer A interval (inter-NMI cycle count) |
$DD06-$DD07 |
Timer B interval |
$DD0D |
Interrupt control/status (bit 7 indicates NMI state) |
$DD0E/$DD0F |
Timer A/B configuration |
Performance Trade-offs
NMIs provide sub-scanline timing precision exceeding standard raster interrupts, which operate at full-line resolution. This granularity requires significantly more complex implementation and debugging.
The precision advantage manifests in several ways. Raster interrupts trigger once per scanline at most—63 possible positions per frame on PAL systems. NMI timers subdivide time into individual CPU cycles, providing 19,656 possible trigger points per frame. This 312× increase in temporal resolution enables effects requiring mid-scanline timing changes, such as sprite multiplexing within horizontal bands narrower than sprite height.
However, increased precision creates proportionally increased complexity. Debugging NMI timing issues requires cycle-level analysis rather than scanline-level observation. Visual symptoms of NMI problems often appear subtle—slight positional jitter, intermittent glitches, or gradual drift visible only over extended observation periods. Development time investment scales accordingly.
When to Use NMIs
NMI-based architectures prove advantageous when:
- Multiple precisely-timed operations must occur within single scanlines
- Timer-based timing provides more natural program structure than position-based triggering
- Guaranteed interrupt execution (non-maskable property) outweighs flexibility concerns
- Combined NMI/IRQ architectures enable complexity beyond either system alone
Raster interrupts remain preferable when position-synchronized operations dominate requirements, when simpler debugging justifies slightly reduced capabilities, or when team familiarity with IRQ programming exceeds NMI experience.
Practical Applications
Common NMI applications in C64 development include:
Sprite Multiplexing: NMIs excel at repositioning sprites during screen rendering. The precise timing enables “splite” techniques where sprites are reconfigured every few scanlines, dramatically increasing effective sprite counts.
Audio Synchronization: Music players requiring consistent playback timing benefit from NMI-based scheduling, executing update routines at fixed intervals independent of main program complexity.
Multi-Layer Parallax: Complex scrolling systems with numerous speed layers require many screen splits. NMI chains can manage a dozen or more parallax boundaries where raster IRQ chains become unwieldy.
Hybrid Processing: Combining NMIs for timing-critical sprite operations with raster IRQs for position-critical color changes leverages the strengths of both interrupt types.
Summary
Despite increased implementation difficulty compared to raster interrupts, NMIs unlock advanced multi-interrupt architectures essential for complex visual effects—particularly multi-layer parallax scrolling requiring numerous precise screen splits. The investment in understanding NMI mechanics pays dividends across ambitious Commodore 64 projects, enabling visual sophistication that distinguishes professional-quality productions from conventional implementations.
Developers approaching NMI programming for the first time should expect an extended learning curve. Prototype implementations, thorough testing, and incremental complexity increases produce better outcomes than attempting full NMI architectures immediately. The techniques documented here provide a foundation, but practical experience remains essential for mastery.
See also: interrupt register preservation · sprite multiplexing via frame alternation · Deep Winter prototype using NMI-driven sprites