The Wild Wood Deconstructed
- IRST-NMI hybrid interrupt engine
- Fast colour scroll
- Interwoven char scrolling
- Scroll tables
- Maintaining a stable raster
- Avoiding sprite jitter / splitting
- Sprite data streaming
- Future optimisations
- Conclusion & some little requests
As subscribers to my newsletter will already know, over the summer I took a break from coding
Parallaxian to help out my very good friend, John Henderson,
with his magnificent game in development for the Commodore 64, The Wild Wood.
My role with the project is to code - after I finish Parallaxian - the "chase levels", featuring ambitious parallax scrolling landscapes and high octane gameplay, a preview video of the first such level (the Moonlit Fields) being recently released as a tweet on Twitter and in higher quality on YouTube (see the clip below).
I felt that, since the core tech for this kind of parallax scrolling game world already existed for my own projects, it would be a relatively straightforward matter to port it
into John's game.
As things transpired, it was a little more complicated than that; it always is with C64 programming!
So, to coincide with the preview video release, this article delves into the technical side of the game to reveal some of the secrets used to squeeze 8 layers of parallax scrolling (most of which are full colour scrollers) into each screen refresh along with extensive sprite multiplexing, collision detection routines, sprite feeder + scroller routines, game logic and music.
Of course, these tricks or hacks or whatever one wishes to call them are built on the foundations of using non-standard opcodes for additional speed gains where possible and that longstanding staple of fast coding on the C64 we call the unrolled loop (or "speedcode" as some like to describe it).
IRST-NMI HYBRID INTERRUPT ENGINE
As stated in previous blog posts, both of my next generation Commodore 64 games in development
(Parallaxian and Deep Winter),
use a hybrid NMI-IRST interrupt configuration
(NMI = Non-Maskable Interrupt,
IRST = Raster Interrupt).
This "dual threaded" interrupt approach facilitates sprite multiplexing that overlaps parallax scrolling landscapes in a very CPU-cycle-efficient fashion that would be impossible if we were to rely on the IRST alone to perform all raster-precise on-screen tasks.
This is because the NMI takes precedence over any IRST code being executed, meaning that when an NMI is triggered, the CPU suspends running the IRST handler code and instead executes the NMI handler code, not resuming the IRST handler code until the NMI handler code finishes executing.
In plainer terms, the NMI is an interrupt that interrupts other interrupts.
The practical, real world application of this as used in my games - and now also, as used in The Wild Wood chase levels - is to deploy the NMI to interrupt lengthy IRST tasks, specifically the time-consuming char and colour RAM scrolling, by nipping in to plex sprites and out again ultra rapidly, without any risk of stalling the IRST or crashing the system.
Consider, then, the interrupt schema below.
The thin raster bars represent NMI instances firing at fixed screen positions to plex the large tree sprites, regardless of what's going on with the IRST and are, as you might expect, 21 raster lines apart, since 21 pixels is the height of an unexpanded (in y-direction) sprite.
The sky blue area that most of them slice through represents the character and colour RAM scroll code for the foreground foliage (the 6th layer of parallax scrolling), executed when required by the IRST handler for the 3rd layer of the landscape (remember, we never perform scroll code for any given layer within that layer's own IRST zone as that would cause jerkiness / flicker).
The NMIs cut through the IRST code like a hot knife through butter, performing the sprite plexing ultra fast before leaving to allow the IRST code to continue where it left off.
In very emphatic terms, therefore, the IRST-NMI hybrid interrupt approach is a veritable game-changer... in both metaphorical and literal senses!
SPECIAL TERM: INTERRUPT HANDLER
This is merely the code that is executed to perform whatever useful tasks are assigned to it when the interrupt fires.
For split-screen scenarios, each split will typically have its own unique handler which, after it executes its main task(s), alters the interrupt vectors so that the unique handler for the next screen split will begin executing as soon as the raster reaches the designated trigger position (in the y-axis) of the next interrupt handler.
With a raster interrupt (IRST), the next trigger position is set by a write to the raster register, $D012, with a value equal to the line on which the next screen split must fire, whereas with a timer interrupt (the NMI or default IRQ) it's a case of altering the relevant CIA chip's frequency settings to match the amount of raster lines (or more precisely, the equivalent number of CPU clock cycles) required before the next split is to occur.
When the final handler (i.e. at the bottom of the screen) has finished its main task(s), it sets the vectors to the very first handler allocated to the top of the screen and the process repeats ad infinitum.
(Back "in the day", before the internet was a thing in the modern sense, there was often no consensus of terminology and so a lot of coders - myself included - would simply describe handlers as "interrupts" and leave it at that).
FAST COLOUR SCROLL
Whereas it is possible to drastically reduce the
consumed in character scrolling by using a double-buffering approach, as per my blog article on
it is impossible to apply the same methodology to the colour RAM as that is always at a fixed location on the Commodore 64: $D800 - $DBE7.
(NOTE: The Commodore 128, however, does allow the colour RAM to be switched, thus opening the door to double-buffered colour RAM scrolling on that machine in its native mode).
There is no natural remedy for this (with regard to horizontal scrolling) on the Commodore 64 beyond the VSP scroll, as most notably used in Mayhem in Monsterland, but that technique appears in some quarters to have fallen into disrepute due to the risk its poses to real C64 hardware, despite there being a (somehwat cumbersome) workaround.
However, in scenarios where there are long blocks of continuous colour, for example, 8+ chars wide in any given row, there is another way.
Essentially, this method entails only scrolling the start and end positions of a colour block on any given row of characters.
So, instead of scrolling the entire row of colour RAM values in sync with the characters, we just scroll the interfaces between continuous colour blocks (i.e. the "ends" of each such block), giving the illusion of a full row's worth of colour RAM scrolling but with the advantage of far less raster time being consumed.
In the case of The Wild Wood, this method is used for the rolling hills in the background - not on every row of the hills, though, just on the rows that have appropriately long continuous blocks of the same colour; the unsuitable rows were full-colour scrolled using the vanilla method.
The net outcome is something of the order of 6-8 raster lines saved on the full colour scrolling for those hills, which contributes massively to making this game sequence possible.
(NOTE: I am unaware of any community-received term for this colour scroll hack, hence the best I could think of was the rather unimaginative fast colour scroll!)
This trick - which, by the way, is not used in my other games in development (so far, that is!) - could be combined with double-buffering at a later stage on the hills if RAM allows for even more raster time saving on scrolling that layer, but for now, with every byte of RAM in the selected 16K VIC bank being used, the decision was made to hold-off on that.
It could also conceivably be used with simple char scrolling, perhaps on a Scramble style game.
INTERWOVEN CHAR SCROLLING
With different screen zones scrolling at different speeds (which is, by definition, parallax scrolling), it becomes possible to synchronise some of the time-consuming char scrolls
(which occur when the 8 pixel horizontal hardware scroll register, $D016, is reset back by 8 pixels) so as to never occur for two different scrolling zones on the same screen frame
(where frame is defined as the entire screen rendering process from top to bottom, which takes 1/50th of a second in PAL machines).
For example, if layer 3 scrolls at 3 pixels per frame at maximum speed, layer 5 might scroll at 6 pixels per frame, i.e., at a fixed ratio of 2:1 between layers 5 and 3 respectively.
And if we set their default hardware scroll positions to be offset from one another by an odd number of say, 3 pixels, then no matter what speed the landscape scrolls at, layer 3 and layer 5 should never perform their respective time-consuming char scrolls on the same frame.
With variable running speeds for the hare, though, the door is opened to occasional instances of the supposedly interwoven char scrollers firing on the same frame, so we use a flag which, when raised, postpones one of the char scrollers until the frame following the one that would cause a clash without said flag.
This interwoven approach is thus very useful, because it means we can handle the char scrolling for two parallax layers from within a single on-screen raster zone not much bigger (in the y-direction) than the most time-consuming of the two char scrollers is in terms of raster time consumed.
By interweaving the char scrolling like this, we might also be able to save some RAM by making the smaller of the two char scrollers a rolled loop (as opposed to unrolled), if doing so still keeps it from taking more raster time than the other scroller it is interwoven with takes.
This, incidentally, is another method I first used in Parallaxian.
The scrolling, though parallax, isn't just a simple matter of moving everything at a 1:2:3:4:5:6:7:8 parallax ratio (actually it's a little different to that in the game, but you get the idea).
There are 4 forward speeds plus stationary for each parallax zone because the hare has to be able to speed up and slow down, not just go from standing still to full speed in one step.
Care has to be taken too, because no matter which of these 4 speeds the hare runs at, the parallax ratio must be maintained.
So, each parallax layer has to have its own unique set of scroll speeds, i.e., a scroll table, decoded each frame for each layer depending on the hare's actual running speed.
To set this up right, we have to think in terms of how many pixels of scrolling each layer can make over, for example, 12 frames and build the table accordingly.
In that example, the speeds might be as follows:
- Speed 0: 0 pixels every 12 frames
- Speed 1: 3 pixels every 12 frames
- Speed 2: 6 pixels every 12 frames
- Speed 3: 9 pixels every 12 frames
- Speed 4: 12 pixels every 12 frames
The idea, then, is to define the parallax scroll speeds in terms of cumulative pixels scrolled over a fixed number of frames and build the speed tables around that, taking care that they increase in consistent increments depending on the hare's speed (in this example, 3 pixels per speed increase) and that they do so while maintaining the overall speed ratios of the parallax landscape.
(It should go without saying that this taxed my concentration at times and had me frantically scribbling out fractions and scroll tables on scraps of paper!)
MAINTAINING A STABLE RASTER
If you are reading this as an experienced coder and were eagle-eyed enough to notice from the preview clips that the hare can jump quite high, traversing several raster splits, you might know how
this was done without inducing jitter / destabilising the raster at the splits.
Then again, you might still be curious to know how it was done, so here goes...
The hare's y-position is read every frame for various purposes, one of which includes using it to modify - on-the-fly - a CPU-cycle delay on the scanline affected by the hare's traverse.
This is necessary because sprites, as you probably know, steal CPU time and act, after a fashion, like another kind of interrupt that can distort or destabilise IRST trigger lines and eat into the raster time required to complete the tasks within an interrupt handler.
The simplest and most efficient workaround for that is to adjust the x-reg timer counting down to zero at the end of the Double IRQ fix used to stabilise the affected IRST trigger line, with sprite-induced jitter / instability shunted off-screen beyond the far right hand side.
With that fix in place, the hare sprite can vertically traverse the split with no adverse jitter effects.
The code snippet below shows what I mean:
|IRST2||PHA||; IRST handler begins here|
|; Double IRQ jitter countermeasure.|
|IRST2WSET||LDX #$0F||; value self-modified each frame as f(hare's y-pos)|
This is not the only way to achieve the desired effect; some other loop could be set up after the Double IRQ code and dynamically altered in response to the hare's y-position, but why do that when there is already a delay loop in place at the end of the Double IRQ?
AVOIDING SPRITE JITTER / SPLITTING
The large trees and their smaller counterparts, along with signposts, scarecrows, etc., consist of vertically stacked sprites, multiplexed in a very tight, zero-gap order.
This presents a coding challenge, or more accurately, a timing challenge to avoid jitters and tearing effects.
Plex sprites too early within an interrupt handler and they either vanish or leave gaps or splits or artefacts from previous plexings, which is very unsightly (as you can see in the image of a deliberately mistimed sprite rendering setup above) and unacceptable for a commercial game in the modern era - these issues are described in greater detail in this great article on multiplexing.
In this case, the remedy involved the judicious application of NOPs or carefully timed and actually useful, non-multiplexing code to kill CPU time where "too early" sprite-rendering was the concern.
I'm not going to pretend I got this right the first time or anywhere near it; rather, it took a lot of trial and error before it worked as required.
There was also an occasional issue where rendering plexed sprites off-screen on the RHS where their MSB = 1, led to unwanted "ghost-rendering" of a "rogue byte" of the relevant sprite in the corresponding MSB = 0 side of the screen; the crude but effective fix for that was to point the affected sprites to an empty sprite definition for the duration of the unwanted artefact.
SPRITE DATA STREAMING
An unavoidable issue raised by a detailed landscape such as the one in The Wild Wood preview is insufficient space in the designated VIC bank for all the graphics data.
For example, more tree and woodland silhouette definitions were required than available space could supply, so the solution was simple: stream the additional sprite data in from RAM outside the designated VIC bank.
Again, this is done on-the-fly by swapping the initial definitions with the replacement ones as required.
The swapping process doesn't have to be ultra fast as it is performed by the main loop of the game rather than by any interrupt; remember, interrupts are for timing-critical events whereas the main loop is for everything else.
(This case highlights one of the limitations of approaching C64 game design with the notion that everything should run exclusively from interrupts.)
So, it works by waiting for, say, the large multiplexed tree to scroll out of view on the LHS of the screen and then the swapping occurs.
By the time the new tree definition is needed, all the sprites involved have been redrawn.
The swapping code itself uses a rolled loop; remember, it happens while the sprites involved are not visible on-screen, so there is no need to waste RAM on speedcode.
As alluded to earlier, double-buffering might be introduced for some extra raster time saving on the slower scrolling layers, not simply by reducing the raster time that char scrolling those
layers would consume on any given frame, but by removing the said char scrolling element from the interrupts totally, performing it in the main loop long before its outcome is ever needed by
The music player might also be optimised; at present, it's just generic Goattracker output, but it could be made to run faster.
And of course, the planned native Commodore 128 version would avail of the 2MHz CPU speed in the upper and lower borders; in fact, the code infrastructure for that is already in place.
The C128 version might also use the colour RAM double-buffering which is also described above.
CONCLUSION & SOME LITTLE REQUESTS
Most of the techniques described above were developed and refined over a long period, chiefly for Parallaxian but also for Deep Winter, which made it a comparatively fast
process to work up the preview of The Wild Wood.
Hopefully, some of the above will prove useful if you're a games developer yourself, or at least mildly interesting if you're not.
Either way, if you haven't done so already, kindly subscribe to John's free newsletter (he's giving an exclusive Wild Wood C64 bitmap image to everyone who subscribes) and also please consider subscribing to his shiny new YouTube Channel.
And if you REALLY want to help, please also make a donation on his Ko-Fi page to support him in his work on this game.
I would, of course, also urge you to follow him on Twitter!
SUBSCRIBE TO THE KODIAK64 NEWSLETTER!
One final thing.
If you haven't already done so, kindly subscribe to the Kodiak64 newsletter, which is 100% free and 100% spam-free.
You can download a sample newsletter here (PDF format): October 2020 Newsletter
Leave a Comment
Comments are moderated to prevent spam and emails are only required to filter basic spambots; such emails are neither harvested by me nor displayed on this website.