James Cat's rough && ready blog

8th A&B computing article > Making the Most of Assembler #1

Posted in 6502 by zzjames on January 25, 2013

Another introduction to using the bbc assembler. This one deals with 2 ways of setting up memory for the code.




The code is short so you  type it in if you feel like it.

5th A&B computing article > Machine Code Capers #1

Posted in 6502 by zzjames on January 24, 2013

This article explains that doing graphics on the BBC micro consist solely of pushing a byte stream to the VDU controller (built around a Motorola 6845 CRT controller chip) . It takes the same program, one written using basic commands (move, draw, print, colour, gcol) and rewrites it first using the BASIC vdu statements, one for each of the previous commands, then rewrites it again showing the BASIC vdu command used to create the bytestream from a look up table set up by BASIC read/data statements. Finally we see the same BASIC read/data statements used in conjunction with OSACI (the OS routine at &FFE3) to achieve the same effect. Although this features no sprites or image composition it does show data being read from a table and used for graphics.



REM bytes to shoot at the VDU controller

UNTIL I%=255

OSASCI=&FFE3 :REM store address of os routine for readability
FOR O = 1 TO 3 STEP 2
\ routine to output vdu sequence
STX&71 \ address high byte (data starts at &0c 00)
STA&70 \ address low byte
LDA(&70),Y \load character
CMP #&FF \ check end of sequence
JSR OSASCI \ output character
INY \ increment the index
BNE LOOP \ check page boundary
INC &71 \ inc page if boundary crossed


4th A&B computing article > Machine Code Made Easy #4

Posted in 6502 by zzjames on January 23, 2013

These are scans from volume 1 number 13 of A&B magazine which introduce BBC machine code programming, this one is from pages 84-86.

This one plays a tune, uses the interrupt system provided by the hardware, but still there’s enough of code to demonstrate each of the different kinds of instructions.

below the article (click to expand jpegs) and then the code, for some reason I had to alter the slashes for the comments, not convinced the emulator is getting the right ascii codes…




Now the code: (watch the line breaks they shouldn’t happen in a comment)

REM ----------------------------------------------
REM ----------------------------------------------
FOR X% = 0 TO 19 : READ P%,D% : ?(&0D80+X%*2) = P% : ?(&0D80+X%*2+1) = D% : NEXT
DATA 69,16, 69,4, 73,4, 73,12, 69,2, 73,8, 69,2, 61,8, 53,2, 49,16, 81,12, 73,12, 69,2, 53,2, 73,2, 81,6, 73,6, 69,12, 61,12, 53,12

REM ———————————————-
REM ———————————————-
FOR I% = 0 TO 3 STEP 3
LDA#0 : STA&72 : STA&76 : STA&78 : LDA#&FF : STA&74 \ set the fixed parameters of the OSWORD parameter block in zero-page
STX&79 : STY&7A \ collect the tune length and the data offset from x% and y% respectively

\ store the variable parameters in the block including collecting the first note from the data store
LDA#1 : STA&71 : LDA#&F1 : STA&73 : LDY&7A : LDA&0D80,Y : STA&75 : LDA&0D81,Y : STA&77

LDY#&FF : LDX#&FA : LDA#&80 : JSR&FFF4 : TXA : BEQcheck \ check the sound buffer to see if there is space in it, check again until there is
LDX#&71 : LDY#0 : LDA#7 : JSR&FFF1 \ set x&y registers to indicate the parameter block and call OSWORD with a=7 – the SOUND command – feeds note to channel 1
LDA#&F6 : STA&73 : LDA#2 : STA&71 \ change the volume to -10 and the channel to 2
LDX#&71 : LDY#0 : CLC : LDA&75 : ADC#&60 : STA&75 : LDA#7 : JSR&FFF1 \ reset to the parameter block and add 2 octaves (96) to the pitch, then feed the note to channel 2
\ increment the data pointer to the next note data – 2 bytes forward – and decrement the number of notes to be played – if zero finish else go and do another note
INC&7A : INC&7A : DEC&79 : BEQend : JMPstart
.end RTS
MODE 7 : PRINTTAB(5,10) “yeah yeah”
X% = 20 : Y% = 0 : CALL&D00

3rd A&B computing article > Machine Code Made Easy #3

Posted in 6502 by zzjames on January 23, 2013

These are scans from volume 1 number 11 of A&B magazine which introduce BBC machine code programming, this one is about the instruction set and the different categories of instructions they are from pages 82-85 and do not have any type in code. It is just an introductory article (click to expand jpegs).












2nd A&B computing article > Machine Code Made Easy #2

Posted in 6502 by zzjames on January 23, 2013

These are scans from volume 1 number 4 of A&B magazine which introduces the concept of machine code addressing modes – featuring the fantastic ‘houses and hotels’ metaphor – this makes me think they were written by a comp. sci. professor as my teacher used a hotel metaphor to teach data structures with pointers, they are from pages 96-99 and do not have any type in code. It is just an introductory article (click to expand jpegs).








1st A&B computing article > Machine Code Made Easy #1

Posted in 6502 by zzjames on January 23, 2013

These are scans from volume 1 number 3 of A&B magazine which introduce BBC machine code programming, they are from pages 94-96 and do not have any type in code. It is just an introductory article (click to expand jpegs).




Moving multicoloured objects (electrem)

Posted in 6502 by zzjames on January 23, 2013

I got this code out of a advanced programming book for the electron:

advanced electron machine code techniques

for some reason it doesn’t work on the beebem, although I’m sure it should. for this reason I downloaded the electron emulator.

I have added a lot of comments to the code, some of the spacing of the comment bars looks odd, but when pasted into the electrem in mode 3 (80 column text mode)) it looks mostly ok.

The hairiest part is the .calcaddress subroutine, this is very specific to how the RAM addresses relate to the pixels on screen, from the .draw routine we call .calcaddress with the  x and y coordinate we want to know the screen address for and when the subroutine returns to .draw the memory address of that pixel is in the memory address loc, which is &75.

This just gives us a flavour of assembly language programming, e.g. how you must take care not to overwrite values in registers when calling subroutines (just assume all variables are global) – because some registers have to be used to perform some operations, sometimes you might have to ‘save’ the register contents into RAM before you call a subroutine and load them back when you return from a subroutine.

here’s the source code:

GOTO 140

DEF FNdataTable(N)
FOR item=1 TO N
NEXT item

REM – this is where shit lives:


REM – start two pass compilation loop:



\ set up mode 2
LDA #&16
LDA #2

\ pointers to bitmap data – addresses are 16 bit so need 2 locations

LDA #(loadbitmap1 MOD 256) \ bitmap 1 low byte
STA bitmap1
LDA #(loadbitmap1 DIV 256) \ bitmap 1 high byte
STA bitmap1+1

LDA #(loadbitmap2 MOD 256) \ bitmap 2 low byte
STA bitmap2
LDA #(loadbitmap2 DIV 256) \ bitmap 2 high byte
STA bitmap2+1

\ set up initial coordinates
LDA #0
STA sprite1_X \ 1 is 0,0
STA sprite1_Y

LDA #34
STA sprite2_X \ 2 is 34,200
LDA #200
STA sprite2_Y

\ ——————————————————————– \

.LOOP \ game loop
INC sprite1_X \ sprite movements
INC sprite1_Y \ sprite movements
DEC sprite2_Y \ sprite movements

JSR SCREEN \ this subsroutine call is in the loop

LDA sprite2_Y \ test for exit conditions
CMP #0 \ is 2 off the screen yet?

\ ——————————————————————– \

\ ————————————————–\
\ screen subroutine loads bitmapN \
\ pointer into $data and spriteN_X & \
\ spriteN_Y into X and Y regs \
\ ————————————————– \

.SCREEN \ store pointer to bitmap 1 in $data
LDA bitmap1
STA data
LDA bitmap1+1
STA data+1

LDA #&13 \ *FX 13 hardware lock on screen refresh
JSR OSBYTE \ call this now as everything is safely stored out of registers

LDX sprite1_X \ store x and y coords in x and y registers
LDY sprite1_Y

JSR draw \ call draw subroutine

LDA bitmap2 \ store pointer to bitmap 2 in $data
STA data
LDA bitmap2+1
STA data+1

LDX sprite2_X \ store x and y coords in x and y registers
LDY sprite2_Y

\ call draw subroutine
JSR draw

\ ——————————————————————– \

\ ——————————————————-\
\ draw subroutine writes to screen \
\ ram from top-left which uses calcaddress\
\ to get the memory location of the top-left \
\ ———————————————————— \

STX thisSpriteX \store current x,y
STY thisSpriteY \ these are used by calcaddress

LDY #0 \ initialise Y to zero for loop with Y

LDA (data),Y
STA height \ first data item is height
LDA (data),Y \ look at next data item
STA width \ second data item is width

LDX #2 \ we’re on the second data item, store this fact in x….

LDA #0
STA Yreg \ zero out Yregistration
LDA width \ load width
STA wcount \ put width into wcount

JSR CALCADDRESS \ calculate screen addresses from coordinates

TXA \transfer x to the accumulator
TAY \ transfer the accumulator to Y (y = x)
\ (so first time) we pick up the loop from the 2 we stashed in x
LDA(data),Y \ start loading the bitmap into accumulator

LDY Yreg \ use y register as start off y offset
STA (LOC),Y \ write pixel value of bitmap starting at loc indirect index mode is 16bit so uses LOC and LOC+1

TYA \ add 8 to pixel value by transferring to accumulator
ADC #8 \ adding 8
STA Yreg \ transferring back to yreg location

INX \ increase count in x register
DEC wcount \ decrease width count
BNE newcolumn \ if wcount is not zero then loop

INC thisSpriteY \ thisSpriteY used in calcaddress subroutine
DEC height
BNE newrow

\ ——————————————————————– \

\ zero out some stuff
LDA #0
STA STORE+1 \ store only used in this subroutine
STA LOC \ loc and loc+1 used to return the screen address

\\ do X coordinate

LDA thisSpriteX \ load the current value in thisSpriteX
ASL A \ we need to multiple x by 8 so we shift left 3 times
ASL A \ we assume all values of x < 127 so we only need to
ROL STORE+1 \ start pickin up carry flag after 2nd shift
ROL STORE+1 \ again pick up any carry
STA STORE \ put result in store (so store + 1 is msb)

\\ do Y coordinate

LDA thisSpriteY \ load the current thisSpriteY
AND #&F8 \ same as 8*(y div 8) looses 3 least sig bits
LSR A \ two right shifts gives the same high byte as
LSR A \ six left shifts on a 2 byte number
STA LOC+1 \ store the 64*(y div 8) in loc high byte
\ we don’t need to store a low byte because we know the last 3 bits were zero from the AND #&F8

LSR A \ double right shift gives us 64(8*(y div 8)) / 4 = 16(8*(y div 8))
ROR LOC \ pick up the carry in loc’s low byte

\\ add them together
ADC LOC+1 \ add back the previously stored 64*(y div 8) gives 80*(y div 8)

TAY \ stash in Y register

\\ do the final y mod 8 to get the number of pixels down into the block
LDA thisSpriteY
AND #7

\\ add thec components together

ADC LOC \ add to LOCs low byte (as y mod 8 will only be a single byte)
ADC STORE \ add store low btye (8x value)
STA LOC \ store this subtotal in LOC
TYA \ get back the 80*(y div 8) result
ADC STORE+1 \ add the store high byte
ADC #&30 \ add on the &3000
STA LOC+1 \ store in loc high byte

\ ————-load bitmap data into sprite 1 & 2 ————— \
OPT FNdataTable(86)
OPT FNdataTable(62)

] : REM back to basic

REM \ —————————————————————– \


REM \ —————————————————————– \

DATA 0,0,0,0,0,0
DATA 0,0,0,0,0,0
DATA 0,0,4F,8F,0,0
DATA 0,0,1,2,0,0
DATA 0,0,1,2,0,0
DATA 0,0,3,3,0,0
DATA 0,1,3,3,2,0
DATA 0,3,9,6,3,0
DATA 0,3,3,3,3,0
DATA 0,3,3,3,3,0
DATA 0,3,3,3,3,0
DATA 0,0,2,1,0,0
DATA 0,0,0,0,0,0
DATA 0,0,0,0,0,0

DATA 0,0,0,0,0,0
DATA 0,0,0,0,0,0
DATA 0,0,1,2,0,0
DATA 0,0,3,3,0,0
DATA 0,1,3,3,2,0
DATA 0,3,9,6,3,0
DATA 0,3,3,3,3,0
DATA 0,0,2,1,0,0
DATA 0,0,0,0,0,0
DATA 0,0,0,0,0,0

I guess without going into detail if you don’t know much about the acorn assembler and mos  some of that is going to be a big mystery, but more on that later.

Set up and workflow

Posted in 6502 by zzjames on January 23, 2013

The emulator is east enough to install, I’m using a text editor to write the code then select all + copy -> pasting it into the emulator, I’m not entering the line numbers when I type the code but I do have a text editor that shows line numbers. Just before pasting into the emulator I type NEW and press return then type AUTO and press return, now paste using the menu (using ctrl+v sometimes puts odd characters into the emulator)

if you’ve done everything right the line numbers shown in the editor will line up with the x10 numbers added by the AUTO command, i.e. line 1 will have a 10 infront, line 2 a 20 etc. this is important as sometimes in the basic code that sets up the machine code there can be GOTO statements which use the line numbers.

I do  this rather than type the code into the emulator directly so I have a reliable and portable copy of the source code and because sometimes I need to type # and ~ characters and the keyboard mapping to the emulator can be a bit off for special characters.

The downside is if you fix typos on the version of the code in the emulator, be sure to fix them in the text editor too. version control and all that.

Emscripten and LLVM

Posted in background by zzjames on January 22, 2013

So now we’re away from the deadline of the MSc project I’ve turned my attention to LLVM bitcode,  which may seem odd if you don’t know about emscripten, which is an LLVM bitcode -> Javascript compiler which uses the typed-arrays in webGL to create static (machine) types in RAM avoiding the boxing/unboxing overhead, manually managing memory (so no garbage collection) and allowing the JS jitter to easily optomise memory access.

This has led through a convoluted set of connections, which has me now figuring out how to make a sprite engine/blitter using assembly.

I don’t know how to write assembly language, so in order to build up to the more difficult steps, and also for nostalgic reasons, I’m writing a sprite based game on an 8-bit micro computer, this is not as difficult as you would think as there are millions of retro/vintage computing nerds preserving information from that era, and it was the culture surround micro computers at the time that there was an appetite for highly technical information, it was a much more rough and ready DIY kind of scene.

I’ve some coding experience with BBC basic from childhood, so I decided I would write this on a Acorn BBC B or Electron emulator, since those computer had an educational remit (due to BBC literacy project) and Acorn’s close ties to Cambridge University there was even more good technical information about them made public than most micros.

So I have downloaded a BBC emulator from here:

and gotten hold of the fist 24 editions of A&B computing magazine, which contains 12 articles on machine code/assembly, some articles on working directly with screen ram and 2 full listing of machine code arcade games which were commercial releases.

So I am going to blog what I do, and how ultimately this relates to the web browser. the steps should be:

  1. build a simple sprite based game in 6502 assembler on bbc micro emulator
  2. recreate this in 16bit HLA using VGA under dosBox (unless there’s a better PC based intermediate)
  3. incorporate some knowledge / information on 8086 assembler game optimisation
  4. try to either (a) create this in LLVM bit code or reverse engineer it to C (as close as possible)
  5. transpile the LLVM or C to javascript.

this should give us some basis for building an efficient JS sprite engine, and we will learn a lot along the way, you and me, dear reader whoever you are…

Object Pooling

Posted in Msc by zzjames on August 10, 2012

Memory pooling is necessary for high performance JavaScript, at least any high performance JavaScript which involves objects of any kind of size and number, of course you can get some little program to go really fast, you don’t need object pooling, but for a complex web app that requires high performance, chances are performance is going to be improved with object pooling.

Memory allocation in JavaScript

In C++ we can allocate a local object to the stack or the heap. Stack allocation is very fast as the only overhead is decreasing the stack pointer by the size of the object. Stack memory is fast as it is likely to be in the cache. If an object is allocated to the heap, the memory manager will have to find a free chunk of memory and mark it as in use. Heap allocation is therefore slower than stack allocation. For temporary objects whose lifetime is the length of the function the stack is a more efficient method for allocating memory.

JavaScript is a high-level scripting language. Objects in JavaScript are always allocated on the heap. Since JavaScript was not designed for professional programmers, memory allocation was not designed to be the programmer’s responsibility, but instead something that the JavaScript run-time finds the optimal solution for.

Unfortunately, this means there is no way of creating temporary objects in a cheap way and JavaScript run-times are not good at optimizing cases where temporary objects are being used.

On each occasion where we use a temporary object, instead of creating a new local object on the heap we could reuse an existing ‘temporary’ object. This strategy is known as ‘Object Pooling’ where temporary objects are fetched from an existing pool rather than created, and returned to this pool rather than being destroyed.

Thus we implement functions with a strict rule of whenever possible not allocating new local objects in the methods, ‘whenever possible’ is included in this rule because sometimes we might not have enough objects in our object pool, in this case a method of creating a new object in the pool is provided.

Implementing Object Pooling in JavaScript

There is at least one well documented publicly available implementation of object pooling in JavaScript, it belongs to a project called gamecore.js and the author(s) have very kindly spent some time making a tutorial on the subject.

Today I am currently looking at how their implementation will fit with my existing code. gamecore.js uses class.js – a script that builds class like behaviour using JavaScirpt prototype object model. Although I have been using the prototype chain as an equivelent of a class-definition, especially for sub-classing in the cases where it is (rarely thankfully!) needed, I am trying to stay away from just ‘fixing’ the prototype system by creating a layer code just to implement classes – this is purely an experiment to challenge my own thinking and to prove or disprove that C++/Java style OOP is the be-all and end all. The downside is that (I think, not read the whole thing yet!) the gamecore.js object pooling code is written for a ‘class’ based object system, I appreciate that it basically works like Singleton pattern, in that it uses a static ‘factory’ method on the class, but I don’t know what or where my static methods would live, in the constructor? in the prototype? I’m guessing the prototype would be faster as _proto_ links are optimised (they’re invariants) and .constructor links are not (although with inline cache I think .constructor property would get optimised  if you hit it a lot) .

This isssue will be solved by Saturday night, or I will, um, be sorry.