yCPU

A simple CPU ISA, currently finalized in its 11th revision.

Intro

Being very into technology, I often consumed technology content on YouTube. Digging myself deeper and deeper, I eventually stumbled upon Ben Eater. One video in particular caught my attention: "Hello, world" from scratch on a 6502 - Part 1 This video ended up sparking my interest for breadboard computers.

Searching for more content I found a few more videos where people not only built their own computer on a breadboard, but their own CPU with an entirely custom made ISA! This interested me even more and after informing myself a little bit (not enough) I started designing my first ISA on paper.

v1 and the beginning

I knew from the start that I wanted to build a RISC machine so quickly I settled on the very uncreative name "Ultimate RISC" or "U-RISC" for short. The specs for v1 sadly seem lost to time, but as far as I remember, it had 3 64-bit registers (A, B, X), where X behaved like an instruction counter, roughly 16 instructions and many missing concepts that would be necessary for any programs to be written comfortably.

Regardless, it provided a crucial step in understanding the concepts needed for a reasonable ISA and the results I achieved "emulating" my CPU on paper pushed me to drive it further.

v2 and the ableCPU situation

v2 followed a radically different approach. I had noticed, that the alignment between the single byte of the instruction and the 128 bits of instruction data was turning out to be rather difficult and I decided to get rid of the single byte. You might wonder how that could be done. An ISA without any instructions? (Or you have heard of "The Movfuscator")

I never ended up finishing this revision but it would have worked in a way, where each operation would have been mapped into one memory space. Around this time I started to become more active in online programming communities and I met someone that I will refer to as "Able". They maintained an impressive collection of projects, most of them their own, but some simply branded as "able" by others.

I kept getting involved in this collection and eventually got encouraged to integrate my U-RISC project and rebrand as "ableCPU". I can not deny that this gained me a few supporters and encouraged me to push this project further, but eventually I ended up pulling my project from the collection due to personal disputes with Able.

v3 and the first milestone

Deciding the "MOV-machine" was too annoying of an approach to implement, I returned to the approach v1 took. Attempting to solve the alignment issue by making the memory region responsible for instructions 8 bit wide, as opposed to the 64 bits of the rest.

Wait, memory region? Yes, this version is the first of which documentation survived and despite it not being written all that well, it gives an insight into how everything was supposed to work out, including a complete memory map and instruction table. For whatever reason, I decided that I was going to already implement multi-core and so the first 131071 addresses are unique per core, split between data memory and instruction memory. The memory at 131072 and beyond is shared by every core and not specified to be anything in particular.

In hindsight, this already sounds quite rough. Just about 1 megabyte of data memory and 23 thousand instructions would have been ridiculously underpowered for anything more complex. Even some Arduino boards have more than that. The instruction count increased from 16 to 39 and the registers from v1 were extended by an S register, which was used by some instructions to dynamically define output or source location, since writing any full address to instruction memory would take 8 other instructions and from a higher level perspective, pointer arithmetic would be far easier to implement that way.

The instruction set itself was also fairly simple. A few memory read/writes, addition, subtraction, multiplication, division and a few conditionals. It turned out to be quite workable but the inexperience still held it back massively and I was aware of it. Following the specification I had written, I then implemented an emulator, the source code of which can be found here.

I had more plans for this v3 version, like an assembler, a compiler, a web emulator and even an implementation in something like Logisim, but I ended up scrapping any such advances, since I felt the base ISA was just not good enough to be worth the effort of all that.

v4-v10 and the struggles

After v3 I really went through quite a few iterations, just never being able to get it to feel quite right. Some of this process is captured in the git history. As most of it happened on paper, which is very likely lost, I can only describe the concepts I toyed with from memory.

The first change made to v3 was getting rid of the fixed, 64-bit-ness of the ISA. Instead adapting it to work on a multitude of word widths. Sometime later, I also got rid of multi core support, since I figured, any such implementation was too much of a hassle to both implement and fully comprehend. After that, I burned through a multitude of different sets of instruction layouts only one of which is still accessible if you dig through the commits in the repository.

This, presumably tenth, version already had a bit of resemblance to v11, there were no registers, only mapped memory. Each instruction consisted of an OP-code, of which there was eight, some instruction flags and two arguments. There were was an instruction to load a constant into a memory address, one to copy from one address to the other, addition, subtraction, multiplication and division and a compare instruction, that jumped ahead a specific amount of instructions, depending on the result of the comparison.

The instruction flags were fairly pointless, 3 bits of them being assigned to some weird form of CPU-internal error handling and 2 for specifying the amount of cycles the instruction should count for. Looking back, those made very little sense, but they also did not have to. Mostly they just filled the empty space left by the minimal set of OP-codes.

The memory layouts for the various bit widths are not all that interesting, they were mostly identical to v3, but adjusted for the size of the total available memory region. The most interesting change resulted from the total lack of registers, since some way of jumping was still necessary the first address was mapped to the internal instruction pointer, so any write to it would jump execution.

v11 and the eight bits

The eleventh revision started out just like all the ones that had come before it, as just an attempt to be checked for soundness. I ditched all bit widths and settled for exclusively 8 bits, while also doubling the amount of OP-codes. Very fortunately this time I did not stumble upon any issues working everything out and the latest and probably final version was born.

The 8 other OP-codes were used to add various bit-wise logic operations, that were crucially missing from all previous versions. The load constant instruction was also removed, since it could just be replaced with a copy. Instruction flags were kept, but made vastly more useful. The first 2 still defining behavior for error handling, but the other 2 defining the format of the 2 arguments, either signed or unsigned.

WHile changing the width to 8 bits also solved any sort of alignment issues, it also severely limited the available memory space, with only 42 instructions and 63 bytes of general memory available. A problem that I chose to solve with banking memory. The rest of the memory is mapped to certain devices or actions like the banking controllers. Overall the limitations imposed turned out to not be as relevant, since any programs remained fairly small in size.

For this version there also exists and emulator, that can be found here and is mostly finished, only implementations for certain devices missing, if I ever get around to that.

A related project, almost as interesting to me as the ISA itself is the assembler I implemented. The first version sucked fairly badly, but thanks to the very helpful Rust community I was able to improve it greatly and learn a whole bunch about proper data structures. It was also the first time that I had received a pull request on any of my repositories. The assembler provides a very simple syntax, including instructions, constants and labels. Any abstraction is represented by macros, which are then resolved by the assembler.

Summary

Overall, the current state allows for the implementation of a multitude of programs in a fairly programmer-friendly way. I feel quite proud of everything I have achieved with this project and I am very glad for all the things I have learned. This also contributed to the eventual renaming of ableCPU to yCPU. Why yCPU? Thats a whole other thing I can not get into here. Maybe I will write about that some other time.