LLBoy is a Nintendo® GameBoy™ emulator based upon Imran Nazar's jsGB. Its distinguishing feature is incremental translation of the GameBoy cart to native instructions with LLVM. Source Code is currently available on GitHub.
LLBoy doesn't aim to be a complete GameBoy emulator. Its core purpose is as a vehicle for becoming familiar with LLVM, C/C++, CMake, and QT. The core functional goal is to play a mostly native version of a single-banked game (eg, Tetris) with user input, video, and no sound. Subsequent goals would include memory banking, audio, aggressive optimization, save states, and a friendly debugger.
As with any emulator, concerns will be raised about the possibility of piracy. To alleviate these concerns, the following policies and design goals are in effect:
The primary selection criteria for the involved technologies and libraries were personal interest and ease of use. These also seemed to coincide with the "best of breed" and "most popular" choices in the software ecosystem, though there are many other alternative choices that would be more appropriate given different constraints for memory, performance, or licensing.
LLVM is a particularly interesting tool, in that it can be used to generate and optimize native bytecode with a reasonably friendly C++ API. It was selected on the basis of its relative popularity and ability to store its intermediate bitcode representation (generated by a simple call to clang -c --emit-lvm) for future use with other dynamically-generated functions. LLVM has also become somewhat "trendy" in compiler circles thanks to its inclusion on mainstream Apple hardware. The llvm-qemu project has also done some experimental work on dynamic binary translation in the context of the QEmu x86 emulator, indicating that this is a somewhat viable approach despite their observed slowdown and memory/compilation overhead.
The QT Framework is a very solid cross-platform library that supports all of the major desktop environments (plus Linux) and mobile. It also offers a rich library and abstractions for native system functions, file management, threading, and user interface, possibly rivalling the Java class library. Only a small subset of the available features will see any use, as most of the complexity lies in the emulation backend. QT will be used to handle basic user input and to provide a drawing surface to the emulator.
The choice to use QT and LLVM necessitates usage of C/C++ as there aren't any Java bindings, and OCaml is not on my (current) list of interests. It is also nice to work with something other than Java for a change -- a variety in development languages breeds adaptability in the face of change, and allows for hedging one's bets given the uncertainty of the Java platform at the hands of Oracle. It would still be interesting to build a Java port with BCEL or ASM, though.
To support C/C++ with unusual intermediary build steps, CMake proves to be an interesting cross-platform build and testing system. It has the advantage of being something other than the Autotools toolchain. It also integrates reasonably well with QT thanks to its heavy involvement in KDE. Profiling the application will prove interesting, possibly requiring some time with DTrace.
To reduce initial implementation complexity and duration, several aspects of the system are being ignored as they are not core to the main goals:
The system is fundamentally designed in the same manner as a traditional emulator, calling state-altering micro-ops from a fetch-decode-execute loop. LLVM will be used to assemble sequences of these micro-ops (and their arguments where possible) into a large switch statement that can be inlined, optimized, and JITed for native execution. The switch statement should make use of fall through to allow for execution of continuous runs and jumps within runs. If the program counter lands at an unknown location, the switch can fall through to simply calling the function and adding an element to the cache, or splitting an existing block where possible.
The instruction cache is a simple array of metadata that matches the Cart ROM layout, providing information about where jump targets land and where sequences must be broken. This metadata can be utilized to generate an LLVM representation of the game as needed.
Expressing the generated code as a (surprisingly legal) C fragment, several key properties can be observed. Case 0x0 begins an uninterrupted sequence of instructions, continuing on to case 0x4 and ultimately ending the sequence. A future execution around case 0x10 would fall through to the default case, signalling that no native implementation is available, allowing some other function to handle updating the game cache and calling the unoptimized micro_op directly.
A primitive loop or branch instruction can also be observed here as part of case 0x27, which uses goto as a means of jumping to a known block. If implemented with care, LLVM branching instructions can be emitted by the code generator for JIT and subsequent native execution. Further gains can also be realized by making MMU access logic available to the optimizer, as some constants can be known, allowing for removal of untraversed branches, or possible replacement of MMU address calculations with a constant address or value.
This will be output as LLVM Bitcode and stored alongside the ROM image. The LLVM Cache will be loaded and executed on next run or on-demand.
The game loop should attempt to run cached operations, falling back to instruction generation and one-off execution.
It would be ideal to support straight interpretation with pre-translation and dynamic translation to work around the compilation overhead experienced by llvm-qemu. The core cases for implementation of micro ops vary based upon the addressing mode used by the instruction. The items requiring the most attention are Immediate Data instructions, as the operation can be read and the arguments may be emitted directly into the native instruction stream.
The possibility of direct emission of immediate-mode instructions with constant data may allow the LLVM Optimizer to partially or fully replace the micro-op during arithmetic optimization and instruction combining. This can turn multiple function calls, all of which with their own program counter increments and memory accesses, into a few native add and set instructions.
Function* op = m->getFunction("op_0xf123");
IRBuilder* builder = new IRBuilder<>(BasicBlock::create(m->getContext(), "entry", op);)
Core cases: Immediate Mode operation (NOP, LDHLnn), Indirect Mode operation (LDHLIA)
Initially, treat all ops as equal, allowing them to go through the MMU to get their arguments. Immediate-mode instructions can be specialized later to hard-code their arguments.