Newsportal USENET - Re: Got Quake 2 running on my MRISC32 FPGA computer

On 12/15/2024 5:32 PM, Marcus wrote:

Some progress...
Earlier this year I spent some time porting Quake 2 to my MRISC32 based
computer. It required some refactoring since Quake 2 used a modular
rendering and game logic system based on dynamically loaded libraries
(DLLs). My computer isn't that fancy, so I had to get everything
statically linked into a single executable ELF32 binary (and the
Quake 2 source code didn't support that at all).
My patched source code: https://gitlab.com/mbitsnbites/mc1-quake2
When I finally got a working build, it only worked in my simulator but
not on my FPGA board, so I dropped the effort.
Yesterday, however, I went and bumped my GNU toolchain to GCC 15.x and
fixed a few bugs in my MRISC32 back end, and lo and behold, the binary
actually started working on the FPGA (not sure if it was a compiler bug
or if it's a CPU implementation bug that got hidden by the compiler
update).
Video: https://vimeo.com/1039476687
It's not much (about 10 FPS at 320x180 resolution), but at least it's
progress.

Yeah, I am still mostly limited to single-digit framerates in Quake 1, and pretty much entirely unplayable framerates in Quake 3.
Not tried porting Quake 2 yet.
Though, if I did so, may make sense to leave it as 256 color and then convert to hi-color later. In my Quake 1 port, I had modified the software renderer to internally operate directly on hi-color, which potentially increases the amount of cache pressure and similar.
Note that Quake 3 had been delayed for a while in my case due to needing virtual memory and DLL loading, but I have these now.
But, now there is the chaos of me trying to turn things into proper user-mode (so, ATM, Quake3 is broken again due to how I needed to link things to make it work).
And, the partial irony that GLQuake is slightly faster than SW Quake, though my GLQuake port had been modified to use static vertex lighting. This information isn't stored in the BSP, so needs to be regenerated on map load, but isn't super accurate (and not as good looking as the vertex lighting mode in Quake 3).
There is a hardware rasterizer module, which helps "slightly", but my OpenGL implementation, the bigger limiting factor is the front-end geometry transform.
My HW rasterizer basically only does edge-walking, so a lot of the heavy lifting (transform/projection and geometric subdivision) needs to be done CPU side. Generally, it seems the rasterizer module rasterizes things faster than the CPU side code can feed requests into it.
...
Oh well, I am distracted some.
I am working on trying to build a specialized printer which might (eventually) be used for printed electronics. At present, I am building it by CNC converting and X/Y table, and using syringe pumps for ink (got some 100ml syringes to use for this; will be driven using NEMA-17 steppers, as these seem to be the cheapest option).
Current idea for print-head is to use 4 22ga blunt-tip needles, spaced roughly 0.25" apart (will be offset in software to re-align the layers).
For the synthesis, had the idea that rather than trying to use large tiles representing more complex logic faking an FPGA (such as LUT4's or LUTRAMs), it may make more sense to first decompose these into logic gates, and then do all the final "place and route" stuff mostly at the level of logic gates.
Would likely have: AND, OR, NAND, NOR, BUF, NOT.
May skip the XOR gate, as it is larger than the others. To include an XOR gate as a fundamental gate and try to layout everything on a grid, would require making everything else bigger.
In terms of truth tables (outputs only):
   0 0 0 0: Zero, resistor to GND.
   0 0 0 1: AND
   0 0 1 0: AND (~B)
   0 0 1 1: Input B (BUF)
   0 1 0 0: AND (~A)
   0 1 0 1: Input A (BUF)
   0 1 1 0: XOR
   0 1 1 1: OR
   1 0 0 0: NOR
   1 0 0 1: XNOR
   1 0 1 0: NOT Input A (NOT)
   1 0 1 1: OR (~A)
   1 1 0 0: NOT Input B (NOT)
   1 1 0 1: OR (~B)
   1 1 1 0: NAND
   1 1 1 1: HI (resistor to Vcc)
Pretty much all of the more complex logic elements can be synthesized from logic gates.
Also recently got around to implementing an experimental filesystem.
General:
   Superblock: Follows a similar pattern to NTFS;
   inode table:
   Represents itself as an inode (0);
   inodes currently 256 or 512 bytes.
Block indexing:
   Normal files (32-bit block numbers):
   16 direct blocks
   8 indirect 1-level blocks
   4 indirect 2-level blocks
   2 indirect 3-level blocks
   1 indirect 4-level block
   1 indirect 5-level block
   Compressed / large volume, 256 byte inode:
   8 direct blocks
   4 indirect 1-level blocks
   1 indirect 2-level block
   1 indirect 3-level block
   1 indirect 4-level block
   1 indirect 5-level block
   Compressed / large volume, 512 byte inode:
   16 direct blocks
   8 indirect 1-level blocks
   4 indirect 2-level blocks
   1 indirect 3-level block
   1 indirect 4-level block
   1 indirect 5-level block
   1 indirect 6-level block
Blocks are allocated via a bitmap, and assigned into table indices within the inodes.
The 32-bit index has 5 levels, though the 5th level is kind of overkill as on current valid block sizes it will not be used (4 levels could address the entire range of 2^32 blocks, or 4TB with 1K blocks).
So, mostly just serves to pad the structure to 128 bytes.
For compressed files, 64 bit block index numbers would be used. Similar for large volumes with more than 2^32 blocks.
At present, largest valid volume size (at 1K blocks), would be 256PB.
This would be larger than the current EXTn filesystems, which seem to still be limited to 32-bit block numbers (~ 16TB with 4K blocks). Though, I have yet to confirm whether EXTn actually has such a limit.
For compressed files, the high order bits of the block numbers would be used for some additional metadata (such as the span of disk blocks for larger compressed blocks; or the location of the packed-data within packed blocks, *).
*: Block compression may use one of several strategies:
   Store: Stores compressed block as raw / uncompressed as N disk blocks.
   Compressed: Compressed data stored as a smaller number of disk blocks.
   Packed: Compressed data is held within data in another inode.
   May be used for small or highly
   Skip: Block was all zeroes, so index entry is left as 0.
   No disk storage is used for compressed all-zero blocks.
I am debating whether to use/allow block-skipping for non-compressed files. Could make sense for some use-cases but be very bad in others.
Directories are using a modified form of AVL trees:
   Each Dirent is 64 bytes;
   Has a 48 byte name field;
   Has left, right, and parent pointers, 21 bits each.
   Was originally left/right at 32 bits, but...
   I needed a parent pointer to make some operations viable.
   Didn't have any space nor did I want to make the dirent bigger.
   Has an inode index, Z height, and entry type.
   inode index: 48 bit
   Z height: 8 bit
   Entry Type: 8 bit
   High 4 bits: File Type (may be cached from inode)
   Bit 3: Deletion Flag
   Bit 2..0: Dirent Base Type Free/ShortName/LongName/NameFrag
For names longer that 48 bytes, they are broken into multiple parts (like FAT32), but this should be much rarer (as something like 99.5% of filenames fit into 48 bytes).
The logic for directories ended up more complicated than I would like.
   Things like node balancing in AVL trees is kind of a pain.
   Though, I did relax them to +/- 2;
   This significantly reduces the number of rotates needed;
   It also eliminates needing a special-case double rotate scenario.
   It is always possible to restore balance with single-rotates.
I still think AVL trees require less code complexity than B-Trees would have (while also being significantly faster than linear search once one gets past a small number of files in a directory).
Would be much simpler though for a read-only filesystem driver (binary tree walk isn't that complicated).
Hash-chains were skipped, as these have a more limited range of scalability:
   Small N: They waste space for a hash table
   Linear search would have been preferable for small N.
   Or, one needs separate hashed/non-hasted directories (like EXT2/3)
   Big N: They are slower than a tree structure.
Though, seemingly, pretty much everyone else (excluding EXTn) went with B-Tree variants for whatever reason, but usually with significantly higher code complexity.
...
As for whether or not it can "replace" the use of FAT32 is another issue. Technically, if a new filesystem image were put into a partition on an SDcard or similar, Windows will have no real viable way to access it (and, a command-line tool to access the volume that only works if one does "Run as Administrator" would be kinda lame).
The other option being to put the FS image as files within a FAT32 volume, which is at least accessible from Windows, if crufty.
Experimentally, for now, was using partitions in the emulator, with a partition tag of 0x3E.

Date	Sujet	#	Auteur
16 Dec 24	Got Quake 2 running on my MRISC32 FPGA computer	8	Marcus
16 Dec 24	Re: Got Quake 2 running on my MRISC32 FPGA computer	1	Terje Mathisen
16 Dec 24	Re: Got Quake 2 running on my MRISC32 FPGA computer	1	BGB
16 Dec 24	Re: Got Quake 2 running on my MRISC32 FPGA computer	5	Chris M. Thomasson
17 Dec 24	Re: Got Quake 2 running on my MRISC32 FPGA computer	4	BGB
17 Dec 24	Re: Got Quake 2 running on my MRISC32 FPGA computer	2	Chris M. Thomasson
20 Dec 24	Re: Got Quake 2 running on my MRISC32 FPGA computer	1	BGB
20 Dec 24	Re: Got Quake 2 running on my MRISC32 FPGA computer	1	Chris M. Thomasson