Z80 timing

ページ 4/5
1 | 2 | 3 | | 5

By mi-chi

Scribe (37)

mi-chi さんの画像

17-02-2022, 07:47

If you are looking for something to develop and tweak single ASM *routines* (e.g. multiplication or unpackers), did you give Z80Runner a try? It's a testbed for developing assembly routines, it's an interactive debugger for ASM code that shows you each instruction's cyclce and can measure the actual execution time of whole subroutines, depending on input parameters. And when done, you would take the routine back to your production code or library.
You edit the code with your favorite editor, and the tool reloads the ASM file whenever you save your changes, ready to run.
It doesn't emulate other hardware like graphics or sound. Also, it does use the standard "Zilog" instruction notification, due to its back-end assembler (from the Z88DK package), but it's not that hard to convert the SDCC notification to Zilog with regular expressions. For example: To convert all the index registers, you can use the following regular expression in Notepad++ for search & replace:
Find:

(\-?[1-9]\d{0,1}) \((i[x|y])\)

Replace with:

\(\2+\1\)

Z80Runner shows you errors and line numbers in a live window, so you can quickly walk through the list in your favorite editor, save, check the error window, rinse and repeat.

This is an example to test a (slow) 16*8 multiplication routine (not recommended in your production code!):

;Mul_16_8.asm
        org     0x100

        ld      de,123
        ld      b,8
        call    Mul_16_8_Slow

        ld      de,8
        ld      b,123
        call    Mul_16_8_Slow

        nop
        nop
        nop

; Multiplication: HL = DE * C
Mul_16_8_Slow:
        ld      hl,0
Loop:
        add     hl,de
        djnz    Loop
        ret

The code window shows you the instruction times (and hex output):

Cycles Opcodes    Command
                  Mul_16_8_Slow:
10     21 00 00           ld	hl,0
                  Loop:
11     19                 add	hl,de
8/13   10 FD              djnz	Loop
10     C9                 ret

When you mark all the lines of the routine, it shows you a summary (here: 7 bytes, 39-44 cycles), but the real power comes when executing the code and stepping over the calls. The first call (DE=123, B=8) reveals an execution time of 207+17 cycles (17 = the cost for the call), and the second call (DE=8, B=123) routine runs 2967+17 cycles.
This is a very simple and obvious example, but if gives you an exact idea where the actual cycles are burned.

For more complex things like decompression routines, you can also load binary data from files to specific memory locations, and whenever you start over after a code change, all the memory is reinitialized with data from these files.

Specifically: If you are planning to develop hand-written assembly routines or tweak the assembly compiler output, or even compare how much you can do better than the compiler - interactively! - this might be the tool for you.

By santiontanon

Paragon (1732)

santiontanon さんの画像

21-02-2022, 00:10

Sounds like an interesting tool mi-chi! btw, any chance of a unix build (Linux or Mac)? or at least a 64bit Windows one? the current binary is a 32bit Windows binary, and some modern 64bit unix distributions would not run it even with wine unless it's a 64bit binary.

Also, @aoineko, I improved the function detection heuristics, and now it detects all the functions in your code! The latest development version can be found here: https://github.com/santiontanon/mdlz80optimizer/releases/tag...

with this version, on the example code you shared, it produces this output:

source file (.function name)	self size	total size	accum t-states
../MSX/others/sdcc/aoineko.asm	2392	2392	15753/15598
../MSX/others/sdcc/aoineko.asm._GamePawn_Initialize	146		1029/1019
../MSX/others/sdcc/aoineko.asm._GamePawn_SetPosition	54		349
../MSX/others/sdcc/aoineko.asm._GamePawn_SetAction	59		373/368
../MSX/others/sdcc/aoineko.asm._GamePawn_Update	1713		11208/11088
../MSX/others/sdcc/aoineko.asm._GamePawn_Draw	331		2221/2201
../MSX/others/sdcc/aoineko.asm._GamePawn_SetTargetPosition	34		221
../MSX/others/sdcc/aoineko.asm._GamePawn_InitializePhysics	52		352

And, of course, those times are just the sum of all the assembler instructions (not actual execution time, for which you'll need an actual emulator-based tool, like mi-chi's, or measuring it in openMSX). Also, some functions have two numbers, e.g. "2221/2201", because of instructions like conditional jumps, etc. that can have different duration depending on the condition. So, you should read as [upper-bound]/[lower-bound]. When upper/lower-bounds are the same, I just show a single number for simplicity.

By aoineko

Paladin (803)

aoineko さんの画像

21-02-2022, 22:52

This work perfectly.
For an offline tool, this is the best we can get for testing code optimization.
Thank you santiontanon.

By santiontanon

Paragon (1732)

santiontanon さんの画像

22-02-2022, 04:13

no problem! I am glad it is useful Smile

By mi-chi

Scribe (37)

mi-chi さんの画像

22-02-2022, 23:36

About the Linux and 64 bit question:
As converting the application to Linux will take some time (Qt seems to be a good candidate for porting it, but will have to spend some time to re-learn the designer and slots and signals), and as I'm busy with another project, I won't be able to convert it any time soon.

That said, I'm not a Linux expert, but I just installed a fresh downloaded Ubuntu 20.x (64 bit) in a Virtual Box, updated Wine and after a bit of fiddling with getting the VC 2010 runtimes installed, I could launch Z80Runner as a 32 bit EXE.

For testing, I built a 64 bit vesion, but needed to install the VC 2010 (64 bit) as well. Anyway, both versions did run in the end.
About what you said: Are there plans to drop 32 bit support in Wine?

Also, can you tell more details on what you tried and what exactly is failing on your site? As I could get it to run on my system, I'm sure there is a way to get it running on your system.

By Grauw

Ascended (10679)

Grauw さんの画像

23-02-2022, 00:01

I believe Santi is running macOS, and since macOS 10.15 (Catalina) running 32-bit binaries is no longer supported. Especially not on new Macs with the M1 ARM processors which rely on Rosetta to interpret x86-64 instructions only. This also extends to binaries run with Wine, since Wine Is Not an Emulator Hannibal.

By mi-chi

Scribe (37)

mi-chi さんの画像

23-02-2022, 00:21

Here is a link to a download of the 64 bit version.

Z80Runner_64.zip

By santiontanon

Paragon (1732)

santiontanon さんの画像

23-02-2022, 05:21

Indeed, I'm using an M1 machine, with 64-bit support only. It's a bit of a pain, but 6x faster build times for projects than my previous intel machine is definitively worth this little pain Wink

Thanks a lot mi-chi! I just downloaded it, and now wine can run this! I'm still missing some .dll file (I think it's some visual studio run time dll that is present in Windows machines), but that will be easy to get. I'll play with this tomorrow, thanks a lot for the build! Big smile

By mi-chi

Scribe (37)

mi-chi さんの画像

23-02-2022, 07:12

Santi, besides the aforementioned VC 2010 runtime (MFC100U.DLL), which can be installed with "winetricks" and run with the param "vc2010" (and probably run it once with "--self-update" if that checksum error pops up), you will still need to make the external assembler (Z88DK) working, which the tool uses to assemble the code. I know that its sources are available, but have never attemped to built it.

That said, thinking about porting my project to Qt could be a chance to integrate an assembler. I saw that your tool understands all kinds of ASM dialects, which would be tempting, especially when working with SDCC's really weird index register and literal number syntax, and given that this is a primary target for optimizations. Would that parser allow to get the actual address and bytes emitted per assembly-line? That's what Z80Runner requires to get started on a source. And that's what the external assembler is required for.

By Grauw

Ascended (10679)

Grauw さんの画像

23-02-2022, 12:51

santiontanon wrote:

Indeed, I'm using an M1 machine, with 64-bit support only. It's a bit of a pain, but 6x faster build times for projects than my previous intel machine is definitively worth this little pain Wink

I’ve seen those differences in performance comparisons. I think the LLVM ARM compiler is more efficient because ARM has a much more regular instruction set. It would be interesting to see a performance comparison cross-compiling to x86-64. But that won’t give as impressive numbers of course, so Youtubers can’t make a nice clickbait headline for it. Anyway, off-topic Smile.

ページ 4/5
1 | 2 | 3 | | 5