Fastest possible multiplication routine?

ページ 7/7
1 | 2 | 3 | 4 | 5 | 6 |

By santiontanon

Paragon (1833)

santiontanon さんの画像

04-01-2023, 00:32

ah, scrap the "H" is not modified. It's just me not remembering past-Santi decisions. I made MDL only print register modifications if they were modified to a DIFFERENT value than before (I guess that made sense to me when I coded it). Indeed, H, and D are modified (but just set to 0), haha. So, false alarm! Smile

In this case MDL was interpreting "jr nc, $+4" fine. In this case it's fine, but I would avoid the use of "$". Some assemblers have different semantics for "$". For example, in standard z80 notation: "org 0; db $, $" should be the same as "db 0, 0", but in some assemblers, this will translate to "db 0, 1". So, if you want portable code to different assemblers, avoid the use of "$" as much as you can Smile

By turbor

Hero (530)

turbor さんの画像

04-01-2023, 00:41

NYYRIKKI wrote:

Sorry for negroposting, but this subject newer gets out of fashion. Smile

I did have a talk with Proton from C64 scene and after that I came up with this version, that tries to have "good balance" with speed and size:

That looks a lot like the code described here Smile
Multiplication on a Z80

If you create the table slightly different you can use this for signed multiplications also. If you have your two inputs in 2-complements you do not even have to check your carry after adding and subtracting the values.

By Prodatron

Paragon (1857)

Prodatron さんの画像

04-01-2023, 10:43

Fantastic, this is even faster (CPC: 24µS) than NYYRIKKIS/Protons (CPC: 29.5µS) and the one I mentioned from CPCWiki (CPC: 29µS).

By NYYRIKKI

Enlighted (6096)

NYYRIKKI さんの画像

04-01-2023, 13:22

turbor wrote:

That looks a lot like the code described here Smile
Multiplication on a Z80

Oh, I'm even quite sure that I've seen this one before. I just didn't pay much attention, but yes this is indeed a good one! It can be made even a little bit faster using the trick from code that Prodatron posted:

; IN : A and D are to be multiplied
; OUT: HL is result
; CHANGES : AF,BC,E,HL
;
	SUB	D
	LD	H,MULTAB/256
	LD	L,A
	LD	C,(HL)
	INC	H
	LD	B,(HL)
	ADD	A,D
	ADD	A,D
	LD	L,A
	LD	E,(HL)
	DEC	H
	LD	L,(HL)
	LD	H,E
	OR	A
	SBC	HL,BC

Edit: ...and this actually uses only 512-byte tables... How ever the reason for smaller size tables and faster speed is that this is signed multiply and not unsigned, and it does not cover full 7bit+sign * 7bit+sign area. (only 7bit*6bit)

By gdx

Enlighted (6445)

gdx さんの画像

07-01-2023, 08:21

Another routine to multiply two 8 bit integers. It take between double and triple the time than fastest one but it uses no table and has no constraint.

;Effect: HL = D x E (integers)
;Modifies: AF, DE, HL.


DxCtoHL:
	ld	hl,0
	ld	a,d
	rra
	ld	d,h
	jr	nc,bit0
	ld	l,e
bit0:
	or	a
	rl	e
	rl	d
	rra
	jr	nc,bit1
	add	hl,de
bit1:
	or	a
	rl	e
	rl	d
	rra
	jr	nc,bit2
	add	hl,de
bit2:
	or	a
	rl	e
	rl	d
	rra
	jr	nc,bit3
	add	hl,de
bit3:
	or	a
	rl	e
	rl	d
	rra
	jr	nc,bit4
	add	hl,de
bit4:
	or	a
	rl	e
	rl	d
	rra
	jr	nc,bit5
	add	hl,de
bit5:
	or	a
	rl	e
	rl	d
	rra
	jr	nc,bit6
	add	hl,de
bit6:
	or	a
	rl	e
	rl	d
	rra
	ret	nc	; Back if bi7 = 0
	add	hl,de
	ret

I think it can be optimized.

By enribar

Paragon (1224)

enribar さんの画像

07-01-2023, 11:10

Sorry I didn't follow all the thread, but as far as I remember the Russian Peasant Algorithm uses simple and quick shifts (registers) to multiply 2 numbers:
https://www.wikihow.com/Multiply-Using-the-Russian-Peasant-M...

By pgimeno

Champion (328)

pgimeno さんの画像

07-01-2023, 22:44

gdx wrote:

I think it can be optimized.

This sequence:

or a
rl e

can be replaced by a single

sla e

There are probably other chances for optimization but that's an outstanding one.

By gdx

Enlighted (6445)

gdx さんの画像

08-01-2023, 02:03

Oh that's right ! It significantly reduces the size and speeds up the operation. And if we use A as entry instead of D, it saves one more instruction (ld a,d).

;Effect: HL = E x A (integers)
;Modifies: AF, DE, HL.


ExAtoHL:
	ld	hl,0
	ld	d,h
	rra
	jr	nc,bit0
	ld	l,e
bit0:
	sla	e
	rl	d
	rra
	jr	nc,bit1
	add	hl,de
bit1:
	sla	e
	rl	d
	rra
	jr	nc,bit2
	add	hl,de
bit2:
	sla	e
	rl	d
	rra
	jr	nc,bit3
	add	hl,de
bit3:
	sla	e
	rl	d
	rra
	jr	nc,bit4
	add	hl,de
bit4:
	sla	e
	rl	d
	rra
	jr	nc,bit5
	add	hl,de
bit5:
	sla	e
	rl	d
	rra
	jr	nc,bit6
	add	hl,de
bit6:
	sla	e
	rl	d
	rra
	ret	nc	; Back if bit7 = 0
	add	hl,de
	ret

By Grauw

Ascended (10822)

Grauw さんの画像

09-01-2023, 04:21

@gdx That can be made faster by putting the 8-bit values in h and e.

http://map.grauw.nl/articles/mult_div_shifts.php#lrmultr

;
; Multiply 8-bit values
; In:  Multiply H with E
; Out: HL = result
;
Mult8:
    ld d,0
    ld l,d
    ld b,8
Mult8_Loop:
    add hl,hl
    jr nc,Mult8_NoAdd
    add hl,de
Mult8_NoAdd:
    djnz Mult8_Loop
    ret

It combines the sla e, rl d and rra in your code above into a single add hl,hl.

Just unroll the loop eight times.

By gdx

Enlighted (6445)

gdx さんの画像

09-01-2023, 08:34

I didn't use a loop to make it a little faster and to avoid modifying one more register but anyway, your optimization is great! Smile

NYYRIKKI wrote:

Edit: ...and this actually uses only 512-byte tables...

You have forgotten it.

ページ 7/7
1 | 2 | 3 | 4 | 5 | 6 |