AminetAminet
Search:
83241 packages online
About
Recent
Browse
Search
Upload
Setup
Services

util/boot/CopyMemAIO.lha

Mirror:Random
Showing:m68k-amigaosgeneric
No screenshot available
Short:Speedup your programs & workbench
Author:Holger.Hippenstiel AT gmx.de
Uploader:Holger Hippenstiel nc-online de
Type:util/boot
Version:4.3
Replaces:util/boot/CopyMemAIO.lha
Architecture:m68k-amigaos >= 3.0.0
Distribution:Aminet
Kurz:Beschleunigt die Workbench/Programme
Date:2020-10-11
Download:http://aminet.net/util/boot/CopyMemAIO.lha - View contents
Readme:http://aminet.net/util/boot/CopyMemAIO.readme
Downloads:1139
CopyMemAIO V4.3
===============

TL;DR CopyMem() is an essential function of exec.library, it's used a lot
by the operating system, so all os-functions and programs benefit from
replacing this function with a quicker one.
Install CopyMemAIO in C:, Call in Startup-Sequence after SetPatch or in User-
Startup. No need to run it, automaticly selects the best code for your cpu.
(Code for 680x0, 68040, 68060, 68080 & native x86)
Or put CopyMemAIO and its icon in the WBStartup-Folder.

For native CopyMem on WinUAE which is 3 times quicker, copy the
winuae_dll/CopyMemAIO.alib to your WinUAE-folder/winuae_dll.
You need to enable the Option under "Miscellaneous" -> "Allow native code".
*** Native Code only works on WinUAE_x86, not on x64 !! ***
There is no need to use the 64-Bitversion anyway.

In the past there where a lot of replacements to speed it up:

Feb.1993 by Arthur Hagen - 68000-68020 Copies best function to ramblock,
relies on AllocMem for alignment, uses mostly movem.l (ax)+,dn-dm/an-am
http://aminet.net/package/util/misc/CopyMemQuicker

Aug.1994 by Arthur Hagen - 68000-68020 Copies best function to ramblock,
aligns the codeentries to /16 divisable Adresses, JmpTable, Multiple movem.l
http://aminet.net/package/util/boot/COPMQR28

Oct.1996 by Allenbrand Brice - written for 68040, works with 68060
No copying/detaching, has to be started with "Run >NIL: ...",
code alignment purely dependend on hunk-loading. Single move16 in a loop ..
http://aminet.net/package/util/boot/PCM_1.0

May.1999 by Dirk Busse - written for 68030, works with 68020-68060, but all
use the same function, aligns the codeentries to /16 divisable Adresses,
enrolled move.l (an)+,(am)+ loops.
http://aminet.net/package/util/boot/CMQ030

Jul.1999 by Dirk Busse - written for 68060, works with 68040, but use the
same function, aligns the codeentries to /16 divisable Adresses,
enrolled move16 & move.l (an)+,(am)+ loops.
CMQ060 permantly checks for <$1000000 Address-Range (SAFE-Mode)
CMQ060Move16 does not check for <$1000000 Address-Range, so a bit faster.
http://aminet.net/package/util/boot/CMQ060

Nov.2000 by Harry "Piru" Sintonen - written for 68060, works with 68040,
but use the same function, aligns the codeentries to /16 divisable Adresses,
enrolled move16, movem.l & move.l (an)+,(am)+ loops.
Wont install on MorphOS.
http://aminet.net/package/util/boot/NewCMQ060

Aug.2009 by Matt Hey - three versions - written for 680x0, 68040 & 68060.
No copying/detaching, has to be started with "Run >NIL: ...",
code alignment purely dependend on hunk-loading.
Big enrolled move16 & move.l (an)+,(am)+ loops.
Best use of cache-size/burstloading.
But has to be run and single files for each processor and Safe-Mode.
http://aminet.net/package/util/boot/CopyMem

Aug.2020 by Holger Hippenstiel - mainly based on CopyMem by Matt Hey
Copies best function to ramblock, aligns the codeentries to /16 Adresses.
No need to run it.
Code for 680x0, 68040, 68060, 68080 & native x86.

I wrote a Benchmark which tests all Functions written so far and it can
test if the copymem-functions work correctly with different sizes.

The memory-layout is the same as CopyMemQuicker, so "TestIt" will believe
CopyMemQuicker is running.

For a fast emulation most important are the Advanced JIT Settings in WinUAE:
Cache Size: 16MB
Check FPU Support
Check Constant Jump
NoCheck Hard flush
Select Direct
Check No flags
Check Catch unexpected exceptions

If you want to try "TestIt" on WinUAE with a descent fast machine,
notice it will crash with Division by Zero, take a look at
http://aminet.net/package/util/boot/NoMoreDiv0 from me to fix that problem.

For real Amigas it can be started with Argument "S" or "SAFE",
then source and destination must be in 24bit-space for move16-operation.
The SAFE Option is only needed for controllers which can only do 24Bit-DMA,
like the A2091, but there is a driverpatch for that:
http://aminet.net/package/driver/media/vbak2091

Starting CopyMemAIO again removes the patches.

*****************************************************************************

Update V4.0:

Major rework and implemented native copymem for (Win)UAE.

Testresults from BenchCM:

AMD Ryzen 5 3600X 4.4Ghz 3466Mhz Ram
--------+----------+----------+----------+---------+---------+---------+
Testsize|      64kb|       8kb|       4kb|      2kb|      1kb|512 bytes|
--------+----------+----------+----------+---------+---------+---------+
CM0x0   |12288 MB/s| 8090 MB/s| 6500 MB/s|4575 MB/s|2883 MB/s|1611 MB/s|
CM040   |11650 MB/s| 6152 MB/s| 4231 MB/s|4670 MB/s|2640 MB/s|1601 MB/s|
CM060   |10834 MB/s| 6575 MB/s| 4868 MB/s| 663 MB/s| 560 MB/s| 518 MB/s|
CMNative|45624 MB/s|18284 MB/s|10674 MB/s|5821 MB/s|2956 MB/s|1591 MB/s|

Intel Core i7-4790k 4.4Ghz 2400Mhz Ram
--------+----------+----------+----------+---------+---------+---------+
Testsize|      64kb|       8kb|       4kb|      2kb|      1kb|512 bytes|
--------+----------+----------+----------+---------+---------+---------+
CM0x0   |12862 MB/s|10615 MB/s| 8595 MB/s|6080 MB/s|3832 MB/s|2245 MB/s|
CM040   |10073 MB/s| 8527 MB/s| 5989 MB/s|5827 MB/s|3760 MB/s|2260 MB/s|
CM060   | 9249 MB/s| 9155 MB/s| 6872 MB/s| 982 MB/s| 901 MB/s| 772 MB/s|
CMNative|36806 MB/s|26455 MB/s|15666 MB/s|8474 MB/s|4561 MB/s|2313 MB/s|

Intel Core i5-2500k 4Ghz 1600Mhz Ram
--------+----------+----------+----------+---------+---------+---------+
Testsize|      64kb|       8kb|       4kb|      2kb|      1kb|512 bytes|
--------+----------+----------+----------+---------+---------+---------+
CM0x0   | 8454 MB/s| 6448 MB/s| 5369 MB/s|3833 MB/s|2407 MB/s|1365 MB/s|
CM040   | 8641 MB/s| 6186 MB/s| 4297 MB/s|3851 MB/s|2413 MB/s|1368 MB/s|
CM060   | 7790 MB/s| 6556 MB/s| 4859 MB/s| 752 MB/s| 668 MB/s| 552 MB/s|
CMNative|29090 MB/s|15409 MB/s| 8808 MB/s|4671 MB/s|2389 MB/s|1221 MB/s|

Intel Celeron J3355 2Ghz 1333Mhz Ram
--------+----------+----------+----------+---------+---------+---------+
Testsize|      64kb|       8kb|       4kb|      2kb|      1kb|512 bytes|
--------+----------+----------+----------+---------+---------+---------+
CM0x0   | 3086 MB/s| 2475 MB/s| 2154 MB/s|1595 MB/s|1023 MB/s| 584 MB/s|
CM040   | 3176 MB/s| 2482 MB/s| 1974 MB/s|1578 MB/s|1030 MB/s| 580 MB/s|
CM060   | 2970 MB/s| 2423 MB/s| 1981 MB/s| 548 MB/s| 459 MB/s| 345 MB/s|
CMNative| 9532 MB/s| 4751 MB/s| 3635 MB/s|1960 MB/s| 978 MB/s| 523 MB/s|

As you can see the native code is 3 times faster, but depending on the
processor the overhead for calling the native code is only worth above
1kb copysize, so when installing native code it will use CM0x0 below
1024 bytes.

Testresults for 64kb may be a bit to high what the ram really can do because
the large Caches on x86 come in to effect, will max out around 1MB Testsize
and drop back to around 95% speed of 64kb Testsize with 4MB Testsize.
This cache effect happens to the 680x0-code parts aswell, but cleaning the
cache all the time, or measuring for longer isnt really worth the effort.

A lot of functions will feel smoother now, Icon-Drawing/Window-Dragging
and so on, all use CopyMem().

Possible Arguments are now:
S=SAFE=SAFEMODE/S,V=VERBOSE/S,NN=NONATIVE/S:

SafeMode for Amigas with Zorro II-Controller who can only do 24bit-dma.
Verbose will output which code was installed/when it was removed.
NoNative will not use the native function, just the optimized 680x0 code.

Included MemTest from http://aminet.net/package/misc/emu/RaMithlon,
which copies different memoryblocks and checks if the code is working.

I get on a 4790k with WinUAE 4.4 68060-Emulation:
Size  |  Iter   | No CMQ | 040  | 0x0  | Native
------+---------+--------+------+------+-------
   4kb| 1000000 |   42   |  34  |  24  |   7
  16kb|  250000 |   40   |  25  |  22  |   5
  64kb|   22500 |   14   |   7  |   8  |   2
 256kb|    1125 |    4   |   2  |   2  |   0
1024kb|     350 |    4   |   2  |   2  |   0

Update 4.0b: Vampire-Machines got a problem with Native-Init, fixed.

Update 4.1:
Removed all old methods to test for UAE (using fixed address in case
uae.resource wasn't found) - because WinUAE will return completly
different adresses anyway.
This should fix crashes on native Amigas and under Aros.

New 68080 code which is relying on 68080-move16 ability to use any alignment.
This ability is not compatible with real CPUs or WinUAE's emulation and is
tested before using the new 68080-code, because they may change move16 back
to be fully compatible, in this case the 68040-code will be used (which was
faster on Vampire than the 68080-code in 4.0 anyway). I got no hardware/
emulation to test the new 68080-code, but it should be a bit faster.
Included CMBench & Sourcecode from Philippe Carpentier.
Thanks to Gunnar von Boehn from Apollo-Team for explaining details of Vampire-
implementation.

Update 4.2:
Oops, alignmentcheck for /16 destionation in 68080 code was still in there,
now removed. 68080 CopyMem & CopyMemQuick now go full ham Apollo/Vampire,
no more extra dataregister for alignment-checks btst #x,an to go .. :)
Removing some additional commands/a bit quicker due to 68080's abilities.

Update BenchCM V1.8:
More accurate Measurement of Time / Copyspeed, rolling buffer to prevent
caching.

Update 4.3:
This time small memcopies (which are used by the OS all the time) where the
mainfocus, 4.3 will do those 15% quicker than 4.2 on Vampire V2 & V4.
Native Copymem also uses a better method for small copies.
CMBench updated to V1.2 and modified it the same way as BenchCM, it now uses
a rolling buffer, so that caches/preloading/prefetching & burst wont modify
the real speed, now you can give a loop-multiplier as an argument.
Default is for V4 Vampire = 3, V2 users can use "CMBench 1", on WinUAE use
"CMBench 64".
Many thanks to Renaud Schweingruber & Joshua Dolan for testing.

How to install:
Install CopyMemAIO in C:, Call in Startup-Sequence after SetPatch or in User-
Startup. No need to run it, automaticly selects the best code for your cpu.
(Code for 680x0, 68040, 68060, 68080 & native x86)
Or put CopyMemAIO and its icon in the WBStartup-Folder.

For native CopyMem on WinUAE which is 3 times quicker, copy the
winuae_dll/CopyMemAIO.alib to your WinUAE-folder/winuae_dll.
You need to enable the Option under "Miscellaneous" -> "Allow native code".
*** Native Code only works on WinUAE_x86, not on x64 !! ***
There is no need to use the 64-Bitversion anyway.

    DISCLAIMER

        This software is subject to the "Standard Amiga FD-Software Copyright
        Note". It is Giftware as defined in paragraph 4g. If you like it and
		use it regulary, please send me a small gift.
		For more information please read "AFD-COPYRIGHT".

        Diese Software unterliegt der "Standard Amiga FD-Software Copyright
        Note". Sie ist Giftware wie definiert in Absatz 4g. Falls du sie magst
		und regelmaessig benutzt, sende bitte ein kleines Geschenk.
		Fuer mehr Informationen lies bitte "AFD-COPYRIGHT".

        (/pub/aminet/docs/misc/AFD-FilesV-XX.lha V=Version,XX=Languages)

    AUTHOR

        Please send comments, bug-reports or small gifts like a Vampire V4
        or a now "worthless :P" NVidia RTX 2080 Ti, or Paypal me to:

        Holger.Hippenstiel AT gmx.de
        Hauptstr. 38
        71229 Leonberg
        Germany


Contents of util/boot/CopyMemAIO.lha
PERMISSION  UID  GID    PACKED    SIZE  RATIO METHOD CRC     STAMP     NAME
---------- ----------- ------- ------- ------ ---------- ------------ ----------
[unknown]                 2898    7381  39.3% -lh5- d49c Oct 27  1999 afd-copyright
[unknown]                  810    1576  51.4% -lh5- 07e8 Oct 11 21:11 AFD-COPYRIGHT.info
[unknown]                 1335    2256  59.2% -lh5- 449d Oct  7 07:26 benchmarks/BenchCM
[unknown]                 6416   10220  62.8% -lh5- cb11 Oct 11 16:57 benchmarks/CMBench
[unknown]                 1225    2028  60.4% -lh5- b128 Jan  6  2002 benchmarks/MemTest
[unknown]                 3308    6048  54.7% -lh5- 3eb8 Sep  9 10:54 benchmarks/TestIt
[unknown]                 1791    5252  34.1% -lh5- de3d Oct 11 21:05 CopyMemAIO
[unknown]                  889    1220  72.9% -lh5- 4116 Oct 11 21:05 CopyMemAIO.info
[unknown]                 4625   11232  41.2% -lh5- 4546 Oct 11 21:07 CopyMemAIO.txt
[unknown]                  978    1344  72.8% -lh5- a0a6 Oct 11 21:11 CopyMemAIO.txt.info
[unknown]                 2491    7468  33.4% -lh5- be22 Oct  7 07:26 source/BenchCM.s
[unknown]                 1986    7107  27.9% -lh5- d16e Oct 11 16:56 source/CMBench.c
[unknown]                 3138    9282  33.8% -lh5- a610 Oct 11 20:35 source/CopyMemAIO.s
[unknown]                 1017    3035  33.5% -lh5- 4d7f Sep 25 02:22 source/Func_CM040.s
[unknown]                  952    2567  37.1% -lh5- 9950 Sep 25 02:23 source/Func_CM060.s
[unknown]                  531    1303  40.8% -lh5- a46f Oct 11 19:48 source/Func_CM080.s
[unknown]                  674    1703  39.6% -lh5- 13a0 Sep 25 02:29 source/Func_CM0x0.s
[unknown]                 1120    2991  37.4% -lh5- 7720 Oct 11 18:44 source/Func_CMNative.s
[unknown]                  897    2511  35.7% -lh5- db17 Sep 25 02:31 source/Func_CMQ040.s
[unknown]                  807    2016  40.0% -lh5- 0730 Sep 25 02:32 source/Func_CMQ060.s
[unknown]                  506    1133  44.7% -lh5- 46e5 Oct 11 19:48 source/Func_CMQ080.s
[unknown]                  578    1384  41.8% -lh5- 2788 Sep 25 02:34 source/Func_CMQ0x0.s
[unknown]                 1014    3911  25.9% -lh5- 907a Oct 11 18:53 source/Func_SmallCopy.s
[unknown]                  831    3347  24.8% -lh5- 7f8a Oct 11 20:31 source/Func_SmallCopy080.s
[unknown]                 5826   11264  51.7% -lh5- 58a0 Sep 17 15:30 winuae_dll/CopyMemAIO.alib
---------- ----------- ------- ------- ------ ---------- ------------ ----------
 Total        25 files   46643  109579  42.6%            Oct 12 03:31

Aminet © 1992-2020 Urban Müller and the Aminet team. Aminet contact address: <aminetaminet net>