Czech translation (translated by Alex Kovalsky)
Many authors recommend, for good coding style, to use A(:) to refer to a whole array rather than A. So do I, and I became so quickly accustomed to this convention that I now get mad when trying to read programs where it is not used.
Well, most people would say that using such conventions can do no harm. Or could it ?
To my great surprise, most compilers do not treat the two alternatives the same way when calling subroutines. Apart from very few systems, the consequence of using A(:) instead of A in a call to a subroutine is a slow-down by a ratio that can reach more than 20. Not 20%. A ratio, practically turning your machine into an abacus !
The test is easy: it needs just to call the DDOT tuned BLAS either way in a performance measuring loop. And here are some results, gathered with the help of some Comp.lang.Fortran newsgroup readers:
|System||BLAS||A(:)/A slow-down ratio|
|Lahey LF90 2.01g / WinNT, P90||=|
|SGI R8000 / IRIX 6.1||Sgimath||=|
|NagAcef90 / Solaris 2.5||tuned||=|
|Nag f90 v2.2 / Any||= (now corrected)|
|Cray C94 / UNICOS 9||Libsci||2.25|
|V5.2-high enough / Dec Alpha||dxml||=|
|CVF / WinNT 4.0 - Pentium II||=|
|xlf90 6.1 / IBM RS6000||essl||=|
|Nag f90 v2.1 / SunOS 4.1.3||tuned||4.0|
|Sun f90 / Solaris 2.5||sunperf||=|
|Nag f90 v2.1 / Dec Alpha||dxml||11.25|
|HP f90 2.3 / PA 2.0||blas||=|
|IBM SP2 (wide nodes)||essl||23.5|
|MS Powerstation 4.0||infinity|
The above figures are still better than what would have been obtained if an INTERFACE block had not been provided for DDOT: slow-down is the effect of copying the argument arrays into temporaries as if they were array sub-sections. As the input arrays are declared with the INTENT (IN) attribute, they are only copied once, before the call, into the temporary. If the INTENT (IN) attribute is left out, copying also occurs back after the call, and slow-down is almost doubled !
The MS Powerstation 4.0 was not able to complete the test, as it does not seem to deallocate the temporary. The amount of memory used thus increases as the loop is executed, until one exhausts the available resource.
I am not an optimization wizard, yet I believe that it is fairly simple to modify a compiler to have it recognize that A(:) is not to be considered as a sub-section of array A, but as the whole array. I hope that those numerous vendors who have it wrong in the above test will swiftly implement this modification, so that good programming practice can be rewarded by good execution performance.
It is also unfortunate that at the standard -O or -fast level of optimization, and for a simple program with no aliasing risk and with clear statement that the array section is not modified in the function, the compiler should fail to move the temporary allocation and copying out of the loop. Improvement on this sort of ``high level'', as opposed to assembly language, optimization is dearly needed.
I would also like to remark that common belief is that optimization is a matter of a few percent, and usually not worth the time of a programmer apart from a few very critical applications such as meteorological forecast models. Fortran 90, and the recent evolution in systems, may have made this statement obsolete. Minor changes, that can be made in a matter of seconds, may now lead to dramatic improvement of a program's performance. For instance, our most demanding applications spend about 50 to 80% of their time in the BLAS. On the systems that we use, removal of the (:) provides a speed-up of 2 to 3 !
Thanks to Ian J. Bush (I.J.Bush@dl.ac.uk), Arnaud Desitter (NAG), Mark Dewing (University of Illinois at Urbana, U.S.A.), Juha Haataja (Center for Scientific Computing, Finland), Sune Karlsson (Stockholm School of Economics, Sweden), Jonathan Wheeler (Rutherford Appleton Laboratory, U.K.) for their help in benchmarking, and to Stefano Baroni (Centre Européen de Calcul Atomique et Moléculaire, France / Scuola Internazionale Superiore di Studi Avanzati, Italy) for the initial idea leading to these tests. The tuned BLAS used with SunOS 4.1.3 and NagAce Solaris 2.5 were provided by Hans Olsson (University of Lund, Sweden).
For those who might like to try for themselves, a test program is available.