M Nakasumi, S Okamoto, M Sowa
INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-IV, PROCEEDINGS, IV 1621-1627, 1998 Peer-reviewed
We previously proposed a Program Controlled Cache Level Memory on Parallel Computer. It can migrate data between high speed memory as fast as cache memory and NUMA-type shared memory by data transfer program. Program Controlled Cache Memory is composed by a word-addressable high speed memory (Cache Level Memory) and a hardware mechanism which executes instructions to migrate variable sized data. Program Controlled Cache Memory can reduce the processor stall time by network busy. Because cache line sized data, transfer brings unnecessary network traffic and waste of cache. But variable sized data transfer, instead of cache line sized, makes network traffic minimize, and injecting indispensable data into Cache Level Memory brings no waste of Cache Level Memory. To work Program Controlled Cache Level Memory efficiently, programmers and compiler writers mast transform the application so that its memory referencing behavior better exploits the memory hierarchy.
This paper describes how programmer can improve memory performance through three well known communication optimizations, such as communication. pipelining, communication combination, and communication overlapping, on Program Controlled Cache Level Memory.
We also provide an optimizing example which has data distribution buys. The statistics of Mem-Spy produce useful information to optimize program for Program Controlled Cache Level Memory.