Hikari Oura, Hiroko Midorikawa, Kenji Kitagawa, Munenori Kai
2017 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, PACRIM 2017 - Proceedings, 2017- 1-8, Nov 27, 2017 Peer-reviewed
A remote memory paging system called a distributed large memory (DLM) has been developed, which uses remote-node memories in a cluster, as the main memory extension of a local node. The DLM is available for out-of-core processing, i.e., processing of large-size data that exceeds the main memory capacity in the local node. By using the DLM and memory servers, it is possible to run multi-Thread programs written in Open MP and pthread for large-scale problems on a computation node whose main memory capacity is smaller than the problem data size. A page swap protocol and its implementation are significant factors in the performance of remote memory paging systems. A current version of the DLM has a bottleneck in efficient page swapping because all communication managements between memory servers and the local computation node are allocated to one system thread. This paper proposes two new page swap protocols and implementations by introducing another new system thread to alleviate this situation. They are evaluated by a micro-benchmark, Stream benchmark, and a 7-point stencil computation program. As a result, the proposed protocol improves the performance degradation ratio, i.e., the performance using the DLM divided by the performance using only the local memory, from 57% in the former protocol to 78% in stencil computation, which processes data whose capacity is four times larger than the local memory capacity.