Technologyscalinghasenabledanincreasingnumberofcoresonachipin3DChip-Multiprocessors (CMP). As a result, more cache resources are needed to feed all the cores. With the increase in cache capacity, traditional SRAM consumes a large chip area and has high power consumption. The new Non-Volatile Memory (NVM) has the advantages of low static power consumption, high density and non-volatility, and is expected to replace the traditional SRAM [1–4]. However, NVM also suffers from challenges, such as limited write endurance and large write power consumption [5,6]. In order to solve these problems, hybrid cache architecture of SRAM and NVM is generally adopted in 3D CMPs [7–11]. Wu et al. [8] proposed a read-write-aware hybrid cache architecture, in which the cache is divided into read and write portions. Each portion contains STT-RAM and SRAM to improve the system performance. Lin et al. [11] used hybrid cache partitioning to improve write operations in Non-Uniform Cache Architecture (NUCA) architectures. It can be seen that hybrid cache architecture can effectively avoid the shortcomings of both NVM and SRAM technology. A hybrid cache design relies on an intelligent data access policy that makes good use of the characteristics of both NVM and SRAM technology. Recent work has focused on optimizing the data access policy of hybrid cache, mainly to reduce the write overhead in NVM [12–15]. Ahn et al. [13] proposed a read-write mechanism in the hybrid cache architecture. This read–write mechanism can predict and bypass dead write operations, thereby reducing write overhead. They did not solve the problem of possible frequent write operations to NVM. Wang et al. [14] and Khan [15] proposed line placement and migration policy to improve system performance and reduce power consumption overhead. However,theydidnotconsiderdatamigrationbetweendifferentcachelevels. Inmulti-level hybrid cache, the cost of migration between different cache levels is large and should be considered. This paper proposes a cache fill and migration policy (CFM) for multi-level hybrid cache in 3D CMPs. The CFM optimizes data access in three aspects: Cache fill, cache eviction and data migration compared to the conventional writeback access policy. Firstly, the CFM can effectively reduce the unnecessarycachefillandthewriteoperationstotheNVMifacachefillisrequired. Secondly,theCFM optimizesthevictimcachelineselectionincache eviction toreducethedata migration overheadofthe dirty victim. Finally, the CFM analyzes the migration cost in multi-level hybrid cache architecture and proposes two migration principles to minimize the migration cost. These two migration principles are suitable in dark silicon era, in which case some open cache banks become closed and some closed cache banks become open (we call this cache architecture dynamically reconfiguration). The results show that in multi-level hybrid cache architecture, the CFM can achieve performance improvement and power saving effectively. The rest of the paper is organized as follows. Section 2 analyzes the problems in conventional writebackaccesspolicyinhybridcache. Section3presentstheproposedcachefillandmigrationpolicy. Section 4 shows the experimental results and finally the conclusion is given in Section.

In conventional writeback cache access policy, data access can be divided into write access and read access. For write access, if it is missed on the cache, it will access the main memory. After that, theaccesseddatalineisfetchedandwillbewrittenbacktothecache,wecallthiscachefill. Ifthecache is full, a victim cache line will be selected to be replaced according to the replacement strategy. If the victim cache line is dirty, the dirty data need a writeback to the lower level cache or main memory. The read access is basically the same as write access. Fromtheaboveaccessprocess,itcanbeseenthatinthehybridcachearchitecture,theconventional writeback access policy has the following problems:

• After an access miss on the cache, a cache fill is required, and a writeback might be required (if cache eviction occurs). If the fetched data line will not be accessed in the future, the cache fill is redundant. If the dirty data in the victim cache line will not be accessed as well, the writeback is unnecessary.

• Inacachefill,forthehybridcachetheconventionalaccesspolicydoesnotconsiderthetypeofthe cache. If the fetched data line will be frequently written, it is inappropriate to place the line in the NVM, otherwise it will cause a large write overhead.

• As mentioned in Section 1, in 3D CMPs cache resource is increased steadily. However, all of them cannot be simultaneously used within the peak power budgets. This phenomenon is called the dark silicon [16–18]. In the dark silicon era, the hybrid cache hierarchy might be dynamically reconfigured [19]. In this case, partially cache banks open will be closed. The conventional access policy does not support this situation, so it might cause data loss.

Recently, in 3D Chip-Multiprocessors (CMPs), a hybrid cache architecture of SRAM and Non-Volatile Memory (NVM) is generally used to exploit high density and low leakage power of NVMandalowwriteoverheadofSRAM.Theconventionalaccesspolicydoesnotconsiderthehybrid cacheandcannotmakegooduseofthecharacteristicsofbothNVMandSRAMtechnology. Thispaper proposes a Cache Fill and Migration policy (CFM) for multi-level hybrid cache. In CFM, data access was optimized in three aspects: Cachefill,cacheeviction,anddirtydatamigration.TheCFMreduces unnecessary cache fill, write operations to NVM, and optimizes the victim cache line selection in cache eviction. The results of experiments show that the CFM can improve performance by 24.1% and reduce power consumption by 18% when compared to conventional writeback access policy.

We analyzed the problems in conventional writeback access policy for hybrid cache architecture. Thenweproposedacachefillandmigrationpolicy(CFM)formulti-levelhybridcachein 3D CMP. The CFM reduces unnecessary cache fill, write operations to the NVM and optimizes victim cache line selection. In addition, the CFM optimizes the data migration between different cache levels. TheexperimentscarriedoutwithSPEC2006benchmarksshowedthattheCFMcanachieve16%power savings and 25.6% performance improvement, compared to the conventional writeback cache access policy which does not consider the hybrid cache architecture and the cache reconfiguration in dark silicon era. This paper was mainly focused on the optimization of write operations in some different cache access situations. However, there are some other cache access situations such as prefetch write operations and hit write operations in our cache architecture that we did not consider, and they make up a large part of the write operations. We will improve this in future work.


Please enter your comment!
Please enter your name here