Technologyscalinghasenabledanincreasingnumberofcoresonachipin3DChip-Multiprocessors (CMP). As a result, more cache resources are needed to feed all the cores. With the increase in cache capacity, traditional SRAM consumes a large chip area and has high power consumption. The new Non-Volatile Memory (NVM) has the advantages of low static power consumption, high density and non-volatility, and is expected to replace the traditional SRAM [1–4]. However, NVM also suﬀers from challenges, such as limited write endurance and large write power consumption [5,6]. In order to solve these problems, hybrid cache architecture of SRAM and NVM is generally adopted in 3D CMPs [7–11]. Wu et al.  proposed a read-write-aware hybrid cache architecture, in which the cache is divided into read and write portions. Each portion contains STT-RAM and SRAM to improve the system performance. Lin et al.  used hybrid cache partitioning to improve write operations in Non-Uniform Cache Architecture (NUCA) architectures. It can be seen that hybrid cache architecture can eﬀectively avoid the shortcomings of both NVM and SRAM technology. A hybrid cache design relies on an intelligent data access policy that makes good use of the characteristics of both NVM and SRAM technology. Recent work has focused on optimizing the data access policy of hybrid cache, mainly to reduce the write overhead in NVM [12–15]. Ahn et al.  proposed a read-write mechanism in the hybrid cache architecture. This read–write mechanism can predict and bypass dead write operations, thereby reducing write overhead. They did not solve the problem of possible frequent write operations to NVM. Wang et al.  and Khan  proposed line placement and migration policy to improve system performance and reduce power consumption overhead. However,theydidnotconsiderdatamigrationbetweendiﬀerentcachelevels. Inmulti-level hybrid cache, the cost of migration between diﬀerent cache levels is large and should be considered. This paper proposes a cache ﬁll and migration policy (CFM) for multi-level hybrid cache in 3D CMPs. The CFM optimizes data access in three aspects: Cache ﬁll, cache eviction and data migration compared to the conventional writeback access policy. Firstly, the CFM can eﬀectively reduce the unnecessarycacheﬁllandthewriteoperationstotheNVMifacacheﬁllisrequired. Secondly,theCFM optimizesthevictimcachelineselectionincache eviction toreducethedata migration overheadofthe dirty victim. Finally, the CFM analyzes the migration cost in multi-level hybrid cache architecture and proposes two migration principles to minimize the migration cost. These two migration principles are suitable in dark silicon era, in which case some open cache banks become closed and some closed cache banks become open (we call this cache architecture dynamically reconﬁguration). The results show that in multi-level hybrid cache architecture, the CFM can achieve performance improvement and power saving eﬀectively. The rest of the paper is organized as follows. Section 2 analyzes the problems in conventional writebackaccesspolicyinhybridcache. Section3presentstheproposedcacheﬁllandmigrationpolicy. Section 4 shows the experimental results and ﬁnally the conclusion is given in Section.
In conventional writeback cache access policy, data access can be divided into write access and read access. For write access, if it is missed on the cache, it will access the main memory. After that, theaccesseddatalineisfetchedandwillbewrittenbacktothecache,wecallthiscacheﬁll. Ifthecache is full, a victim cache line will be selected to be replaced according to the replacement strategy. If the victim cache line is dirty, the dirty data need a writeback to the lower level cache or main memory. The read access is basically the same as write access. Fromtheaboveaccessprocess,itcanbeseenthatinthehybridcachearchitecture,theconventional writeback access policy has the following problems:
• After an access miss on the cache, a cache ﬁll is required, and a writeback might be required (if cache eviction occurs). If the fetched data line will not be accessed in the future, the cache ﬁll is redundant. If the dirty data in the victim cache line will not be accessed as well, the writeback is unnecessary.
• Inacacheﬁll,forthehybridcachetheconventionalaccesspolicydoesnotconsiderthetypeofthe cache. If the fetched data line will be frequently written, it is inappropriate to place the line in the NVM, otherwise it will cause a large write overhead.
• As mentioned in Section 1, in 3D CMPs cache resource is increased steadily. However, all of them cannot be simultaneously used within the peak power budgets. This phenomenon is called the dark silicon [16–18]. In the dark silicon era, the hybrid cache hierarchy might be dynamically reconﬁgured . In this case, partially cache banks open will be closed. The conventional access policy does not support this situation, so it might cause data loss.
Recently, in 3D Chip-Multiprocessors (CMPs), a hybrid cache architecture of SRAM and Non-Volatile Memory (NVM) is generally used to exploit high density and low leakage power of NVMandalowwriteoverheadofSRAM.Theconventionalaccesspolicydoesnotconsiderthehybrid cacheandcannotmakegooduseofthecharacteristicsofbothNVMandSRAMtechnology. Thispaper proposes a Cache Fill and Migration policy (CFM) for multi-level hybrid cache. In CFM, data access was optimized in three aspects: Cacheﬁll,cacheeviction,anddirtydatamigration.TheCFMreduces unnecessary cache ﬁll, write operations to NVM, and optimizes the victim cache line selection in cache eviction. The results of experiments show that the CFM can improve performance by 24.1% and reduce power consumption by 18% when compared to conventional writeback access policy.
We analyzed the problems in conventional writeback access policy for hybrid cache architecture. Thenweproposedacacheﬁllandmigrationpolicy(CFM)formulti-levelhybridcachein 3D CMP. The CFM reduces unnecessary cache ﬁll, write operations to the NVM and optimizes victim cache line selection. In addition, the CFM optimizes the data migration between diﬀerent cache levels. TheexperimentscarriedoutwithSPEC2006benchmarksshowedthattheCFMcanachieve16%power savings and 25.6% performance improvement, compared to the conventional writeback cache access policy which does not consider the hybrid cache architecture and the cache reconﬁguration in dark silicon era. This paper was mainly focused on the optimization of write operations in some diﬀerent cache access situations. However, there are some other cache access situations such as prefetch write operations and hit write operations in our cache architecture that we did not consider, and they make up a large part of the write operations. We will improve this in future work.