系统相关
首页 > 系统相关> > linux那些事之zero page【转】

linux那些事之zero page【转】

作者:互联网

转自:https://blog.csdn.net/weixin_42730667/article/details/123121624

zero page
zero page是一个特殊的物理页,里面值全部为0,zero page是针对匿名页场景专门进行优化,主要是节省内存和对性能进行了一定优化。当malloc或者mapp一段虚拟内存后,第一次对该内存访问为读操作,将会发生匿名page fault。do_anonymous_page处理,由于第一次为读操作还未发生写操作,因此发生一个zero page,为其申请一个特殊物理页zero page。

if a process instantiates a new (non-huge) page by trying to read from it, the kernel still will not allocate a new memory page. Instead, it maps a special page, called simply the "zero page," into the process's address space instead. Thus, all unwritten anonymous pages, across all processes in the system, are, in fact, sharing one special page. Needless to say, the zero page is always mapped read-only; it would not do to have some process changing the value of zero for everybody else. Whenever a process attempts to write to the zero page, it will generate a write-protection fault; the kernel will then (finally) get around to allocating a real page of memory and substitute it into the process's address space at the right spot

zero page 好处:

zero page是一个全局唯一的一个物理页,且只有一个物理页。
zero page可以节省很对不必要的物理内存开销。在实际应用程序场景中,经常存在其虚拟内存已经申请,且只对该内存进行读取过,但是从未对该内存进行过写操作,如果针对此场景发生只读page fault时,也为其虚拟内存区域全部申请对应物理内存将会极大浪费内存 因为后面从没有对该内存真正进行写过。如果针对此场景,针对只读过未初始化的内存,全部映射到同一个内容全部为0的物理内存页上将会大大节省物理内存。
提高效率,由于针对只读page fault,并没有进入buddy分配物理页,而是直接使用zero page(初始化就已经分配好),所以效率会提高很多。
使用zero page 可以防止 由于之前申请释放的物理页残留而造成的脏数据。
When an anonymous memory area is created or extended, no actual pages of memory are allocated (whether transparent huge pages are enabled or not). That is because a typical program will never touch many of the pages that are part of its address space; allocating pages before there is a demonstrated need would waste a considerable amount of time and memory. So the kernel will wait until the process tries to access a specific page, generating a page fault, before allocating memory for that page。

But, even then, there is an optimization that can be made. New anonymous pages must be filled with zeroes; to do anything else would be to risk exposing whatever data was left in the page by its previous user. Programs often depend on the initialization of their memory; since they know that memory starts zero-filled, there is no need to initialize that memory themselves. As it turns out, a lot of those pages may never be written to; they stay zero-filled for the life of the process that owns them. Once that is understood, it does not take long to see that there is an opportunity to save a lot of memory by sharing those zero-filled pages. One zero-filled page looks a lot like another, so there is little value in making too many of them.

zero page 劣势:

当内存先读后写时,会触发两次page fault,先触发读内存page fault 使用zero page刷新对应映射,然后由于写内存会再次触发page fault才生成新的实际物理内存,相对之前一次触发page fault,消耗可能会增多。
匿名页do_anonymous_page处理

 

 

 

匿名页page fault针对zero page处理主要由读写内存顺序不同,触发page fault处理不同,如下:

 

 

 

当内存第一次访问是读则会触发读page fault,会申请zero page,且内存属性为只读。
当内存先读后写,则会由于读page fault设置属性为只读,当写内存时会再次触发page fault,申请物理页并更改pte。
当内存第一次访问为写触发,则会直接调用alloc_zeroed_user_highpage_movable申请新的物理页。
page fault zero page 处理
匿名页page fault zero page处理如下:

static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
{
... ...

//读page fault 触发zero page
/* Use the zero-page for reads */
if (!(vmf->flags & FAULT_FLAG_WRITE) &&
!mm_forbids_zeropage(vma->vm_mm)) {
entry = pte_mkspecial(pfn_pte(my_zero_pfn(vmf->address),
vma->vm_page_prot));
vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
vmf->address, &vmf->ptl);
... ...
pte_mkspecial(pfn_pte(my_zero_pfn(vmf->address),
vma->vm_page_prot))
goto setpte;
}

... ...

//写page fault 调用alloc_zeroed_user_highpage_movable申请物理内存
page = alloc_zeroed_user_highpage_movable(vma, vmf->address);
if (!page)
goto oom;

... ...
//写page fault,更新pte 从buddy中申请的物理内存,并刷新pte及对应权限。
entry = mk_pte(page, vma->vm_page_prot);
entry = pte_sw_mkyoung(entry);
setpte:
set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry);

/* No need to invalidate - it was non-present before */
update_mmu_cache(vma, vmf->address, vmf->pte);
... ...
}
vmf->flags 代表触发page 原因,当为只读(vmf->flags & FAULT_FLAG_WRITE)时,且支持zero page特性,则会进入zero page处理。
my_zero_pfn(vmf->address): 获取导zero page pfn。
pte_mkspecial(pfn_pte(my_zero_pfn(vmf->address), vma->vm_page_prot)):设置zero page PTE。
set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry):刷新page table对应PTE,即该虚拟地址对应物理页为zero page。
当为写内存触发page fault时会调用alloc_zeroed_user_highpage_movable 申请物理内存。
entry = mk_pte(page, vma->vm_page_prot):根据新申请的物理内存以及对应物理页权限组装成entry。
set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry) 重新刷新entry 将之前zero page对应映射覆盖掉。
update_mmu_cache: 刷新MMU。
my_zero_pfn
my_zero_pfn为获取到zero page pfn, 当不支持__HAVE_COLOR_ZERO_PAGE时,my_zero_pfn为处理为:

static inline unsigned long my_zero_pfn(unsigned long addr)
{
extern unsigned long zero_pfn;
return zero_pfn;
}
zero_pfn为全局唯一的一个特殊物理页pfn,内容全部为零,在系统初始时就已经定义完成:

static int __init init_zero_pfn(void)
{
zero_pfn = page_to_pfn(ZERO_PAGE(0));
return 0;
}
ZERO_PAGE
ZERO_PAGE为获取到zero page对应的实际物理页,x86系统(arch\x86\include\asm\pgtable.h)文件中定义:

/*
* ZERO_PAGE is a global shared page that is always zero: used
* for zero-mapped memory areas etc..
*/
extern unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)]
__visible;
#define ZERO_PAGE(vaddr) ((void)(vaddr),virt_to_page(empty_zero_page))
empty_zero_page为一个占用实际大小为一个物理页的unsigned long,为一个全局变量,其内存大小占用一个物理页,大小在编译时就已经指定,当内核启动加载image时将该全局变量加载到内存中,empty_zero_page数组首地址就是zero page的实际虚拟地址,由于内核数据段是采用一一映射,所以虚拟地址可以直接转换成物理地址。

empty_zero_page为一段汇编代码进行一个初始化全部为0(arch\x86\kernel\head_64.S)

__PAGE_ALIGNED_BSS
SYM_DATA_START_PAGE_ALIGNED(empty_zero_page)
.skip PAGE_SIZE
SYM_DATA_END(empty_zero_page)
EXPORT_SYMBOL(empty_zero_page)
empty_zero_page位于BSS段,使用GUN .skip指令将其填充为0,大小为PAGE_SIZE。

.skip指令相当于.space指令,格式为:

..skip size [,fill]

size 为要填充的内存大小。

file为要填充的数值,如果省略则默认为0

zero page 实验可以参考《Introduce huge zero page》 中的实验,实验代码如下:

#include <assert.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>

#define MAX_NUM 10

#define BUFFER_SIZE 4096
int main()
{
char *a[MAX_NUM]={NULL};


for (int i=0; i<MAX_NUM;i++){
posix_memalign((void **)&a[i],4096,BUFFER_SIZE);

for(int j=0;j<BUFFER_SIZE;j++){
assert(a[i][j] == 0);
}

getchar();
}

}
使用free 命令观察每次循环之后物理内存没有增长现象。

alloc_zeroed_user_highpage_movable
该函数是当写内存触发page fault时,会调用该函数申请实际物理内存:

static inline struct page *
alloc_zeroed_user_highpage_movable(struct vm_area_struct *vma,
unsigned long vaddr)
{
return __alloc_zeroed_user_highpage(__GFP_MOVABLE, vma, vaddr);
}
为对应vma申请物理内存,标记为位__GFP_MOVABLE,允许从ZONE_MOVABLE或者可移动迁移类中中申请内存:

#define __alloc_zeroed_user_highpage(movableflags, vma, vaddr) \
alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr)
再次添加GFP_HIGHUSER和__GFP_ZERO标记为:

如果由ZONE_HIGH,则尽量从ZONE_HIGH中申请物理内存,会设置__GFP_HARDWALL当water mark低于min时允许使用一定数量保留物理内存,并设置__GFP_RECLAIM,内存不足时允许直接触发内存回收。
__GFP_ZERO 分配物理内存内存全部初始化为0。
参考资料
The GNU Assembler

Adding a huge zero page [LWN.net]

Introduce huge zero page [LWN.net]
————————————————
版权声明:本文为CSDN博主「Huo的藏经阁」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/weixin_42730667/article/details/123121624

标签:vmf,fault,pte,zero,内存,linux,page
来源: https://www.cnblogs.com/sky-heaven/p/16621085.html