第63篇-解释器与编译器适配(二)
作者:互联网
这一篇详细介绍相关适配器的代码片段。
1、解释执行切换到编译执行的例程
调用SharedRuntime::gen_i2c_adapter()函数生成解释执行切换到编译执行的例程,如下:
注意生成的汇编代码会以函数传入的实参的不同而不同,例如传入的实参是2时的汇编如下:
0x00007fffe110a1a0: mov (%rsp),%rax // pick up the return address // 如果加-XX:VerifyAdapterCalls=false时不会生成 // ... // 正确时跳转到这里 // Must preserve(维护) original SP for loading incoming arguments because // we need to align the outgoing(离开的) SP for compiled code. 0x00007fffe110a267: mov %rsp,%r11 // Ensure compiled code always sees stack at proper alignment 0x00007fffe110a26a: and $0xfffffffffffffff0,%rsp // %rsp按照16字节对齐 // push the return address and misalign 不重合 the stack that youngest frame always sees // as far as(只要,直到...为止) the placement of the call instruction 0x00007fffe110a26e: push %rax // Put saved SP in another register 0x00007fffe110a26f: mov %r11,%rax // Will jump to the compiled code just as if compiled code was doing it. // Pre-load the register-jump target early, to schedule it better. 0x00007fffe110a272: mov 0x48(%rbx),%r11 // 获取Method:_from_compiled_entry值存储到%r11中 // Now generate the shuffle code. Pick up all register args and move the // rest through the floating point stack top. 0x00007fffe110a276: mov 0x8(%rax),%rsi // saved_sp+0x8移动到%rsi,准备参数的操作,%rsi是第2个参数用的指定寄存器 0x00007fffe110a27a: mov %rbx,0x248(%r15) // 将%rbx的值存储到JavaThread::_callee_target中,%rbx应该是Method* // put Method* where a c2i would expect should we end up there // only needed becaus of c2 resolve stubs return Method* as a result in rax 0x00007fffe110a281: mov %rbx,%rax 0x00007fffe110a284: jmpq *%r11
在跳转之前的栈帧如下图所示。
2、编译执行切换到解释执行的例程
调用SharedRuntime::gen_c2i_adapter()函数生成编译执行切换到解释执行的例程,在这个函数中首先会调用patch_callers_callsite()函数,生成的例程如下:
调用生成的汇编代码如下:
// %rbx中存储的是Method*,比较Method::_code属性与NULL_WORD 0x00007fffe110a2b4: cmpq $0x0,0x50(%rbx) // 如果Method::_code为NULL,则跳转到L 0x00007fffe110a2bc: je 0x00007fffe110a3a4 // Method::_code属性不为空时才会执行下面的汇编 0x00007fffe110a2c2: mov %rsp,%r13 // 保存当前的栈指针 // Schedule(安排) the branch target address early. // Call into the VM to patch the caller, then jump to compiled callee // rax isn't live so capture return address while we easily can // 将返回地址保存到%rax中,因为编译的方法在调用方法时,会压入 // 返回地址,所以(%rsp)获取的是返回地址 0x00007fffe110a2c5: mov (%rsp),%rax 0x00007fffe110a2c9: and $0xfffffffffffffff0,%rsp // 对齐栈 // 省略调用push_CPU_state()方法生成的汇编 // ... // %rbx->c_rarg0,其中%rbx中存储的是Method*,因为调用函数的第1个参数要求Method* 0x00007fffe110a334: mov %rbx,%rdi // %rax->c_rarg1,%rax中保存的是返回地址,因为调用函数第2个参数要求caller_pc 0x00007fffe110a337: mov %rax,%rsi // 调用SharedRuntime::fixup_callers_callsite()函数 0x00007fffe110a33a: callq 0x00007ffff6a0ec10 // 省略调用pop_CPU_state()方法生成的汇编 // ... // 恢复rsp寄存器 0x00007fffe110a3a1: mov %r13,%rsp // **** L **** // **** skip_fixup ****
从如上汇编可以看出,如果Method::code为NULL,则直接跳转到L,否则调用SharedRuntime::fixup_callers_callsite()函数。
调用的SharedRuntime::fixup_callers_callsite()函数的实现如下 :
// We are calling the interpreter via a c2i. Normally this would mean that // we were called by a compiled method. However we could have lost a race // where we went int -> i2c -> c2i and so the caller could in fact be interpreted. // If the caller is compiled we attempt to patch the caller // so he no longer calls into the interpreter. IRT_LEAF(void, SharedRuntime::fixup_callers_callsite(Method* method, address caller_pc)) Method* moop(method); address entry_point = moop->from_compiled_entry(); // It's possible that deoptimization can occur at a call site which hasn't // been resolved yet, in which case this function will be called from // an nmethod that has been patched for deopt and we can ignore the // request for a fixup. // Also it is possible that we lost a race in that from_compiled_entry // is now back to the i2c in that case we don't need to patch and if // we did we'd leap into space because the callsite needs to use // "to interpreter" stub in order to load up the Method*. Don't // ask me how I know this... CodeBlob* cb = CodeCache::find_blob(caller_pc); if (!cb->is_nmethod() || entry_point == moop->get_c2i_entry()) { return; } // The check above makes sure this is a nmethod. nmethod* nm = cb->as_nmethod_or_null(); assert(nm, "must be"); // Get the return PC for the passed caller PC. address return_pc = caller_pc + frame::pc_return_offset; // 对于x86来说,pc_return_offset的值为0 // There is a benign race here. We could be attempting to patch to a compiled // entry point at the same time the callee is being deoptimized. If that is // the case then entry_point may in fact point to a c2i and we'd patch the // call site with the same old data. clear_code will set code() to NULL // at the end of it. If we happen to see that NULL then we can skip trying // to patch. If we hit the window where the callee has a c2i in the // from_compiled_entry and the NULL isn't present yet then we lose the race // and patch the code with the same old data. Asi es la vida.带些许无奈口气 if (moop->code() == NULL){ return; } if (nm->is_in_use()) { // Expect to find a native call there (unless 除非; 除非在 it was no-inline cache vtable dispatch) MutexLockerEx ml_patch(Patching_lock, Mutex::_no_safepoint_check_flag); if (NativeCall::is_call_before(return_pc)) { NativeCall *call = nativeCall_before(return_pc); // // bug 6281185. We might get here after resolving a call site to a vanilla // virtual call. Because the resolvee uses the verified entry it may then // see compiled code and attempt to patch the site by calling us. This would // then incorrectly convert the call site to optimized and its downhill from // there. If you're lucky you'll get the assert in the bugid, if not you've // just made a call site that could be megamorphic into a monomorphic site // for the rest of its life! Just another racing bug in the life of // fixup_callers_callsite ... // RelocIterator iter(nm, call->instruction_address(), call->next_instruction_address()); iter.next(); assert(iter.has_current(), "must have a reloc at java call site"); relocInfo::relocType typ = iter.reloc()->type(); if ( typ != relocInfo::static_call_type && typ != relocInfo::opt_virtual_call_type && typ != relocInfo::static_stub_type ){ return; } address destination = call->destination(); if (destination != entry_point) { CodeBlob* callee = CodeCache::find_blob(destination); // callee == cb seems weird. It means calling interpreter thru stub. if (callee == cb || callee->is_adapter_blob()) { // static call or optimized virtual if (TraceCallFixup) { tty->print("fixup callsite at " INTPTR_FORMAT " to compiled code for", caller_pc); moop->print_short_name(tty); tty->print_cr(" to " INTPTR_FORMAT, entry_point); } call->set_destination_mt_safe(entry_point); } } else { // 满足的条件:destination == entry_point if (TraceCallFixup) { tty->print("already patched callsite at " INTPTR_FORMAT " to compiled code for", caller_pc); moop->print_short_name(tty); tty->print_cr(" to " INTPTR_FORMAT, entry_point); } } } } IRT_END
调用的set_destination_mt_safe()函数的实现如下:
// Similar to replace_mt_safe, but just changes the destination. The // important thing is that free-running threads are able to execute this // call instruction at all times. If the displacement field is aligned // we can simply rely on atomicity of 32-bit writes to make sure other threads // will see no intermediate states. Otherwise, the first two bytes of the // call are guaranteed to be aligned, and can be atomically patched to a // self-loop to guard the instruction while we change the other bytes. // We cannot rely on locks here, since the free-running threads must run at // full speed. // // Used in the runtime linkage of calls; see class CompiledIC. // (Cf. 4506997 and 4479829, where threads witnessed garbage displacements.) void NativeCall::set_destination_mt_safe(address dest) { debug_only(verify()); // Make sure patching code is locked. No two threads can patch at the same // time but one may be executing this code. assert(Patching_lock->is_locked() || SafepointSynchronize::is_at_safepoint(), "concurrent code patching"); // Both C1 and C2 should now be generating code which aligns the patched address // to be within a single cache line except that C1 does not do the alignment on // uniprocessor systems. bool is_aligned = ((uintptr_t)displacement_address() + 0) / cache_line_size == ((uintptr_t)displacement_address() + 3) / cache_line_size; guarantee(!os::is_MP() || is_aligned, "destination must be aligned"); if (is_aligned) { // Simple case: The destination lies within a single cache line. set_destination(dest); } else if ((uintptr_t)instruction_address() / cache_line_size == ((uintptr_t)instruction_address()+1) / cache_line_size) { // Tricky case: The instruction prefix lies within a single cache line. intptr_t disp = dest - return_address(); #ifdef AMD64 guarantee(disp == (intptr_t)(jint)disp, "must be 32-bit offset"); #endif // AMD64 int call_opcode = instruction_address()[0]; // First patch dummy jump in place: { u_char patch_jump[2]; patch_jump[0] = 0xEB; // jmp rel8 patch_jump[1] = 0xFE; // jmp to self assert(sizeof(patch_jump)==sizeof(short), "sanity check"); *(short*)instruction_address() = *(short*)patch_jump; } // Invalidate. Opteron requires a flush after every write. wrote(0); // (Note: We assume any reader which has already started to read // the unpatched call will completely read the whole unpatched call // without seeing the next writes we are about to make.) // Next, patch the last three bytes: u_char patch_disp[5]; patch_disp[0] = call_opcode; *(int32_t*)&patch_disp[1] = (int32_t)disp; assert(sizeof(patch_disp)==instruction_size, "sanity check"); for (int i = sizeof(short); i < instruction_size; i++) instruction_address()[i] = patch_disp[i]; // Invalidate. Opteron requires a flush after every write. wrote(sizeof(short)); // (Note: We assume that any reader which reads the opcode we are // about to repatch will also read the writes we just made.) // Finally, overwrite the jump: *(short*)instruction_address() = *(short*)patch_disp; // Invalidate. Opteron requires a flush after every write. wrote(0); debug_only(verify()); guarantee(destination() == dest, "patch succeeded"); } else { // Impossible: One or the other must be atomically writable. ShouldNotReachHere(); } }
接着回到gen_c2i_adapter()函数继续执行,生成的汇编代码如下:
0x00007fffe110a3a4: pop %rax // 将返回地址保存到%rax中
0x00007fffe110a3a5: mov %rsp,%r13 // 将sender sp设置到%r13中 0x00007fffe110a3a8: sub $0x10,%rsp // $0x10是要传递的参数加上返回地址的空间 0x00007fffe110a3ac: mov %rax,(%rsp) // 将返回地址存储存储到指定的空间 0x00007fffe110a3b0: mov %rsi,0x8(%rsp) // 将传递的参数写到栈中指定的位置 0x00007fffe110a3b5: mov 0x38(%rbx),%rcx// 将Method::_i2i_entry写入到%rcx中 0x00007fffe110a3b9: jmpq *%rcx // 跳转执行
3、编译执行切换到解释执行的未验证例程
在SharedRuntime::generate_i2c2i_adapters()函数中生成_c2i_unverified_entry,如下:
这个函数首先会调用gen_i2c_adapter()函数生成汇编代码,之前介绍过。然后生成自己的汇编代码,然后再调用gen_c2i_adapter()生成汇编代码,之前已经介绍过。
(也是一个转换入口,为c2i_unverified_entry)
// %rsi中存储的是receiver,获取oop.Klass到%ebx中 0x00007fffe110a287: mov 0x8(%rsi),%ebx // LogKlassAlignmentInBytes=0x3 0x00007fffe110a28a: shl $0x3,%rbx // %rax中存储的是holder,和holder.Klass进行比较 0x00007fffe110a28e: cmp 0x10(%rax),%rbx // CompiledICHolder::_holder_method存储到%rbx 0x00007fffe110a292: mov 0x8(%rax),%rbx // 如果相等,表示缓存命中,跳转到----ok---- 0x00007fffe110a296: je 0x00007fffe110a2a1 // 否则跳转到SharedRuntime::get_ic_miss_stub()函数入口 0x00007fffe110a29c: jmpq 0x00007fffe1105be0 // **** ok **** // 当oop.Klass与holder.Klass相等时跳转到这里执行 // Method might have been compiled since the call site was patched to // interpreted if that is the case treat it as a miss so we can get // the call site corrected. // %rbx中存储的是Method*,获取Method::code与NULL_WORD进行比较 0x00007fffe110a2a1: cmpq $0x0,0x50(%rbx) // 如果相等就跳过修正,跳转到----skip_fixup----,这个调用点在get_c2i_adapter()方法中定义 0x00007fffe110a2a9: je 0x00007fffe110a3a4 // 执行下面的汇编时,表示在调用点要patch到解释器时, // 方法已经编译完成,所以按缓存没有命中处理,这样就会修正调用点 // 跳转到SharedRuntime::get_ic_miss_stub()函数入口 0x00007fffe110a2af: jmpq 0x00007fffe1105be0
执行流程图如下:
公众号 深入剖析Java虚拟机HotSpot 已经更新虚拟机源代码剖析相关文章到60+,欢迎关注,如果有任何问题,可加作者微信mazhimazh,拉你入虚拟机群交流
标签:code,适配,patch,编译器,63,call,address,Method,rax 来源: https://www.cnblogs.com/mazhimazhi/p/15898901.html