虫趣:内存误操作引发中断级错误(IRQL_NOT_LESS_OR_EQUAL)

作者:张佩】【原文: http://www.yiiyee.cn/Blog/bsod-0xa-1/

得到一个BSOD 0xA的dump文件,BSOD 0xA代表的是 IRQL_NOT_LESS_OR_EQUAL错误,也就是说试图在错误的中断级上进行特定操作。在这个例子中,错误的原因是试图在中断级2上解决页错误,引发了系统蓝屏。

用windbg打开dump文件后,先用kv和r命令获取当前状态:

0: kd> kv;r
ChildEBP RetAddr Args to Child
81256650 815cd41b 0000000a c9495000 00000002 nt!KiBugCheck2
81256650 81558315 0000000a c9495000 00000002 nt!KiTrap0E+0x1b3 (FPO: [0,0] TrapFrame @ 81256670)
812566e4 ad6a47e4 c9494efc 00000000 00000120 nt!memset+0x45 (FPO: [3,0,0])
81256734 ad6a3a5b c35d6f7c 00000010 d5140dfc XXX!initializeIsoUrb+0x74 (FPO: [Non-Fpo]) (CONV: thiscall)
81256770 ad6a3c73 c35d6f7c c346a890 00000001 XXX!completeRequest+0x1cb (FPO: [Non-Fpo]) (CONV: thiscall)
81256794 8190852a 00000000 d51bade0 c35d6f7c XXX!completeFunc+0xe3 (FPO: [Non-Fpo]) (CONV: stdcall)
812567c8 814d5885 00000000 d51bade0 81256890 nt!IovpLocalCompletionRoutine+0x12f (FPO: [3,5,4])
81256864 819090f0 8163a015 8b136708 88441a38 nt!IopfCompleteRequest+0x42e (FPO: [Non-Fpo])
812568c8 8fae1ef7 8fe50028 88441a38 8fe50b88 nt!IovCompleteRequest+0x123 (FPO: [Non-Fpo])
8125697c 8fae268a d51bade0 8fe50b88 00026202 USBPORT!USBPORT_Core_iCompleteDoneTransfer+0x99a (FPO: [3,39,4])
812569a4 8fae8df6 246b5702 8fe50028 8fe50b88 USBPORT!USBPORT_Core_iIrpCsqCompleteDoneTransfer+0x1ff (FPO: [1,4,0])
812569e0 8fae9140 8fe50028 8fe50b88 246b5702 USBPORT!USBPORT_Core_UsbIocDpc_Worker+0x1b7 (FPO: [Non-Fpo])
81256a28 814e0bc3 8fe50b94 8fe50b88 00000000 USBPORT!USBPORT_Xdpc_Worker_IocDpc+0x1c9 (FPO: [Non-Fpo])
81256ae0 814e07fe 816860c0 81256b28 bedc4d40 nt!KiExecuteAllDpcs+0x1f2 (FPO: [Non-Fpo])
81256c00 815cea74 00000000 0000000e 00000000 nt!KiRetireDpcList+0xed (FPO: [0,65,4])
81256c04 00000000 0000000e 00000000 00000000 nt!KiIdleLoop+0x38 (FPO: [0,0,0])

eax=00000002 ebx=00000001 ecx=00000001 edx=00000000 esi=81558315 edi=c9495000
eip=81553c24 esp=81256654 ebp=81256670 iopl=0 nv up ei pl nz na po nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00000202
nt!KiBugCheck2:
81553c24 55 push ebp

查看这个调用栈,比较地能够说明问题。不用调用!analyze命令就能够进行下一步分析了。从栈顶的两个函数看来,执行过程中遇到一个缺页异常,系统立刻调用KiTrap0E异常处理函数来处理。但缺页异常处理函数却发现无法找到对应的物理也,所以继而调用了KiBugCheck2函数让系统蓝屏。系统在调用异常处理函数时,会预先产生一个陷阱帧(Trap Frame),以保存异常出现时候的现场。如果异常被成功处理的话,就可以通过这个陷阱帧回到异常发生前的线程,恢复正常的执行流程。在这个调用栈上找到了保存的陷阱帧,在第三帧上有关键字TrapFrame,对应的陷阱帧地址是0x81256670。用.trap命令切换到出问题时候的环境:

0: kd> .trap 81256670
ErrCode = 00000002
eax=00000000 ebx=d51bafdc ecx=00000007 edx=00000000 esi=cb334dfc edi=c9495000
eip=81558315 esp=812566e4 ebp=81256734 iopl=0 nv up ei pl nz na pe nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010206
nt!memset+0x45:
81558315 f3ab rep stos dword ptr es:[edi]

 

现在调用栈变成下面的样子,不包括异常处理调用:

0: kd> kv
*** Stack trace for last set context - .thread/.cxr resets it
ChildEBP RetAddr Args to Child
812566e4 ad6a47e4 c9494efc 00000000 00000120 nt!memset+0x45 (FPO: [3,0,0])
81256734 ad6a3a5b c35d6f7c 00000010 d5140dfc XXX!initializeIsoUrb+0x74 (FPO: [Non-Fpo]) (CONV: thiscall)
81256770 ad6a3c73 c35d6f7c c346a890 00000001 XXX!completeRequest+0x1cb (FPO: [Non-Fpo]) (CONV: thiscall)
81256794 8190852a 00000000 d51bade0 c35d6f7c XXX!completeFunc+0xe3 (FPO: [Non-Fpo]) (CONV: stdcall)
812567c8 814d5885 00000000 d51bade0 81256890 nt!IovpLocalCompletionRoutine+0x12f (FPO: [3,5,4])
81256864 819090f0 8163a015 8b136708 88441a38 nt!IopfCompleteRequest+0x42e (FPO: [Non-Fpo])
812568c8 8fae1ef7 8fe50028 88441a38 8fe50b88 nt!IovCompleteRequest+0x123 (FPO: [Non-Fpo])
8125697c 8fae268a d51bade0 8fe50b88 00026202 USBPORT!USBPORT_Core_iCompleteDoneTransfer+0x99a (FPO: [3,39,4])
812569a4 8fae8df6 246b5702 8fe50028 8fe50b88 USBPORT!USBPORT_Core_iIrpCsqCompleteDoneTransfer+0x1ff (FPO: [1,4,0])
812569e0 8fae9140 8fe50028 8fe50b88 246b5702 USBPORT!USBPORT_Core_UsbIocDpc_Worker+0x1b7 (FPO: [Non-Fpo])
81256a28 814e0bc3 8fe50b94 8fe50b88 00000000 USBPORT!USBPORT_Xdpc_Worker_IocDpc+0x1c9 (FPO: [Non-Fpo])
81256ae0 814e07fe 816860c0 81256b28 bedc4d40 nt!KiExecuteAllDpcs+0x1f2 (FPO: [Non-Fpo])
81256c00 815cea74 00000000 0000000e 00000000 nt!KiRetireDpcList+0xed (FPO: [0,65,4])
81256c04 00000000 0000000e 00000000 00000000 nt!KiIdleLoop+0x38 (FPO: [0,0,0])

出问题的最后一个函数是memset,它被用来执行内存操作。这个函数的原型是这样的:

void memset (void* addr, int c, size_t count);

这是一个c标准调用规约函数,所有参数通过栈传递。所以前面的三个参数,分别是:addr: c9494efc; c: 00000000; count: 00000120.

它表示把地址0xc9494efc开始的0x120个字节的内容填充为0

Memset函数是一个没有安全检查的函数,它把收到的参数直接利用,不检查产生是否正确。如果第一个地址参数,如果是0的话,它也会一往无前地使用它。与此相对应,可以使用有安全检查的运行时函数RtlZeroMemory/RtlFillMemory。

首先就是怀疑地址有问题,用db 命令:

0: kd> db c9494efc
c9494efc 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
c9494f0c 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
c9494f1c 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
c9494f2c 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
c9494f3c 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
c9494f4c 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
c9494f5c 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
c9494f6c 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................

看来地址是有效的。接着就检查它的长度:

0: kd> db c9494efc L120
c9494efc 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
c9494f0c 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
c9494f1c 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
c9494f2c 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
c9494f3c 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
c9494f4c 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
c9494f5c 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
c9494f6c 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
c9494f7c 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
c9494f8c 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
c9494f9c 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
c9494fac 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
c9494fbc 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
c9494fcc 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
c9494fdc 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
c9494fec 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
c9494ffc 00 00 00 00 ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ....????????????
c949500c ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ????????????????

看来问题找到了。原来这个地址包含的空间没有0x120字节长,而只有0x120-0x1C字节。这就是最后两排内容无效的原因(windbg以??显示)。系统在使用内存的时候,发现有一部分内存是无法访问的,就产生页错误异常。但当前的IRQL是2:

 0: kd> !irql
Debugger saved IRQL for processor 0x0 -- 2 (DISPATCH_LEVEL)

系统的页处理只能发生在2以下的中断级别上,所以就报一个中断级错误异常0xA。而实际的错误它还差了一步,即内存的实际大小根本不及需求。但这已不是系统力所能及的了。

解决的办法是回到代码,找到memset调用的地方,检查并修改内存变量。

6,031 total views, 2 views today

《虫趣:内存误操作引发中断级错误(IRQL_NOT_LESS_OR_EQUAL)》有2个想法

  1. 这是一个c标准调用规约函数,所有参数通过栈传递。所以前面的三个参数,分别是:addr: c9494efc; c: 00000000; count: 00000120.

    张老师,想问一下,如何才能知道,一个STACK中,前面的五个数据,分别表示是什么意思?

    比如这里,这三个参数分别是什么意思,是如何确定下来的?

    1. 这是没法确认的,虽然一般会认为前面几个是输入参数,但不正确的情况也比较多。可以回到父函数查看汇编代码,确定父函数把什么值压到栈中,以确认其意义;同时也要看子函数有没有更改栈的内容。

发表评论

电子邮件地址不会被公开。