Intel Pin是Intel推出的一款二进制程序的插桩分析工具，目前已经到3.4版本。虽然已经推出很久了不过进行开发更多的还是要参考其用户手册与API文档。最近也在看这方面的用法，正好稍微梳理一下常用的一些函数和功能。

安装Pin

这里就以64位的Linux为例来说明，可以在官网上下载3.4版本的Pin，下载地址。3.x之后Pin似乎实现了一个自己的CRT，所以之前的一些基于pin开发的工具在应用于3.x的时候可能会遇到一些困难，不过这个还没有详细研究。

在这里我们不去深入讨论pin这个工具具体的实现细节和架构，只是介绍一点基本的使用方法。

Pin与Pintool

在3.4的Manual中有很多的例子，基本涵盖了各个模块的基本用法，可以首先尝试例子程序。

1 2	cd source/tools/ManualExamples make all TARGET=intel64

要构建单独的例子程序，可以

1 2	cd source/tools/ManualExamples make obj-intel64/inscount0.so TARGET=intel64

例如inscount0.cpp最终会生成inscount0.so这个库，这个so即成为pintool，pin的主程序可以利用这个pintool中的代码来对程序进行插桩分析，运行

1	../../../pin -t obj-intel64/inscount0.so -o inscount0.log -- /bin/ls

则可以对/bin/ls这个程序使用inscount0.so这个pintool进行分析，最后输出结果，pin的使用方式为

1	pin [OPTION] [-t <tool> [<toolargs>]] -- <command line>

-t之后接pintool的so文件，之后接传递给pintool的参数，在--之后接需要进行分析的程序以及它的参数。

插桩分析基本流程

插桩(Instrumentation)就是在程序运行时在程序自身代码中插入一定分析代码的过程，在Manual提到从概念上来说插桩的流程包含两个部分：

确定需要插桩的代码的机制
插桩之后需要执行的分析代码

最基础的例子：程序计数

我们可以看一下inscount0.cpp这个程序的内容

#include <iostream>
#include <fstream>
#include "pin.H"

ofstream OutFile;

// The running count of instructions is kept here
// make it static to help the compiler optimize docount
static UINT64 icount = 0;

// This function is called before every instruction is executed
VOID docount() { icount++; }

// Pin calls this function every time a new instruction is encountered
VOID Instruction(INS ins, VOID *v)
{
    // Insert a call to docount before every instruction, no arguments are passed
    INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)docount, IARG_END);
}

KNOB<string> KnobOutputFile(KNOB_MODE_WRITEONCE, "pintool",
    "o", "inscount.out", "specify output file name");

// This function is called when the application exits
VOID Fini(INT32 code, VOID *v)
{
    // Write to a file since cout and cerr maybe closed by the application
    OutFile.setf(ios::showbase);
    OutFile << "Count " << icount << endl;
    OutFile.close();
}

/* ===================================================================== */
/* Print Help Message                                                    */
/* ===================================================================== */

INT32 Usage()
{
    cerr << "This tool counts the number of dynamic instructions executed" << endl;
    cerr << endl << KNOB_BASE::StringKnobSummary() << endl;
    return -1;
}

/* ===================================================================== */
/* Main                                                                  */
/* ===================================================================== */
/*   argc, argv are the entire command line: pin -t <toolname> -- ...    */
/* ===================================================================== */

int main(int argc, char * argv[])
{
    // Initialize pin
    if (PIN_Init(argc, argv)) return Usage();

    OutFile.open(KnobOutputFile.Value().c_str());

    // Register Instruction to be called to instrument instructions
    INS_AddInstrumentFunction(Instruction, 0);

    // Register Fini to be called when the application exits
    PIN_AddFiniFunction(Fini, 0);

    // Start the program, never returns
    PIN_StartProgram();

    return 0;
}

这个程序给出了一般pintool的基本框架，在main函数中首先调用PIN_Init初始化，之后就可以使用INS_AddInstrumentFunction注册一个插桩函数，在原始程序的每条指令被执行前，都会进入Instruction这个函数中，其第2个参数为一个额外传递给Instruction的参数，即对应VOID *v这个参数，这里没有使用。而Instruction接受的第一个参数为INS结构，用来表示一条指令。

最后又注册了一个程序退出时的函数Fini，接着就可以使用PIN_StartProgram启动程序了。

回调函数模式

可以看到，上面inscount0.cpp这个pintool插桩的对象就是所有指令。pintool在编写中将比较多的使用回调函数的机制，譬如在每条指令之前回调Instruction函数。而在Instruction函数的内部又使用INS_InsertCall注册了一个函数docount，意为在指令执行之前插入一个对docount函数的调用。注意INS_InsertCall是一个变参函数，前3个参数分别为指令，插入的时机（这里IPOINT_BEFORE表示之前）以及函数指针（转为AFUNPTR类型），在之后就可以指定传给函数的参数，并以IARG_END结尾，这里没有指定参数，直接调用。而docount的作用即是将一个全局变量加1，以达到统计执行指令条数的目的。

故此处插桩的分析代码即是将指令数加1.

指令(Instruction)级别的插桩

我们可以在inscount0的基础上，慢慢扩展出更加复杂的插桩分析程序

指定插桩的位置

最简单的情况是直接针对所有指令插桩，INS模块中提供了很多API来判断当前指令的类型

INS_IsMemoryRead (INS ins)
INS_IsMemoryWrite (INS ins)
INS_IsLea (INS ins)
INS_IsNop (INS ins)
INS_IsBranch (INS ins)
INS_IsDirectBranch (INS ins)
INS_IsDirectCall (INS ins)
INS_IsDirectBranchOrCall (INS ins)
INS_IsBranchOrCall (INS ins)
INS_IsCall (INS ins)
INS_IsRet (INS ins)
...

一般看到API的名字就可以明白其作用了，如果有不明白则可以去查API的手册，或者还有种更加直接、具体的方法

if (INS_Opcode(ins) == XED_ICLASS_MOV &&
    INS_IsMemoryRead(ins) &&
    INS_OperandIsReg(ins, 0) &&
    INS_OperandIsMemory(ins, 1))

上面的代码来自safecopy.cpp，直接通过Opcode来识别mov指令，并且是一条内存读指令，并且指令的第一个操作数是寄存器，并且指令的第二个操作数是内存。通过组合这些API就可以非常精确地筛选出想要插桩的指令了。

插桩分析代码

inscount0中的分析代码写的非常简略，再之后还有一个例子itrace

// This function is called before every instruction is executed
// and prints the IP
VOID printip(VOID *ip) { fprintf(trace, "%p\n", ip); }

// Pin calls this function every time a new instruction is encountered
VOID Instruction(INS ins, VOID *v)
{
    // Insert a call to printip before every instruction, and pass it the IP
    INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)printip, IARG_INST_PTR, IARG_END);
}

在这里传递给printip的是一个IARG_INST_PTR参数，实际对应的类型是VOID *，指示了当前指令的位置，而printip则是把它输出出来，所以itrace的作用即是输出所有指令的地址。

实际来说Instrumentation arguments中给出了很多可以传递给回调函数的参数，包括当前指令读取的有效内存地址、相关寄存器的值等等，能够对程序的运行状态有很全面的描述，便于回调函数的进一步分析。

程序运行状态监控 & 修改

寄存器

想要获得当前某个寄存器的值，可以传递...IARG_REG_VALUE, REG_RAX...参数，实际对应的类型是ADDRINT，将寄存器当前的值传给回调函数。或者可以通过INS_OperandReg函数首先提取出指令中的寄存器操作数，然后再用IARG_REG_VALUE传递给回调函数。

想要修改寄存器的值，可以传递...IARG_REG_REFERENCE, REG_RAX...这种参数，实际对应的类型是PIN_REGISTER *指针，指向一个表示寄存器值的union类型，在64位中，可以使用reg->qword[0]来访问RAX，reg->dword[0]来访问EAX，以达到修改寄存器值的目的。

内存

关于内存数据的获取和写入，可以参考safecopy，其中使用到了PIN_SafeCopy函数

//=======================================================
//  Analysis routines
//=======================================================

// Move from memory to register
ADDRINT DoLoad(REG reg, ADDRINT * addr)
{
    *out << "Emulate loading from addr " << addr << " to " << REG_StringShort(reg) << endl;
    ADDRINT value;
    PIN_SafeCopy(&value, addr, sizeof(ADDRINT));
    return value;
}

//=======================================================
// Instrumentation routines
//=======================================================

VOID EmulateLoad(INS ins, VOID* v)
{
    // Find the instructions that move a value from memory to a register
    if (INS_Opcode(ins) == XED_ICLASS_MOV &&
        INS_IsMemoryRead(ins) &&
        INS_OperandIsReg(ins, 0) &&
        INS_OperandIsMemory(ins, 1))
    {
        // op0 <- *op1
        INS_InsertCall(ins,
                       IPOINT_BEFORE,
                       AFUNPTR(DoLoad),
                       IARG_UINT32,
                       REG(INS_OperandReg(ins, 0)),
                       IARG_MEMORYREAD_EA,
                       IARG_RETURN_REGS,
                       INS_OperandReg(ins, 0),
                       IARG_END);

        // Delete the instruction
        INS_Delete(ins);
    }
}

safecopy实际模拟了mov指令内存读的过程，将寄存器和指令操作的内存地址传递给分析函数DoLoad，并在最后用IARG_RETURN_REGS指定将分析函数的返回值写入到指令的操作寄存器中，实际指令的语义没有改变。

而在DoLoad函数中，实际调用了PIN_SafeCopy(&value, addr, sizeof(ADDRINT));将对应地址的内容模拟装载并返回。由此就可以看出在程序实际运行时pintool和原始程序位于同一地址空间，因而PIN_SafeCopy既可以从内存中读取数据，亦可以写入数据。

更粗粒度的插桩

有时我们并不需要在指令级的插桩，pin也可以实现基于Basic Block，Routine或Image的插桩函数，以例子中的malloctrace来说

...
VOID Image(IMG img, VOID *v)
{
    // Instrument the malloc() and free() functions.  Print the input argument
    // of each malloc() or free(), and the return value of malloc().
    //
    //  Find the malloc() function.
    RTN mallocRtn = RTN_FindByName(img, MALLOC);
    if (RTN_Valid(mallocRtn))
    {
        RTN_Open(mallocRtn);

        // Instrument malloc() to print the input argument value and the return value.
        RTN_InsertCall(mallocRtn, IPOINT_BEFORE, (AFUNPTR)Arg1Before,
                       IARG_ADDRINT, MALLOC,
                       IARG_FUNCARG_ENTRYPOINT_VALUE, 0,
                       IARG_END);
        RTN_InsertCall(mallocRtn, IPOINT_AFTER, (AFUNPTR)MallocAfter,
                       IARG_FUNCRET_EXITPOINT_VALUE, IARG_END);

        RTN_Close(mallocRtn);
    }

    // Find the free() function.
    RTN freeRtn = RTN_FindByName(img, FREE);
    if (RTN_Valid(freeRtn))
    {
        RTN_Open(freeRtn);
        // Instrument free() to print the input argument value.
        RTN_InsertCall(freeRtn, IPOINT_BEFORE, (AFUNPTR)Arg1Before,
                       IARG_ADDRINT, FREE,
                       IARG_FUNCARG_ENTRYPOINT_VALUE, 0,
                       IARG_END);
        RTN_Close(freeRtn);
    }
}
...

使用IMG_AddInstrumentFunction来注册一个在Image载入时插桩的函数，随后在Image里面使用RTN_FindByName来找到模块里的malloc和free两个符号，注意在pintool开头除了PIN_Init之外还要用PIN_InitSymbols来初始化symbol manager。在找到相应的函数之后，可以使用RTN_InsertCall来插入分析代码Arg1Before，并将此时函数的参数传递给分析函数。最后这个pintool完成的作用就是追踪malloc/free的调用，并输出它们的参数与返回值。

小结

使用Pin工具需要首先理解二进制程序插桩的过程和整体思路，之后编写pintool就是套用例子就可以了，如果有需要的功能可以直接查手册或者自己去尝试。Pin还有很多功能没有研究，之后可能还会进一步了解一下。

Reference

Intel Pin Home
Pin 3.4 User Guide
API Reference

Pin系列索引

Intel Pin II - 编译和运行时环境

Intel Pin基本用法