首页 > 编程语言> > c-如何整理/修复PyCXX创建的新型Python扩展类？

c-如何整理/修复PyCXX创建的新型Python扩展类？

2019-10-12 14:06:22 作者：互联网

我几乎完成了对C Python包装器(PyCXX)的重写.

原始版本允许旧样式扩展类和新样式扩展类,但也允许从新样式扩展类派生一个类：

import test

// ok
a = test.new_style_class();

// also ok
class Derived( test.new_style_class() ):
    def __init__( self ):
        test_funcmapper.new_style_class.__init__( self )

    def derived_func( self ):
        print( 'derived_func' )
        super().func_noargs()

    def func_noargs( self ):
        print( 'derived func_noargs' )

d = Derived()

该代码令人费解,并且似乎包含错误(Why does PyCXX handle new-style classes in the way it does?)

我的问题是：PyCXX复杂机制的原理/合理性是什么？有没有更清洁的选择？

我将尝试在下面详细说明我所处的位置.首先,我将尝试描述PyCXX目前正在做什么,然后我将描述我认为可以改进的地方.

当Python运行时遇到d = Derived()时,它将执行PyObject_Call(ob),其中ob是NewStyleClass的PyTypeObject.我将编写obasNewStyleClass_PyTypeObject`.

该PyTypeObject已用C构造并使用PyType_Ready注册

PyObject_Call将调用type_call(PyTypeObject * type,PyObject * args,PyObject * kwds),返回初始化的Derived实例,即

PyObject* derived_instance = type_call(NewStyleClass_PyTypeObject, NULL, NULL)

这样的事情.

(所有这些都来自(顺便说一句http://eli.thegreenplace.net/2012/04/16/python-object-creation-sequence,谢谢Eli！)

type_call实际上是：

type->tp_new(type, args, kwds);
type->tp_init(obj, args, kwds);

我们的C包装程序已将函数插入NewStyleClass_PyTypeObject的tp_new和tp_init插槽中,如下所示：

typeobject.set_tp_new( extension_object_new );
typeobject.set_tp_init( extension_object_init );

:
    static PyObject* extension_object_new( PyTypeObject* subtype, 
                                              PyObject* args, PyObject* kwds )
    {
        PyObject* pyob = subtype->tp_alloc(subtype,0);

        Bridge* o = reinterpret_cast<Bridge *>( pyob );

        o->m_pycxx_object = nullptr;

        return pyob;
    }

    static int extension_object_init( PyObject* _self, 
                                            PyObject* args, PyObject* kwds )
    {
        Bridge* self{ reinterpret_cast<Bridge*>(_self) };

        // NOTE: observe this is where we invoke the constructor, 
        //       but indirectly (i.e. through final)
        self->m_pycxx_object = new FinalClass{ self, args, kwds };

        return 0;
    }

请注意,我们需要将Python派生实例与相应的C类实例绑定在一起. (为什么？在下面解释,请参阅“ X”).为此,我们正在使用：

struct Bridge
{
    PyObject_HEAD // <-- a PyObject
    ExtObjBase* m_pycxx_object;
}

现在,这座桥引发了一个问题.我对此设计非常怀疑.

注意如何为这个新的PyObject分配内存：

        PyObject* pyob = subtype->tp_alloc(subtype,0);

然后,我们将该指针转换为Bridge的类型,并在PyObject之后立即使用4或8(sizeof(void *))字节来指向相应的C类实例(如上所示,它被连接在extension_object_init中).

现在要使它起作用,我们需要：

a)subtype-> tp_alloc(subtype,0)必须分配一个额外的sizeof(void *)个字节
b)PyObject不需要sizeof(PyObject_HEAD)以外的任何内存,因为如果这样做,那么它将与上面的指针冲突

我现在有一个主要问题是：
我们是否可以保证Python运行时为派生实例创建的PyObject不会与Bridge的ExtObjBase * m_pycxx_object字段重叠？

我将尝试回答这个问题：是美国确定要分配多少内存.当我们创建NewStyleClass_PyTypeObject时,我们输入我们希望该PyTypeObject为这种类型的新实例分配多少内存：

template< TEMPLATE_TYPENAME FinalClass >
class ExtObjBase : public FuncMapper<FinalClass> , public ExtObjBase_noTemplate
{
protected:
    static TypeObject& typeobject()
    {
        static TypeObject* t{ nullptr };
        if( ! t )
            t = new TypeObject{ sizeof(FinalClass), typeid(FinalClass).name() };
                   /*           ^^^^^^^^^^^^^^^^^ this is the bug BTW!
                        The C++ Derived class instance never gets deposited
                        In the memory allocated by the Python runtime
                        (controlled by this parameter)

                        This value should be sizeof(Bridge) -- as pointed out
                        in the answer to the question linked above

        return *t;
    }
:
}

class TypeObject
{
private:
    PyTypeObject* table;

    // these tables fit into the main table via pointers
    PySequenceMethods*       sequence_table;
    PyMappingMethods*        mapping_table;
    PyNumberMethods*         number_table;
    PyBufferProcs*           buffer_table;

public:
    PyTypeObject* type_object() const
    {
        return table;
    }

    // NOTE: if you define one sequence method you must define all of them except the assigns

    TypeObject( size_t size_bytes, const char* default_name )
        : table{ new PyTypeObject{} }  // {} sets to 0
        , sequence_table{}
        , mapping_table{}
        , number_table{}
        , buffer_table{}
    {
        PyObject* table_as_object = reinterpret_cast<PyObject* >( table );

        *table_as_object = PyObject{ _PyObject_EXTRA_INIT  1, NULL }; 
        // ^ py_object_initializer -- NULL because type must be init'd by user

        table_as_object->ob_type = _Type_Type();

        // QQQ table->ob_size = 0;
        table->tp_name              = const_cast<char *>( default_name );
        table->tp_basicsize         = size_bytes;
        table->tp_itemsize          = 0; // sizeof(void*); // so as to store extra pointer

        table->tp_dealloc           = ...

您可以看到它以table-> tp_basicsize的形式进入

但是现在对我来说似乎很清楚,从NewStyleClass_PyTypeObject生成的PyObject将不再需要额外分配的内存.

这意味着整个桥接机制是不必要的.

而且PyCXX的原始技术是使用PyObject作为NewStyleClassCXXClass的基类,并初始化该基数,以便d = Derived()的Python运行时PyObject实际上是此基数,此技术看起来不错.因为它允许无缝类型转换.

每当Python运行时从NewStyleClass_PyTypeObject调用插槽时,它将作为第一个参数传递指向d的PyObject的指针,我们可以将其类型转换回NewStyleClassCXXClass. <-'X'(上面已引用) 所以,实际上我的问题是：我们为什么不这样做呢？从NewStyleClass派生有什么特别之处,它可以强制为PyObject进行额外分配？我知道在派生类的情况下我不理解创建顺序. Eli的帖子没有涵盖这一点. 我怀疑这可能与以下事实有关

    static PyObject* extension_object_new( PyTypeObject* subtype, ...

^此变量名是“子类型”
我不明白这一点,我想知道这是否可以把握关键.

编辑：我想到了一个可能的解释,为什么PyCXX使用sizeof(FinalClass)进行初始化.它可能是经过尝试和抛弃的想法的遗物.即,如果Python的tp_new调用为FinalClass(以PyObject为基础)分配了足够的空间,则可以使用“placement new”或一些狡猾的reinterpret_cast业务在该确切位置上生成一个新的FinalClass.我的猜测是,这可能已经尝试过,发现存在一些问题,可以解决,但遗物被遗忘了.

解决方法:

PyCXX不复杂.它确实有两个错误,但是可以轻松修复它们,而无需对代码进行重大更改.

在为Python API创建C包装程序时,遇到了一个问题. C对象模型和Python新型对象模型有很大的不同.一个根本的区别是C具有创建和初始化对象的单个构造函数.虽然Python有两个阶段； tp_new创建对象并执行最小化初始化(或仅返回一个现有对象),而tp_init执行其余的初始化.

PEP 253,您应该全文阅读：

The difference in responsibilities between the tp_new() slot and the tp_init() slot lies in the invariants they ensure. The tp_new() slot should ensure only the most essential invariants, without which the C code that implements the objects would break. The tp_init() slot should be used for overridable user-specific initializations. Take for example the dictionary type. The implementation has an internal pointer to a hash table which should never be NULL. This invariant is taken care of by the tp_new() slot for dictionaries. The dictionary tp_init() slot, on the other hand, could be used to give the dictionary an initial set of keys and values based on the arguments passed in.

…

You may wonder why the tp_new() slot shouldn’t call the tp_init() slot itself. The reason is that in certain circumstances (like support for persistent objects), it is important to be able to create an object of a particular type without initializing it any further than necessary. This may conveniently be done by calling the tp_new() slot without calling tp_init(). It is also possible hat tp_init() is not called, or called more than once — its operation should be robust even in these anomalous cases.

C包装程序的全部目的是使您能够编写出色的C代码.例如,假设您希望对象具有只能在构造期间初始化的数据成员.如果在tp_new期间创建对象,则无法在tp_init期间重新初始化该数据成员.这可能会迫使您通过某种智能指针保留该数据成员,并在tp_new期间创建它.这使代码很难看.

PyCXX采取的方法是将对象构造分为两部分：

> tp_new创建一个虚拟对象,其中仅包含指向创建的tp_init C对象的指针.该指针最初为空.
> tp_init分配并构造实际的C对象,然后更新在tp_new中创建的虚拟对象中的指针以指向它.如果多次调用tp_init,则会引发Python异常.

我个人认为这种方法对我自己的应用程序的开销太高,但这是合法的方法.我在Python C / API周围有自己的C包装程序,该包装程序在tp_new中执行所有初始化,这也是有缺陷的.似乎没有一个好的解决方案.

标签：c,initialization,python-c-api,new-style-class,pycxx
来源： https://codeday.me/bug/20191012/1900957.html