Scratch the Surface of CPython by Simple Examples

Scratch the Surface of CPython by Simple Examples

Authors:

Motivation

As you may know, Python is an interpreted language, and if you have ever used it, then it is most likely that it was interpreted by CPython. It is the original Python implementation that is downloaded from Python.org and it is written in C language.

While there are other implementations of Python interpreter (JPython, IronPython, PyPy, etc) which let you use the same Python syntax and functionalities but written in different languages or have different implementation details, CPython is the first to implement any new features and the rest will follow.

Why should you even bother to learn about it? Could this ever be helpful?

Yes! it helps you understand how python-built structures work, how memory is managed by interpreter, debug your code or answer questions about python internals like list resizing, key-sharing dictionary, unlimited size of int, etc.

Our approach here is to highlight the mandatory knowledge for some topics, followed by some examples to break it down.

Introduction

In this article, we are going to briefly highlight some of the internal design of Python's built-in data structure source code, which is written in C language, unveiling some interesting ideas and concepts. We will go through how to install and compile the source code, followed by an overview of some of the objects defined in CPython and where they are located. Afterwards, we will have 4 examples to modify existing functionalities or adding new ones to the pre-defined object types in Python.

Based on Python 3.11.0
OS platform: Linux.

This article discusses the following:

1. Source Code Download and Compilation.

2. Directories & Data structures You Have to Know.

2.1 Directories.

2.2 CPython Structures You Have to Know.

2.3 Built-in Structures Samples.

3. Examples:

3.1 Change type name.

3.2 Change the dictionary representation.

3.3 Add a function in 'int' type to return its length.

3.4 Add list subtraction feature.


Let's start by downloading and installing CPython source code.

Section 1: Source Code Download and Compilation

* Note: Some dependencies might be needed before compilation [More].

The following section shows the steps needed to download CPython source code then compile it:

  1. Install CPython source code 3.11.0 (gzipped tarball) from downloads .

from:

No alt text provided for this image

2. tar -xzvf Python-3.11.0.tgz

3. cd?Python-3.11.0

?4.?compile?code to be able to get binary Python interpreter, to do so you need to run the below steps:

4.1 run ./configure script and change the prefix path (where interpreter will be installed later):

??./configure --prefix=/home/anati/Python-3.11.0/out # you have to change the path to match your case

     ...
??   ...

?   ?config.status: creating pyconfig.h

??   creating Modules/Setup.local

?   ?creating Makefile        

Tip: configure will create a Makefile considering your settings.

?You will see 'Makefile' created.

to check the prefix that you changed in Makefile, use the below grep command:

egrep '^prefix\s*=' Makefile        

prefix= /home/anati/Python-3.11.0/out (in my case)

?4.2 run 'make' command:?it will run the default target, you can see it in Makefile and you don't need to worry about it.

??Link for some info about?Makefile.

5. run 'sudo make install' to install the interpreter bin.

Now, you have both CPython source code and compiled interpreter. You can try the compiled version by:

/home/anati/Python-3.11.0/out/bin/python3 (You need to pick your path)        


Section 2: Directories and Data Structures you Have to Know

2.1 Directories:

Directories that we are interested in to understand the Python built-in data structures are highlighted in the below image:

No alt text provided for this image

1.?Objects:

Object directory?has all "source files for various built-in objects".

Python objects (like: set, list, dict, etc) implementations are in 'Objects' directory (Python-3.11.0/Objects).

The names of files don't necessarily have the same names of Python built in objects.

For example, set data type is implemented in setobject.c, dict in dictobject.c, int in longobject.c, etc.


2.?Include:

Include directory (and its sub directories) have 'header' files for each object such as setobject.h, dictobject.h, longobject.h, etc.


2.2 CPython Structures You Have to Know

The below figure shows the 3 basic objects and their relations.

No alt text provided for this image


  1. PyObject

All object types are extensions of this type.

Although, nothing is actually declared to be a PyObject, this is a type which contains the information Python needs to treat a pointer to an object as an object. Every pointer to a Python object can be cast to a PyObject.

in Include/pytypedefs.h, you can see this typedef that gives a type a new name, it gives _object this name (PyObject).

typedef struct _object PyObject;        

PyObject declaration:

struct _object {
	    _PyObject_HEAD_EXTRA
	    Py_ssize_t ob_refcnt;
	    PyTypeObject *ob_type;
	};        

?location:?Inlcude/object.h

?As shown in the above image, PyObject is a structure that has 2 members:

1.1. Py_ssize_t ob_refcnt: this is the type object’s reference count, initialized to?1?by the?PyObject_HEAD_INIT?macro.

1.2. PyTypeObject *ob_type: will talk about it in soon.

2. PyVarObject

It is an extension of PyObject that adds the ob_size field

added by PyObject_VAR_HEAD directive. This is only used for objects that have some notion of length, like list, set, dic, etc.

Include/object.h

typedef struct {
	    PyObject ob_base;
	    Py_ssize_t ob_size; /* Number of items in variable part */
	} PyVarObject;        

3. PyTypeObject

PyTypeObject is fundamental to how objects behave

It is perhaps one of the most important structures of the Python object system as it is the structure that defines a new type and its behaviours.

Type objects are fairly large compared to most of the standard types. The reason for the size is that each type object stores a large number of values, mostly C function pointers, each of which implements a small part of the type’s functionality [ref].

in Include/pytypedefs.h, you can see this typedef that gives a type a new name, it gives _typeobject this name (PyTypeObject).

typedef struct _typeobject PyTypeObject;        

?The definition of _typeobject exists in Include/cpython/object.h

struct _typeobject {
	    PyObject_VAR_HEAD
	    const char *tp_name; /* For printing, in format "<module>.<name>" */
	    Py_ssize_t tp_basicsize, tp_itemsize; /* For allocation */
	

	    /* Methods to implement standard operations */
	

	    destructor tp_dealloc;
	    Py_ssize_t tp_vectorcall_offset;
	    getattrfunc tp_getattr;
	    setattrfunc tp_setattr;
	    PyAsyncMethods *tp_as_async; /* formerly known as tp_compare (Python 2)
	                                    or tp_reserved (Python 3) */
	    reprfunc tp_repr;
	

	    /* Method suites for standard classes */
	

	    PyNumberMethods *tp_as_number;
	    PySequenceMethods *tp_as_sequence;
	    PyMappingMethods *tp_as_mapping;
	

	    /* More standard operations (here for binary compatibility) */
	

	    hashfunc tp_hash;
	    ternaryfunc tp_call;
	    reprfunc tp_str;
	    getattrofunc tp_getattro;
	    setattrofunc tp_setattro;
	

	    /* Functions to access object as input/output buffer */
	    PyBufferProcs *tp_as_buffer;
	

	    /* Flags to define presence of optional/expanded features */
	    unsigned long tp_flags;
	

	    const char *tp_doc; /* Documentation string */
	

	    /* Assigned meaning in release 2.0 */
	    /* call function for all accessible objects */
	    traverseproc tp_traverse;
	

	    /* delete references to contained objects */
	    inquiry tp_clear;
	

	    /* Assigned meaning in release 2.1 */
	    /* rich comparisons */
	    richcmpfunc tp_richcompare;
	

	    /* weak reference enabler */
	    Py_ssize_t tp_weaklistoffset;
	

	    /* Iterators */
	    getiterfunc tp_iter;
	    iternextfunc tp_iternext;
	

	    /* Attribute descriptor and subclassing stuff */
	    PyMethodDef *tp_methods;
	    PyMemberDef *tp_members;
	    PyGetSetDef *tp_getset;
	    // Strong reference on a heap type, borrowed reference on a static type
	    PyTypeObject *tp_base;
	    PyObject *tp_dict;
	    descrgetfunc tp_descr_get;
	    descrsetfunc tp_descr_set;
	    Py_ssize_t tp_dictoffset;
	    initproc tp_init;
	    allocfunc tp_alloc;
	    newfunc tp_new;
	    freefunc tp_free; /* Low-level free-memory routine */
	    inquiry tp_is_gc; /* For PyObject_IS_GC */
	    PyObject *tp_bases;
	    PyObject *tp_mro; /* method resolution order */
	    PyObject *tp_cache;
	    PyObject *tp_subclasses;
	    PyObject *tp_weaklist;
	    destructor tp_del;
	

	    /* Type attribute cache version tag. Added in version 2.6 */
	    unsigned int tp_version_tag;
	

	    destructor tp_finalize;
	    vectorcallfunc tp_vectorcall;
	};        

PyTypeObject has most of the attributes and methods collections that a Python built in type has, like:

  • ??Methods to implement standard operations like:tp_getattr, setattr, etc.
  • Method suites for standard classes:

No alt text provided for this image

3.1?PyNumberMethods:

This structure holds pointers to the functions which an object uses to implement the number protocol. Each function is used by the function of similar name documented in the?Number Protocol?section. To define: addition, subtraction, multiplication, division, abs, etc.

You can see how it fills in Objects/longobject.c

static PyNumberMethods long_as_number = {
	    (binaryfunc)long_add,       /*nb_add*/
	    (binaryfunc)long_sub,       /*nb_subtract*/
	    (binaryfunc)long_mul,       /*nb_multiply*/
	    long_mod,                   /*nb_remainder*/
	    long_divmod,                /*nb_divmod*/
	    long_pow,                   /*nb_power*/
	    (unaryfunc)long_neg,        /*nb_negative*/
	    long_long,                  /*tp_positive*/
	    (unaryfunc)long_abs,        /*tp_absolute*/
	    (inquiry)long_bool,         /*tp_bool*/
	    (unaryfunc)long_invert,     /*nb_invert*/
	    long_lshift,                /*nb_lshift*/
	    long_rshift,                /*nb_rshift*/
	    long_and,                   /*nb_and*/
	    long_xor,                   /*nb_xor*/
	    long_or,                    /*nb_or*/
	    long_long,                  /*nb_int*/
	    0,                          /*nb_reserved*/
	    long_float,                 /*nb_float*/
	    0,                          /* nb_inplace_add */
	    0,                          /* nb_inplace_subtract */
	    0,                          /* nb_inplace_multiply */
	    0,                          /* nb_inplace_remainder */
	    0,                          /* nb_inplace_power */
	    0,                          /* nb_inplace_lshift */
	    0,                          /* nb_inplace_rshift */
	    0,                          /* nb_inplace_and */
	    0,                          /* nb_inplace_xor */
	    0,                          /* nb_inplace_or */
	    long_div,                   /* nb_floor_divide */
	    long_true_divide,           /* nb_true_divide */
	    0,                          /* nb_inplace_floor_divide */
	    0,                          /* nb_inplace_true_divide */
	    long_long,                  /* nb_index */
	};
        

3.2?PySequenceMethods:

This structure holds pointers to the functions which an object uses to implement the sequence protocol.?You can see a real sample for its usage in a python list Objects/listobject.c

static PySequenceMethods list_as_sequence = {
	    (lenfunc)list_length,                       /* sq_length */
	    (binaryfunc)list_concat,                    /* sq_concat */
	    (ssizeargfunc)list_repeat,                  /* sq_repeat */
	    (ssizeargfunc)list_item,                    /* sq_item */
	    0,                                          /* sq_slice */
	    (ssizeobjargproc)list_ass_item,             /* sq_ass_item */
	    0,                                          /* sq_ass_slice */
	    (objobjproc)list_contains,                  /* sq_contains */
	    (binaryfunc)list_inplace_concat,            /* sq_inplace_concat */
	    (ssizeargfunc)list_inplace_repeat,          /* sq_inplace_repeat */
	};        

??3.3 PyMappingMethods:

This structure holds pointers to the functions which an object uses to implement the mapping protocol. You can see a real sample for its usage in a python list Objects/listobject.c

static PyMappingMethods list_as_mapping = {
	    (lenfunc)list_length,
	    (binaryfunc)list_subscript,
	    (objobjargproc)list_ass_subscript
	};        

Returning back to?PyTypeObject?that each built in object has to initialize, the below snippet shows the list type (called PyList_Type) initialization:

PyTypeObject PyList_Type = {
	    PyVarObject_HEAD_INIT(&PyType_Type, 0)
	    "list",
	    sizeof(PyListObject),
	    0,
	    (destructor)list_dealloc,                   /* tp_dealloc */
	    0,                                          /* tp_vectorcall_offset */
	    0,                                          /* tp_getattr */
	    0,                                          /* tp_setattr */
	    0,                                          /* tp_as_async */
	    (reprfunc)list_repr,                        /* tp_repr */
	    0,                                          /* tp_as_number */
	    &list_as_sequence,                          /* tp_as_sequence */
	    &list_as_mapping,                           /* tp_as_mapping */
	    PyObject_HashNotImplemented,                /* tp_hash */
	    0,                                          /* tp_call */
	    0,                                          /* tp_str */
	    PyObject_GenericGetAttr,                    /* tp_getattro */
	    0,                                          /* tp_setattro */
	    0,                                          /* tp_as_buffer */
	    Py_TPFLAGS_DEFAULT | Py_TPFLAGS_HAVE_GC |
	        Py_TPFLAGS_BASETYPE | Py_TPFLAGS_LIST_SUBCLASS |
	        _Py_TPFLAGS_MATCH_SELF | Py_TPFLAGS_SEQUENCE,  /* tp_flags */
	    list___init____doc__,                       /* tp_doc */
	    (traverseproc)list_traverse,                /* tp_traverse */
	    (inquiry)_list_clear,                       /* tp_clear */
	    list_richcompare,                           /* tp_richcompare */
	    0,                                          /* tp_weaklistoffset */
	    list_iter,                                  /* tp_iter */
	    0,                                          /* tp_iternext */
	    list_methods,                               /* tp_methods */
	    0,                                          /* tp_members */
	    0,                                          /* tp_getset */
	    0,                                          /* tp_base */
	    0,                                          /* tp_dict */
	    0,                                          /* tp_descr_get */
	    0,                                          /* tp_descr_set */
	    0,                                          /* tp_dictoffset */
	    (initproc)list___init__,                    /* tp_init */
	    PyType_GenericAlloc,                        /* tp_alloc */
	    PyType_GenericNew,                          /* tp_new */
	    PyObject_GC_Del,                            /* tp_free */
	    .tp_vectorcall = list_vectorcall,
	};
        


2.3 Built-in Structures Samples

  1. int (CPython name is PyLongObject):

Generally, in languages like C/C++,?the precision of integers is limited by the size of a C long (typically 32 or 64 bits), but Python supports a "bignum" integer type which can work with arbitrarily large numbers (Arbitrary-precision integers), see PEP-0237.

Shortly, giving us the convenience of unlimited sizes on strings, lists, etc. It makes sense to extend this convenience to numbers (integers).

Will talk about this ability in next article

Some info about the PyLongObject structure:

No alt text provided for this image


In?Include/pytypedefs.h:

typedef struct _longobject PyLongObject;        

In Include/cpython/longintrepr.h:

struct _longobject {
	    PyObject_VAR_HEAD
	    digit ob_digit[1];
	};        


2. list (CPython name is PyListObject):

looking at the below declaration, will let us understand that python list is a list of pointers.

typedef struct {
	    PyObject_VAR_HEAD
	    /* Vector of pointers to list elements.  list[0] is ob_item[0], etc. */
	    PyObject **ob_item;
	

	    /* ob_item contains space for 'allocated' elements.  The number
	     * currently in use is ob_size.
	     * Invariants:
	     *     0 <= ob_size <= allocated
	     *     len(list) == ob_size
	     *     ob_item == NULL implies ob_size == allocated == 0
	     * list.sort() temporarily sets allocated to -1 to detect mutations.
	     *
	     * Items must normally not be NULL, except during construction when
	     * the list is not yet visible outside the function that builds it.
	     */
	    Py_ssize_t allocated;
	} PyListObject;        

location: Include/cpython/listobject.h

The below figure shows PyListObject structure (for full image).

No alt text provided for this image


No alt text provided for this image
All the below examples are implemented in this GitHub repo.

Let's start with very simple example to help us picturing how Python-built-in objects are structured.

Example 1

Change the type?name?for int to something else like 'integer_name'.

  • Origin interpreter output:

type(123) --> int        

? Desired output:

type(123) -> 'integer_name'        

Steps:

1.Edit PyLong_Type in Object/longobject.c file by changing tp_name value in PyLong_Type structure .

2. Change tp_name from?"int" to "integer_name" (link):

From:

PyTypeObject PyLong_Type = {

??PyVarObject_HEAD_INIT(&PyType_Type, 0)

??"int",???????????????????/* tp_name */

..
        

To:

PyTypeObject PyLong_Type = {

??PyVarObject_HEAD_INIT(&PyType_Type, 0)

??"integer_name",???????????????????/* tp_name */
...        

3. Recompile using the below steps:

make

sudo make install        

?Run the recent compiled interpreter:

./out/bin/python3

No alt text provided for this image

As shown above, type(123) is integer_name not int as origin.


Example 2

In this example, we will change how CPython represents a dictionary variable. This is defined in /* tp_repr */ method in PyTypeObject that we discussed earlier, and it usually looks like *_repr as dict_repr.

Change the dictionary representation so each key-value pair printed in separate line:

? Origin interpreter output:

>>> test_var = {str(i):i*i for i in range(10)}

>>> print(test_var)
{'0': 0, '1': 1, '2': 4, '3': 9, '4': 16, '5': 25, '6': 36, '7': 49, 
'8': 64, '9': 81}?        

? Desired output:

>>> test_var = {str(i):i*i for i in range(10)}
>>> print(test_var)
{'0': 0,
 '1': 1,
 '2': 4,
 '3': 9,
 '4': 16,
 '5': 25,
 '6': 36,
 '7': 49,
 '8': 64,
 '9': 81}?
        

Steps:

1. Go to PyDict_Type Structure definition in Objects/dictobject.c which has name of tp_repr function which is dict_repr as shown below:

PyTypeObject PyDict_Type = {
	    PyVarObject_HEAD_INIT(&PyType_Type, 0)
	    "dict",
	    sizeof(PyDictObject),
	    0,
	    (destructor)dict_dealloc,                   /* tp_dealloc */
	    0,                                          /* tp_vectorcall_offset */
	    0,                                          /* tp_getattr */
	    0,                                          /* tp_setattr */
	    0,                                          /* tp_as_async */
	    (reprfunc)dict_repr,                        /* tp_repr */
	    &dict_as_number,                            /* tp_as_number */
	    &dict_as_sequence,                          /* tp_as_sequence */
	    &dict_as_mapping,                           /* tp_as_mapping */
	    PyObject_HashNotImplemented,                /* tp_hash */
	    0,                                          /* tp_call */
	    0,                                          /* tp_str */
	    PyObject_GenericGetAttr,                    /* tp_getattro */
	    0,                                          /* tp_setattro */
	    0,                                          /* tp_as_buffer */
	    Py_TPFLAGS_DEFAULT | Py_TPFLAGS_HAVE_GC |
	        Py_TPFLAGS_BASETYPE | Py_TPFLAGS_DICT_SUBCLASS |
	        _Py_TPFLAGS_MATCH_SELF | Py_TPFLAGS_MAPPING,  /* tp_flags */
	    dictionary_doc,                             /* tp_doc */
	    dict_traverse,                              /* tp_traverse */
	    dict_tp_clear,                              /* tp_clear */
	    dict_richcompare,                           /* tp_richcompare */
	    0,                                          /* tp_weaklistoffset */
	    (getiterfunc)dict_iter,                     /* tp_iter */
	    0,                                          /* tp_iternext */
	    mapp_methods,                               /* tp_methods */
	    0,                                          /* tp_members */
	    0,                                          /* tp_getset */
	    0,                                          /* tp_base */
	    0,                                          /* tp_dict */
	    0,                                          /* tp_descr_get */
	    0,                                          /* tp_descr_set */
	    0,                                          /* tp_dictoffset */
	    dict_init,                                  /* tp_init */
	    _PyType_AllocNoTrack,                       /* tp_alloc */
	    dict_new,                                   /* tp_new */
	    PyObject_GC_Del,                            /* tp_free */
	    .tp_vectorcall = dict_vectorcall,
	};        

2. Go to dict_repr function which is defined in same file dictobject.c?(link)

static PyObject* 
	dict_repr(PyDictObject *mp)
	{
	    Py_ssize_t i;
	    PyObject *key = NULL, *value = NULL;
	    _PyUnicodeWriter writer;
	    int first;
	

	    i = Py_ReprEnter((PyObject *)mp);
	    if (i != 0) {
	        return i > 0 ? PyUnicode_FromString("{...}") : NULL;
	    }
	

	    if (mp->ma_used == 0) {
	        Py_ReprLeave((PyObject *)mp);
	        return PyUnicode_FromString("{}");
	    }
	

	    _PyUnicodeWriter_Init(&writer);
	    writer.overallocate = 1;
	    /* "{" + "1: 2" + ", 3: 4" * (len - 1) + "}" */
	    writer.min_length = 1 + 4 + (2 + 4) * (mp->ma_used - 1) + 1;
	

	    if (_PyUnicodeWriter_WriteChar(&writer, '{') < 0)
	        goto error;
	

	    /* Do repr() on each key+value pair, and insert ": " between them.
	       Note that repr may mutate the dict. */
	    i = 0;
	    first = 1;
	    while (PyDict_Next((PyObject *)mp, &i, &key, &value)) {
	        PyObject *s;
	        int res;
	

	        /* Prevent repr from deleting key or value during key format. */
	        Py_INCREF(key);
	        Py_INCREF(value);
	

	        if (!first) {
	            if (_PyUnicodeWriter_WriteASCIIString(&writer, ", ", 2) < 0)
	                goto error;
	        }
	        first = 0;
	

	        s = PyObject_Repr(key);
	        if (s == NULL)
	            goto error;
	        res = _PyUnicodeWriter_WriteStr(&writer, s);
	        Py_DECREF(s);
	        if (res < 0)
	            goto error;
	

	        if (_PyUnicodeWriter_WriteASCIIString(&writer, ": ", 2) < 0)
	            goto error;
	

	        s = PyObject_Repr(value);
	        if (s == NULL)
	            goto error;
	        res = _PyUnicodeWriter_WriteStr(&writer, s);
	        Py_DECREF(s);
	        if (res < 0)
	            goto error;
	

	        Py_CLEAR(key);
	        Py_CLEAR(value);
	    }
	

	    writer.overallocate = 0;
	    if (_PyUnicodeWriter_WriteChar(&writer, '}') < 0)
	        goto error;
	

	    Py_ReprLeave((PyObject *)mp);
	

	    return _PyUnicodeWriter_Finish(&writer);
	

	error:
	    Py_ReprLeave((PyObject *)mp);
	    _PyUnicodeWriter_Dealloc(&writer);
	    Py_XDECREF(key);
	    Py_XDECREF(value);
	    return NULL;
	}        

as you can notice from the above function, it has loop, looping through dict key, values object and printing them.

3. Inside the while loop we have if statements adding colon “, “ which separating key value pairs

        if (!first) {
	          if (_PyUnicodeWriter_WriteASCIIString(&writer, ", ", 2) < 0)
	                goto error;
	        }        

replace colon “, “ by colon + new line character “,\n “ and change the len of new string to 3?

      if (!first) {
	         if (_PyUnicodeWriter_WriteASCIIString(&writer, ",\n ", 3) < 0)
	                goto error;
	        }        

4.Recompile using the below steps:

make

sudo make install        

run the recent compiled interpreter:

./out/bin/python3

No alt text provided for this image

Congratulations!! Now you changed the behavior of dictionary objects representation when printed.

Example 3

Add a function to int class that returns length of int.

Desired output:

No alt text provided for this image

Here is the algorithm that we are going to use (written in Python) to find int length:


if number == 0:
    return 1
number = abs(given_number)
counter = 0
while(number > 0):
    counter += 1
    number //= 10        

How are we going to add such method in CPython?

The algorithm will work for numbers up to
2^30 - 1

Again, Python3 has limitless integer number size more precisely, size of an integer is not restricted by the number of bits and can expand to the limit of the available memory (this will be discussed further in the separate article).

Steps:

  1. Go to Objects/longobject.c
  2. In PyLong_Type structure initialization, we need to add a pointer to tp_as_sequence (will call it long_as_sequence), which is PySequenceMethods type.

PyTypeObject?PyLong_Type?=?
????PyVarObject_HEAD_INIT(&PyType_Type,?0)
????"integer_name",??????????????????????????????????????/*?tp_name?*/
????offsetof(PyLongObject,?ob_digit),???????????/*?tp_basicsize?*/
????sizeof(digit),??????????????????????????????/*?tp_itemsize?*/
????0,??????????????????????????????????????????/*?tp_dealloc?*/
????0,??????????????????????????????????????????/*?tp_vectorcall_offset?*/
????0,??????????????????????????????????????????/*?tp_getattr?*/
????0,??????????????????????????????????????????/*?tp_setattr?*/
????0,??????????????????????????????????????????/*?tp_as_async?*/
????long_to_decimal_string,?????????????????????/*?tp_repr?*/
????&long_as_number,????????????????????????????/*?tp_as_number?*/
????&long_as_sequence,??????????????????????????/*?tp_as_sequence?*/
????0,??????????????????????????????????????????/*?tp_as_mapping?*/

?....        

The value now in 0, but we have to instantiate a structure of PySequenceMethods (link):

static?PySequenceMethods?long_as_sequence?=?{
????(lenfunc)long_length,???????????????????????/*?sq_length?*/
????0,??????????????????????????????????????????/*?sq_concat?*/
????0,??????????????????????????????????????????/*?sq_repeat?*/
????0,??????????????????????????????????????????/*?sq_item?*/
????0,??????????????????????????????????????????/*?sq_slice?*/
????0,??????????????????????????????????????????/*?sq_ass_item?*/
????0,??????????????????????????????????????????/*?sq_ass_slice?*/
????0,??????????????????????????????????????????/*?sq_contains?*/
????0,??????????????????????????????????????????/*?sq_inplace_concat?*/
????0,??????????????????????????????????????????/*?sq_inplace_repeat?*/
};        

Here is the implementation for long_length (link):

static int

long_length(PyLongObject *dv)
{
? ? int len = 0;
? ? int num = dv->ob_digit[0];
? ? if (num == 0) {
? ? ? ? return 1;
? ? }
? ? while (num > 0) {
? ? ? ? num /= 10;
? ? ? ? len++;
? ? }
? ? return len;
};        

3.recompile using the below steps:

make

sudo make install        

run the recent compiled interpreter:

./out/bin/python3

(image already displayed in Example 3 header)


Example 4

list subtraction < list> - <list> is not supported in Python.

We need to clarify the logic when we apply subtraction between 2 lists:

[1, 2, 3, 1] - [0, 1] --> [2, 3, 1]

[0, {1:1}, 2, 1] - [{1:1}, 1] --> [0, 2]

and so on.

How are we going to add such method in CPython?

We know that 'set' object has a defined subtraction operation. You can take a look in objects/setobject.c to check out where and how this operation is implemented. It is found in PyNumberMethods, notice that PyList_Type does not set a value to it (not implemented) as shown in the below snippet:

PyTypeObject PyList_Type = {
? ? PyVarObject_HEAD_INIT(&PyType_Type, 0)
? ? "list",
? ? sizeof(PyListObject),
? ? 0,
? ? (destructor)list_dealloc, ? ? ? ? ? ? ? ? ? /* tp_dealloc */
? ? 0, ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?/* tp_vectorcall_offset */
? ? 0, ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?/* tp_getattr */
? ? 0, ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?/* tp_setattr */
? ? 0, ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?/* tp_as_async */
? ? (reprfunc)list_repr, ? ? ? ? ? ? ? ? ? ? ? ?/* tp_repr */
? ? &0,              ? ? ? ? ? ? ? ? ? ? ? ? ? ?/* tp_as_number */
? ? &list_as_sequence, ? ? ? ? ? ? ? ? ? ? ? ? ?/* tp_as_sequence */
? ? &list_as_mapping, ? ? ? ? ? ? ? ? ? ? ? ? ? /* tp_as_mapping */
...        

Steps:

  1. Go to Objects/listobject.c
  2. In PyList_Type structure initialization, we need to point to an instance of tp_as_number:

PyTypeObject PyList_Type = {
? ? PyVarObject_HEAD_INIT(&PyType_Type, 0)
? ? "list",
? ? sizeof(PyListObject),
 ?? 0,
? ? (destructor)list_dealloc, ? ? ? ? ? ? ? ? ? /* tp_dealloc */
? ? 0, ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?/* tp_vectorcall_offset */
? ? 0, ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?/* tp_getattr */
? ? 0, ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?/* tp_setattr */
? ? 0, ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?/* tp_as_async */
? ? (reprfunc)list_repr, ? ? ? ? ? ? ? ? ? ? ? ?/* tp_repr */
? ? &list_as_number, ? ? ? ? ? ? ? ? ? ? ? ? ? ?/* tp_as_number */
? ? &list_as_sequence, ? ? ? ? ? ? ? ? ? ? ? ? ?/* tp_as_sequence */
? ? &list_as_mapping, ? ? ? ? ? ? ? ? ? ? ? ? ? /* tp_as_mapping */{
...        

3. Add a function of type PyNumberMethods call it list_as_number (link):

static PyNumberMethods list_as_number = {
? ? 0, ? ? ? ? ? ? ? ? ? ? ? ? /*nb_add*/
? ? (binaryfunc)list_sub, ? ? ? /*nb_subtract*/
? ? 0, ? ? ? ? ? ? ? ? ? ? ? ? ?/*nb_multiply*/
? ? 0, ? ? ? ? ? ? ? ? ? ? ? ? ?/*nb_remainder*/
? ? 0, ? ? ? ? ? ? ? ? ? ? ? ? ?/*nb_divmod*/
? ? 0, ? ? ? ? ? ? ? ? ? ? ? ? ?/*nb_power*/
? ? 0, ? ? ? ? ? ? ? ? ? ? ? ? ?/*nb_negative*/
? ? 0, ? ? ? ? ? ? ? ? ? ? ? ? ?/*tp_positive*/
? ? 0, ? ? ? ? ? ? ? ? ? ? ? ? ?/*tp_absolute*/
? ? 0, ? ? ? ? ? ? ? ? ? ? ? ? ?/*tp_bool*/
? ? 0, ? ? ? ? ? ? ? ? ? ? ? ? ?/*nb_invert*/
? ? 0, ? ? ? ? ? ? ? ? ? ? ? ? ?/*nb_lshift*/
? ? 0, ? ? ? ? ? ? ? ? ? ? ? ? ?/*nb_rshift*/
? ? 0, ? ? ? ? ? ? ? ? ? ? ? ? ?/*nb_and*/
? ? 0, ? ? ? ? ? ? ? ? ? ? ? ? ?/*nb_xor*/
? ? 0, ? ? ? ? ? ? ? ? ? ? ? ? ?/*nb_or*/
? ? 0, ? ? ? ? ? ? ? ? ? ? ? ? ?/*nb_int*/
? ? 0, ? ? ? ? ? ? ? ? ? ? ? ? ?/*nb_reserved*/
? ? 0, ??  ? ? ? ? ? ? ? ? ? ? ?/*nb_float*/
? ? 0, ? ? ? ? ? ? ? ? ? ? ? ? ?/* nb_inplace_add */
? ? 0, ? ? ? ? ? ? ? ? ? ? ? ? ?/* nb_inplace_subtract */
? ? 0, ? ? ? ? ? ? ? ? ? ? ? ? ?/* nb_inplace_multiply */
? ? 0, ? ? ? ? ? ? ? ? ? ? ? ? ?/* nb_inplace_remainder */
? ? 0, ? ? ? ? ? ? ? ? ? ? ? ? ?/* nb_inplace_power */
? ? 0, ? ? ? ? ? ? ? ? ? ? ? ? ?/* nb_inplace_lshift */
? ? 0, ? ? ? ? ? ? ? ? ? ? ? ? ?/* nb_inplace_rshift */
? ? 0, ? ? ? ? ? ? ? ? ? ? ? ? ?/* nb_inplace_and */
? ? 0, ? ? ? ? ? ? ? ? ? ? ? ? ?/* nb_inplace_xor */
? ? 0, ? ? ? ? ? ? ? ? ? ? ? ? ?/* nb_inplace_or */
};?        

4. Implement list_sub (link):

static PyObject *

list_sub(PyListObject *a, PyObject *bb)

{
? ? int cmp;
? ? PyObject *u ;
? ? int found;

? ? Py_ssize_t i1;
? ? Py_ssize_t i2;
? ? PyListObject *np; ? ? ? ?
?
? ? /* The below check used to see if the left hand object is a list*/
? ? if (!PyList_Check(bb)) {
? ? ? ? PyErr_Format(PyExc_TypeError,
? ? ? ? ? ? ? ? ? "can only find difference list (not \"%.200s\") to list",
? ? ? ? ? ? ? ? ? Py_TYPE(bb)->tp_name);
? ? ? ? return NULL;
? ? }


? ? if (Py_SIZE(a) == 0) {
? ? ? ? ? ? return PyList_New(0);
? ? }

? ? np = (PyListObject *) PyList_New(0);
? ? if (np == NULL) {
? ? ? ? ? ? return NULL;
? ? }

? ? for (i1 = 0; i1 < Py_SIZE(a); i1++) {
? ? ? ? found = 0 ;
? ? ? ? PyObject *v = PyList_GET_ITEM(a, i1);

? ? ? ? for (i2 = 0; i2 < Py_SIZE((PyListObject *) bb); i2++) {
? ? ? ? ? ? u = PyList_GET_ITEM((PyListObject *) bb, i2);
? ? ? ? ? ? cmp = PyObject_RichCompareBool(v, u, Py_EQ);
? ? ? ? ? ? if (cmp == 1) {
? ? ? ? ? ? ? ? found = 1;
? ? ? ? ? ? ? ? break;
? ? ? ? ? ? }
? ? ? ? }

? ? ? ? if (found == 0) {
? ? ? ? ? ? PyList_Append((PyObject *) np, v);
? ? ? ? }
? ? ? ? else {
? ? ? ? ? ? list_remove((PyListObject *) bb, u);
? ? ? ? }
? ? }
? ? return (PyObject *)np;
};        

5.Recompile using the below steps:

make

sudo make install        

?Then run the recent compiled interpreter

No alt text provided for this image

Conclusion

In summary, learning CPython internals and implementation details is an advanced topic and it needs some degree of knowledge of the C language to be able to understand it. However, focusing on the most important parts of its internal structures and recognizing the patterns and styles which CPython adopts is very helpful and time-saving for anyone who is interested in diving into these internals. That was the approach of this article, with addition to providing some examples which rely on C or APIs already written in CPython.

Abdelrahman Alanati

Clinical Academic | Registered Nurse (RN) | Researcher | Doctoral Candidate, University of Wollongong ???? | MSN, MACN

2 年

Well-deserved achievement! keep up the good work!

要查看或添加评论,请登录

Abdullah Anati的更多文章

社区洞察

其他会员也浏览了