# PyTorch - 内部建筑之旅

## 介绍

This post is a tour around the PyTorch codebase, it is meant to be a guide for the architectural design of PyTorch and its internals. My main goal is to provide something useful for those who are interested in understanding what happens beyond the user-facing API and show something new beyond what was already covered in other tutorials.

Note:PyTorch build system uses code generation extensively so I won’t repeat here what was already described by others. If you’re interested in understanding how this works, please read the following tutorials:

## Short intro to Python extension objects in C/C++

`// Python object that backs torch.autograd.Variable struct THPVariable { PyObject_HEAD torch::autograd::Variable cdata; PyObject* backward_hooks; };`

Funny fact: it is very common in many applications to use small integer numbers as indexing, counters, etc. For efficiency, the officialCPython的解释从-5高达256缓存整数出于这个原因，语句`一个= 200;B = 200;A是B`will be真正, while the statement`a = 300; b = 300; a is b`will beFalse

## PyTorch张量零拷贝到numpy的，反之亦然

PyTorch has its own Tensor representation, which decouples PyTorch internal representation from external representations. However, as it is very common, especially when data is loaded from a variety of sources, to have Numpy arrays everywhere, therefore we really need to make conversions between Numpy and PyTorch tensors. For that reason, PyTorch provides two methods called`from_numpy（）``numpy的（）`, that converts a Numpy array to a PyTorch array and vice-versa, respectively. If we look the code that is being called to convert a Numpy array into a PyTorch tensor, we can get more insights on the PyTorch’s internal representation:

`在::张量tensor_from_numpy（*的PyObject OBJ）{如果（！PyArray_Check（OBJ））{抛出类型错误（ “预期np.ndarray（得到的是％S）”，Py_TYPE（OBJ） - > tp_name）;}自动阵列=（PyArrayObject *）OBJ;INT NDIM = PyArray_NDIM（数组）;自动调整大小= to_aten_shape（NDIM，PyArray_DIMS（阵列））;自动步幅= to_aten_shape（NDIM，PyArray_STRIDES（阵列））;// NumPy的进步使用字节。火炬大步使用元素计数。自动element_size_in_bytes = PyArray_ITEMSIZE（数组）;为（自动＆步幅：步幅）{步幅/ = element_size_in_bytes;} //（...） - 为了简洁省略无效* DATA_PTR = PyArray_DATA（数组）; auto& type = CPU(dtype_to_aten(PyArray_TYPE(array))); Py_INCREF(obj); return type.tensorFromBlob(data_ptr, sizes, strides, [obj](void* data) { AutoGIL gil; Py_DECREF(obj); }); }`

(code fromtensor_numpy.cpp)

The`tensorFromBlob（）`方法将创建一个新的张量，但只有创建一个新的“存储”这个张量之后。存储是其中实际数据指针将被存储（而不是在张量结构本身）。这需要我们去了解下部分张量存储s

## 张量存储

The CPUFloatStorage is basically a wrapper with utility functions around the actual storage structure called`THFloatStorage`我们下面显示：

`typedef结构THStorage {真实*数据;ptrdiff_t的大小;INT引用计数;焦标志;THAllocator *分配器;无效* allocatorContext;STRUCT THStorage *图;} THStorage;`

(code fromTHStorage。h)

As you can see, the`THStorage`具有指向原始数据，它的尺寸，标志和也是一个有趣的领域被称为`allocator`我们马上要讨论的。同样重要的是要注意，关于如何解释里面的数据没有元数据`THStorage`, this is due to the fact that the storage is “dumb” regarding of its contents and it is the Tensor responsibility to know how to “view” or interpret this data.

`>>> tensor_a = torch.ones((3, 3)) >>> tensor_b = tensor_a.view(9) >>> tensor_a.storage().data_ptr() == tensor_b.storage().data_ptr() True`

Now, as we saw in line 7 of the`THFloatStorage`结构中，存在一个指向`THAllocator`构建那里。而且因为它带来了关于可用亚洲金博宝于分配存储数据分配器的灵活性，这是非常重要的。这种结构通过下面的代码表示：

`typedef struct THAllocator { void* (*malloc)(void*, ptrdiff_t); void* (*realloc)(void*, void*, ptrdiff_t); void (*free)(void*, void*); } THAllocator;`

(code fromTHAllocator.h)

`静态无效* THCudaHostAllocator_malloc（无效* CTX，ptrdiff_t的大小）{void *的PTR;如果（大小<0）THError（ “无效的存储器大小：％LD”，大小）;如果（大小== 0）返回NULL;THCudaCheck（cudaMallocHost（PTR，大小））;返回PTR;}`

(code fromTHCAllocator.c)

• TH=TorcH
• THC=TorcHC乌达
• 乡镇卫生院=TorcHC乌达Sparse
• THCUNN=TorcHCUDANeuralNetwork
• THD=TorcHDistributed
• THNN=TorcHNeuralNetwork
• THS=TorcH 2 Sparse

This convention is also present in the function/class names and other objects, so it is important to always keep these patterns in mind. While you can find CPU allocators in the TH code, you’ll find CUDA allocators in the THC code.

`typedef结构THTensor {*的int64_t大小;*的int64_t步幅;INT n标注;THStorage *存储;ptrdiff_t的storageOffset;INT引用计数;焦标志;} THTensor;`

(Code fromTHTensor.h)

And as you can see, the main`THTensor`结构保持的尺寸/步幅/尺寸/偏移/等以及存储（`THStorage`），用于张量数据。

Now, once we have requirements such as multi-processing where we want to share tensor data among multiple different processes, we need a shared memory approach to solve it, otherwise, every time another process needs a tensor or even when you want to implementHogwildtraining procedure where all different processes will write to the same memory region (where the parameters are), you’ll need to make copies between processes, and this is very inefficient. Therefore we’ll discuss in the next section a special kind of storage for Shared Memory.

## 共享内存

PyTorch周围提供了Python的包装`多`module and can be imported from`torch.multiprocessing`。The changes they implemented in this wrapper around the official Python multiprocessing were done to make sure that everytime a tensor is put on a queue or shared with another process, PyTorch will make sure that only a handle for the shared memory will be shared instead of a new entire copy of the Tensor.

Now, many people aren’t aware of a Tensor method from PyTorch called`share_memory_()`然而，这个功能是什么触发了整个重建该特定张量的存储内存。什么这种方法确实是创建共享存储器的区域，可以不同的过程中被使用。该功能将在年底，拨打以下这个如下功能：

`静态THStorage * THPStorage_（newFilenameStorage）（ptrdiff_t的大小）{INT标志= TH_ALLOCATOR_MAPPED_SHAREDMEM |TH_ALLOCATOR_MAPPED_EXCLUSIVE;的std :: string手柄= THPStorage _（__ newHandle）（）;自动CTX = libshm_context_new（NULL，handle.c_str（），标志）;返回THStorage_（newWithAllocator）（大小，＆THManagedSharedAllocator，（无效*）CTX）;}`

(Code fromStorageSharing.cpp)

Note：当与一个PyTorch下划线的方法结束时，如该方法称为`share_memory_()`, it means that this method has an in-place effect, and it will change the current object instead of creating a new one with the modifications.

I’ll now show a Python example of one processing using the data from a Tensor that was allocated on another process by manually exchanging the shared memory handle:

This is executed in the process A:

`>>>进口炬>>> tensor_a = torch.ones（（5,5））>>> tensor_a 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [尺寸5×5的torch.FloatTensor] >>> tensor_a.is_shared（）假>>> tensor_a = tensor_a.share_memory_（）>>> tensor_a.is_shared（）真>>> tensor_a_storage = tensor_a.storage（）>>> tensor_a_storage。_share_filename_（）（b '/ var / tmp中/ tmp.0.yowqlr'，b '/ torch_31258_1218748506'，25）`

In this code, executed in theprocess A我们创建充满的人的5×5的一个新的张量。之后，我们让共享和打印与Unix域套接字地址元组以及手柄。现在，我们可以从另一个访问该存储区域process Bas shown below:

`>>> import torch >>> tensor_a = torch.Tensor() >>> tuple_info = (b'/var/tmp/tmp.0.yowqlr', b'/torch_31258_1218748506', 25) >>> storage = torch.Storage._new_shared_filename(*tuple_info) >>> tensor_a = torch.Tensor(storage).view((5, 5)) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [torch.FloatTensor of size 5x5]`

## DLPack: a hope for the Deep Learning frameworks Babel

This will certainly help to overcome the “island model” that we have today between tensor representations in MXNet, PyTorch, etc, and will allow developers to mix framework operations between frameworks and all the benefits that a standardization can bring to the frameworks.

DLPack的核心操作系统被称为结构非常简单亚洲金博宝`DLTensor`, as shown below:

`/ *！* \简短的纯C张量的对象，不管理内存。* / typedef结构{/ *！* \介绍不透明数据指针指向分配的数据。*这将在CUDA的OpenCL设备指针或cl_mem手柄。*这个指针始终对齐到256个字节的CUDA。* / void *的数据;/ *！\介绍张量* / DLContext CTX的设备上下文/ *！ \brief Number of dimensions */ int ndim; /*! \brief The data type of the pointer*/ DLDataType dtype; /*! \brief The shape of the tensor */ int64_t* shape; /*! * \brief strides of the tensor, * can be NULL, indicating tensor is compact. */ int64_t* strides; /*! \brief The offset in bytes to the beginning pointer to data */ uint64_t byte_offset; } DLTensor;`

(code fromdlpack.h)

There is also a managed version of the tensor that is called`DLManagedTensor`, where the frameworks can provide a context and also a “deleter” function that can be called by the framework who borrowed the Tensor to inform the other framework that the resources are no longer required.

`import torch from torch.utils import dlpack t = torch.ones((5, 5)) dl = dlpack.to_dlpack(t)`

This Python function will call the`toDLPack`function from ATen, shown below:

`DLManagedTensor* toDLPack(const Tensor& src) { ATenDLMTensor * atDLMTensor(new ATenDLMTensor); atDLMTensor->handle = src; atDLMTensor->tensor.manager_ctx = atDLMTensor; atDLMTensor->tensor.deleter = &deleter; atDLMTensor->tensor.dl_tensor.data = src.data_ptr(); int64_t device_id = 0; if (src.type().is_cuda()) { device_id = src.get_device(); } atDLMTensor->tensor.dl_tensor.ctx = getDLContext(src.type(), device_id); atDLMTensor->tensor.dl_tensor.ndim = src.dim(); atDLMTensor->tensor.dl_tensor.dtype = getDLDataType(src.type()); atDLMTensor->tensor.dl_tensor.shape = const_cast(src.sizes().data()); atDLMTensor->tensor.dl_tensor.strides = const_cast(src.strides().data()); atDLMTensor->tensor.dl_tensor.byte_offset = 0; return &(atDLMTensor->tensor); }`

As you can see, it’s a pretty simple conversion, casting the metadata from the PyTorch format to the DLPack format and assigning a pointer to the internal Tensor data representation.

That’s it, I hope you liked this long post !

- 基督教S. Perone

Cite this article as: Christian S. Perone, "PyTorch – Internal Architecture Tour," inTerra Incognita，2018年12月3日，//www.cpetem.com/2018/03/pytorch-internal-architecture-tour/

## 13个想法“PyTorch - 内部建筑之旅”

1. Thomas 说：

Great post! Very interesting to see the details of Pytorch as well as to know that it is well-implemented.

2. 匿名 说：

This is awesome!

3. Minh 说：

Awesome write-up! This inspires me wanting to look at PyThor’s code base more.

4. 匿名 说：

Great post, thanks!

5. 匿名 说：

好贴！不过，我想你最好添加源代码版本，因为底层后端正在迅速发生变化，有些链接已经断开。

6. h 说：

hi Christian, thanks for the insider details on pytorch.

我有一个问题，从pytorch到numpy的转换，并希望你能帮助我了解发生了什么，以及如何解决它。

简单地说，我转换数组pytorch，执行一个过程，然后转换回numpy的用于使用OpenCV的后续处理。

例：
torch_array = torch.from_numpy（numpy_array）＃小于1毫秒
做torch_array＃处理不到1毫秒施加GPU @ 99％
numpy_array = np.array(torch_array) # greater than 200 msec

GPU = nvidia on jetson TX1 platform
火炬= 0.4.0

关于^ h

1. 弗拉季斯拉夫·Kurenkov 说：

You should use .numpy().

torch_array = torch.from_numpy（numpy_array）
...。
...。
numpy_array = torch_array.numpy（）

7. 有见地。感谢分享。

8. 埃里克 说：

写得好！现在我更了解pytorch内部，它是如何代表/存储张量

9. 戴夫Kielpinski 说：

我绝对欣赏这个博客帖子。谢谢！

10. cocoaaa 说：

谢谢你为一个伟大的帖子！这真的帮助我理解张量存储是如何工作的。现在我可以检查两个张量共享相同的存储（由`t0.storage（）。date_ptr（）== t1.storage（）。DATA_PTR（）`），但我怎么能检查是否numpy的阵列的视图张量？有没有办法做到PyTorch和numpy的之间类似的检查？谢谢你的忠告提前！

1. 您可以使用：n_array .__ array_interface __ [“数据”]，然而，这仅仅是用于说明目的，因为比较原始指针不是一个很好的主意。亚洲金博宝

11. 蒂亚戈 说：

伟大的职位，但它确实帮助了我很多理解Pytorch存储。

我的理解是，我可以从一个STL向量C ++ pytorch张量，并通过pybind暴露到Python无副本。

我不知道如果我能揭露从C ++ STL的一个载体向Python和从它创建一个张没有进行复印，尽管https://pytorch.org/docs/stable/tensors.html说torch.tensor总是拷贝数据