# PyTorch – Internal Architecture Tour

Update 28 Feb 2019:我添加了一个新的博客文章用幻灯片平台Containing the presentation I did for PyData Montreal.

## 小号Hort intro to Python extension objects in C/C++

`// Python object that backs torch.autograd.Variable struct THPVariable { PyObject_HEAD torch::autograd::Variable cdata; PyObject* backward_hooks; };`

## Zero-copy PyTorch Tensor to Numpy and vice-versa

PyTorch有它自己的张量表示，该解耦对外交涉PyTorch内部表示。然而，因为它是很常见的，尤其是当数据亚洲金博宝从多种来源，装载有numpy的阵列无处不在，所以我们真的需要作出与NumPy和PyTorch张量之间的转换。出于这个原因，PyTorch提供了两个方法叫`from_numpy（）`and`numpy的（）`，that converts a Numpy array to a PyTorch array and vice-versa, respectively. If we look the code that is being called to convert a Numpy array into a PyTorch tensor, we can get more insights on the PyTorch’s internal representation:

`at::Tensor tensor_from_numpy(PyObject* obj) { if (!PyArray_Check(obj)) { throw TypeError("expected np.ndarray (got %s)", Py_TYPE(obj)->tp_name); } auto array = (PyArrayObject*)obj; int ndim = PyArray_NDIM(array); auto sizes = to_aten_shape(ndim, PyArray_DIMS(array)); auto strides = to_aten_shape(ndim, PyArray_STRIDES(array)); // NumPy strides use bytes. Torch strides use element counts. auto element_size_in_bytes = PyArray_ITEMSIZE(array); for (auto& stride : strides) { stride /= element_size_in_bytes; } // (...) - omitted for brevity void* data_ptr = PyArray_DATA(array); auto& type = CPU(dtype_to_aten(PyArray_TYPE(array))); Py_INCREF(obj); return type.tensorFromBlob(data_ptr, sizes, strides, [obj](void* data) { AutoGIL gil; Py_DECREF(obj); }); }`

（代码从tensor_numpy.cpp

`tensorFromBlob（）`method will create a new Tensor, but only after creating a new “Storage” for this Tensor. The storage is where the actual data pointer will be stored (and not in the Tensor structure itself). This takes us to the next section about张量存储s

## 张量存储

As we saw in the previous code from`tensor_from_numpy()`，有一个呼叫`tensorFromBlob（）`将从原始数据blob创建一个张量。这最后一个功能将调用另外一个函数storageFromBlob（），这将反过来，对于这个数据根据其类型创建存储。在CPU浮点类型的情况下，它会返回一个新的`CPUFloatStorage`实例。

`typedef结构THStorage {真实*数据;ptrdiff_t的大小;INT引用计数;焦标志;THAllocator *分配器;无效* allocatorContext;STRUCT THStorage *图;} THStorage;`

（代码从THStorage.h

As you can see, the`THStorage`具有指向原始数据，它的尺寸，标志和也是一个有趣的领域被称为`allocator`我们马上要讨论的。同样重要的是要注意，关于如何解释里面的数据没有元数据`THStorage`，这是由于这样的事实，存储是“哑”关于它的内容，它是张量的责任要懂得“视图”或解释这个数据。

`>>> tensor_a = torch.ones（（3,3））>>> tensor_b = tensor_a.view（9）>>> tensor_a.storage（）。DATA_PTR（）== tensor_b.storage（）。DATA_PTR（）真`

`typedef结构THAllocator {无效*（* malloc的）（无效*，ptrdiff_t的）;无效*（* realloc的）（无效*，无效*，ptrdiff_t的）;空隙（*免费）（无效*，无效*）;} THAllocator;`

（代码从THAllocator.h

`static void *THCudaHostAllocator_malloc(void* ctx, ptrdiff_t size) { void* ptr; if (size < 0) THError("Invalid memory size: %ld", size); if (size == 0) return NULL; THCudaCheck(cudaMallocHost(&ptr, size)); return ptr; }`

（代码从THCAllocator.c

• TH=ŤorcH
• THC=ŤorcHC乌达
• THC小号=ŤorcHC乌达小号parse
• THCUNN=ŤorcHCUDAñeuralñetwork
• THD=ŤorcHdistributed
• THññ=ŤorcHñeuralñetwork
• THS=ŤorcH 2 Sparse

ŤHis convention is also present in the function/class names and other objects, so it is important to always keep these patterns in mind. While you can find CPU allocators in the TH code, you’ll find CUDA allocators in the THC code.

`typedef结构THTensor {*的int64_t大小;*的int64_t步幅;INT n标注;THStorage *存储;ptrdiff_t的storageOffset;INT引用计数;焦标志;} THTensor;`

(Code fromTHTensor。H

And as you can see, the main`THTensor`结构保持的尺寸/步幅/尺寸/偏移/等以及存储（`THStorage`），用于张量数据。

ñow, once we have requirements such as multi-processing where we want to share tensor data among multiple different processes, we need a shared memory approach to solve it, otherwise, every time another process needs a tensor or even when you want to implementHogwildtraining procedure where all different processes will write to the same memory region (where the parameters are), you’ll need to make copies between processes, and this is very inefficient. Therefore we’ll discuss in the next section a special kind of storage for Shared Memory.

## 小号Hared Memory

PyTorch周围提供了Python的包装`多`module and can be imported from`torch.multiprocessing`。他们在各地官方Python多这个包装实施的变化做是为了确保每次张量放在一个队列或共享与另一个进程，PyTorch将确保只对共享内存的句柄将被共享，而不是亚洲金博宝张量的新的完整副本。

`static THStorage* THPStorage_(newFilenameStorage)(ptrdiff_t size) { int flags = TH_ALLOCATOR_MAPPED_SHAREDMEM | TH_ALLOCATOR_MAPPED_EXCLUSIVE; std::string handle = THPStorage_(__newHandle)(); auto ctx = libshm_context_new(NULL, handle.c_str(), flags); return THStorage_(newWithAllocator)(size, &THManagedSharedAllocator, (void*)ctx); }`

(Code from小号torageSharing.cpp

`>>>进口炬>>> tensor_a = torch.ones（（5,5））>>> tensor_a 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [尺寸5×5的torch.FloatTensor] >>> tensor_a.is_shared（）假>>> tensor_a = tensor_a.share_memory_（）>>> tensor_a.is_shared（）真>>> tensor_a_storage = tensor_a.storage（）>>> tensor_a_storage。_share_filename_（）（b '/ var / tmp中/ tmp.0.yowqlr'，b '/ torch_31258_1218748506'，25）`

In this code, executed in the进程A我们创建充满的人的5×5的一个新的张量。之后，我们让共享和打印与Unix域套接字地址元组以及手柄。现在，我们可以从另一个访问该存储区域进程B如下所示：

`>>> import torch >>> tensor_a = torch.Tensor() >>> tuple_info = (b'/var/tmp/tmp.0.yowqlr', b'/torch_31258_1218748506', 25) >>> storage = torch.Storage._new_shared_filename(*tuple_info) >>> tensor_a = torch.Tensor(storage).view((5, 5)) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [torch.FloatTensor of size 5x5]`

## DLPack：对于深度学习一个希望构架巴贝尔

ŤHis will certainly help to overcome the “island model” that we have today between tensor representations in MXNet, PyTorch, etc, and will allow developers to mix framework operations between frameworks and all the benefits that a standardization can bring to the frameworks.

DLPack的核心操作系统被称为结构非常简单亚洲金博宝`DLTensor`，如下所示：

`/ *！* \简短的纯C张量的对象，不管理内存。* / typedef结构{/ *！* \介绍不透明数据指针指向分配的数据。*这将在CUDA的OpenCL设备指针或cl_mem手柄。*这个指针始终对齐到256个字节的CUDA。* / void *的数据;/ *！\介绍张量* / DLContext CTX的设备上下文/ *！ \brief Number of dimensions */ int ndim; /*! \brief The data type of the pointer*/ DLDataType dtype; /*! \brief The shape of the tensor */ int64_t* shape; /*! * \brief strides of the tensor, * can be NULL, indicating tensor is compact. */ int64_t* strides; /*! \brief The offset in bytes to the beginning pointer to data */ uint64_t byte_offset; } DLTensor;`

（代码从dlpack.h

As you can see, there is a data pointer for the raw data as well as shape/stride/offset/GPU vs CPU, and other metadata information about the data that the`DLTensor`指向。

In PyTorch, if you want to convert to or from a DLTensor format, you can find both C/C++ methods for doing that or even in Python you can do that as shown below:

`import torch from torch.utils import dlpack t = torch.ones((5, 5)) dl = dlpack.to_dlpack(t)`

ŤHis Python function will call the`toDLPack`function from ATen, shown below:

`DLManagedTensor* toDLPack(const Tensor& src) { ATenDLMTensor * atDLMTensor(new ATenDLMTensor); atDLMTensor->handle = src; atDLMTensor->tensor.manager_ctx = atDLMTensor; atDLMTensor->tensor.deleter = &deleter; atDLMTensor->tensor.dl_tensor.data = src.data_ptr(); int64_t device_id = 0; if (src.type().is_cuda()) { device_id = src.get_device(); } atDLMTensor->tensor.dl_tensor.ctx = getDLContext(src.type(), device_id); atDLMTensor->tensor.dl_tensor.ndim = src.dim(); atDLMTensor->tensor.dl_tensor.dtype = getDLDataType(src.type()); atDLMTensor->tensor.dl_tensor.shape = const_cast(src.sizes().data()); atDLMTensor->tensor.dl_tensor.strides = const_cast(src.strides().data()); atDLMTensor->tensor.dl_tensor.byte_offset = 0; return &(atDLMTensor->tensor); }`

As you can see, it’s a pretty simple conversion, casting the metadata from the PyTorch format to the DLPack format and assigning a pointer to the internal Tensor data representation.

ŤHat’s it, I hope you liked this long post !

– Christian S. Perone

Cite this article as: Christian S. Perone, "PyTorch – Internal Architecture Tour," in亚洲金博宝未知领域，2018年12月3日，//www.cpetem.com/2018/03/pytorch-internal-architecture-tour/

## 13个想法“PyTorch - 内部建筑之旅”

1. 托马斯 says:

伟大的职位！亚洲金博宝非常有趣的，看看Pytorch的细节，以及知道它是良好的实现。

2. 匿名 says:

这太棒了！

3. 胡志明市 says:

真棒写了！这激励我想在PyThor的代码库看多。

4. 匿名 says:

Great post, thanks!

5. 匿名 says:

好贴！不过，我想你最好添加源代码版本，因为底层后端正在迅速发生变化，有些链接已经断开。

6. H says:

喜基督徒，感谢上pytorch的内幕细节。

我有一个问题，从pytorch到numpy的转换，并希望你能帮助我了解发生了什么，以及如何解决它。

简单地说，我转换数组pytorch，执行一个过程，然后转换回numpy的用于使用OpenCV的后续处理。

例：
torch_array = torch.from_numpy（numpy_array）＃小于1毫秒
做torch_array＃处理不到1毫秒施加GPU @ 99％
numpy_array = np.array（torch_array）＃大于200毫秒

GPU = nvidia on jetson TX1 platform
火炬= 0.4.0

关于^ h

1. 弗拉季斯拉夫·Kurenkov says:

您应该使用.numpy（）。

torch_array = torch.from_numpy（numpy_array）
...。
...。
numpy_array = torch_array.numpy()

7. 埃里克 says:

Well Written! Now I know more about pytorch internals, how it represent/store tensors

8. dave Kielpinski says:

I definitely appreciate this blog post. Thank you!

9. cocoaaa says:

谢谢你为一个伟大的帖子！这真的帮助我理解张量存储是如何工作的。现在我可以检查两个张量共享相同的存储（由`t0.storage（）。date_ptr（）== t1.storage（）。DATA_PTR（）`），但我怎么能检查是否numpy的阵列的视图张量？有没有办法做到PyTorch和numpy的之间类似的检查？谢谢你的忠告提前！

1. 您可以使用：n_array .__ array_interface __ [“数据”]，然而，这仅仅是用于说明目的，因为比较原始指针不是一个很好的主意。亚洲金博宝

10. ŤHiago says:

伟大的职位，但它确实帮助了我很多理解Pytorch存储。

我的理解是，我可以从一个STL向量C ++ pytorch张量，并通过pybind暴露到Python无副本。

I wonder If I could expose a STL vector from C++ into Python and create a tensor from it without making copies, despite the fact thathttps://pytorch.org/docs/stable/tensors.htmlsays torch.tensor always copies data