188betiosapp

Update 28 Feb 2019:我添加了一个new blog post with a slide deck包括我做了PyData蒙特利尔的表现。

Today, at the PyTorch Developer Conference, the PyTorch team announced the plans and the release of the PyTorch 1.0 preview with many nice features such as a JIT for model graphs (with and without tracing) as well as theLibTorch中，PyTorch C ++ API中的所述一个most important release announcementmade today in my opinion.

Below you can see the final result:

Introduction

Simply put, the libtorch is a library version of the PyTorch. It contains the underlying foundation that is used by PyTorch, such as the阿托恩（张量库），它包含了所有的张量操作和方法。还Libtorch包含autograd, which is the component that adds the automatic differentiation to the ATen tensors.

A word of caution for those who are starting now is to be careful with the use of the tensors that can be created both from ATen and autograd,不要混合使用它们, the ATen will return the plain tensors (when you create them using theatnamespace) while the autograd functions (from thetorch命名空间）将返回Variable中，加入其自动分化机制。

Libtorch可以从下载Pytorch website它仅作为一会儿预览。您还可以找到文档this site, which is mostly a Doxygen rendered documentation. I found the library pretty stable, and it makes sense because it is actually exposing the stable foundations of PyTorch, however, there are some issues with headers and some minor problems concerning the library organization that you might find while starting working with it (that will hopefully be fixed soon).

For NodeJS, I’ll use theNative Abstractions库（楠），其是最值得推荐的库（实际上基本是仅标头库）来创建的NodeJS C ++的附加组件和cmake-js, because libtorch already provide the cmake files that make our building process much easier. However, the focus here will be on the C++ code and not on the building process.

It starts with the development process and tracing being done in PyTorch (Python domain) and then the loading and inference on the C++ domain (in our case in NodeJS add-on).

Wrapping the Tensor

#ifndef TENSOR_H #define TENSOR_H #include  #include  namespace torchjs { class Tensor : public Nan::ObjectWrap { public: static NAN_MODULE_INIT(Init); void setTensor(at::Tensor tensor) { this->mTensor = tensor; } torch::Tensor getTensor() { return this->mTensor; } static v8::Local NewInstance(); private: explicit Tensor(); ~Tensor(); static NAN_METHOD(New); static NAN_METHOD(toString); static Nan::Persistent constructor; private: torch::Tensor mTensor; }; } // namespace torchjs #endif

As you can see, most of the code for the definition of our Tensor class is just boilerplate. The key point here is that thetorchjs::Tensorwill wrap atorch::Tensor我们增加了两个特殊的公共方法（setTensorandgetTensor) to set and get this internal torch tensor.

I won’t show all the implementation details because most parts of it are NodeJS boilerplate code to construct the object, etc. I’ll focus on the parts that touch the libtorch API, like in the code below where we are creating a small textual representation of the tensor to show on JavaScript (toStringmethod):

NAN_METHOD(Tensor::toString) { Tensor* obj = ObjectWrap::Unwrap(info.Holder()); std::stringstream ss; at::IntList sizes = obj->mTensor.sizes(); ss << "Tensor[Type=" << obj->mTensor.type() << ", "; ss << "Size=" << sizes << std::endl; info.GetReturnValue().Set(Nan::New(ss.str()).ToLocalChecked()); }

What we are doing in the code above, is just getting the internal tensor object from the wrapped object byunwrapping它。之后，我们建立与张力的大小（各维的尺寸）和它的类型（浮法等）的字符串表示。

Wrapping Tensor-creation operations

NAN_METHOD () {/ / argum的完整性检查ents if (info.Length() < 2) return Nan::ThrowError(Nan::New("Wrong number of arguments").ToLocalChecked()); if (!info[0]->IsArray() || !info[1]->IsBoolean()) return Nan::ThrowError(Nan::New("Wrong argument types").ToLocalChecked()); // Retrieving parameters (require_grad and tensor shape) const bool require_grad = info[1]->BooleanValue(); const v8::Local array = info[0].As(); const uint32_t length = array->Length(); // Convert from v8::Array to std::vector std::vector dims; for(int i=0; i v; int d = array->Get(i)->NumberValue(); dims.push_back(d); } // Call the libtorch and create a new torchjs::Tensor object // wrapping the new torch::Tensor that was created by torch::ones at::Tensor v = torch::ones(dims, torch::requires_grad(require_grad)); auto newinst = Tensor::NewInstance(); Tensor* obj = Nan::ObjectWrap::Unwrap(newinst); obj->setTensor(v); info.GetReturnValue().Set(newinst); }

So, let’s go through this code. We are first checking the arguments of the function. For this function, we’re expecting a tuple (a JavaScript array) for the tensor shape and a boolean indicating if we want to compute gradients or not for this tensor node. After that, we’re converting the parameters from the V8 JavaScript types into native C++ types. Soon as we have the required parameters, we then call the火炬::者function from the libtorch, this function will create a new tensor where we use atorchjs::Tensor类，我们创建较早把它包起来。

And that’s it, we just exposed one torch operation that can be used as native JavaScript operation.

Intermezzo for the PyTorch JIT

Note that if you have branching decisions on your code that depends on external factors or data, tracing won’t work as you expect because it will record that particular execution of the graph, hence the alternative option to provide the script. However, in most of the cases, the tracing is what we need.

@ torch.jit.script DEF happy_function_script（X）：RET = torch.rand（0）如果真== TRUE：RET = torch.rand（1）否则：RET = torch.rand（2）返回RET DEF happy_function_trace（X）：t RET = torch.rand（0）如果真== TRUE：RET = torch.rand（1）否则：RET = torch.rand（2）返回RET traced_fn = torch.jit.trace（happy_function_trace，（torch.tensor（0），），check_trace = FALSE）

In the code above, we’re providing two functions, one is using the@torch.jit.scriptdecorator, and it is the scripting way to create a Torch Script, while the second function is being used by the tracing functiontorch.jit.trace。Not that I intentionally added a “True == True” decision on the functions (which will always be true).

Now, if we inspect the IR generated by these two different approaches, we’ll clearly see the difference between the tracing and scripting approaches:

# 1) Graph from the scripting approach graph(%x : Dynamic) { %16 : int = prim::Constant[value=2]() %10 : int = prim::Constant[value=1]() %7 : int = prim::Constant[value=1]() %8 : int = prim::Constant[value=1]() %9 : int = aten::eq(%7, %8) %ret : Dynamic = prim::If(%9) block0() { %11 : int[] = prim::ListConstruct(%10) %12 : int = prim::Constant[value=6]() %13 : int = prim::Constant[value=0]() %14 : int[] = prim::Constant[value=[0, -1]]() %ret.2 : Dynamic = aten::rand(%11, %12, %13, %14) -> (%ret.2) } block1() { %17 : int[] = prim::ListConstruct(%16) %18 : int = prim::Constant[value=6]() %19 : int = prim::Constant[value=0]() %20 : int[] = prim::Constant[value=[0, -1]]() %ret.3 : Dynamic = aten::rand(%17, %18, %19, %20) -> (%ret.3) } return (%ret); } # 2) Graph from the tracing approach graph(%0 : Long()) { %7 : int = prim::Constant[value=1]() %8 : int[] = prim::ListConstruct(%7) %9 : int = prim::Constant[value=6]() %10 : int = prim::Constant[value=0]() %11 : int[] = prim::Constant[value=[0, -1]]() %12 : Float(1) = aten::rand(%8, %9, %10, %11) return (%12); }

PyTorch JIT有很多被用来做循环展开改造通行证，死代码消除等。您可以找到这些的这里传递。Not that conversion to other formats such as ONNX can be implemented as a pass on top of this intermediate representation (IR), which is quite convenient.

Tracing the ResNet

Now, before implementing the Script Module in NodeJS, let’s first trace a ResNet network using PyTorch (using just Python):

traced_net = torch.jit.trace（torchvision.models.resnet18（），torch.rand（1，3，224，224））traced_net.save（ “resnet18_trace.pt”）

As you can see from the code above, we just have to provide a tensor example (in this case a batch of a single image with 3 channels and size 224×224. After that we just save the traced network into a file calledresnet18_trace.pt

包裹脚本模块

/ /类构造函数ScriptModule:: ScriptModule(有限公司nst std::string filename) { // Load the traced network from the file this->mModule = torch::jit::load(filename); } // JavaScript object creation NAN_METHOD(ScriptModule::New) { if (info.IsConstructCall()) { // Get the filename parameter v8::String::Utf8Value param_filename(info[0]->ToString()); const std::string filename = std::string(*param_filename); // Create a new script module using that file name ScriptModule *obj = new ScriptModule(filename); obj->Wrap(info.This()); info.GetReturnValue().Set(info.This()); } else { v8::Local cons = Nan::New(constructor); info.GetReturnValue().Set(Nan::NewInstance(cons).ToLocalChecked()); } }

NAN_METHOD（ScriptModule ::向前）{ScriptModule * script_module = ObjectWrap ::展开（info.Holder（））;楠:: MaybeLocal 说不定=楠::要（信息[0]）;张量*张量=楠:: ObjectWrap ::展开<张量>（maybe.ToLocalChecked（））;火炬::张量torch_tensor = tensor-> getTensor（）;火炬::张量输出= script_module-> mModule->向前（{torch_tensor}）toTensor（）;自动newinst中=张量:: NewInstance方法（）;张量* OBJ =楠:: ObjectWrap ::展开<张量>（newinst中）;obj-> setTensor（输出）;。info.GetReturnValue（）设置（newinst中）;}

As you can see, in this code, we just receive a tensor as an argument, we get the internaltorch::Tensorfrom it and then call the forward method from the script module, we wrap the output on a newtorchjs::Tensorand then return it.

And that’s it, we’re ready to use our built module in native NodeJS as in the example below:

var torchjs = require("./build/Release/torchjs"); var script_module = new torchjs.ScriptModule("resnet18_trace.pt"); var data = torchjs.ones([1, 3, 224, 224], false); var output = script_module.forward(data);

I hope you enjoyed ! Libtorch opens the door for the tight integration of PyTorch in many different languages and frameworks, which is quite exciting and a huge step towards the direction of production deployment code.

– Christian S. Perone

Hacking into Python objects internals

[enlighter lang=”c”]
typedef struct _object {
}PyObject;
[/enlighter]

and the PyObject_HEAD macro is defined as:

[enlighter lang=”c”]
Py_ssize_t ob_refcnt; \
struct _typeobject *ob_type;
[/enlighter]

… with two fields (忘了_PyObject_HEAD_EXTRA, it’s only used for a tracing debug feature）呼吁ob_refcntandob_type, representing the reference counting for the object and the type of the object. I know you can usesys.getrefcountto get the reference counting of an object, but hacking the object memory using ctypes is by far more powerful, since you can get the contents of any field of the object (in cases where you don’t have a native API for that), I’ll show more examples later, but lets focus on the reference counting field of the object.

Getting the reference count (ob_refcnt)

[enlighter lang=”c”]
static PyObject *
builtin_id(PyObject *self, PyObject *v)
{
return PyLong_FromVoidPtr(v);
}
[/enlighter]

… the functionPyLong_FromVoidPtr返回从一个空指针一个Python长物体。因此，在CPython的，这个值是在存储器中的对象的地址如下所示：

[enlighter lang=”python”]
>>> value = 666
>>> hex(id(value))
‘0x8998e50’ # memory address of the ‘value’ object
[/enlighter]

Now that we have the memory address of the object, we can use the Python ctypes module to get the reference counting by accessing the attributeob_refcnt, here is the code needed to do that:

[enlighter lang=”python”]
>>> value = 666
>>>
>>> ob_refcnt
c_long(1)
[/enlighter]

What I’m doing here is getting the integer value from theob_refcntattribute of thePyObject在记忆中。让我们添加的对象“值”我们创建了一个新的参考，然后再检查引用计数：

[enlighter lang=”python”]
>>> value_ref = value
>>> id(value_ref) == id(value)
True
>>> ob_refcnt
c_long（2）
[/enlighter]

Note that the reference counting was increased by 1 due to the new reference variable called ‘value_ref’.

Interned strings state (ob_sstate)

Now, getting the reference count wasn’t even funny, we already had thesys.getrefcountAPI为，但什么字符串的实习状态? In order to avoid the creation of different allocations for the same string (and to speed comparisons), Python uses a dictionary that works like a “cache” for strings, this dictionary is defined inObjects/stringobject.c:

[enlighter lang=”c”]

/* This dictionary holds all interned strings. Note that references to
strings in this dictionary are *not* counted in the string’s ob_refcnt.
When the interned string reaches a refcnt of 0 the string deallocation
function will delete the reference from this dictionary.

Another way to look at this is that to say that the actual reference
count of a string is: s->ob_refcnt + (s->ob_sstate?2:0)
*/
static PyObject *interned;
[/enlighter]

I also copied here the comment about the dictionary, because is interesting to note that the strings in the dictionary aren’t counted in the string’sob_refcnt

So, the interned state of a string object is hold in the attributeob_sstateof the string object, let’s see the definition of the Python string object:

[enlighter lang=”c”]

typedef struct {
long ob_shash;
int ob_sstate;
char ob_sval[1];

/* Invariants:
* ob_sval包含关于“ob_size + 1”的元素的空间。
* ob_sval[ob_size] == 0.
* ob_shash is the hash of the string or -1 if not computed yet.
* ob_sstate != 0 iff the string object is in stringobject.c’s
* ‘interned’ dictionary; in this case the two references
* from ‘interned’ to this object are *not counted* in ob_refcnt.
*/
} PyStringObject;
[/enlighter]

As you can note, strings objects inherit from the PyObject_VAR_HEAD macro, which defines another header attribute, let’s see the definition to get the complete idea of the structure:

[enlighter lang=”c”]
Py_ssize_t ob_size; /* Number of items in variable part */
[/enlighter]

PyObject_VAR_HEAD宏将另一个叫场ob_size, which is the number of items on the variable part of the Python object (i.e. the number of items on a list object). So, before getting to theob_sstatefield, we need to shift our offset to skip the fieldsob_refcnt（长）,ob_type（无效*）(fromPyObject_HEAD), the fieldob_size (long)(fromPyObject_VAR_HEAD) and the fieldob_shash(long)来自PyStringObject。具体而言，我们需要跳过此偏移（3个大小字段long与大小和一个场void*) of bytes:

[enlighter lang=”python”]
>>> ob_sstate_offset = ctypes.sizeof(ctypes.c_long)*3 + ctypes.sizeof(ctypes.c_voidp)
>>> ob_sstate_offset
16
[/enlighter]

Now, let’s prepare two cases, one that we know that isn’t interned and another that is surely interned, then we’ll force the interned state of the other non-interned string to check the result of theob_sstate属性：

[enlighter lang=”python”]
>>> a = “lero”
>>> b = “”.join([“l”, “e”, “r”, “o”])
c_long(1)
c_long(0)
c_long(1)
[/enlighter]

Note that the interned state for the object “a” is 1 and for the object “b” is 0. After forcing the interned state of the variable “b”, we can see that the fieldob_sstatehas changed to 1.

Changing internal states (evil mode)

Now, let’s suppose we want to change some internal state of a Python object through the interpreter. Let’s try to change the value of an int object. Int objects are defined inInclude/intobject.h:

[enlighter lang=”c”]
typedef struct {
long ob_ival;
}PyIntObject;
[/enlighter]

As you can see, the internal value of an int is stored in the fieldob_ival，去改变它，我们只需要跳过ob_refcnt(long)and theob_type（无效*）来自PyObject_HEAD:

[enlighter lang=”python”]
>>> value = 666
>>> ob_ival_offset = ctypes.sizeof(ctypes.c_long) + ctypes.sizeof(ctypes.c_voidp)
>>> ob_ival
c_long（666）
>>> ob_ival。value = 8
>>> value
8
[/enlighter]

And that is it, we have changed the value of the int value directly in the memory.

I hope you liked it, you can play with lots of other Python objects like lists and dicts, note that this method is just intended to show how the Python objects are structured in the memory and how you can change them using the native API, but obviously, you’re not supposed to use this to change the value of ints lol.

Update 11/29/11:你不应该做的事情对你的产品代码或类似的东西这样的事情，在这个岗位我做关于拱的细节，如原语等的尺寸懒惰的假设Be warned

C ++ 11用户定义的文字和一些结构中

I was taking a look at the proposalN2765(user-defined literals）已经在的发展中实现快照GCC 4.7and I was thinking in how user-defined literals can be used to create some interesting and sometimes strange constructions.

简介用户定义的文字

C++03 has some literals, like the “f” in “12.2f” that converts the double value to float. The problem is that these literals aren’t very flexible since they’re pretty fixed, so you can’t change them or create new ones. To overcome this situation, C++11 introduced the concept of“用户定义的文字”that will give to the user, the ability to create new custom literal modifiers. The new user-defined literals can create either built-in types (e.g. int) or user-define types (e.g. classes), and the fact that they could be very useful is an effect that they can return objects instead of only primitives.

(光线影业lang =“c++”)

[/enlighter]

......在一个字符串的情况下。该输出类型is anything you want (object or primitive), the “_suffix” is the name of the literal modifier, isn’t required to use the underline in front of it, but if you don’t use you’ll get some warnings telling you that suffixes not preceded by the underline are reserved for future standardization.

例子

Kmh to Mph converter

[enlighter lang=”C++” escaped=”true” lines=”1000″]
// stupid converter class

{
public:
Converter(double kmph) : m_kmph(kmph) {};
~Converter() {};

double to_mph(void)
{返回米_kmph / 1.609344; }

private:
double m_kmph;
};

// user-defined literal
Converter operator “” kmph(long double kmph)
{返回转换器（KMPH）;}

int main(void)
{
std::cout << “Converter: ” << (80kmph).to_mph() << std::endl;
// note that I’m using parenthesis in order to
// be able to call the ‘to_mph’ method
return 0;
}
[/ccb]

Note that the literal for for numeric types should be eitherlong double(for floating point literals) orunsigned long long(for integral literals). There is no signed type, because a signed literal is parsed as an expression with a sign as unary prefix and the unsigned number part.

std::string literal

[enlighter lang=”C++” escaped=”true” lines=”1000″]

{ return std::string(p,n); }

int main(void)
{
std::cout << "convert me to a string"s.length() << std::endl; // here you don't need the parenthesis, note that the // c-string was automagically converted to std::string return 0; } [/ccb]

system() call

[enlighter lang=”C++” escaped=”true” lines=”1000″]
INT运营商“” EX（为const char * CMD，为size_t NUM_CHARS）
{ return system(cmd); }

int main(void)
{
“LS -lah” EX;
return 0;
}
[/ccb]

别名和std ::地图

[enlighter lang=”C++” escaped=”true” lines=”1000″]
typedef std::map MyMap;
MyMap create_map()
{
MyMap m;
m[“lol”] = 7;
return m;
}
auto m = create_map();

int& operator “” m(const char *key, size_t length)
{返回米[key]; }

int main(void)
{
std::cout << "lol"m << std::endl; // 7 "lol"m = 2; std::cout << "lol"m << std::endl; // 2 return 0; } [/ccb]

参考

Wikipedia :: C++11 (User-defined literals)

Proposal N2765

C++0x :: Introduction to some amazing features

I’ve made those slides in March of this year to a training session, I was expecting to get time to update it to cover more features but I wasn’t able to do that yet, so I’m publishing it here for those who are interested in some of the new language features.

未来可以在RPython现在写

Following the最近的一篇文章arguing whyPyPy是Python的未来，我必须说，PyPy是不是Python的未来，是本。当我have testedit last time (PyPy-c 1.1.0) with Pyevolve into the optimization of a simple Sphere function, it was at least 2x slower than Unladen Swallow Q2, but in that time, PyPy was not able to JIT. Now, with this new release of PyPy and the JIT’ing support, the scenario has changed.

PyPy has evolved a lot (actually, you can see this演化here）， 一个nice work在GC系统上完成，节省了（相对于CPython的时候），每个对象的8个字节分配，这对于大量使用对象分配（GP系统是这方面的一个强有力的例子，因为当它们在目标实现的应用非常有趣亚洲金博宝取向的语言，各语法树节点是一个对象）。工作也正在进行以改善CPython的扩展的支持（用C / C ++），它们中的一个是有些复杂：利用RPyC的，到代理通过TCP远程调用CPython的;但对方似乎远远更有效，这是对创作CPyExt subsystem。By using CPyExt, all you need is to have your CPython API functions implemented in CPyExt, a lot of people is working on this right now and you can do it too, it’s a long road to have a good API coverage, but when you think about advantages, this road becomes small.

In order to benchmark CPython, Jython, CPython+Psyco, Unladen Swallow and PyPy, I’ve used theRastrigin functionoptimization (an example of that implementation is here in theExample 7的Pyevolve 0.6rc1）：

$f(x) = 10n + \sum_{i=1}^{n}{x_{i}^{2}}$$-10\cos(2\pi x_{i})$

Here are the information about versions used in this benchmark:

• 操作系统Ubuntu Linux操作系统的10.04 LTS（清晰）
• CPython 2.6.5 (Apr 16 2010)
• Jython 2.5.1 (Sun JVM 1.6.0_20, Server Mode)
• CPython 2.6.5 + PsycoV2 trunk (r74587)
• CPython的2.6.5 + Psyco的1.6.0（默认清醒封装）
• PyPy干线（r74537）

No暖身was performed in JVM or in PyPy. PyPy translator was executed using the “-Ojit” option in order to get the JIT version of the Python interpreter. The JVM was executed using the server mode, I’ve tested the client and server mode for Sun JVM and IcedTea6, the best results were observed from the server mode using Sun JVM, however when I’ve compared the client mode of IcedTea6 with the client mode of Sun JVM, the best results observed were from IcedTea6 (the same as using server mode in IcedTea6). Unladen Swallow was compiled using the projectwiki instructions建立优化的二进制文件。

PyPy is not only the future of Python, but is becoming the present right now. PyPy will not bring us only an implementation of Python in Python (which in itself is the valuable result of努力），但也将带来性能回（其中有许多在开始怀疑，不知道它怎么可能为Python的Python中的实现比用C实现更快吗？这里是哪里的翻译和JIT魔术进入）。到时候是Python解释器可以为整个写在高级语言（实际上几乎是相同的语言，这是很奇怪），Python社区可以把他们的重点放在提高语言本身，而不是花时间解决的复杂度较低高级语言，这是不是那些努力的伟大意义呢？

By the way, just to note, PyPy isn’t only a translator for the Python interpreter written in RPython, it’s a translator of RPython, what means that PyPy isn’t only the future of Python, but probably, the future of many interpreters.

A method for JIT’ing algorithms and data structures with LLVM

Hello folks, I always post about Python and EvoComp (Pyevolve), but this time it’s about C,LLVM, search algorithms and data structures. This post describes the efforts to implement an idea: to JIT (动词) algorithms and the data structures used by them, together.

AVL Tree Intro

Here is a short intro toAVL Treesfrom Wikipedia:

In computer science, an AVL tree is a self-balancing binary search tree, and it is the first such data structure to be invented. In an AVL tree, the heights of the two child subtrees of any node differ by at most one; therefore, it is also said to be height-balanced. Lookup, insertion, and deletion all take O(logn）时间的平均和最坏情况下，两个，其中n是节点在树中的号码之前，该操作。插入和缺失可能需要树由一个或多个树旋转来重新平衡。