## Introduction

This is the post of 2020, so新年快乐给你所有！

I’m a huge fan of LLVM since 11 years ago when I started playing with it toJIT data structures如AVLS，然后稍后JIT限制AST树JIT native code from TensorFlow graphs。此后，LLVM演变成最重要的编译器框架的生态系统之一，是由很多重要的开源项目采用了时下。

The image below gives an overview of Gandiva:

## 建设有Gandiva简单的表达

### Using Gandiva Python bindings to JIT and expression

Before building our parser and expression builder for expressions, let’s manually build a simple expression with Gandiva. First, we will create a simple Pandas DataFrame with numbers from 0.0 to 9.0:

import pandas as pd import pyarrow as pa import pyarrow.gandiva as gandiva # Create a simple Pandas DataFrame df = pd.DataFrame({"x": [1.0 * i for i in range(10)]}) table = pa.Table.from_pandas(df) schema = pa.Schema.from_pandas(df)

（X> 2.0）和（x <6.0）

助洗剂= gandiva.TreeExprBuilder（）＃参考列的 “x” node_x = builder.make_field（table.schema.field（ “×”））＃提出两个文字：2.0和6.0 2 = builder.make_literal（2.0，pa.float64（））6 = builder.make_literal（6.0，pa.float64（））＃为 “X> 2.0” gt_five_node = builder.make_function一个函数（ “GREATER_THAN”，[node_x，两]，pa.bool_（））＃创建 “×<6.0” 的函数lt_ten_node = builder.make_function（ “LESS_THAN”，[node_x，六]，pa.bool_（））＃创建一个 “和” 节点，为“（X> 2.0）和（x <6.0）” and_node = builder.make_and（[gt_five_node，lt_ten_node]）＃使表达的条件，并创建一个过滤条件= builder.make_condition（and_node）filter_ = gandiva.make_filter（table.schema，条件）

This code now looks a little more complex but it is easy to understand. We are basically creating the nodes of a tree that will represent the expression we showed earlier. Here is a graphical representation of what it looks like:

### Inspecting the generated LLVM IR

auto field_x = field("x", float32()); auto schema = arrow::schema({field_x}); auto node_x = TreeExprBuilder::MakeField(field_x); auto two = TreeExprBuilder::MakeLiteral((float_t)2.0); auto six = TreeExprBuilder::MakeLiteral((float_t)6.0); auto gt_five_node = TreeExprBuilder::MakeFunction("greater_than", {node_x, two}, arrow::boolean()); auto lt_ten_node = TreeExprBuilder::MakeFunction("less_than", {node_x, six}, arrow::boolean()); auto and_node = TreeExprBuilder::MakeAnd({gt_five_node, lt_ten_node}); auto condition = TreeExprBuilder::MakeCondition(and_node); std::shared_ptr filter; auto status = Filter::Make(schema, condition, TestConfiguration(), &filter);

The code above is the same as the Python code, but using the C++ Gandiva API. Now that we built the tree in C++, we can get the LLVM Module and dump the IR code for it. The generated IR is full of boilerplate code and the JIT’ed functions from the Gandiva registry, however the important parts are show below:

; Function Attrs: alwaysinline norecurse nounwind readnone ssp uwtable define internal zeroext i1 @less_than_float32_float32(float, float) local_unnamed_addr #0 { %3 = fcmp olt float %0, %1 ret i1 %3 } ; Function Attrs: alwaysinline norecurse nounwind readnone ssp uwtable define internal zeroext i1 @greater_than_float32_float32(float, float) local_unnamed_addr #0 { %3 = fcmp ogt float %0, %1 ret i1 %3 } (...) %x = load float, float* %11 %greater_than_float32_float32 = call i1 @greater_than_float32_float32(float %x, float 2.000000e+00) (...) %x11 = load float, float* %15 %less_than_float32_float32 = call i1 @less_than_float32_float32(float %x11, float 6.000000e+00)

As you can see, on the IR we can see the call to the functionsless_than_float32_float_32andgreater_than_float32_float32这是（在这种情况下很简单的）Gandiva功能做浮动比亚洲金博宝较。通过查看函数名前缀注意函数的专业化。

%x.us = load float, float* %10, align 4 %11 = fcmp ogt float %x.us, 2.000000e+00 %12 = fcmp olt float %x.us, 6.000000e+00 %not.or.cond = and i1 %12, %11

## 建设有Gandiva一个熊猫过滤器表达式JIT

Now we want to be able to implement something similar as the Pandas’DataFrame.query()function using Gandiva. The first problem we will face is that we need to parse a string such as（X> 2.0）和（x <6.0），以后我们将不得不建立使用从Gandiva树构建的Gandiva表达式树，然后评估上箭头的数据表达。

Now, instead of implementing a full parsing of the expression string, I’ll use the Python AST module to parse valid Python code and build an Abstract Syntax Tree (AST) of that expression, that I’ll be later using to emit the Gandiva/LLVM nodes.

class LLVMGandivaVisitor(ast.NodeVisitor): def __init__(self, df_table): self.table = df_table self.builder = gandiva.TreeExprBuilder() self.columns = {f.name: self.builder.make_field(f) for f in self.table.schema} self.compare_ops = { "Gt": "greater_than", "Lt": "less_than", } self.bin_ops = { "BitAnd": self.builder.make_and, "BitOr": self.builder.make_or, } def visit_Module(self, node): return self.visit(node.body[0]) def visit_BinOp(self, node): left = self.visit(node.left) right = self.visit(node.right) op_name = node.op.__class__.__name__ gandiva_bin_op = self.bin_ops[op_name] return gandiva_bin_op([left, right]) def visit_Compare(self, node): op = node.ops[0] op_name = op.__class__.__name__ gandiva_comp_op = self.compare_ops[op_name] comparators = self.visit(node.comparators[0]) left = self.visit(node.left) return self.builder.make_function(gandiva_comp_op, [left, comparators], pa.bool_()) def visit_Num(self, node): return self.builder.make_literal(node.n, pa.float64()) def visit_Expr(self, node): return self.visit(node.value) def visit_Name(self, node): return self.columns[node.id] def generic_visit(self, node): return node def evaluate_filter(self, llvm_mod): condition = self.builder.make_condition(llvm_mod) filter_ = gandiva.make_filter(self.table.schema, condition) result = filter_.evaluate(self.table.to_batches()[0], pa.default_memory_pool()) arr = result.to_array() pd_result = arr.to_numpy() return pd_result @staticmethod def gandiva_query(df, query): df_table = pa.Table.from_pandas(df) llvm_gandiva_visitor = LLVMGandivaVisitor(df_table) mod_f = ast.parse(query) llvm_mod = llvm_gandiva_visitor.visit(mod_f) results = llvm_gandiva_visitor.evaluate_filter(llvm_mod) return results

### 注册为熊猫扩展

@ pd.api.extensions.register_dataframe_accessor（ “gandiva”）类GandivaAcessor：高清__init __（自我，pandas_obj）：self.pandas_obj = pandas_obj高清查询（个体经营，查询）：返回LLVMGandivaVisitor.gandiva_query（self.pandas_obj，查询）

df = pd.DataFrame({"a": [1.0 * i for i in range(nsize)]}) results = df.gandiva.query("a > 10.0")

As we have registered a Pandas extension calledgandiva这是现在的大熊猫DataFrames的一等公民。

df = pd.DataFrame({"a": [1.0 * i for i in range(50000000)]}) df.gandiva.query("a < 4.0") # This will output: # array([0, 1, 2, 3], dtype=uint32)

That’s it ! I hope you liked the post as I enjoyed exploring Gandiva. It seems that we will probably have more and more tools coming up with Gandiva acceleration, specially for SQL parsing/projection/JITing. Gandiva is much more than what I just showed, but you can get started now to understand more of its architecture and how to build the expression trees.

– Christian S. Perone

## Listening to the neural network gradient norms during training

Training neural networks is often done by measuring many different metrics such as accuracy, loss, gradients, etc. This is most of the time done aggregating these metrics and plotting visualizations on TensorBoard.

There are, however, other senses that we can use to monitor the training of neural networks, such as声音。Sound is one of the perspectives that is currently very poorly explored in the training of neural networks. Human hearing can be very good a distinguishing very small perturbations in characteristics such as rhythm and pitch, even when these perturbations are very short in time or subtle.

### Training sound with Adam using LR 0.01

This is using Adam in the same setting as the SGD.

## Source code

For those who are interested, here is the entire source code I used to make the sound clips:

进口pyaudio进口numpy np波不进口rt torch import torch.nn as nn import torch.nn.functional as F import torch.optim as optim from torchvision import datasets, transforms class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.conv1 = nn.Conv2d(1, 20, 5, 1) self.conv2 = nn.Conv2d(20, 50, 5, 1) self.fc1 = nn.Linear(4*4*50, 500) self.fc2 = nn.Linear(500, 10) self.ordered_layers = [self.conv1, self.conv2, self.fc1, self.fc2] def forward(self, x): x = F.relu(self.conv1(x)) x = F.max_pool2d(x, 2, 2) x = F.relu(self.conv2(x)) x = F.max_pool2d(x, 2, 2) x = x.view(-1, 4*4*50) x = F.relu(self.fc1(x)) x = self.fc2(x) return F.log_softmax(x, dim=1) def open_stream(fs): p = pyaudio.PyAudio() stream = p.open(format=pyaudio.paFloat32, channels=1, rate=fs, output=True) return p, stream def generate_tone(fs, freq, duration): npsin = np.sin(2 * np.pi * np.arange(fs*duration) * freq / fs) samples = npsin.astype(np.float32) return 0.1 * samples def train(model, device, train_loader, optimizer, epoch): model.train() fs = 44100 duration = 0.01 f = 200.0 p, stream = open_stream(fs) frames = [] for batch_idx, (data, target) in enumerate(train_loader): data, target = data.to(device), target.to(device) optimizer.zero_grad() output = model(data) loss = F.nll_loss(output, target) loss.backward() norms = [] for layer in model.ordered_layers: norm_grad = layer.weight.grad.norm() norms.append(norm_grad) tone = f + ((norm_grad.numpy()) * 100.0) tone = tone.astype(np.float32) samples = generate_tone(fs, tone, duration) frames.append(samples) silence = np.zeros(samples.shape[0] * 2, dtype=np.float32) frames.append(silence) optimizer.step() # Just 200 steps per epoach if batch_idx == 200: break wf = wave.open("sgd_lr_1_0_bs256.wav", 'wb') wf.setnchannels(1) wf.setsampwidth(p.get_sample_size(pyaudio.paFloat32)) wf.setframerate(fs) wf.writeframes(b''.join(frames)) wf.close() stream.stop_stream() stream.close() p.terminate() def run_main(): device = torch.device("cpu") train_loader = torch.utils.data.DataLoader( datasets.MNIST('../data', train=True, download=True, transform=transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,)) ])), batch_size=256, shuffle=True) model = Net().to(device) optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.5) for epoch in range(1, 2): train(model, device, train_loader, optimizer, epoch) if __name__ == "__main__": run_main()

## Randomized prior functions in PyTorch

The main idea of the method is to have bootstrap to provide a non-parametric data perturbation together with randomized priors, which are nothing more than just random initialized networks.

$$Q _ {\ theta_k}（X）= F _ {\ theta_k}（X）+ P_K（x）的$$

The final model $$Q_{\theta_k}(x)$$ will be the k model of the ensemble that will fit the function $$f_{\theta_k}(x)$$ with an untrained prior $$p_k(x)$$.

Let’s go to the code. The first class is a simple MLP with 2 hidden layers and Glorot initialization :

类MLP（nn.Module）：DEF __init __（个体）：超级（）.__ INIT __（）self.l1 = nn.Linear（1，20）self.l2 = nn.Linear（20，20）self.l3 = NN。Linear(20, 1) nn.init.xavier_uniform_(self.l1.weight) nn.init.xavier_uniform_(self.l2.weight) nn.init.xavier_uniform_(self.l3.weight) def forward(self, inputs): x = self.l1(inputs) x = nn.functional.selu(x) x = self.l2(x) x = nn.functional.selu(x) x = self.l3(x) return x

Then later we define a class that will take the model and the prior to produce the final model result:

class ModelWithPrior(nn.Module): def __init__(self, base_model : nn.Module, prior_model : nn.Module, prior_scale : float = 1.0): super().__init__() self.base_model = base_model self.prior_model = prior_model self.prior_scale = prior_scale def forward(self, inputs): with torch.no_grad(): prior_out = self.prior_model(inputs) prior_out = prior_out.detach() model_out = self.base_model(inputs) return model_out + (self.prior_scale * prior_out)

And it’s basically that ! As you can see, it’s a very simple method, in the second part we just created a custom forward() to avoid computing/accumulating gradients for the prior network and them summing (after scaling) it with the model prediction.

DEF train_model（x_train，y_train，base_model，prior_model）：模型= ModelWithPrior（base_model，prior_model，1.0）loss_fn = nn.MSELoss（）优化= torch.optim.Adam（model.parameters（），LR = 0.05）为在历元范围（100）：model.train（）preds =模型（x_train）损耗= loss_fn（preds，y_train）optimizer.zero_grad（）loss.backward（）optimizer.step（）的返回模型

and using a sampler with replacement (bootstrap) as in:

数据集= TensorDataset（...）bootstrap_sampler = RandomSampler（数据集，TRUE，LEN（数据集））train_dataloader =的DataLoader（数据集，=的batch_size LEN（数据集），采样= bootstrap_sampler）

In this case, I used the same small dataset used in the original paper:

If we look at just the priors, we will see the variation of the untrained networks:

We can also visualize the individual model predictions showing their variability due to different initializations as well as the bootstrap noise:

Now, what is also quite interesting, is that we can change the prior to let’s say a fixed sine:

类SinPrior（nn.Module）：高清向前（个体经营，输入）：返回torch.sin（3 *输入）

Then, when we train the same MLP model but this time using the sine prior, we can see how it affects the final prediction and uncertainty bounds:

If we show each individual model, we can see the effect of the prior contribution to each individual model:

Cite this article as: Christian S. Perone, "Randomized prior functions in PyTorch," inTerra Incognita, 24/03/2019,//www.cpetem.com/2019/03/randomized-prior-functions-in-pytorch/

## 188bet开户平台

*这篇文章是在葡萄牙语。It’s a bayesian analysis of a Brazilian national exam. The main focus of the analysis is to understand the underlying factors impacting the participants performance on ENEM.

Este tutorial apresenta uma análise breve dos microdados do ENEM do Rio Grande do Sul do ano de 2017. O principal objetivo é entender os fatores que impactam na performance dos participantes do ENEM dado fatores como renda familiar e tipo de escola. Neste tutorial são apresentados dois modelos: regressão linear e regressão linear hierárquica.

188bet开户平台

## 188betiosapp

Update 28 Feb 2019:我添加了一个new blog post with a slide deck包括我做了PyData蒙特利尔的表现。

Today, at the PyTorch Developer Conference, the PyTorch team announced the plans and the release of the PyTorch 1.0 preview with many nice features such as a JIT for model graphs (with and without tracing) as well as theLibTorch中，PyTorch C ++ API中的所述一个most important release announcementmade today in my opinion.

### Introduction

Simply put, the libtorch is a library version of the PyTorch. It contains the underlying foundation that is used by PyTorch, such as theATen（张量库），它包含了所有的张量操作和方法。还Libtorch包含autograd, which is the component that adds the automatic differentiation to the ATen tensors.

A word of caution for those who are starting now is to be careful with the use of the tensors that can be created both from ATen and autograd,不要混合使用它们, the ATen will return the plain tensors (when you create them using theatnamespace) while the autograd functions (from the火炬命名空间）将返回Variable中，加入其自动分化机制。

Libtorch可以从下载Pytorch网站它仅作为一会儿预览。您还可以找到文档this site，这主要是一个Doxygen的渲染文档。我发现图书馆相当稳定，这是有道理的，因为它实际上是暴露PyTorch的稳定基础，但是，也有一些问题，标题和一些小问题，关于图书馆的组织，而开始使用它，你可能会发现（将希望尽快修复）。

For NodeJS, I’ll use the本土抽象库（楠），这是最值得推荐的库（实际上基本是仅标头库）来创建的NodeJS C ++的附加组件和cmake的-JS, because libtorch already provide the cmake files that make our building process much easier. However, the focus here will be on the C++ code and not on the building process.

The flow for the development, tracing, serializing and loading the model can be seen in the figure on the left side.

It starts with the development process and tracing being done in PyTorch (Python domain) and then the loading and inference on the C++ domain (in our case in NodeJS add-on).

### 结束语张量

的#ifndef TENSOR_H的#define TENSOR_H的#include 的#include <炬/ torch.h>命名空间torchjs {类张量：公共楠:: ObjectWrap {公共：静态NAN_MODULE_INIT（初始化）;空隙setTensor（在::张量张量）{这个 - > mTensor =张量;}炬::张量getTensor（）{返回这个 - > mTensor;}静态V8 ::本地 NewInstance方法（）;私人：明确的张量（）;张量〜（）;静态NAN_METHOD（新）;静态NAN_METHOD（的toString）;静态楠::持久构造;私人：火炬::张量mTensor; }; } // namespace torchjs #endif

I won’t show all the implementation details because most parts of it are NodeJS boilerplate code to construct the object, etc. I’ll focus on the parts that touch the libtorch API, like in the code below where we are creating a small textual representation of the tensor to show on JavaScript (toStringmethod):

NAN_METHOD(Tensor::toString) { Tensor* obj = ObjectWrap::Unwrap(info.Holder()); std::stringstream ss; at::IntList sizes = obj->mTensor.sizes(); ss << "Tensor[Type=" << obj->mTensor.type() << ", "; ss << "Size=" << sizes << std::endl; info.GetReturnValue().Set(Nan::New(ss.str()).ToLocalChecked()); }

### 包装张量创建操作

NAN_METHOD（个）{//理智的参数检查，如果（info.Length（）<2）返回楠:: ThrowError（楠::新（ “错误的数目的参数”）ToLocalChecked（））;如果（！信息[0]  - > IsArray的（）||信息[1]  - > IsBoolean（）！）返回楠:: ThrowError（楠::新（ “错误的参数类型”）ToLocalChecked（））;//检索参数（require_grad和张量形状）const的布尔require_grad =信息[1]  - > BooleanValue中（）;常量V8 ::本地阵列=信息[0]。如（）;常量uint32_t的长度=阵列 - >长度（）;//从V8 ::数组转换为标准::矢量的std ::矢量<长长>变暗;对于（中间体I = 0;我<长度;我++）{V8 ::本地 V;INT d =阵列 - >获取（ⅰ） - > NumberValue（）;dims.push_back（d）;} //调用libtorch并创建一个新torchjs ::张量对象//包装新火炬::张量，是由火炬创建::在::张量V =火炬::一（变暗，火炬:: requires_grad的人（require_grad））; auto newinst = Tensor::NewInstance(); Tensor* obj = Nan::ObjectWrap::Unwrap(newinst); obj->setTensor(v); info.GetReturnValue().Set(newinst); }

And that’s it, we just exposed one torch operation that can be used as native JavaScript operation.

### Intermezzo for the PyTorch JIT

The introduced PyTorch JIT revolves around the concept of the Torch Script. A Torch Script is a restricted subset of the Python language and comes with its own compiler and transform passes (optimizations, etc).

Note that if you have branching decisions on your code that depends on external factors or data, tracing won’t work as you expect because it will record that particular execution of the graph, hence the alternative option to provide the script. However, in most of the cases, the tracing is what we need.

To understand the differences, let’s take a look at the Intermediate Representation (IR) from the script module generated both by tracing and by scripting.

@ torch.jit.script DEF happy_function_script（X）：RET = torch.rand（0）如果真== TRUE：RET = torch.rand（1）否则：RET = torch.rand（2）返回RET DEF happy_function_trace（X）：t RET = torch.rand（0）如果真== TRUE：RET = torch.rand（1）否则：RET = torch.rand（2）返回RET traced_fn = torch.jit.trace（happy_function_trace，（torch.tensor（0），），check_trace = FALSE）

# 1) Graph from the scripting approach graph(%x : Dynamic) { %16 : int = prim::Constant[value=2]() %10 : int = prim::Constant[value=1]() %7 : int = prim::Constant[value=1]() %8 : int = prim::Constant[value=1]() %9 : int = aten::eq(%7, %8) %ret : Dynamic = prim::If(%9) block0() { %11 : int[] = prim::ListConstruct(%10) %12 : int = prim::Constant[value=6]() %13 : int = prim::Constant[value=0]() %14 : int[] = prim::Constant[value=[0, -1]]() %ret.2 : Dynamic = aten::rand(%11, %12, %13, %14) -> (%ret.2) } block1() { %17 : int[] = prim::ListConstruct(%16) %18 : int = prim::Constant[value=6]() %19 : int = prim::Constant[value=0]() %20 : int[] = prim::Constant[value=[0, -1]]() %ret.3 : Dynamic = aten::rand(%17, %18, %19, %20) -> (%ret.3) } return (%ret); } # 2) Graph from the tracing approach graph(%0 : Long()) { %7 : int = prim::Constant[value=1]() %8 : int[] = prim::ListConstruct(%7) %9 : int = prim::Constant[value=6]() %10 : int = prim::Constant[value=0]() %11 : int[] = prim::Constant[value=[0, -1]]() %12 : Float(1) = aten::rand(%8, %9, %10, %11) return (%12); }

PyTorch JIT有很多被用来做循环展开改造通行证，死代码消除等。您可以找到这些的这里传递。不是转换成其它格式，例如ONNX可以被实施为在该中间表示（IR），这是相当方便的顶部一通。

### Tracing the ResNet

Now, before implementing the Script Module in NodeJS, let’s first trace a ResNet network using PyTorch (using just Python):

traced_net = torch.jit.trace（torchvision.models.resnet18（），torch.rand（1，3，224，224））traced_net.save（ “resnet18_trace.pt”）

As you can see from the code above, we just have to provide a tensor example (in this case a batch of a single image with 3 channels and size 224×224. After that we just save the traced network into a file calledresnet18_trace.pt

### 包裹脚本模块

// Class构造ScriptModule :: ScriptModule（常量的std :: string文件名）{//加载从文件这 - > mModule =炬:: JIT ::负载（文件名）所追踪的网络;} // JavaScript对象创建NAN_METHOD（ScriptModule ::新）{如果（info.IsConstructCall（））{//获取文件名参数V8 ::字符串:: Utf8Value param_filename（信息[0]  - >的ToString（））;常量的std :: string文件名=的std :: string（* param_filename）;//使用该文件名ScriptModule * OBJ =新ScriptModule（文件名）创建一个新的脚本模块;obj->裹（info.This（））;。info.GetReturnValue（）设置（info.This（））;}否则{V8 ::本地缺点=楠::新（构造）;。info.GetReturnValue（）设置（楠:: NewInstance方法（缺点）.ToLocalChecked（））;}}

The wrapping of the forward pass is also quite straightforward:

NAN_METHOD（ScriptModule ::向前）{ScriptModule * script_module = ObjectWrap ::展开（info.Holder（））;楠:: MaybeLocal 说不定=楠::要（信息[0]）;张量*张量=楠:: ObjectWrap ::展开<张量>（maybe.ToLocalChecked（））;火炬::张量torch_tensor = tensor-> getTensor（）;火炬::张量输出= script_module-> mModule->向前（{torch_tensor}）toTensor（）;自动newinst中=张量:: NewInstance方法（）;张量* OBJ =楠:: ObjectWrap ::展开<张量>（newinst中）;obj-> setTensor（输出）;。info.GetReturnValue（）设置（newinst中）;}

As you can see, in this code, we just receive a tensor as an argument, we get the internal火炬::Tensorfrom it and then call the forward method from the script module, we wrap the output on a new火炬js::Tensor和n return it.

VAR torchjs =需要（ “./构建/释放/ torchjs”）;VAR = script_module新torchjs.ScriptModule（ “resnet18_trace.pt”）;VAR数据= torchjs.ones（[1，3，224，224]，假）;VAR输出= script_module.forward（数据）;

– Christian S. Perone

## PyTorch – Internal Architecture Tour

Update 28 Feb 2019:我添加了一个new blog post with a slide deck包括我做了PyData蒙特利尔的表现。

## 短介绍到Python扩展在C / C ++对象

As you probably know, you can extend Python using C and C++ and develop what is called as “extension”. All the PyTorch heavy work is implemented in C/C++ instead of pure-Python. To define a new Python object type in C/C++, you define a structure like this one example below (which is the base for the autogradVariable类）：

// Python对象即背torch.autograd.Variable结构THPVariable {PyObject_HEAD炬:: autograd ::可变CDATA;*的PyObject backward_hooks;};

Funny fact：这是在许多应用亚洲金博宝中非常普遍使用小整数作为索引，计数器等为了提高效率，官方CPython interpretercaches the integers from -5 up to 256. For that reason, the statement一个= 200;B = 200;A是B将会True，而声明一个= 300;B = 300;A是B将会

## Zero-copy PyTorch Tensor to Numpy and vice-versa

PyTorch has its own Tensor representation, which decouples PyTorch internal representation from external representations. However, as it is very common, especially when data is loaded from a variety of sources, to have Numpy arrays everywhere, therefore we really need to make conversions between Numpy and PyTorch tensors. For that reason, PyTorch provides two methods calledfrom_numpy（）andnumpy()，其将一个numpy的阵列，反之亦然，分别PyTorch阵列和。如果我们看就是被称为一个numpy的数组转换成PyTorch张量的代码，我们可以得到在PyTorch的内部表示更多的见解：

at::Tensor tensor_from_numpy(PyObject* obj) { if (!PyArray_Check(obj)) { throw TypeError("expected np.ndarray (got %s)", Py_TYPE(obj)->tp_name); } auto array = (PyArrayObject*)obj; int ndim = PyArray_NDIM(array); auto sizes = to_aten_shape(ndim, PyArray_DIMS(array)); auto strides = to_aten_shape(ndim, PyArray_STRIDES(array)); // NumPy strides use bytes. Torch strides use element counts. auto element_size_in_bytes = PyArray_ITEMSIZE(array); for (auto& stride : strides) { stride /= element_size_in_bytes; } // (...) - omitted for brevity void* data_ptr = PyArray_DATA(array); auto& type = CPU(dtype_to_aten(PyArray_TYPE(array))); Py_INCREF(obj); return type.tensorFromBlob(data_ptr, sizes, strides, [obj](void* data) { AutoGIL gil; Py_DECREF(obj); }); }

（代码从tensor_numpy.cpp)

After this, PyTorch will create a new Tensor object from this Numpy data blob, and in the creation of this new Tensor it passes the borrowed memory data pointer, together with the memory size and strides as well as a function that will be used later by the Tensor Storage (we’ll discuss this in the next section) to release the data by decrementing the reference counting to the Numpy array object and let Python take care of this object life cycle.

ThetensorFromBlob（）方法将创建一个新的张量，但只有创建一个新的“存储”这个张量之后。存储是其中实际数据指针将被存储（而不是在张量结构本身）。这需要我们去了解下部分张量储量

## 张量存储

The actual raw data of the Tensor is not directly kept in the Tensor structure, but on another structure called Storage, which in turn is part of the Tensor structure.

As we saw in the previous code fromtensor_from_numpy（）, there is a call fortensorFromBlob（）that will create a Tensor from the raw data blob. This last function will call another function called storageFromBlob() that will, in turn, create a storage for this data according to its type. In the case of a CPU float type, it will return a newCPUFloatStorage实例。

typedef struct THStorage { real *data; ptrdiff_t size; int refcount; char flag; THAllocator *allocator; void *allocatorContext; struct THStorage *view; } THStorage;

（代码从THStorage.h)

>>> tensor_a = torch.ones((3, 3)) >>> tensor_b = tensor_a.view(9) >>> tensor_a.storage().data_ptr() == tensor_b.storage().data_ptr() True

As we can see in the example above, the data pointer on the storage of both Tensors are the same, but the Tensors represent a different interpretation of the storage data.

typedef结构THAllocator {无效*（* malloc的）（无效*，ptrdiff_t的）;无效*（* realloc的）（无效*，无效*，ptrdiff_t的）;空隙（*免费）（无效*，无效*）;} THAllocator;

（代码从THAllocator.h)

static void *THCudaHostAllocator_malloc(void* ctx, ptrdiff_t size) { void* ptr; if (size < 0) THError("Invalid memory size: %ld", size); if (size == 0) return NULL; THCudaCheck(cudaMallocHost(&ptr, size)); return ptr; }

（代码从THCAllocator.c)

You probably noticed a pattern in the repository organization, but it is important to keep in mind these conventions when navigating the repository, as summarized here (taken from thePyTorch LIB自述）：

• TH=TorcH
• THC=TorcHCuda
• THCS=TorcHCudaS解析
• THCUNN=TorcHCUdaNeuralNetwork
• THD=TorcHDistributed
• THNN=TorcHNeuralNetwork
• THS=TorcH 2 S解析

Finally, we can see the composition of the main TensorTHTensor结构体：

typedef结构THTensor {*的int64_t大小;*的int64_t步幅;INT n标注;THStorage *存储;ptrdiff_t的storageOffset;INT引用计数;焦标志;} THTensor;

（从代码THTensor.h)

We can summarize all this structure that we saw in the diagram below:

## Shared Memory

PyTorch provides a wrapper around the Python多模块，并且可以从被导入火炬。多。他们在各地官方Python多这个包装实施的变化做是为了确保每次张量放在一个队列或共享与另一个进程，PyTorch将确保只对共享内存的句柄将被共享，而不是亚洲金博宝张量的新的完整副本。

static THStorage* THPStorage_(newFilenameStorage)(ptrdiff_t size) { int flags = TH_ALLOCATOR_MAPPED_SHAREDMEM | TH_ALLOCATOR_MAPPED_EXCLUSIVE; std::string handle = THPStorage_(__newHandle)(); auto ctx = libshm_context_new(NULL, handle.c_str(), flags); return THStorage_(newWithAllocator)(size, &THManagedSharedAllocator, (void*)ctx); }

（从代码StorageSharing.cpp)

Note: when a method ends with a underscore in PyTorch, such as the method calledshare_memory_()，这意味着，该方法具有就地效果，它将改变当前对象，而不是创建一个新的与改进的。

I’ll now show a Python example of one processing using the data from a Tensor that was allocated on another process by manually exchanging the shared memory handle:

This is executed in the process A:

>>>进口炬>>> tensor_a = torch.ones（（5,5））>>> tensor_a 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 [火炬。FloatTensor of size 5x5] >>> tensor_a.is_shared() False >>> tensor_a = tensor_a.share_memory_() >>> tensor_a.is_shared() True >>> tensor_a_storage = tensor_a.storage() >>> tensor_a_storage._share_filename_() (b'/var/tmp/tmp.0.yowqlr', b'/torch_31258_1218748506', 25)

Code executed in the process B:

>>>进口炬>>> tensor_a = torch.Tensor（）>>> tuple_info =（B '/ var / tmp中/ tmp.0.yowqlr'，B '/ torch_31258_1218748506'，25）>>>存储=炬。Storage._new_shared_filename（* tuple_info）>>> tensor_a = torch.Tensor（存储）。查看（（5,5））1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 [5×5大小的torch.FloatTensor]

## DLPack: a hope for the Deep Learning frameworks Babel

Now I would like to talk about something recent in the PyTorch code base, that is calledDLPack。DLPack是一个内存张量结构，其将允许交换张量数据的一个开放的标准化框架之间, and what is quite interesting is that since this memory representation is standardized and very similar to the memory representation already in use by many frameworks, it will allow azero-copy data sharing between frameworks, which is a quite amazing initiative given the variety of frameworks we have today without inter-communication among them.

DLPack的核心操作系统被称为结构非常简单亚洲金博宝DLTensor， 如下所示：

/ * !* \短暂纯C张量对象,并不等内容e memory. */ typedef struct { /*! * \brief The opaque data pointer points to the allocated data. * This will be CUDA device pointer or cl_mem handle in OpenCL. * This pointer is always aligns to 256 bytes as in CUDA. */ void* data; /*! \brief The device context of the tensor */ DLContext ctx; /*! \brief Number of dimensions */ int ndim; /*! \brief The data type of the pointer*/ DLDataType dtype; /*! \brief The shape of the tensor */ int64_t* shape; /*! * \brief strides of the tensor, * can be NULL, indicating tensor is compact. */ int64_t* strides; /*! \brief The offset in bytes to the beginning pointer to data */ uint64_t byte_offset; } DLTensor;

（代码从dlpack.h)

There is also a managed version of the tensor that is calledDLManagedTensor，其中框架可以提供一个框架，并且可以通过谁借了张量来通知资源不再需要其他的框架，框架调用也是一个“删除器”的功能。

In PyTorch, if you want to convert to or from a DLTensor format, you can find both C/C++ methods for doing that or even in Python you can do that as shown below:

从进口torch.utils炬导入dlpack吨= torch.ones（（5,5））（DL）= dlpack.to_dlpack（t）的

DLManagedTensor * toDLPack（常量张量＆SRC）{ATenDLMTensor * atDLMTensor（新ATenDLMTensor）;atDLMTensor->手柄= SRC;atDLMTensor-> tensor.manager_ctx = atDLMTensor;atDLMTensor-> tensor.deleter =＆删除器;atDLMTensor-> tensor.dl_tensor.data = src.data_ptr（）;的int64_t DEVICE_ID = 0;如果（src.type（）is_cuda（）。）{DEVICE_ID = src.get_device（）;} atDLMTensor-> tensor.dl_tensor.ctx = getDLContext（src.type（），DEVICE_ID）;atDLMTensor-> tensor.dl_tensor.ndim = src.dim（）;atDLMTensor-> tensor.dl_tensor.dtype = getDLDataType（src.type（））; atDLMTensor->tensor.dl_tensor.shape = const_cast(src.sizes().data()); atDLMTensor->tensor.dl_tensor.strides = const_cast(src.strides().data()); atDLMTensor->tensor.dl_tensor.byte_offset = 0; return &(atDLMTensor->tensor); }

– Christian S. Perone

## 隐私保护使用InferSent的嵌入和安全两方计算句子语义相似度

### Privacy-preserving Computation

Formally what we want is to jointly evaluate the following function:

$R = F（A，B）$

Such as the private valuesAandB举行私人到它的唯一拥有者，并在结果r将会known to just one or both of the parties.

It seems very counterintuitive that a problem like that could ever be solved, but for the surprise of many people, it is possible to solve it on some security requirements. Thanks to the recent developments in techniques such as FHE (Fully Homomorphic EncryptionOblivious Transfer,乱码电路, problems like that started to get practical for real-life usage and they are being nowadays being used by many companies in applications such as information exchange, secure location, advertisement, satellite orbit collision avoidance, etc.

### 句子相似度比较

Another approach for this problem (this is the approach that we’ll be using), is to compare the sentences in the sentence embeddings space. We just need to create sentence embeddings using a Machine Learning model (we’ll useInferSent更高版本），然后比较句子的嵌入物。不过，这种做法也引起了另一个问题：如果什么鲍勃或翘火车一Seq2Seq模式，将从对方回项目的大致描述的嵌入物去？

### Generating sentence embeddings with InferSent

They used a Bi-directional LSTM with attention that consistently surpassed many unsupervised training methods such as the SkipThought vectors. They also provide aPytorch implementation我们将用它来生成句子的嵌入。

进口numpy的从NP进口火炬＃训练模型：https://github.com/facebookresearch/Infer金宝博游戏网址Sent GLOVE_EMBS = '../dataset/GloVe/glove.840B.300d.txt' INFERSENT_MODEL = 'infersent.allnli.pickle' ＃负荷训练InferSent模型模型= torch.load（INFERSENT_MODEL，map_location =拉姆达存储，在上述：存储）model.set_glove_path（GLOVE_EMBS）model.build_vocab_k_words（K = 100000）

$cos(\pmb x, \pmb y) = \frac {\pmb x \cdot \pmb y}{||\pmb x|| \cdot ||\pmb y||}$

$cos(\hat{x}, \hat{y}) =\hat{x} \cdot\hat{y}$

# This function will forward the text into the model and # get the embeddings. After that, it will normalize it # to a unit vector. def encode(model, text): embedding = model.encode([text])[0] embedding /= np.linalg.norm(embedding) return embedding

＃此功能将为了缩放嵌入到＃去掉小数点。DEF比例（嵌入）：SCALE = 1 << 14 scale_embedding = np.clip（嵌入，0.0，1.0）* SCALE返回scale_embedding.astype（np.int32）

Now we just need to create some sentence samples that we’ll be using:

＃爱丽丝句子alice_sentences =列表[“我的猫很喜欢我的键盘走了”，“我想爱抚我的猫”，]＃鲍勃的句子bob_sentences名单= [“猫总是走在我的键盘”，]

＃爱丽丝句子alice_sentence1 =编码（型号，alice_sentences [0]）= alice_sentence2编码（型号，alice_sentences [1]）＃鲍勃句bob_sentence1 =编码（型号，bob_sentences [0]）

>>> np.dot(bob_sentence1, alice_sentence1) 0.8798542 >>> np.dot(bob_sentence1, alice_sentence2) 0.62976325

Since we have now the embeddings, we just need to convert them to scaled integers:

# Scale the Alice sentence embeddings alice_sentence1_scaled = scale(alice_sentence1) alice_sentence2_scaled = scale(alice_sentence2) # Scale the Bob sentence embeddings bob_sentence1_scaled = scale(bob_sentence1) # This is the unit vector embedding for the sentence >>> alice_sentence1 array([ 0.01698913, -0.0014404 , 0.0010993 , ..., 0.00252409, 0.00828147, 0.00466533], dtype=float32) # This is the scaled vector as integers >>> alice_sentence1_scaled array([278, 0, 18, ..., 41, 135, 76], dtype=int32)

### 两方安全计算

In order to perform secure computation between the two parties (Alice and Bob), we’ll use theABY framework。ABY实现了许多差异安全计算方案，并允许你描述你的计算像下面的图片，其中姚明的百万富翁的问题描述描绘的电路：

As you can see, we have two inputs entering in one GT GATE (greater than gate) and then a output. This circuit has a bit length of 3 for each input and will compute if the Alice input is greater than (GT GATE) the Bob input. The computing parties then secret share their private data and then can use arithmetic sharing, boolean sharing, or Yao sharing to securely evaluate these gates.

ABY是很容易使用，因为你可以描述你的输入，股票，盖茨和它会做休息，你如创建套接字通信信道，在需要的时候进行数据交换等。然而，实施完全是用C ++编写，并I’m not aware of any Python bindings for it (a great contribution opportunity).

After that, we just need to execute the application on two different machines (or by emulating locally like below):

＃这将执行的服务器部分，所述-r 0指定的角色（服务器）＃和载体的-n 4096只限定了尺寸（InferSent生成＃4096维的嵌入）。〜＃./innerproduct -r 0 -n 4096＃而另一个进程相同（或另一台机器，但是对于另一个＃机执行，你必须明明指定IP）。〜＃./innerproduct -r 1 -n 4096

alice_sentence1的内积和alice_sentence2的bob_sentence1 = 226691917内产品和bob_sentence1 = 171746521

Even in the integer representation, you can see that the inner product of the Alice’s first sentence and the Bob sentence is higher, meaning that the similarity is also higher. But let’s now convert this value back to float:

>>> SCALE = 1 << 14＃这是点的产品，我们应该得到>>> np.dot（alice_sentence1，bob_sentence1）0.8798542＃这是内部的产品，我们在安全计算>>> 226691917 / SCALE **了2.00.8444931＃这是点的产品，我们应该得到>>> np.dot（alice_sentence2，bob_sentence1）0.6297632＃这是内部的产品，我们在安全计算得到>>> 171746521 / SCALE ** 2.0 0.6398056

– Christian S. Perone

Cite this article as: Christian S. Perone, "Privacy-preserving sentence semantic similarity using InferSent embeddings and secure two-party computation," inTerra Incognita，22/01/2018，//www.cpetem.com/2018/01/privacy-preserving-infersent/