隐私保护使用InferSent的嵌入和安全两方计算句子语义相似度

隐私保护计算

Privacy-preserving computation or secure computation is a sub-field of cryptography where two (two-party, or 2PC) or multiple (multi-party, or MPC) parties can evaluate a function together without revealing information about the parties private input data to each other. The problem and the first solution to it were introduced in 1982 by an amazing breakthrough done by Andrew Yao on what later became known as the “Yao’s Millionaires’ problem“.

To make the problem concrete, Alice has an amount A such as $10, and Bob has an amount B such as$ 50, and what they want to know is which one is larger, without Bob revealing the amount B to Alice or Alice revealing the amount A to Bob. It is also important to note that we also don’t want to trust on a third-party, otherwise the problem would just be a simple protocol of information exchange with the trusted party.

Formally what we want is to jointly evaluate the following function:

$r = f(A, B)$

Sentence Similarity Comparison

Another approach for this problem (this is the approach that we’ll be using), is to compare the sentences in the sentence embeddings space. We just need to create sentence embeddings using a Machine Learning model (we’ll useInferSent更高版本），然后比较句子的嵌入物。不过，这种做法也引起了另一个问题：如果什么鲍勃或翘火车一Seq2Seq模式，将从对方回项目的大致描述的嵌入物去？

It isn’t unreasonable to think that one can recover an approximate description of the sentence given their embeddings. That’s why we’ll use the two-party secure computation for computing the embeddings similarity, in a way that Bob and Alice will compute the similarity of the embeddings没有透露他们的嵌入, keeping their project ideas safe.

Generating sentence embeddings with InferSent

进口numpy的从NP进口火炬＃训练模型：https://github.com/facebookresearch/Infer金宝博游戏网址Sent GLOVE_EMBS = '../dataset/GloVe/glove.840B.300d.txt' INFERSENT_MODEL = 'infersent.allnli.pickle' ＃负荷训练InferSent模型模型= torch.load（INFERSENT_MODEL，map_location =拉姆达存储，在上述：存储）model.set_glove_path（GLOVE_EMBS）model.build_vocab_k_words（K = 100000）

Now we need to define a similarity measure to compare two vectors, and for that goal, I’ll the cosine similarity (188betcom网页版），因为它是非常简单的：

$COS（\ PMB的x，\ PMB Y）= \压裂{\ PMB X \ CDOT \ PMB Y} {|| \ PMB X ||\ CDOT || \ PMBÿ||}$

$COS（\帽子{X}，\帽子{Y}）= \帽子{X} \ CDOT \帽子{Y}$

So, if we normalize our vectors to have a unit norm (that’s why the vectors are wearing hats in the equation above), we can make the computation of the cosine similarity become just a simple dot product. That will help us a lot in computing the similarity distance later when we’ll use a framework to do the secure computation of this dot product.

So, the next step is to define a function that will take some sentence text and forward it to the model to generate the embeddings and then normalize them to unit vectors:

#这个函数提出了文本到国防部el and # get the embeddings. After that, it will normalize it # to a unit vector. def encode(model, text): embedding = model.encode([text])[0] embedding /= np.linalg.norm(embedding) return embedding

Now, for practical reasons, I’ll be using integer computation later for computing the similarity, however, the embeddings generated by InferSent are of course real values. For that reason, you’ll see in the code below that we create another function toscale the float values and remove the radix pointconverting them to integers. There is also another important issue, the framework that we’ll be using later for secure computation不允许有符号整数,所以我们还需要剪辑嵌入值tween 0.0 and 1.0. This will of course cause some approximation errors, however, we can still get very good approximations after clipping and scaling with limited precision (I’m using 14 bits for scaling to avoid overflow issues later during dot product computations):

# This function will scale the embedding in order to # remove the radix point. def scale(embedding): SCALE = 1 << 14 scale_embedding = np.clip(embedding, 0.0, 1.0) * SCALE return scale_embedding.astype(np.int32)

Now we just need to create some sentence samples that we’ll be using:

＃爱丽丝句子alice_sentences =列表[“我的猫很喜欢我的键盘走了”，“我想爱抚我的猫”，]＃鲍勃的句子bob_sentences名单= [“猫总是走在我的键盘”，]

# Alice sentences alice_sentence1 = encode(model, alice_sentences[0]) alice_sentence2 = encode(model, alice_sentences[1]) # Bob sentences bob_sentence1 = encode(model, bob_sentences[0])

>>> np.dot（bob_sentence1，alice_sentence1）0.8798542 >>> np.dot（bob_sentence1，alice_sentence2）0.62976325

As we can see, the first sentence of Bob is most similar (~0.87) with Alice first sentence than to the Alice second sentence (~0.62).

Since we have now the embeddings, we just need to convert them to scaled integers:

# Scale the Alice sentence embeddings alice_sentence1_scaled = scale(alice_sentence1) alice_sentence2_scaled = scale(alice_sentence2) # Scale the Bob sentence embeddings bob_sentence1_scaled = scale(bob_sentence1) # This is the unit vector embedding for the sentence >>> alice_sentence1 array([ 0.01698913, -0.0014404 , 0.0010993 , ..., 0.00252409, 0.00828147, 0.00466533], dtype=float32) # This is the scaled vector as integers >>> alice_sentence1_scaled array([278, 0, 18, ..., 41, 135, 76], dtype=int32)

Now with these embeddings as scaled integers, we can proceed to the second part, where we’ll be doing the secure computation between two parties.

两方安全计算

In order to perform secure computation between the two parties (Alice and Bob), we’ll use theABY framework。ABY实现了许多差异安全计算方案，并允许你描述你的计算像下面的图片，其中姚明的百万富翁的问题描述描绘的电路：

ABY是很容易使用，因为你可以描述你的输入，股票，盖茨和它会做休息，你如创建套接字通信信道，在需要的时候进行数据交换等。然而，实施完全是用C ++编写，并I’m not aware of any Python bindings for it (a great contribution opportunity).

After that, we just need to execute the application on two different machines (or by emulating locally like below):

# This will execute the server part, the -r 0 specifies the role (server) # and the -n 4096 defines the dimension of the vector (InferSent generates # 4096-dimensional embeddings). ~# ./innerproduct -r 0 -n 4096 # And the same on another process (or another machine, however for another # machine execution you'll have to obviously specify the IP). ~# ./innerproduct -r 1 -n 4096

Inner Product of alice_sentence1 and bob_sentence1 = 226691917 Inner Product of alice_sentence2 and bob_sentence1 = 171746521

>>> SCALE = 1 << 14 # This is the dot product we should get >>> np.dot(alice_sentence1, bob_sentence1) 0.8798542 # This is the inner product we got on secure computation >>> 226691917 / SCALE**2.0 0.8444931 # This is the dot product we should get >>> np.dot(alice_sentence2, bob_sentence1) 0.6297632 # This is the inner product we got on secure computation >>> 171746521 / SCALE**2.0 0.6398056

- 基督教S. Perone

Nanopipe: connecting the modern babel

Hello everyone, I just released the Nanopipe project. Nanopipe is a library that allows you to connect different message queue systems (but not limited to) together. Nanopipe was built to avoid the glue code between different types of communication protocols/channels that is very common nowadays. An example of this is: you have an application that is listening for messages on an AMQP broker (ie. RabbitMQ) but you also have a Redis pub/sub source of messages and also a MQTT source from a weird IoT device you may have. Using Nanopipe, you can connect both MQTT and Redis to RabbitMQ without doing any glue code for that. You can also build any kind of complex connection scheme using Nanopipe.

Simple and effective coin segmentation using Python and OpenCV

In this example, I’ll show how to segment coins present in images or even real-time video capture with a simple approach using thresholding, morphological operators, and contour approximation. This approach is a lot simpler than the approach using Otsu’s thresholding and Watershed segmentationhere in OpenCV Python tutorials，我强烈建议你阅读，因为它的稳健性。不幸的是，使用大津的阈值的方法是高度依赖于照明正常化。人们可以提取图像的小补丁来实现类似的自适应大津的二进制的东西（像在Letptonica实现 - 由正方体OCR使用的框架）来解决这个问题，但让我们看到另一种方法。为了参考，见使用具有我与一个非归一化照明的摄像头所拍摄的图像的大津的阈值的输出：

1. Setting the Video Capture configuration

import numpy as np import cv2 cap = cv2.VideoCapture(0) cap.set(cv2.cv.CV_CAP_PROP_FRAME_WIDTH, 1280) cap.set(cv2.cv.CV_CAP_PROP_FRAME_HEIGHT, 720)

In newer versions (unreleased yet), the constants forCV_CAP_PROP_FRAME_WIDTH现在在CV2模块，现在，我们只需要使用CV2。cv模块。

而正确：RET，帧= cap.read（）ROI =帧[0：500，0：500]灰色= cv2.cvtColor（ROI，cv2.COLOR_BGR2GRAY）

3.应用自适应阈值

gray_blur = cv2.GaussianBlur(gray, (15, 15), 0) thresh = cv2.adaptiveThreshold(gray_blur, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 11, 1)

See the effect of the Gaussian Kernel in the image:

And now the effect of the Adaptive Thresholding with the blurry image:

Note that at that moment we already have the coins segmented except for the small noisy inside the center of the coins and also in some places around them.

4. Morphology

Morphological Operatorsare used to dilate, erode and other operations on the pixels of the image. Here, due to the fact that sometimes the camera can present some artifacts, we will use the Morphological Operation of Closing to make sure that the borders of the coins are always close, otherwise, we may found a coin with a semi-circle or something like that. To understand the effect of the Closing operation (which is the operation of erosion of the pixels already dilated) see the image below:

内核= np.ones（（3，3），np.uint8）闭合= cv2.morphologyEx（THRESH，cv2.MORPH_CLOSE，内核，迭代= 4）

See now the effect of the Closing operation on our coins:

5.轮廓检测和滤波

cont_img = closing.copy() contours, hierarchy = cv2.findContours(cont_img, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

After finding the contours, we need to iterate into each one and check the area of them to filter the contours containing an area greater or smaller than the area of a coin. We also need to fit an ellipse to the contour found. We could have done this using the minimum enclosing circle, but since my camera isn’t perfectly above the coins, the coins appear with a small inclination describing an ellipse.

for cnt in contours: area = cv2.contourArea(cnt) if area < 2000 or area > 4000: continue if len(cnt) < 5: continue ellipse = cv2.fitEllipse(cnt) cv2.ellipse(roi, ellipse, (0,255,0), 2)

To show the final image with the contours we just use the imshow function to show a new window with the image:

cv2.imshow（ '最终结果'，ROI）

import numpy as np import cv2 def run_main(): cap = cv2.VideoCapture(0) cap.set(cv2.cv.CV_CAP_PROP_FRAME_WIDTH, 1280) cap.set(cv2.cv.CV_CAP_PROP_FRAME_HEIGHT, 720) while(True): ret, frame = cap.read() roi = frame[0:500, 0:500] gray = cv2.cvtColor(roi, cv2.COLOR_BGR2GRAY) gray_blur = cv2.GaussianBlur(gray, (15, 15), 0) thresh = cv2.adaptiveThreshold(gray_blur, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 11, 1) kernel = np.ones((3, 3), np.uint8) closing = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel, iterations=4) cont_img = closing.copy() contours, hierarchy = cv2.findContours(cont_img, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) for cnt in contours: area = cv2.contourArea(cnt) if area < 2000 or area > 4000: continue if len(cnt) < 5: continue ellipse = cv2.fitEllipse(cnt) cv2.ellipse(roi, ellipse, (0,255,0), 2) cv2.imshow("Morphological Closing", closing) cv2.imshow("Adaptive Thresholding", thresh) cv2.imshow('Contours', roi) if cv2.waitKey(1) & 0xFF == ord('q'): break cap.release() cv2.destroyAllWindows() if __name__ == "__main__": run_main()

Accessing HP Cloud OpenStack Nova using Python and Requests

Here is a screenshot of the instance size set:

Since they are using OpenStack, I really think that they should have imported the vocabulary of the OpenStack into the user interface, and instead of calling it “Size”, it would be more sensible to use “Flavour“.

Let’s dig into the OpenStack API now.

OpenStack API

To access the OpenStack API you’ll need the credentials for the authentication, HP Cloud services provide these keys on the Manage interface for each zone/service you have, see the screenshot below (with keysanonymized当然）：

Now,OpenStack authentication在不同的方案可以做，该方案，我知道惠普支持IS令牌认证。我知道，有很多已经支持的OpenStack API（一些没有文档，有些人奇怪的API设计等）的客户，但这篇文章的目的是展示将​​是多么容易地创建一个简单的接口来访问使用Python和OpenStack的APIRequests（HTTP人类！）。

[enlighter lang=”python” ]

self.auth_key = AUTH_KEY
self.auth_user = auth_user

def __call__(self, r):
return r
[/ enlighter]

[enlighter lang=”python”]
ENDPOINT_URL =“https://az-1.region-a.geo-1.compute.hpcloudsvc.com/v1.1/”
ACCESS_KEY =“您的访问密钥”

[/ enlighter]

[enlighter lang=”python”]

def __init__(self, request):

def __call__(self, r):
return r
[/ enlighter]

Note that the OpenStackAuthToken is receiving now a response request as parameter, copying the X-Auth-Token and setting it on the request.

Let’s consume a service from the OpenStack API v.1.1, I’m going to call theList Servers API函数解析使用JSON结果，然后在屏幕上显示的结果：

[enlighter lang=”python”]
# Get the management URL from the response header

＃创建使用/服务器路径管理URL的新请求
＃和OpenStackAuthToken方案，我们创建了
r_server = requests.get(mgmt_url + ‘/servers’, auth=OpenStackAuthToken(response))

# Parse the response and show it to the screen
print json.dumps(json_parse, indent=4)
[/ enlighter]

[enlighter]
{
“服务器”：
{
“ID”：22378，
“UUID”：“e2964d51-fe98-48f3-9428-f3083aa0318e”
{
“HREF”：“https://az-1.region-a.geo-1.compute.hpcloudsvc.com/v1.1/20817201684751/servers/22378”
“rel”: “self”
},
{
“href”: “https://az-1.region-a.geo-1.compute.hpcloudsvc.com/20817201684751/servers/22378”,
“相对”：“书签”
}
],
“名”：“服务器22378”
},
{
“id”: 11921,
“uuid”: “312ff473-3d5d-433e-b7ee-e46e4efa0e5e”,
{
“HREF”：“https://az-1.region-a.geo-1.compute.hpcloudsvc.com/v1.1/20817201684751/servers/11921”
“rel”: “self”
},
{
“HREF”：“https://az-1.region-a.geo-1.compute.hpcloudsvc.com/20817201684751/servers/11921”
“相对”：“书签”
}
],
“名”：“服务器11921”
}
]
}
[/ enlighter]

And that is it, now you know how to use Requests and Python to consume OpenStack API. If you wish to read more information about the API and how does it works, you can read thedocumentation here

- 基督教S. Perone

C++11 user-defined literals and some constructions

Introduction to user-defined literals

C ++ 03有一些文字，如在“12.2f”，其将双值浮在“F”。问题是，这些文字是不是很灵活，因为他们是非常固定的，所以你不能改变它们或创建新的。亚洲金博宝为了克服这种情况，C ++ 11引入的概念“用户定义的文字”这将使用户，创建新的自定义文字修饰的能力。新的用户定义的文本可以创建或者内置的类型（例如INT）或用户定义类型（例如类），以及事实上，他们可能是非常有用的是，它们可以代替仅返回原语的目的的效果。亚洲金博宝

[enlighter lang=”C++”]
OutputType operator “” _suffix(const char *literal_string);
[/ enlighter]

......在一个字符串的情况下。该OutputTypeis anything you want (object or primitive), the “_suffix” is the name of the literal modifier, isn’t required to use the underline in front of it, but if you don’t use you’ll get some warnings telling you that suffixes not preceded by the underline are reserved for future standardization.

Examples

KMH到MPH转换器

[enlighter LANG =” C ++”转义=”真”线=” 1000“]
// stupid converter class

{
public:
Converter(double kmph) : m_kmph(kmph) {};
~Converter() {};

{ return m_kmph / 1.609344; }

double m_kmph;
};

//用户定义文字

{ return Converter(kmph); }

INT主（无效）
{

//请注意，我用括号以
//可以称之为“to_mph”的方法

}
[/ccb]

的std :: string字面

[enlighter LANG =” C ++”转义=”真”线=” 1000“]
std::string operator “” s (const char* p, size_t n)
{ return std::string(p,n); }

INT主（无效）
{

system() call

[enlighter LANG =” C ++”转义=”真”线=” 1000“]
int operator “” ex(const char *cmd, size_t num_chars)
{ return system(cmd); }

INT主（无效）
{
“ls -lah”ex;

}
[/ccb]

别名和std ::地图

[enlighter LANG =” C ++”转义=”真”线=” 1000“]

MyMap中create_map（）
{
MyMap中米;

}

INT＆运算符“” M（常量字符*键，为size_t长度）
{返回米[键]。}

INT主（无效）
{

参考

Wikipedia :: C++11 (User-defined literals)

Pyevolve on Sony PSP ! (genetic algorithms on PSP)

Well, I’ve tested the Pyevolve GA framework on the Stackless Python for PSP and for my surprise, it worked without changing one single line of code on the framework due the fact of Pyevolve has been written in pure Python (except the platform specific issue like theInteractive Mode，但这个问题上不支持的平台是全自动禁用）。

So now,我们对PSP遗传算法

PSP Stackless Python Installation

1）首先，创建一个名为目录“python”on your PSP under the“/ PSP / GAME”和directory structure“/蟒蛇/站点包/”在你的记忆棒根目录（这最后的目录将被用于以后放Pyevolve）。
2) Copy theEBOOT.PBPpython.zipfiles to this created directory;

Pyevolve安装

3）下载Pyevolve源和复制名为“pyevolve”的目录中创建的目录“/蟒蛇/站点包/”，最终的目录结构将是：“/python/site-packages/pyevolve”

Ready ! Now you can import Pyevolve modules inside scripts on your PSP, of course you can’t use the graphical plotting tool or some DB Adapters of Pyevolve, but the GA Core it’s working very well.