## 188asia.net

### 马尔可夫不等式

$$\ underbrace {P（X \ GEQ \阿尔法）} _ {\文本{的大于恒定} \阿尔法概率} \当量\ underbrace {\压裂{\ mathbb {E} \左[X \权利]}{\阿尔法}} _ {\文本{界以上由期望超过恒定} \阿尔法}$$

Example：一家杂货店出售的平均每天40种啤酒（它的夏天！）。是什么，将出售80个或更多的啤酒明天的概率是多少？

$$\ {开始}对齐 P(X \geq \alpha) & \leq\frac{\mathbb{E}\left[X\right]}{\alpha} \\\\ P（X \ GEQ 80）＆\当量\压裂{40} {80} = 0.5 = 50 \％ \ {端对齐}$$

### 切比雪夫不等式

When we have information about the underlying distribution of a random variable, we can take advantage of properties of this distribution to know more about the concentration of this variable. Let’s take for example a normal distribution with mean $$\mu = 0$$ and unit standard deviation $$\sigma = 1$$ given by the probability density function (PDF) below:

$$F（X）= \压裂{1} {\ SQRT {2 \ PI}}ë^ { - X ^ 2/2}$$

$$P( \mid X – \mu \mid \geq k\sigma) \leq \frac{1}{k^2}$$

$$P（\中期X - \亩\中间$$

### 切比雪夫不等式与大数定律弱

Chebyshev’s inequality can also be used to prove theweak law of large numbers，它说，在概率对真实均值样本均值收敛。

That can be done as follows:

• 考虑独立同分布的序列（独立同分布）的随机变量\（X_1，X_2，X_3，\ ldots \），平均\（\亩\）和方差\（\西格马^ 2 \）;
• 样本均值是\（M_n = \压裂{X_1 + \ ldots + X_n} {N} \）和真实平均是\（\亩\）;
• 对于样品的期望意味着我们有：$$\ mathbb {E} \左[M_n \右] = \压裂{\ mathbb {E} \左[X_1 \右] + \ ldots + \ mathbb {E} \左[X_n \右]} {N} = \压裂{N \亩} {N} = \亩$$
• 对于样品的方差，我们有：$$瓦尔\左[M_n \右] = \压裂{VAR \左[X_1 \右] + \ ldots +无功\左[X_n \右]} {N ^ 2} =\压裂{N \西格玛^ 2} {N ^ 2} = \压裂{\西格马^ 2} {N}$$
• 乙y the application of the Chebyshev’s inequality we have: $$P(\mid M_n – \mu \mid \geq \epsilon) \leq \frac{\sigma^2}{n\epsilon^2}$$ for any (fixed) $$\epsilon > 0$$, as $$n$$ increases, the right side of the inequality goes to zero. Intuitively, this means that for a large $$n$$ the concentration of the distribution of $$M_n$$ will be around $$\mu$$.

### Improving on Markov’s and Chebyshev’s with Chernoff Bounds

$$P（A \帽B）= P（A）P（B）\\ P(A \cap C) = P(A)P(C) \\ P（B \帽C）= P（B）P（C）$$

$$P（A \帽乙\帽C）= P（A）P（B）P（C）$$

## Privacy-preserving sentence semantic similarity using InferSent embeddings and secure two-party computation

### 隐私保护计算

$[R= f(A, B)$

It seems very counterintuitive that a problem like that could ever be solved, but for the surprise of many people, it is possible to solve it on some security requirements. Thanks to the recent developments in techniques such as FHE (全同态加密），不经意传输乱码电路，这样的问题开始变得现实生活中使用的实用，他们正在时下正在应用，如信息交换，安全的位置，广告，卫星轨道防撞等许多公司所采用

I’m not going to enter into details of these techniques, but if you’re interested in the intuition behind the OT (Oblivious Transfer), you should definitely read the amazing explanation done by Craig Gidneyhere。该[Re are also, of course, many different protocols for doing 2PC or MPC, where each one of them assumes some security requirements (semi-honest, malicious, etc), I’m not going to enter into the details to keep the post focused on the goal, but you should be aware of that.

### 句子相似度比较

Now, how can we exchange information about the Bob and Alice’s project sentences without disclosing information about the project descriptions ?

One naive way to do that would be to just compute the hashes of the sentences and then compare only the hashes to check if they match. However, this would assume that the descriptions are exactly the same, and besides that, if the entropy of the sentences is small (like small sentences), someone with reasonable computation power can try to recover the sentence.

### 生成文章的嵌入与InferSent

InferSentis an NLP technique for universal sentence representation developed by Facebook that uses supervised training to produce high transferable representations.

Note: even if you don’t have GPU, you can have reasonable performance doing embeddings for a few sentences.

import numpy as np import torch # Trained model from: https://github.com/facebookresearch/InferSent GLOVE_EMBS = '../dataset/GloVe/glove.840B.300d.txt' INFERSENT_MODEL = 'infersent.allnli.pickle' # Load trained InferSent model model = torch.load(INFERSENT_MODEL, map_location=lambda storage, loc: storage) model.set_glove_path(GLOVE_EMBS) model.build_vocab_k_words(K=100000)

$Cos(\pmb x, \pmb y) = \frac {\pmb x \cdot \pmb y}{||\pmb x|| \cdot ||\pmb y||}$

$COS（\帽子{X}，\帽子{Y}）= \帽子{X} \ CDOT \帽子{Y}$

＃该功能将转发文成模型和＃得到的嵌入。在此之后，它会＃正常化到一个单位向量。DEF编码（模型，文本）：嵌入= model.encode（[文本]）[0]嵌入/ = np.linalg.norm（嵌入）返回嵌入

# This function will scale the embedding in order to # remove the radix point. def scale(embedding): SCALE = 1 << 14 scale_embedding = np.clip(embedding, 0.0, 1.0) * SCALE return scale_embedding.astype(np.int32)

# The list of Alice sentences alice_sentences = [ 'my cat loves to walk over my keyboard', 'I like to pet my cat', ] # The list of Bob sentences bob_sentences = [ 'the cat is always walking over my keyboard', ]

＃爱丽丝句子alice_sentence1 =编码（型号，alice_sentences [0]）= alice_sentence2编码（型号，alice_sentences [1]）＃鲍勃句bob_sentence1 =编码（型号，bob_sentences [0]）

>>> np.dot（bob_sentence1，alice_sentence1）0.8798542 >>> np.dot（bob_sentence1，alice_sentence2）0.62976325

＃缩放alice_sentence1_scaled爱丽丝句子的嵌入=规模（alice_sentence1）alice_sentence2_scaled =规模（alice_sentence2）＃量表鲍勃句子的嵌入bob_sentence1_scaled =规模（bob_sentence1）＃这是单位矢量嵌入用于句子>>> alice_sentence1阵列（[0.01698913， -0.0014404，0.0010993，...，0.00252409，0.00828147，0.00466533]，D型细胞= FLOAT32）＃这是经缩放的矢量作为整数>>> alice_sentence1_scaled阵列（[278，0，18，...，41，135，76]，D型细胞= INT32）

### Two-party secure computation

ABY是很容易使用，因为你可以描述你的输入，股票，盖茨和它会做休息，你如创建套接字通信信道，在需要的时候进行数据交换等。然而，实施完全是用C ++编写，并I’m not aware of any Python bindings for it (a great contribution opportunity).

Fortunately, there is an implemented example for ABY that can do dot product calculation for us, theexample is here。我不会在这里复制的例子，但只有一部分，我们必须改变读取嵌入矢量，我们之前创建的，而不是随机生成vectors and increasing the bit length to 32-bits.

＃这将执行的服务器部分，所述-r 0指定的角色（服务器）＃和载体的-n 4096只限定了尺寸（InferSent生成＃4096维的嵌入）。〜＃./innerproduct -r 0 -n 4096＃而另一个进程相同（或另一台机器，但是对于另一个＃机执行，你必须明明指定IP）。〜＃./innerproduct -r 1 -n 4096

alice_sentence1的内积和alice_sentence2的bob_sentence1 = 226691917内产品和bob_sentence1 = 171746521

>>> SCALE = 1 << 14＃这是点的产品，我们应该得到>>> np.dot（alice_sentence1，bob_sentence1）0.8798542＃这是内部的产品，我们在安全计算>>> 226691917 / SCALE **了2.00.8444931＃这是点的产品，我们应该得到>>> np.dot（alice_sentence2，bob_sentence1）0.6297632＃这是内部的产品，我们在安全计算得到>>> 171746521 / SCALE ** 2.0 0.6398056

– Christian S. Perone

## 在该区块新总理

GIMPS互联网梅森素数大搜索）已确认昨天新的已知的最大素数：277232917-1。这种新的已知的最大素有23249425个位数，是的，当然，一个梅森素数，素数在2形式表达ñ- 1，其中素性可以使用高效地计算卢卡斯 - 莱默素性测试。

>>>进口numpy的作为NP >>>一个= 2 >>> B = 77232917 >>> NUM_DIGITS = INT（1 + B * np.log10的（a））>>>打印（NUM_DIGITS）23249425

$2 ^ 1 \当量2 \ PMOD {10}$
$2 ^ 2 \当量4 \ PMOD {10}$
$2 ^ 3 \当量8 \ PMOD {10}$
$2 ^ 4 \当量6 \ PMOD {10}$
$2 ^ 5 \当量2 \ PMOD {10}$
$2 ^ 6 \当量4 \ PMOD {10}$
（...重复）

Which means that powers of 2 mod 10 repeats at every 4 numbers, thus we just need to compute 77,232,917 mod 4, which is 1. Given that$2 ^ {77232917} \当量2 ^ 1个\ PMOD {10}$所述部分2a77232917ends in 2 and when you subtract 1 you end up with 1 as the last digit, as you can confirm by looking at the整个号码（〜10Mb的压缩文件）。

– Christian S. Perone

## 本福德定律 - 指数

Since本福德定律got some attention in the past years, I decided to make a list of the previous posts I made on the subject in the context of elections, fraud, corruption, universality and prime numbers:

Despesas德Custeioé雷德本福德（2014年6月 -在葡萄牙语

Prime Numbers and the Benford’s Law（米ay 2009)

Delicious.com，检查用户数对本福德定律（2009年4月）

– Christian S. Perone

## 深度学习 - 卷积神经网络，并与Python特征提取

Convolutional neural networks (orConvNets）的生物激发的MLP的变体，它们具有不同的种类的层，并且每个层不同工作比通常的MLP层不同。如果您有兴趣了解更多关于ConvNets，好的当然是CS231n – Convolutional Neural Newtorks for Visual Recognition。的细胞神经网络的体系结构示于下面的图片：

### 加载MNIST数据集

MNIST数据集is one of the most traditional datasets for digits classification. We will use a pickled version of it for Python, but first, lets import the packages that we will need to use:

进口matplotlib进口matplotlib.pyplot作为PLT进口matplotlib.cm从进口的urllib进口urlretrieve cPickle的作为泡菜进口OS进口gzip的进口numpy的作为NP进口从lasagne.updates烤宽面条进口层theano进口烤宽面条厘米从nolearn.lasagne进口导入nesterov_momentum NeuralNet从sklearn.metrics nolearn.lasagne进口形象化导入classification_report从sklearn.metrics进口confusion_matrix

def load_dataset(): url = 'http://deeplearning.net/data/mnist/mnist.pkl.gz' filename = 'mnist.pkl.gz' if not os.path.exists(filename): print("Downloading MNIST dataset...") urlretrieve(url, filename) with gzip.open(filename, 'rb') as f: data = pickle.load(f) X_train, y_train = data[0] X_val, y_val = data[1] X_test, y_test = data[2] X_train = X_train.reshape((-1, 1, 28, 28)) X_val = X_val.reshape((-1, 1, 28, 28)) X_test = X_test.reshape((-1, 1, 28, 28)) y_train = y_train.astype(np.uint8) y_val = y_val.astype(np.uint8) y_test = y_test.astype(np.uint8) return X_train, y_train, X_val, y_val, X_test, y_test

X_train, y_train, X_val, y_val, X_test, y_test = load_dataset() plt.imshow(X_train[0][0], cmap=cm.binary)

### ConvNet架构和培训

NET1 = NeuralNet（层= [（ '输入'，layers.InputLayer），（ 'conv2d1'，layers.Conv2DLayer），（ 'maxpool1'，layers.MaxPool2DLayer），（ 'conv2d2'，layers.Conv2DLayer），（'maxpool2”，layers.MaxPool2DLayer），（ 'dropout1'，layers.DropoutLayer），（ '密'，layers.DenseLayer），（ 'dropout2'，layers.DropoutLayer），（ '输出'，layers.DenseLayer）]，＃输入层input_shape =（无，1，28，28），＃层conv2d1 conv2d1_num_filters = 32，conv2d1_filter_size =（5,5），conv2d1_nonlinearity = lasagne.nonlinearities.rectify，conv2d1_W = lasagne.init.GlorotUniform（），＃层maxpool1maxpool1_pool_size =（2，2），＃层conv2d2 conv2d2_num_filters = 32，conv2d2_filter_size =（5,5），conv2d2_nonlinearity = lasagne.nonlinearities.rectify，＃层maxpool2 maxpool2_pool_size =（2，2），＃dropout1 dropout1_p = 0.5，＃密dense_num_units = 256，dense_nonlinearity = lasagne.nonlinearities.rectify，＃dropout2 dropout2_p = 0.5，＃输出output_nonlinearity = lasagne.nonlinearities.softmax，output_num_units = 10，＃优化方法PARAMS更新=未列名terov_momentum，update_learning_rate = 0.01，update_momentum = 0.9，max_epochs = 10，详细= 1，）＃列车网络NN = net1.fit（X_train，y_train）

＃神经网络160362个可学习参数##层信息＃名称大小--- -------- -------- 0输入1x28x28 1 conv2d1 32x24x24 2 maxpool1 32x12x12 3 conv2d2 32x8x8 4 maxpool2 32x4x4 5 dropout132x4x4 6密256 7 dropout2 256 8输出10划时代列车损失有效损失火车/ VAL有效ACC DUR ------- ------------ -----------  ----------- --------- --- 1 0.85204 0.16707 5.09977 0.95174 33.71s 2 0.27571 0.10732 2.56896 0.96825 3 33.34s 0.20262 0.08567 2.36524 0.97488 33.51s 4 0.16551 0.07695 2.150810.97705 33.50s 5 0.14173 0.06803 2.08322 0.98061 34.38s 6 0.12519 0.06067 2.06352 0.98239 34.02s 7 0.11077 0.05532 2.00254 0.98427 33.78s 8 0.10497 0.05771 1.81898 0.98248 34.17s 9 0.09881 0.05159 1.91509 0.98407 10 33.80s 0.09264 0.04958 1.86864 0.98526 33.40s

### 预测和混淆矩阵

Now we can use the model to predict the entire testing dataset:

preds = net1.predict（X_test）

厘米= confusion_matrix（y_test，preds）plt.matshow（厘米）plt.title（ '混淆矩阵'）plt.colorbar（）plt.ylabel（ '真标签'）plt.xlabel（ '预测标签'）plt.show（）

### 过滤器的可视化

We can also visualize the 32 filters from the first convolutional layer:

visualize.plot_conv_weights（net1.layers _ [ 'conv2d1']）

### Theano层的功能和特征提取

Now it is time to create theano-compiled functions that will feed-forward the input data into the architecture up to the layer you’re interested. I’m going to get the functions for the output layer and also for the dense layer before the output layer:

dense_layer = layers.get_output（net1.layers _ [ '密']，确定性=真）output_layer = layers.get_output（net1.layers _ [ '输出']，确定性=真）input_var = net1.layers _ [ '输入']。input_varf_output = theano.function（[input_var]，output_layer）f_dense = theano.function（[input_var]，dense_layer）

instance = X_test[0][None, :, :] %timeit -n 500 f_output(instance) 500 loops, best of 3: 858 µs per loop

pred = f_output(instance) N = pred.shape[1] plt.bar(range(N), pred.ravel())

PRED = f_dense（实例）N = pred.shape [1] plt.bar（范围（N），pred.ravel（））

I hope you enjoyed the tutorial !

## 谷歌的S2，上球体，细胞和希尔伯特曲线几何

### 一直到细胞

• 它们结构紧凑（由64位整数表示）
• 他们有地域特色解析
• 他们是分层次的（大公有水平，以及类似的水平也有类似的地方）
• 该Containment query for arbitrary regions are really fast

### Hilbert Curve

In the image below, the point in the very beggining of the Hilbert curve (the string) is located also in the very beginning along curve (the curve is represented by a long string in the bottom of the image):

Now in the image below where we have more points, it is easy to see how the Hilbert curve is preserving the spatial locality. You can note that points closer to each other in the curve (in the 1D representation, the line in the bottom) are also closer in the 2D dimensional space (in the x,y plane). However, note that the opposite isn’t quite true because you can have 2D points that are close to each other in the x,y plane that aren’t close in the Hilbert curve.

### 例子

*在本教程中我使用了Python从2.7绑定以下资料库。该指令编译和安装它存在于仓库的自述，所以我就不在这里重复了。

>>>进口S2 >>>经纬度= s2.S2LatLng.FromDegrees（-30.043800，-51.140220）>>>细胞= s2.S2CellId.FromLatLng（经纬度）>>> cell.level（）30 >>> cell.id（）10743750136202470315 >>> cell.ToToken（）951977d377e723ab

You can also get the parent cell of that cell (one level above it) and use containment methods to check if a cell is contained by another cell:

>>> parent = cell.parent() >>> print parent.level() 29 >>> parent.id() 10743750136202470316 >>> parent.ToToken() 951977d377e723ac >>> cell.contains(parent) False >>> parent.contains(cell) True

>>> region_rect = S2LatLngRect（S2LatLng.FromDegrees（-51.264871，-30.241701），S2LatLng.FromDegrees（-51.04618，-30.000003））>>>防尘罩= S2RegionCoverer（）>>> coverer.set_min_level（8）>>>防尘罩.set_max_level（15）>>> coverer.set_max_cells（500）>>>覆盖= coverer.GetCovering（region_rect）

进口matplotlib.pyplot从S2进口* PLT从shapely.geometry进口多边形进口cartopy.crs作为CCRS导入cartopy.io.img_tiles作为cimgt PROJ = cimgt.MapQuestOSM（）plt.figure（figsize =（20,20），DPI= 200）AX = plt.axes（投影= proj.crs）ax.add_image（PROJ，12）ax.set_extent（[ -  51.411886，-50.922470，-30.301314，-29.94364]）region_rect = S2LatLngRect（S2LatLng.FromDegrees（ -51.264871，-30.241701），S2LatLng.FromDegrees（-51.04618，-30.000003））防尘罩= S2RegionCoverer（）coverer.set_min_level（8）coverer.set_max_level（15）coverer.set_max_cells（500）覆盖= coverer.GetCovering（region_rect）geoms =[用于在覆盖CELLID：new_cell = S2Cell（CELLID）顶点= []为i的x范围（0，4）：顶点= new_cell.GetVertex（ⅰ）经纬度= S2LatLng（顶点）vertices.append（（latlng.lat（）.degrees（），latlng.lng（）度（）））=地理多边形（的顶点）geoms.append（GEO）打印 “总几何：{}”。格式（LEN（geoms））ax.add_geometries（geoms，ccrs.PlateCarree（），facecolor = '珊瑚'，edgecolor = '黑'，人PHA = 0.4）plt.show（）

– Christian S. Perone

## 普遍性，素数和空间通信

– Christian S. Perone