听培训期间的神经网络梯度规范

You’ll need to installPyAudioPyTorch以运行该代码（在这个帖子的结尾）。

使用LR 0.01 SGD培训声

This segment represents a training session with gradients from 4 layers during the first 200 steps of the first epoch and using a batch size of 10. The higher the pitch, the higher the norm for a layer, there is a short silence to indicate different batches. Note the gradient increasing during time.

使用LR 0.1 SGD培训声

Same as above, but with higher learning rate.

Training sound with SGD using LR 1.0 and BS 256

Same setting but with a high learning rate of 1.0 and a batch size of 256. Note how the gradients explode and then there are NaNs causing the final sound.

源代码

进口pyaudio进口numpy np波不进口rt torch import torch.nn as nn import torch.nn.functional as F import torch.optim as optim from torchvision import datasets, transforms class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.conv1 = nn.Conv2d(1, 20, 5, 1) self.conv2 = nn.Conv2d(20, 50, 5, 1) self.fc1 = nn.Linear(4*4*50, 500) self.fc2 = nn.Linear(500, 10) self.ordered_layers = [self.conv1, self.conv2, self.fc1, self.fc2] def forward(self, x): x = F.relu(self.conv1(x)) x = F.max_pool2d(x, 2, 2) x = F.relu(self.conv2(x)) x = F.max_pool2d(x, 2, 2) x = x.view(-1, 4*4*50) x = F.relu(self.fc1(x)) x = self.fc2(x) return F.log_softmax(x, dim=1) def open_stream(fs): p = pyaudio.PyAudio() stream = p.open(format=pyaudio.paFloat32, channels=1, rate=fs, output=True) return p, stream def generate_tone(fs, freq, duration): npsin = np.sin(2 * np.pi * np.arange(fs*duration) * freq / fs) samples = npsin.astype(np.float32) return 0.1 * samples def train(model, device, train_loader, optimizer, epoch): model.train() fs = 44100 duration = 0.01 f = 200.0 p, stream = open_stream(fs) frames = [] for batch_idx, (data, target) in enumerate(train_loader): data, target = data.to(device), target.to(device) optimizer.zero_grad() output = model(data) loss = F.nll_loss(output, target) loss.backward() norms = [] for layer in model.ordered_layers: norm_grad = layer.weight.grad.norm() norms.append(norm_grad) tone = f + ((norm_grad.numpy()) * 100.0) tone = tone.astype(np.float32) samples = generate_tone(fs, tone, duration) frames.append(samples) silence = np.zeros(samples.shape[0] * 2, dtype=np.float32) frames.append(silence) optimizer.step() # Just 200 steps per epoach if batch_idx == 200: break wf = wave.open("sgd_lr_1_0_bs256.wav", 'wb') wf.setnchannels(1) wf.setsampwidth(p.get_sample_size(pyaudio.paFloat32)) wf.setframerate(fs) wf.writeframes(b''.join(frames)) wf.close() stream.stop_stream() stream.close() p.terminate() def run_main(): device = torch.device("cpu") train_loader = torch.utils.data.DataLoader( datasets.MNIST('../data', train=True, download=True, transform=transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,)) ])), batch_size=256, shuffle=True) model = Net().to(device) optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.5) for epoch in range(1, 2): train(model, device, train_loader, optimizer, epoch) if __name__ == "__main__": run_main()

在GPT-2语言模型法福德

Cite this article as: Christian S. Perone, "Benford law on GPT-2 language model," in亚洲金博宝未知领域，14/06/2019，//www.cpetem.com/2019/06/benford-law-on-gpt-2-language-model/

188bet手机版客户端

Past week I released the first public version ofEuclidesDB.EuclidesDB is a multi-model machine learning feature database that is tightly coupled with PyTorch and provides a backend for including and querying data on the model feature space.

EuclidesDB的某些功能列举如下：

• Written in C++ for performance;
• 用途protobuf的数据序列化;
• 用途GRPC通信;
• 性LevelDB集成数据库系列化;
• Many indexing methods implemented (AnnoyFaiss等）;
• 通过libtorch紧密PyTorch整合;
• 轻松集成新的自定义微调模式;
• Easy client language binding generation;
• 免费和开放源代码与自由许可;

And here is a diagram of the overall architecture:

本福德定律emerges from deep language model

I was experimenting with the digits distribution from a pre-trained (weights from theOpenAI库Transformerlanguage model (LM) and I found a very interesting correlation between the本福德定律和the digit distribution of the language model after conditioning it with some particular phrases.

188betiosapp

Given the huge interest in understanding how this new API works, I decided to write this article showing an example of many opportunities that are now open after the release of the PyTorch C++ API. In this post, I’ll integrate PyTorch inference into native NodeJS using NodeJS C++ add-ons, just as an example of integration between different frameworks/languages that are now possible using the C++ API.

介绍

For a more extensive tutorial on how PyTorch internals work, please take a look on my previous tutorial on thePyTorchinternal architecture

Libtorch可以从下载Pytorch网站和it is only available as a preview for a while. You can also find the documentation inthis site，这主要是一个Doxygen的渲染文档。我发现图书馆相当稳定，这是有道理的，因为它实际上是暴露PyTorch的稳定基础，但是，也有一些问题，标题和一些小问题，关于图书馆的组织，而开始使用它，你可能会发现（将希望尽快修复）。

Wrapping the Tensor

In NodeJS, to create an object as a first-class citizen of the JavaScript world, you need to inherit from theObjectWrap类，这将负责用于包装C ++组件。

#ifndef TENSOR_H #define TENSOR_H #include  #include  namespace torchjs { class Tensor : public Nan::ObjectWrap { public: static NAN_MODULE_INIT(Init); void setTensor(at::Tensor tensor) { this->mTensor = tensor; } torch::Tensor getTensor() { return this->mTensor; } static v8::Local NewInstance(); private: explicit Tensor(); ~Tensor(); static NAN_METHOD(New); static NAN_METHOD(toString); static Nan::Persistent constructor; private: torch::Tensor mTensor; }; } // namespace torchjs #endif

As you can see, most of the code for the definition of our Tensor class is just boilerplate. The key point here is that thetorchjs ::张量将一个包裹火炬::张量和we added two special public methods (setTensorgetTensor）to set and get this internal torch tensor.

I won’t show all the implementation details because most parts of it are NodeJS boilerplate code to construct the object, etc. I’ll focus on the parts that touch the libtorch API, like in the code below where we are creating a small textual representation of the tensor to show on JavaScript (的toString方法）：

NAN_METHOD（张量::的toString）{张量* OBJ = ObjectWrap ::展开<张量>（info.Holder（））;的std :: stringstream的SS;在:: intList中尺寸= obj-> mTensor.sizes（）;SS << “张量[类型=” << obj-> mTensor.type（）<< “”;SS << “大小=” <<尺寸<<的std :: ENDL;。info.GetReturnValue（）设置（楠::新（ss.str（））ToLocalChecked（））;}

Wrapping Tensor-creation operations

Let’s create now a wrapper code for the火炬::onesfunction which is responsible for creating a tensor of any defined shape filled with constant 1’s.

NAN_METHOD（个）{//理智的参数检查，如果（info.Length（）<2）返回楠:: ThrowError（楠::新（ “错误的数目的参数”）ToLocalChecked（））;如果（！信息[0]  - > IsArray的（）||信息[1]  - > IsBoolean（）！）返回楠:: ThrowError（楠::新（ “错误的参数类型”）ToLocalChecked（））;//检索参数（require_grad和张量形状）const的布尔require_grad =信息[1]  - > BooleanValue中（）;常量V8 ::本地阵列=信息[0]。如（）;常量uint32_t的长度=阵列 - >长度（）;//从V8 ::数组转换为标准::矢量的std ::矢量<长长>变暗;对于（中间体I = 0;我<长度;我++）{V8 ::本地 V;INT d =阵列 - >获取（ⅰ） - > NumberValue（）;dims.push_back（d）;} //调用libtorch并创建一个新torchjs ::张量对象//包装新火炬::张量，是由火炬创建::在::张量V =火炬::一（变暗，火炬:: requires_grad的人（require_grad））; auto newinst = Tensor::NewInstance(); Tensor* obj = Nan::ObjectWrap::Unwrap(newinst); obj->setTensor(v); info.GetReturnValue().Set(newinst); }

间奏曲为PyTorch JIT

This script can be created in two different ways: by using a tracing JIT or by providing the script itself. In the tracing mode, your computational graph nodes will be visited and operations recorded to produce the final script, while the scripting is the mode where you provide this description of your model taking into account the restrictions of the Torch Script.

@ torch.jit.script DEF happy_function_script（X）：RET = torch.rand（0）如果真== TRUE：RET = torch.rand（1）否则：RET = torch.rand（2）返回RET DEF happy_function_trace（X）：t RET = torch.rand（0）如果真== TRUE：RET = torch.rand（1）否则：RET = torch.rand（2）返回RET traced_fn = torch.jit.trace（happy_function_trace，（torch.tensor（0），），check_trace = FALSE）

Now, if we inspect the IR generated by these two different approaches, we’ll clearly see the difference between the tracing and scripting approaches:

＃1）格拉夫从编写脚本的方法图（％×：动态）{％16：整数=拘谨::常数[值= 2]（）％10：整数=拘谨::常数[值= 1]（）％7：INT =拘谨::常数[值= 1]（）％8：整数=拘谨::常数[值= 1]（）％9：整数= ATEN ::当量（％7，％8）％保留：动态= ::拘谨如果（％9）BLOCK0（）{％11：INT [] =拘谨:: ListConstruct（％10）％12：整数=拘谨::常数[值= 6]（）％13：INT =拘谨::常数[值= 0]（）％14：INT [] =拘谨::常数[值= [0，-1]（）％ret.2：动态=阿坦::兰特（11％，12％，%13, %14) -> (%ret.2) } block1() { %17 : int[] = prim::ListConstruct(%16) %18 : int = prim::Constant[value=6]() %19 : int = prim::Constant[value=0]() %20 : int[] = prim::Constant[value=[0, -1]]() %ret.3 : Dynamic = aten::rand(%17, %18, %19, %20) -> (%ret.3) } return (%ret); } # 2) Graph from the tracing approach graph(%0 : Long()) { %7 : int = prim::Constant[value=1]() %8 : int[] = prim::ListConstruct(%7) %9 : int = prim::Constant[value=6]() %10 : int = prim::Constant[value=0]() %11 : int[] = prim::Constant[value=[0, -1]]() %12 : Float(1) = aten::rand(%8, %9, %10, %11) return (%12); }

As we can see, the IR is very similar to theLLVM IR应注意，在跟踪方法，跟踪记录包含的代码，真理的路径只有一条路径，而在脚本我们既有分支的替代品。然而，即使在脚本中，总是假的分支可以进行优化，并与死代码消除变换通过去除。

PyTorchJIT has a lot of transformation passes that are used to do loop unrolling, dead code elimination, etc. You can find these这里传递。不是转换成其它格式，例如ONNX可以被实施为在该中间表示（IR），这是相当方便的顶部一通。

追根RESNET

traced_net = torch.jit.trace(torchvision.models.resnet18(), torch.rand(1, 3, 224, 224)) traced_net.save("resnet18_trace.pt")

Wrapping the Script Module

// Class构造ScriptModule :: ScriptModule（常量的std :: string文件名）{//加载从文件这 - > mModule =炬:: JIT ::负载（文件名）所追踪的网络;} // JavaScript对象创建NAN_METHOD（ScriptModule ::新）{如果（info.IsConstructCall（））{//获取文件名参数V8 ::字符串:: Utf8Value param_filename（信息[0]  - >的ToString（））;常量的std :: string文件名=的std :: string（* param_filename）;//使用该文件名ScriptModule * OBJ =新ScriptModule（文件名）创建一个新的脚本模块;obj->裹（info.This（））;。info.GetReturnValue（）设置（info.This（））;}否则{V8 ::本地缺点=楠::新（构造）;。info.GetReturnValue（）设置（楠:: NewInstance方法（缺点）.ToLocalChecked（））;}}

NAN_METHOD（ScriptModule ::向前）{ScriptModule * script_module = ObjectWrap ::展开（info.Holder（））;楠:: MaybeLocal 说不定=楠::要（信息[0]）;张量*张量=楠:: ObjectWrap ::展开<张量>（maybe.ToLocalChecked（））;火炬::张量torch_tensor = tensor-> getTensor（）;火炬::张量输出= script_module-> mModule->向前（{torch_tensor}）toTensor（）;自动newinst中=张量:: NewInstance方法（）;张量* OBJ =楠:: ObjectWrap ::展开<张量>（newinst中）;obj-> setTensor（输出）;。info.GetReturnValue（）设置（newinst中）;}

As you can see, in this code, we just receive a tensor as an argument, we get the internal火炬::张量从它，然后调用从脚本模块forward方法，我们总结了新的输出torchjs ::张量然后返回。

And that’s it, we’re ready to use our built module in native NodeJS as in the example below:

VAR torchjs =需要（ “./构建/释放/ torchjs”）;VAR = script_module新torchjs.ScriptModule（ “resnet18_trace.pt”）;VAR数据= torchjs.ones（[1，3，224，224]，假）;VAR输出= script_module.forward（数据）;

- 基督教S. Perone

188asia.net

介绍

Concentration inequalities, or probability bounds, are very important tools for the analysis of Machine Learning algorithms or randomized algorithms. In statistical learning theory, we often want to show that random variables, given some assumptions, are close to its expectation with high probability. This article provides an overview of the most basic inequalities in the analysis of these concentration measures.

马尔可夫不等式

The Markov’s inequality is one of the most basic bounds and it assumes almost nothing about the random variable. The assumptions that Markov’s inequality makes is that the random variable $$X$$ is non-negative $$X > 0$$ and has a finite expectation $$\mathbb{E}\left[X\right] < \infty$$. The Markov’s inequality is given by:

$$\ underbrace {P（X \ GEQ \阿尔法）} _ {\文本{的大于恒定} \阿尔法概率} \当量\ underbrace {\压裂{\ mathbb {E} \左[X \权利]}{\阿尔法}} _ {\文本{界以上由期望超过恒定} \阿尔法}$$

Example: A grocery store sells an average of 40 beers per day (it’s summer !). What is the probability that it will sell 80 or more beers tomorrow ?

\begin{align} P（X \ GEQ \阿尔法）\当量\压裂{\ mathbb {E} \左[X \右]} {\阿尔法} \\\\ P（X \ GEQ 80）＆\当量\压裂{40} {80} = 0.5 = 50 \％ \end{align}

Chebyshev’s Inequality

When we have information about the underlying distribution of a random variable, we can take advantage of properties of this distribution to know more about the concentration of this variable. Let’s take for example a normal distribution with mean $$\mu = 0$$ and unit standard deviation $$\sigma = 1$$ given by the probability density function (PDF) below:

$$F（X）= \压裂{1} {\ SQRT {2 \ PI}}ë^ { - X ^ 2/2}$$

$$P（\中期X - \亩\中间\ GEQķ\西格马）\当量\压裂{1} {K ^ 2}$$

$$P（\中期X - \亩\中间$$

Chebyshev’s Inequality and the Weak Law of Large Numbers

That can be done as follows:

• 考虑独立同分布的序列（独立同分布）的随机变量\（X_1，X_2，X_3，\ ldots \），平均\（\亩\）和方差\（\西格马^ 2 \）;
• The sample mean is $$M_n = \frac{X_1 + \ldots + X_n}{n}$$ and the true mean is $$\mu$$;
• 对于样品的期望意味着我们有：$$\ mathbb {E} \左[M_n \右] = \压裂{\ mathbb {E} \左[X_1 \右] + \ ldots + \ mathbb {E} \左[X_n \右]} {N} = \压裂{N \亩} {N} = \亩$$
• 对于样品的方差，我们有：$$瓦尔\左[M_n \右] = \压裂{VAR \左[X_1 \右] + \ ldots +无功\左[X_n \右]} {N ^ 2} =\压裂{N \西格玛^ 2} {N ^ 2} = \压裂{\西格马^ 2} {N}$$
• By the application of the Chebyshev’s inequality we have: $$P(\mid M_n – \mu \mid \geq \epsilon) \leq \frac{\sigma^2}{n\epsilon^2}$$ for any (fixed) $$\epsilon > 0$$, as $$n$$ increases, the right side of the inequality goes to zero. Intuitively, this means that for a large $$n$$ the concentration of the distribution of $$M_n$$ will be around $$\mu$$.

Improving on Markov’s and Chebyshev’s with Chernoff Bounds

Before getting into the Chernoff bound, let’s understand the motivation behind it and how one can improve on Chebyshev’s bound. To understand it, we first need to understand the difference between a pairwise independence and mutual independence. For the pairwise independence, we have the following for A, B, and C:

$$P（A \帽B）= P（A）P（B）\\ P（A \帽C）= P（A）P（C）\\ P（B \帽C）= P（B）P（C）$$

Which means that any pair (any two events) are independent, but not necessarily that:

$$P（A \帽乙\帽C）= P（A）P（B）P（C）$$

which is called “mutual independence” and it is a stronger independence. By definition, the mutual independence assumes the pairwise independence but the opposite isn’t always true. And this is the case where we can improve on Chebyshev’s bound, as it is not possible without doing these further assumptions (stronger assumptions leads to stronger bounds).

NLP字的陈述和语言维特根斯坦哲学

I made anintroductory talk在过去的嵌入字，这写了大约词矢量背后的哲学思想的部分的扩展版本。这篇文章的目的是提供一个介绍维特根斯坦的主要思路上是密切相关的是分配（这意味着什么后，我将讨论）技术设计语言学，如word2vec [Mikolov等人，2013]，手套[Pennington et al., 2014]，跳过-思想载体[基罗斯等人，2015年]等等。

One of the most interesting aspects of Wittgenstein is perhaps that fact that he had developed two very different philosophies during his life, and each of which had great influence. Something quite rare for someone who spent so much time working on these ideas and retreating even after the major influence they exerted, especially in the Vienna Circle. A true lesson of intellectual honesty, and in my opinion, one important legacy.

Wittgenstein was an avid reader of the Schopenhauer’s philosophy, and in the same way that Schopenhauer inherited his philosophy from Kant, especially regarding the division of what can be experimented (现象） 或不 （noumena），Contrasting things as他们似乎对我们从东西as they are in themselves，Wittgenstein concluded that Schopenhauer philosophy was fundamentally right. He believed that in thenoumena境界，我们没有概念的理解，因此我们将永远不能说任何东西（没有成为废话），而相比之下，现象realm of our experience, where we can indeed talk about and try to understand. By adding secure foundations, such as logic, to the phenomenal world, he was able to reason about how the world is describable by language and thus mapping what are the limits of how and what can be expressed in language or in conceptual thought.

The first main theory of language from Wittgenstein, described in his逻辑哲学论被称为“Picture theory of language”（又名图片意义的理论）。这个理论是基于一个比喻与绘画，维特根斯坦意识到，一画就是东西比一个自然景观非常不同的，然而，一个熟练的画家仍然可以通过将相应的自然景观现实补丁或中风代表真正的景观。亚洲金博宝维特根斯坦给了名字“逻辑形式“这组paintin之间的关系g and the natural landscape. This logical form, the set of internal relationships common to both representations, is why the painter was able to represent reality because the logical form was the same in both representations (在这里我呼吁双方为“交涉”是一致的与叔本华和康德而言，因为现实对我们来说也是一个代表，它和物自体本身区分）。

The meaning of a word is its在语言。

(…)

- 维特根斯坦，哲学研究

Our language can be seen as an ancient city: a maze of little streets and squares, of old and new houses, and of houses with additions from various periods (…)

- 维特根斯坦，哲学研究

The placing of a text as a constituent in a context of situation contributes to the statement of meaning since situations are set up to recognize use. As Wittgenstein says, ‘the meaning of words lies in their use。’ (Phil. Investigations, 80, 109). The day-to-day practice of playing language games recognizes customs and rules. It follows that a text in such established usage may contain sentences such as ‘Don’t be such an ass !’, ‘You silly ass !’, ‘What an ass he is !’ In these examples, the word ass is in familiar and habitual company, commonly collocated with you silly-, he is a silly-, don’t be such an-.You shall know a word by the company it keeps !一屁股的含义是其惯常搭配与这样换句话说上文引述。虽然维特根斯坦正在处理另一个问题，he also recognizes the plain face-value, the physiognomy of words. They look at us !“这句话是由词，这是足够的”。

- 约翰·弗斯

- 基督教S. Perone

Cite this article as: Christian S. Perone, "NLP word representations and the Wittgenstein philosophy of language," in亚洲金博宝未知领域，23/05/2018,//www.cpetem.com/2018/05/nlp-word-representations-and-the-wittgenstein-philosophy-of-language/

References

Mikolov, Thomas et al. Efficient Estimation of Word Representations in Vector Space. 2013. https://arxiv.org/abs/1301.3781

Neelakantan, Arvind et al. Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space. 2015. https://arxiv.org/abs/1504.06654