听培训期间的神经网络梯度规范

训练神经网络往往是通过测量许多不同的指标,如准确性,损失,渐变等做了这是大多数做了汇总这些指标和TensorBoard绘制可视化的时间。

有,但是,其他的感觉,我们可以用它来监控神经网络,如培训声音。声音是一种目前在神经网络的训练探索很差的观点之一。亚洲金博宝人类听觉能很好的在一个等特点,节奏和音调区亚洲金博宝别非常小的扰动,即使这些扰动很短的时间或微妙。

对于这个实验,我做出示出利用各层的和用于使用不同的设置,如不亚洲金博宝同的学习率,优化器,动量上MNIST卷积神经网络训练的训练步骤中的梯度范数由合成声音一个非常简单的例子等等。

You’ll need to installPyAudioPyTorch以运行该代码(在这个帖子的结尾)。

使用LR 0.01 SGD培训声

This segment represents a training session with gradients from 4 layers during the first 200 steps of the first epoch and using a batch size of 10. The higher the pitch, the higher the norm for a layer, there is a short silence to indicate different batches. Note the gradient increasing during time.

使用LR 0.1 SGD培训声

Same as above, but with higher learning rate.

Training sound with SGD using LR 1.0

与上述相同,但较高的学习速度,使网络发散,讲究高音时的规范爆炸,然后发散。

Training sound with SGD using LR 1.0 and BS 256

Same setting but with a high learning rate of 1.0 and a batch size of 256. Note how the gradients explode and then there are NaNs causing the final sound.

Training sound with Adam using LR 0.01

这是在相同的设置作为SGD用亚当。

源代码

对于那些有兴趣谁,这里是我用来做声音剪辑的完整的源代码:

进口pyaudio进口numpy np波不进口rt torch import torch.nn as nn import torch.nn.functional as F import torch.optim as optim from torchvision import datasets, transforms class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.conv1 = nn.Conv2d(1, 20, 5, 1) self.conv2 = nn.Conv2d(20, 50, 5, 1) self.fc1 = nn.Linear(4*4*50, 500) self.fc2 = nn.Linear(500, 10) self.ordered_layers = [self.conv1, self.conv2, self.fc1, self.fc2] def forward(self, x): x = F.relu(self.conv1(x)) x = F.max_pool2d(x, 2, 2) x = F.relu(self.conv2(x)) x = F.max_pool2d(x, 2, 2) x = x.view(-1, 4*4*50) x = F.relu(self.fc1(x)) x = self.fc2(x) return F.log_softmax(x, dim=1) def open_stream(fs): p = pyaudio.PyAudio() stream = p.open(format=pyaudio.paFloat32, channels=1, rate=fs, output=True) return p, stream def generate_tone(fs, freq, duration): npsin = np.sin(2 * np.pi * np.arange(fs*duration) * freq / fs) samples = npsin.astype(np.float32) return 0.1 * samples def train(model, device, train_loader, optimizer, epoch): model.train() fs = 44100 duration = 0.01 f = 200.0 p, stream = open_stream(fs) frames = [] for batch_idx, (data, target) in enumerate(train_loader): data, target = data.to(device), target.to(device) optimizer.zero_grad() output = model(data) loss = F.nll_loss(output, target) loss.backward() norms = [] for layer in model.ordered_layers: norm_grad = layer.weight.grad.norm() norms.append(norm_grad) tone = f + ((norm_grad.numpy()) * 100.0) tone = tone.astype(np.float32) samples = generate_tone(fs, tone, duration) frames.append(samples) silence = np.zeros(samples.shape[0] * 2, dtype=np.float32) frames.append(silence) optimizer.step() # Just 200 steps per epoach if batch_idx == 200: break wf = wave.open("sgd_lr_1_0_bs256.wav", 'wb') wf.setnchannels(1) wf.setsampwidth(p.get_sample_size(pyaudio.paFloat32)) wf.setframerate(fs) wf.writeframes(b''.join(frames)) wf.close() stream.stop_stream() stream.close() p.terminate() def run_main(): device = torch.device("cpu") train_loader = torch.utils.data.DataLoader( datasets.MNIST('../data', train=True, download=True, transform=transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,)) ])), batch_size=256, shuffle=True) model = Net().to(device) optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.5) for epoch in range(1, 2): train(model, device, train_loader, optimizer, epoch) if __name__ == "__main__": run_main()
Cite this article as: Christian S. Perone, "Listening to the neural network gradient norms during training," in亚洲金博宝未知领域,2019年4月8日,//www.cpetem.com/2019/08/listening-to-the-neural-network-gradient-norms-during-training/

在GPT-2语言模型法福德

我写了一些几个月前约在本福德从法律语言模型如何出现今天我决定来评价同样的方法检查如何GPT-2会表现得与一些句子,事实证明,它似乎也捕捉到这些功法。你可以找到一些地块下面的例子中,图中显示的是数字的概率给出一个特定的语句,如“以”一个人口规模,示出的分布:$$ P(\ {1,2,\ ldots,9 \} \ VERT \文本{“与群体大小”})$$用于GPT-2培养基模型(345M):

Cite this article as: Christian S. Perone, "Benford law on GPT-2 language model," in亚洲金博宝未知领域,14/06/2019,//www.cpetem.com/2019/06/benford-law-on-gpt-2-language-model/

188bet手机版客户端

Past week I released the first public version ofEuclidesDB.EuclidesDB is a multi-model machine learning feature database that is tightly coupled with PyTorch and provides a backend for including and querying data on the model feature space.

欲了解更多信息,请参阅金宝博游戏网址GitHub的库或者文件

EuclidesDB的某些功能列举如下:

  • Written in C++ for performance;
  • 用途protobuf的数据序列化;
  • 用途GRPC通信;
  • 性LevelDB集成数据库系列化;
  • Many indexing methods implemented (AnnoyFaiss等);
  • 通过libtorch紧密PyTorch整合;
  • 轻松集成新的自定义微调模式;
  • Easy client language binding generation;
  • 免费和开放源代码与自由许可;

And here is a diagram of the overall architecture:

188betiosapp

更新2019年2月28日:我添加了一个new blog post with a slide deck包括我做了PyData蒙特利尔的表现。

今天,在PyTorch开发者大会中,PyTorch队宣布计划与PyTorch 1.0预览有许多不错的释放如模型图一个JIT(有和没有跟踪)功能,以及在LibTorch,the PyTorch C++ API, one of themost important release announcement今天在我看来做。

Given the huge interest in understanding how this new API works, I decided to write this article showing an example of many opportunities that are now open after the release of the PyTorch C++ API. In this post, I’ll integrate PyTorch inference into native NodeJS using NodeJS C++ add-ons, just as an example of integration between different frameworks/languages that are now possible using the C++ API.

下面你可以看到最后的结果:

正如你所看到的,整合是无缝的,我可以用一个跟踪RESNET作为计算图模型和饲料任何张量它得到的输出预测。

介绍

简单地说,libtorch是PyTorch库版本。它包含的底层基础所使用的PyTorch,如ATEN(the tensor library), which contains all the tensor operations and methods. Libtorch also contains theautograd,which is the component that adds the automatic differentiation to the ATen tensors.

谨慎为那些谁现在正在开始一个字是要小心使用,可以从ATEN和autograd创建两个张量,do not mix them中,ATEN将返回平原张量(当你使用它们创建命名空间)而autograd函数(从火炬namespace) will return变量,by adding its automatic differentiation mechanism.

For a more extensive tutorial on how PyTorch internals work, please take a look on my previous tutorial on thePyTorchinternal architecture

Libtorch可以从下载Pytorch网站和it is only available as a preview for a while. You can also find the documentation inthis site,这主要是一个Doxygen的渲染文档。我发现图书馆相当稳定,这是有道理的,因为它实际上是暴露PyTorch的稳定基础,但是,也有一些问题,标题和一些小问题,关于图书馆的组织,而开始使用它,你可能会发现(将希望尽快修复)。

对于的NodeJS,我将使用Native Abstractions库(楠),这是最值得推荐的库(实际上基本是仅标头库)来创建的NodeJS C ++的附加组件和Cmake-js,因为libtorch已经提供cmake的文件,使我们的建设过程变得更加容易。然而,这里的重点将是C ++代码,而不是在建设过程中。

为发展方向,跟踪,序列化和加载模型,流程可以在左边的图中可以看出。

它从开发过程和PyTorch(Python的域)追查正在做的,然后在C ++领域的装载和推理(在我们的例子中添加的NodeJS上)。

Wrapping the Tensor

In NodeJS, to create an object as a first-class citizen of the JavaScript world, you need to inherit from theObjectWrap类,这将负责用于包装C ++组件。

#ifndef TENSOR_H #define TENSOR_H #include  #include  namespace torchjs { class Tensor : public Nan::ObjectWrap { public: static NAN_MODULE_INIT(Init); void setTensor(at::Tensor tensor) { this->mTensor = tensor; } torch::Tensor getTensor() { return this->mTensor; } static v8::Local NewInstance(); private: explicit Tensor(); ~Tensor(); static NAN_METHOD(New); static NAN_METHOD(toString); static Nan::Persistent constructor; private: torch::Tensor mTensor; }; } // namespace torchjs #endif

As you can see, most of the code for the definition of our Tensor class is just boilerplate. The key point here is that thetorchjs ::张量将一个包裹火炬::张量和we added two special public methods (setTensorgetTensor)to set and get this internal torch tensor.

I won’t show all the implementation details because most parts of it are NodeJS boilerplate code to construct the object, etc. I’ll focus on the parts that touch the libtorch API, like in the code below where we are creating a small textual representation of the tensor to show on JavaScript (的toString方法):

NAN_METHOD(张量::的toString){张量* OBJ = ObjectWrap ::展开<张量>(info.Holder());的std :: stringstream的SS;在:: intList中尺寸= obj-> mTensor.sizes();SS << “张量[类型=” << obj-> mTensor.type()<< “”;SS << “大小=” <<尺寸<<的std :: ENDL;。info.GetReturnValue()设置(楠::新(ss.str())ToLocalChecked());}

我们在上面做代码,通过在刚刚起步的内部对象张从被包装的对象解缠it. After that, we build a string representation with the tensor size (each dimension sizes) and its type (float, etc).

Wrapping Tensor-creation operations

Let’s create now a wrapper code for the火炬::onesfunction which is responsible for creating a tensor of any defined shape filled with constant 1’s.

NAN_METHOD(个){//理智的参数检查,如果(info.Length()<2)返回楠:: ThrowError(楠::新( “错误的数目的参数”)ToLocalChecked());如果(!信息[0]  - > IsArray的()||信息[1]  - > IsBoolean()!)返回楠:: ThrowError(楠::新( “错误的参数类型”)ToLocalChecked());//检索参数(require_grad和张量形状)const的布尔require_grad =信息[1]  - > BooleanValue中();常量V8 ::本地阵列=信息[0]。如();常量uint32_t的长度=阵列 - >长度();//从V8 ::数组转换为标准::矢量的std ::矢量<长长>变暗;对于(中间体I = 0;我<长度;我++){V8 ::本地 V;INT d =阵列 - >获取(ⅰ) - > NumberValue();dims.push_back(d);} //调用libtorch并创建一个新torchjs ::张量对象//包装新火炬::张量,是由火炬创建::在::张量V =火炬::一(变暗,火炬:: requires_grad的人(require_grad)); auto newinst = Tensor::NewInstance(); Tensor* obj = Nan::ObjectWrap::Unwrap(newinst); obj->setTensor(v); info.GetReturnValue().Set(newinst); }

那么,让我们通过这段代码。我们首先检查ing the arguments of the function. For this function, we’re expecting a tuple (a JavaScript array) for the tensor shape and a boolean indicating if we want to compute gradients or not for this tensor node. After that, we’re converting the parameters from the V8 JavaScript types into native C++ types. Soon as we have the required parameters, we then call the火炬::ones从libtorch功能,这个功能将创建在这里我们使用一个新的张量torchjs ::张量Class that we created earlier to wrap it.

就是这样,我们刚刚曝光,可以用来作为本地JavaScript运行一个火炬操作。

间奏曲为PyTorch JIT

引进的PyTorch JIT围绕着火炬脚本的概念。火炬脚本是Python语言的有限子集,并配有自己的编译器和转换通行证(优化等)。

This script can be created in two different ways: by using a tracing JIT or by providing the script itself. In the tracing mode, your computational graph nodes will be visited and operations recorded to produce the final script, while the scripting is the mode where you provide this description of your model taking into account the restrictions of the Torch Script.

请注意,如果你有你的代码依赖于外部因素或数据,跟踪你希望不会工作,因为它会记录下图的某次执行,因此替代方案,以提供脚本分支决定。然而,在大多数的情况下,跟踪是我们所需要的。

了解这些差异,让我们从两个通过跟踪和脚本生成的脚本模块来看看中间表示(IR)。

@ torch.jit.script DEF happy_function_script(X):RET = torch.rand(0)如果真== TRUE:RET = torch.rand(1)否则:RET = torch.rand(2)返回RET DEF happy_function_trace(X):t RET = torch.rand(0)如果真== TRUE:RET = torch.rand(1)否则:RET = torch.rand(2)返回RET traced_fn = torch.jit.trace(happy_function_trace,(torch.tensor(0),),check_trace = FALSE)

在上面的代码中,我们提供两个功能,一个是使用@ torch.jit.script装饰,并且它是脚本的方式来创建一个火炬脚本,而第二个功能是正在使用的跟踪功能torch.jit.trace。这并不是说我故意增加了一个“真正的==真”的功能(这将永远是正确的)决定。

Now, if we inspect the IR generated by these two different approaches, we’ll clearly see the difference between the tracing and scripting approaches:

#1)格拉夫从编写脚本的方法图(%×:动态){%16:整数=拘谨::常数[值= 2]()%10:整数=拘谨::常数[值= 1]()%7:INT =拘谨::常数[值= 1]()%8:整数=拘谨::常数[值= 1]()%9:整数= ATEN ::当量(%7,%8)%保留:动态= ::拘谨如果(%9)BLOCK0(){%11:INT [] =拘谨:: ListConstruct(%10)%12:整数=拘谨::常数[值= 6]()%13:INT =拘谨::常数[值= 0]()%14:INT [] =拘谨::常数[值= [0,-1]()%ret.2:动态=阿坦::兰特(11%,12%,%13, %14) -> (%ret.2) } block1() { %17 : int[] = prim::ListConstruct(%16) %18 : int = prim::Constant[value=6]() %19 : int = prim::Constant[value=0]() %20 : int[] = prim::Constant[value=[0, -1]]() %ret.3 : Dynamic = aten::rand(%17, %18, %19, %20) -> (%ret.3) } return (%ret); } # 2) Graph from the tracing approach graph(%0 : Long()) { %7 : int = prim::Constant[value=1]() %8 : int[] = prim::ListConstruct(%7) %9 : int = prim::Constant[value=6]() %10 : int = prim::Constant[value=0]() %11 : int[] = prim::Constant[value=[0, -1]]() %12 : Float(1) = aten::rand(%8, %9, %10, %11) return (%12); }

As we can see, the IR is very similar to theLLVM IR应注意,在跟踪方法,跟踪记录包含的代码,真理的路径只有一条路径,而在脚本我们既有分支的替代品。然而,即使在脚本中,总是假的分支可以进行优化,并与死代码消除变换通过去除。

PyTorchJIT has a lot of transformation passes that are used to do loop unrolling, dead code elimination, etc. You can find these这里传递。不是转换成其它格式,例如ONNX可以被实施为在该中间表示(IR),这是相当方便的顶部一通。

追根RESNET

现在,在执行的NodeJS的脚本模块之前,让我们先跟踪使用PyTorch(只使用Python)的一个RESNET网络:

traced_net = torch.jit.trace(torchvision.models.resnet18(), torch.rand(1, 3, 224, 224)) traced_net.save("resnet18_trace.pt")

可以看到从上面的代码中,我们只有to provide a tensor example (in this case a batch of a single image with 3 channels and size 224×224. After that we just save the traced network into a file calledresnet18_trace.pt

现在,我们已经准备好执行脚本模块中的NodeJS以加载该文件被追踪。

Wrapping the Script Module

这是现在的脚本模块中的的NodeJS执行:

// Class构造ScriptModule :: ScriptModule(常量的std :: string文件名){//加载从文件这 - > mModule =炬:: JIT ::负载(文件名)所追踪的网络;} // JavaScript对象创建NAN_METHOD(ScriptModule ::新){如果(info.IsConstructCall()){//获取文件名参数V8 ::字符串:: Utf8Value param_filename(信息[0]  - >的ToString());常量的std :: string文件名=的std :: string(* param_filename);//使用该文件名ScriptModule * OBJ =新ScriptModule(文件名)创建一个新的脚本模块;obj->裹(info.This());。info.GetReturnValue()设置(info.This());}否则{V8 ::本地缺点=楠::新(构造);。info.GetReturnValue()设置(楠:: NewInstance方法(缺点).ToLocalChecked());}}

正如你可以从上面的代码中看到,我们只是创建一个类,将调用火炬:: JIT ::负荷功能经过追查网络的文件名。我们也有JavaScript对象,我们在那里参数转换为C ++类,然后创建的一个新实例的执行torchjs :: ScriptModule

直传的包装也很简单:

NAN_METHOD(ScriptModule ::向前){ScriptModule * script_module = ObjectWrap ::展开(info.Holder());楠:: MaybeLocal 说不定=楠::要(信息[0]);张量*张量=楠:: ObjectWrap ::展开<张量>(maybe.ToLocalChecked());火炬::张量torch_tensor = tensor-> getTensor();火炬::张量输出= script_module-> mModule->向前({torch_tensor})toTensor();自动newinst中=张量:: NewInstance方法();张量* OBJ =楠:: ObjectWrap ::展开<张量>(newinst中);obj-> setTensor(输出);。info.GetReturnValue()设置(newinst中);}

As you can see, in this code, we just receive a tensor as an argument, we get the internal火炬::张量从它,然后调用从脚本模块forward方法,我们总结了新的输出torchjs ::张量然后返回。

And that’s it, we’re ready to use our built module in native NodeJS as in the example below:

VAR torchjs =需要( “./构建/释放/ torchjs”);VAR = script_module新torchjs.ScriptModule( “resnet18_trace.pt”);VAR数据= torchjs.ones([1,3,224,224],假);VAR输出= script_module.forward(数据);

我希望你喜欢!Libtorch打开了在许多不同的语言和框架的紧密集成PyTorch,这是非常令人兴奋,对生产部署代码的方向迈出了一大步门。

- 基督教S. Perone

引用本文为:基督教S. Perone, “PyTorch 1.0追踪JIT和LibTorch C ++ API来PyTorch融入的NodeJS,” 在亚洲金博宝未知领域,2018年2月10日,188betiosapp

188asia.net

介绍

Concentration inequalities, or probability bounds, are very important tools for the analysis of Machine Learning algorithms or randomized algorithms. In statistical learning theory, we often want to show that random variables, given some assumptions, are close to its expectation with high probability. This article provides an overview of the most basic inequalities in the analysis of these concentration measures.

马尔可夫不等式

The Markov’s inequality is one of the most basic bounds and it assumes almost nothing about the random variable. The assumptions that Markov’s inequality makes is that the random variable \(X\) is non-negative \(X > 0\) and has a finite expectation \(\mathbb{E}\left[X\right] < \infty\). The Markov’s inequality is given by:

$$ \ underbrace {P(X \ GEQ \阿尔法)} _ {\文本{的大于恒定} \阿尔法概率} \当量\ underbrace {\压裂{\ mathbb {E} \左[X \权利]}{\阿尔法}} _ {\文本{界以上由期望超过恒定} \阿尔法} $$

这意味着由恒定\(\阿尔法\)划分的概率随机变量\(X \)将由\(X \)的期望的限制。什么是显着的这个约束,是它拥有与正值任何分配,它不依赖于概率分布的任何功能,只需要一些薄弱的假设和它的第一时刻,期待。

Example: A grocery store sells an average of 40 beers per day (it’s summer !). What is the probability that it will sell 80 or more beers tomorrow ?

$$
\begin{align}
P(X \ GEQ \阿尔法)\当量\压裂{\ mathbb {E} \左[X \右]} {\阿尔法} \\\\
P(X \ GEQ 80)&\当量\压裂{40} {80} = 0.5 = 50 \%
\end{align}
$$

马尔可夫不等式不依赖于随机变量的概率分布的任何财产,所以很明显,有两种使用更好的界限,如果有关概率分布的信息是可用的。

Chebyshev’s Inequality

When we have information about the underlying distribution of a random variable, we can take advantage of properties of this distribution to know more about the concentration of this variable. Let’s take for example a normal distribution with mean \(\mu = 0\) and unit standard deviation \(\sigma = 1\) given by the probability density function (PDF) below:

$$ F(X)= \压裂{1} {\ SQRT {2 \ PI}}ë^ { - X ^ 2/2} $$

积分从-1到1:\(\ INT _ { - 1} ^ {1} \压裂{1} {\ SQRT {2 \ PI}}ë^ { - X ^ 2/2} \),我们知道,68数据的%是内\从平均值(1个\西格玛\)(一个标准偏差)\(\亩\)和95%是内\从平均值(2 \西格玛\)。但是,当它不是可以假定正态性,数据的任何其它量可被内\(1个\西格玛\)或\(2 \西格玛\)浓缩。

切比雪夫不等式提供了一种方式来获得一个绑定在任何分配的集中,而不承担除了有限的均值和方差的任何基本属性。切比雪夫也适用于任何随机变量,不仅对非负变量的马尔可夫不等式。

切比雪夫不等式是由以下关系式给出:

$$
P(\中期X - \亩\中间\ GEQķ\西格马)\当量\压裂{1} {K ^ 2}
$$

这也可以被改写为:

$$
P(\中期X - \亩\中间$$

对于\(K = 2 \)的具体情况下,切比雪夫告诉我们,数据的至少75%集中在平均值的2个标准偏差。这适用于any distribution

现在,当我们比较这对结果\与正态分布的95%浓度(K = 2 \)\(2 \西格玛\),我们可以看到的是如何保守的切比雪夫的约束。然而,我们不能忘记,这适用于任何分配,而不是只为一个正态分布随机变量,和所有的切比雪夫的需求,是数据的第一和第二的时刻。一些需要注意的重要的是,在缺乏有关随机变量的详细信息,这不能得到改善。

Chebyshev’s Inequality and the Weak Law of Large Numbers

切比雪夫不等式,也可以用来证明weak law of large numbers,它说,在概率对真实均值样本均值收敛。

That can be done as follows:

  • 考虑独立同分布的序列(独立同分布)的随机变量\(X_1,X_2,X_3,\ ldots \),平均\(\亩\)和方差\(\西格马^ 2 \);
  • The sample mean is \(M_n = \frac{X_1 + \ldots + X_n}{n}\) and the true mean is \(\mu\);
  • 对于样品的期望意味着我们有:$$ \ mathbb {E} \左[M_n \右] = \压裂{\ mathbb {E} \左[X_1 \右] + \ ldots + \ mathbb {E} \左[X_n \右]} {N} = \压裂{N \亩} {N} = \亩$$
  • 对于样品的方差,我们有:$$瓦尔\左[M_n \右] = \压裂{VAR \左[X_1 \右] + \ ldots +无功\左[X_n \右]} {N ^ 2} =\压裂{N \西格玛^ 2} {N ^ 2} = \压裂{\西格马^ 2} {N} $$
  • By the application of the Chebyshev’s inequality we have: $$ P(\mid M_n – \mu \mid \geq \epsilon) \leq \frac{\sigma^2}{n\epsilon^2}$$ for any (fixed) \(\epsilon > 0\), as \(n\) increases, the right side of the inequality goes to zero. Intuitively, this means that for a large \(n\) the concentration of the distribution of \(M_n\) will be around \(\mu\).

Improving on Markov’s and Chebyshev’s with Chernoff Bounds

Before getting into the Chernoff bound, let’s understand the motivation behind it and how one can improve on Chebyshev’s bound. To understand it, we first need to understand the difference between a pairwise independence and mutual independence. For the pairwise independence, we have the following for A, B, and C:

$$
P(A \帽B)= P(A)P(B)\\
P(A \帽C)= P(A)P(C)\\
P(B \帽C)= P(B)P(C)
$$

Which means that any pair (any two events) are independent, but not necessarily that:

$$
P(A \帽乙\帽C)= P(A)P(B)P(C)
$$

which is called “mutual independence” and it is a stronger independence. By definition, the mutual independence assumes the pairwise independence but the opposite isn’t always true. And this is the case where we can improve on Chebyshev’s bound, as it is not possible without doing these further assumptions (stronger assumptions leads to stronger bounds).

我们将谈论在本教程的第二部分中的切尔诺夫界!

引用本文为:基督教S. Perone,“集中不等式 - 第一部分,”在亚洲金博宝未知领域,23/08/2018,188asia.net

NLP字的陈述和语言维特根斯坦哲学

I made anintroductory talk在过去的嵌入字,这写了大约词矢量背后的哲学思想的部分的扩展版本。这篇文章的目的是提供一个介绍维特根斯坦的主要思路上是密切相关的是分配(这意味着什么后,我将讨论)技术设计语言学,如word2vec [Mikolov等人,2013],手套[Pennington et al., 2014],跳过-思想载体[基罗斯等人,2015年]等等。

One of the most interesting aspects of Wittgenstein is perhaps that fact that he had developed two very different philosophies during his life, and each of which had great influence. Something quite rare for someone who spent so much time working on these ideas and retreating even after the major influence they exerted, especially in the Vienna Circle. A true lesson of intellectual honesty, and in my opinion, one important legacy.

Wittgenstein was an avid reader of the Schopenhauer’s philosophy, and in the same way that Schopenhauer inherited his philosophy from Kant, especially regarding the division of what can be experimented (现象) 或不 (noumena),Contrasting things as他们似乎对我们从东西as they are in themselves,Wittgenstein concluded that Schopenhauer philosophy was fundamentally right. He believed that in thenoumena境界,我们没有概念的理解,因此我们将永远不能说任何东西(没有成为废话),而相比之下,现象realm of our experience, where we can indeed talk about and try to understand. By adding secure foundations, such as logic, to the phenomenal world, he was able to reason about how the world is describable by language and thus mapping what are the limits of how and what can be expressed in language or in conceptual thought.

The first main theory of language from Wittgenstein, described in his逻辑哲学论被称为“Picture theory of language”(又名图片意义的理论)。这个理论是基于一个比喻与绘画,维特根斯坦意识到,一画就是东西比一个自然景观非常不同的,然而,一个熟练的画家仍然可以通过将相应的自然景观现实补丁或中风代表真正的景观。亚洲金博宝维特根斯坦给了名字“逻辑形式“这组paintin之间的关系g and the natural landscape. This logical form, the set of internal relationships common to both representations, is why the painter was able to represent reality because the logical form was the same in both representations (在这里我呼吁双方为“交涉”是一致的与叔本华和康德而言,因为现实对我们来说也是一个代表,它和物自体本身区分)。

这个理论是非常重要的,尤其是在我们的语境(NLP),因为维特根斯坦意识到,同样的事情发生与语言。我们能够组装单词句子匹配相同逻辑形式什么样的,我们想描述。逻辑形式是核心思想,使我们能这个世界谈谈。然而,后期维特根斯坦意识到,他刚刚捡到一个任务,输出任务该语言可以执行的大量的产生和它周围的意义整体论。

事实是,语言可以做很多其他的任务,除了代表(生动描述)的现实。有了语言,维特根斯坦注意到,我们可以发号施令,我们不能说这是东西的图片。不久,他意识到这些反例,维特根斯坦放弃了picture theory of language和adopted a much more powerful metaphor ofa tool。在这里,我们正在接近语言以及背后许多现代机器学习技术的字/句子陈述的作品相当不错的主要基本思想意义的现代观点。一旦你意识到语言的作品作为一个工具,如果你想了解它的意思,你只需要了解所有可能的东西,你可以用它做。如果你就拿一个词或概念隔离,它的意义是其所有用途的总和,这个意义是流动的,可以有很多不同的面孔。这个重要思想可以概括为以下的知名的报价:

The meaning of a word is its在语言。

(…)

人们无法猜测有一个词功能。一个人必须看它的使用,以及从学习

- 维特根斯坦,哲学研究

事实上它使完整意义上的,因为一旦你用尽一个字的所有用途,没有什么留下它。现实也是目前为止更流畅的比通常认为,这是因为:

Our language can be seen as an ancient city: a maze of little streets and squares, of old and new houses, and of houses with additions from various periods (…)

- 维特根斯坦,哲学研究

约翰·弗斯是一个语言学家也被称为是谁也用维特根斯坦的哲学研究作为求助于强调的意思,在下面我引用的上下文的重要性的意思这依赖于上下文的性质的普及:

The placing of a text as a constituent in a context of situation contributes to the statement of meaning since situations are set up to recognize use. As Wittgenstein says, ‘the meaning of words lies in their use。’ (Phil. Investigations, 80, 109). The day-to-day practice of playing language games recognizes customs and rules. It follows that a text in such established usage may contain sentences such as ‘Don’t be such an ass !’, ‘You silly ass !’, ‘What an ass he is !’ In these examples, the word ass is in familiar and habitual company, commonly collocated with you silly-, he is a silly-, don’t be such an-.You shall know a word by the company it keeps !一屁股的含义是其惯常搭配与这样换句话说上文引述。虽然维特根斯坦正在处理另一个问题,he also recognizes the plain face-value, the physiognomy of words. They look at us !“这句话是由词,这是足够的”。

- 约翰·弗斯

通过它使公司学习单词的含义的这种想法是什么word2vec(and other count-based methods based on co-occurrence as well) is doing by means of data and learning on an unsupervised fashion with a supervised task that was by design built to predict context (or vice-versa, depending if you use skip-gram or cbow), which was also a source of inspiration for the跳跃式思维的载体。Nowadays, this idea is also known as the “分布式假说“,其也被上比语言学等领域。

现在,它是相当惊人的,如果我们看一下在工作Neelakantan, et al., 2015,所谓的“在向量空间每个字多曲面嵌入的有效的非参数估计“, where they mention about an important deficiency in word2vec in which each word type has only one vector representation, you’ll see that this has deep philosophical motivations if we relate it to the Wittgenstein and Firth ideas, because, as Wittgenstein noticed, the meaning of a word is unlikely to wear a single face and word2vec seems to be converging to an approximation of the average meaning of a word instead of capturing the polysemy inherent in language.

字的多面性的一个具体例子可以在字的“证据”,这里的意思可以是一个历史学家,律师和物理学家完全不同的例子可以看出。传闻不能在法庭算作证据,而这是很多次,一个历史学家的唯一证据,而传闻甚至不出现物理。最近的作品如ELMO [Peters, Matthew E. et al. 2018],which used different levels of features from a LSTM trained with a language model objective are also a very interesting direction with excellent results towards incorporating a context-dependent semantics into the word representations and breaking the tradition of shallow representations as seen in word2vec.

我们正处在一个激动人心的时刻,其中实在是令人惊讶地看到许多深层次的哲学基础,实际上是如何隐藏在机器学习技术。这也是非常有趣的是,我亚洲金博宝们正在学习的机器学习试验了很多语言的教训,我们可以作为获得新发现的重要手段正在形成一个惊人的良性循环见。我认为,我们从来没有自我意识和关心语言在过去几年。

我真的希望你喜欢阅读这个!

- 基督教S. Perone

Cite this article as: Christian S. Perone, "NLP word representations and the Wittgenstein philosophy of language," in亚洲金博宝未知领域,23/05/2018,//www.cpetem.com/2018/05/nlp-word-representations-and-the-wittgenstein-philosophy-of-language/

References

马吉,布莱恩。哲学的历史。1998年。

Mikolov, Thomas et al. Efficient Estimation of Word Representations in Vector Space. 2013. https://arxiv.org/abs/1301.3781

彭宁顿,杰弗里等人。手套:全球矢量字表示。2014年https://nlp.stanford.edu/projects/glove/

基罗斯,瑞安等。跳思想的载体。2015年https://arxiv.org/abs/1506.06726

Neelakantan, Arvind et al. Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space. 2015. https://arxiv.org/abs/1504.06654

莱昂,杰奎琳。通过搭配含义。语料库语言学的Firthian亲子关系。2007年。