I made an入门谈话在过去的嵌入字，这写了大约词矢量背后的哲学思想的部分的扩展版本。这篇文章的目的是提供一个介绍维特根斯坦的主要思路上是密切相关的是分配（这意味着什么后，我将讨论）技术设计语言学，如word2vec [Mikolov et al., 2013]，手套[Pennington等人，2014], Skip-Thought Vectors [Kiros et al., 2015], among others.
One of the most interesting aspects of Wittgenstein is perhaps that fact that he had developed two very different philosophies during his life, and each of which had great influence. Something quite rare for someone who spent so much time working on these ideas and retreating even after the major influence they exerted, especially in the Vienna Circle. A true lesson of intellectual honesty, and in my opinion, one important legacy.
Wittgenstein was an avid reader of the Schopenhauer’s philosophy, and in the same way that Schopenhauer inherited his philosophy from Kant, especially regarding the division of what can be experimented (现象） 或不 （noumena), contrasting things as他们似乎对我们from things因为他们在自己维特根斯坦的结论是叔本华的哲学是基本上是正确的。他认为，在noumena境界，我们没有概念的理解，因此我们将永远不能说任何东西（没有成为废话），而相比之下，现象根据我们的经验，我们确实可以谈，并尝试了解的境界。通过添加安全的基础，如逻辑，对现象世界，他能够推理世界是怎样的按语言描述和可由此映射什么的界限如何以及什么可以在语言或概念的思想表达。
该first main theory of language from Wittgenstein, described in his逻辑哲学论, is known as the “Picture theory of language” (aka Picture theory of meaning). This theory is based on an analogy with painting, where Wittgenstein realized that a painting is something very different than a natural landscape, however, a skilled painter can still represent the real landscape by placing patches or strokes corresponding to the natural landscape reality. Wittgenstein gave the name “logical form” to this set of relationships between the painting and the natural landscape. This logical form, the set of internal relationships common to both representations, is why the painter was able to represent reality because the logical form was the same in both representations (在这里我呼吁双方为“交涉”是一致的与叔本华和康德而言，因为现实对我们来说也是一个代表，它和物自体本身区分).
This theory was important, especially in our context (NLP), because Wittgenstein realized that the same thing happens with language. We are able to assemble words in sentences to match the samelogical form什么样的，我们想描述。逻辑形式是核心思想，使我们能talk about the world。However, later Wittgenstein realized that he had just picked a single task, out of the vast amount of tasks that language can perform and created a whole theory of meaning around it.
事实是，语言可以做很多其他的任务，除了代表（生动描述）的现实。有了语言，维特根斯坦注意到，我们可以发号施令，我们不能说这是东西的图片。不久，他意识到这些反例，维特根斯坦放弃了picture theory of language和一个dopted a much more powerful metaphor of一个工具。这里我们接近th的现代观点e meaning in language as well as the main foundational idea behind many modern Machine Learning techniques for word/sentence representations that works quite well. Once you realize that language works as a tool, if you want to understand the meaning of it, you just need to understand all the possible things you can do with it. And if you take for instance a word or concept in isolation, the meaning of it is the sum of all its uses, and this meaning is fluid and can have many different faces. This important thought can be summarized in the well-known quote below:
该meaning of a word is itsusein the language.
人们无法猜测有一个词功能。一个人必须看它的使用，以及learn from that。
约翰·弗斯was a linguist also known for the popularization of this context-dependent nature of the meaning who also used Wittgenstein’s Philosophical Investigations as a recourse to emphasize the importance of the context in meaning, in which I quote below:
文本作为构成的情况下有助于的上下文含义的语句，因为情况配售被设置为识别使用。正如维特根斯坦说，“该meaning of words lies in their use。’ (Phil. Investigations, 80, 109). The day-to-day practice of playing language games recognizes customs and rules. It follows that a text in such established usage may contain sentences such as ‘Don’t be such an ass !’, ‘You silly ass !’, ‘What an ass he is !’ In these examples, the word ass is in familiar and habitual company, commonly collocated with you silly-, he is a silly-, don’t be such an-.You shall know a word by the company it keeps !一屁股的含义是其惯常搭配与这样换句话说上文引述。虽然维特根斯坦正在处理另一个问题，he also recognizes the plain face-value, the physiognomy of words. They look at us !“这句话是由词，这是足够的”。
– John R. Firth
通过它使公司学习单词的含义的这种想法是什么word2vec(and other count-based methods based on co-occurrence as well) is doing by means of data and learning on an unsupervised fashion with a supervised task that was by design built to predict context (or vice-versa, depending if you use skip-gram or cbow), which was also a source of inspiration for theSkip-Thought Vectors。现在,这个想法也被称为“Distributional Hypothesis“，其也被上比语言学等领域。
现在，它是相当惊人的，如果我们看一下在工作Neelakantan, et al., 2015, called “在向量空间每个字多曲面嵌入的有效的非参数估计“, where they mention about an important deficiency in word2vec in which each word type has only one vector representation, you’ll see that this has deep philosophical motivations if we relate it to the Wittgenstein and Firth ideas, because, as Wittgenstein noticed, the meaning of a word is unlikely to wear a single face and word2vec seems to be converging to an approximation of the average meaning of a word instead of capturing the polysemy inherent in language.
一个具体的例子是多方面的words can be seen in the example of the word “evidence”, where the meaning can be quite different to a historian, a lawyer and a physicist. The hearsay cannot count as evidence in a court while it is many times the only evidence that a historian has, whereas the hearsay doesn’t even arise in physics. Recent works such as ELMo [Peters, Matthew E. et al. 2018], which used different levels of features from a LSTM trained with a language model objective are also a very interesting direction with excellent results towards incorporating a context-dependent semantics into the word representations and breaking the tradition of shallow representations as seen in word2vec.
- 基督教S. Perone
Neelakantan, Arvind et al. Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space. 2015. https://arxiv.org/abs/1504.06654