How computers learn languages

We Need to Talk

Article from Issue 264/2022

Whether through voice assistants, chatbots, or the automatic analysis of documents, rapid developments in AI are helping speech technologies make inroads. But how does AI manage to understand the subtleties of human language?

Language is the medium through which people communicate and express their thoughts. It is an ancient dream of mankind to be able to communicate with a machine (for example, just watch 2001: A Space Odyssey).

Meanwhile, science has come a bit closer to this vision. The box entitled "Sample Dialogue with LaMDA" contains a conversation with the Language Model for Dialogue Applications [1] (LaMDA) dialogue model. It was assigned the identity of a Weddell Seal in the first line. As you can see, LaMDA can give grammatically and contextually correct answers and even play with the meanings of words. But how does a computer system manage to achieve language fluency?

To understand language, the system needs to know the meaning of words. For this purpose, each word is represented by a long vector of 100 to 1,000 real numbers. This vector is known as embedding. Now, for example, because "sofa" and "couch" both refer to upholstered seating furniture for several people, their embeddings should also be similar. This means that very similar numbers should be found at the same positions of the vector. Other words such as "dive" have very different meanings, so their embeddings should be very different from the one just mentioned.


Use Express-Checkout link below to read the full article (PDF).

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Programming Snapshot – Markov Chains

    Markov chains model systems that jump from state to state with predetermined probabilities, but can they help write new columns like this one after learning from previously written articles?

  • Simon Voice Control

    Simon is a sophisticated speech recognition tool with easy access to two powerful speech recognition engines, Julius and CMU Sphinx.

  • Free Software Projects

    Even hardened nerds are often over-challenged by the less than intuitive field of statistics. Besides the theory, you need to know how to use the software that converts all the theory into a practical application.

  • Add Real-Time Word Count to Writer
  • Welcome

    I feel like we entered a new era earlier this year when Google scientist Blake Lemoine declared that he thought Google's LaMDA artificial intelligence is "sentient," and that the company should probably be asking LaMDA's permission before studying it.

comments powered by Disqus

Direct Download

Read full article as PDF:

Price $2.95

Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Find SysAdmin Jobs