Skip to content

bram-pramono/talee

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Talee

Text Automated Learning by Experience & Empathy

Motivation

Currently existing automated text learning with current Natural Language Processing(NLP) techniques in the field of Artificial Intelligence are learning based on text occurrence statistics. TODO: missing citation. This way of analyzing text, needs a lot of high quality data. This does not actually reflect how humans learn languages. Humans learn languages based on small sets of repeating texts that are then processed to create meaning. TODO: missing citation. The learning processes of humans start by associating words with sensors in their body parts and changes in their environment. TODO: missing citation.

The purpose of Talee is to learn languages using the approach that humans have. This means that Talee will need a virtual body with virtual sensors on different body parts, that will be used to associate applicable words with.

I believe that in the process of creating meaning two things play a big role, namely experience and empathy. The meaning of experience here is not the body experience, but rather that every text that was learned will help with the analysis of new texts to process. It also means that the meaning of learned texts can change when it gained new insights. As for empathy, it means that during understanding process of texts, there should be connections made between the texts and body parts whenever possible.

Ideas on different parts of language learning

Basics of words creations in human mind

  • Texts are representations of our perceptions based on what the sensors in our body received after processing different things that happened to our body.
  • Every sensor in our body register different things at different time. These registrations of different sensors will eventually build events. Events can eventually be expressed as words or part of words that we use in our daily conversations. This process of event perceptions to become words are a long process that require agreements with other people (agents) around us. Not all events our body perceived can be translated into specific words. If the events cannot be understood by others, it becomes difficult to make agreements of words to use. In a sense, word creation is a sharing and aligning ideas between agents that requires a common basic cognitive function. I believe this basic cognitive function is comparison.
  • Along with the sensors that our body register, there are other things around those sensors that need to be taken into account on every event registrations, which are the changes within a certain time and space.
  • The choices of words humans have at the beginning of their life are limited based on what their parents(or guardians / caretakers) use (Hilpert, 2019, p. 159). These choices of words might eventually evolve when they learn new things outside the environment they grew up or meet new people. The evolution / growth of these words choices will need the support of words that are known and learned before. This means that people learn only some new words in different context while still reusing a lot of existing words to maintain the structures of meaning. TODO: missing citation. With this idea in the back of our mind, the questions are then: which structures are necessary for us to learn new words? what can be considered as a structure of meaning for texts?
  • All texts that human learned are checked and verified with other humans. Every text and the relations between texts are tested and verified before they become the trusted truth. TODO: missing citation.

Grouping of words

  • Once texts and relations between them are verified, the next step is to group the texts. Every grouping of texts are a result of probabilistic calculation. TODO: The term probabilistic used here is not the same as how this is understood scientifically. Rename this into score of patterns.
  • The meaning of a word in texts are determined by the surrounding words. The meaning also occurs once some contexts have been created. Without the contexts, the word is merely an indication of meaning probabilities.
  • To build contexts humans use temporary models in their mind that keep changing until they are verified in conversations. In Talee this temporary models are called thought models.
  • Every word in texts will eventually be categorized. To be able to apply the categories, Talee needs to be "taught" what are the words categories in different sentences and conversations.

Approach

As humans, there are multiple simultaneous sensors registrations in every event that happen. Since Talee is a text automated learning, it has only a single registration, which is the texts. The relations of texts to the virtual body parts will need to be created partly manually. Hopefully, after an advance Talee bot has been created, it can learn more words that are connected to different body parts by itself.

Before relations between texts and body parts can be made, Talee needs to have enough texts understandings. These texts understanding will be extracted through a simulated learning process. TODO: describe the high level approach of session & inputs. Note 29-12-2021: I don't remember what I meant by this TODO. However, I think that text understanding can be seen as to know or obtain the language structure used in communication between different people.

Goals

Talee should be able to :

  1. identify common words / terms in a language.
  2. identify domain specific words / terms in a language and manually taught domains. (Or maybe using text clustering to determine / discover domains)
  3. group words in sentences. (Identify entity, verbs, adjectives, place, time, etc)
  4. relate words with body parts
  5. create thought models during a conversation / session.
  6. build meaning model by context. Build contexts by words used during a session.
  7. summarize conversations / sessions.

Abilities that Talee might need to help discover meaning are:

  • Pattern recognition. How sentences form, how questions form, What is a quote, etc. Examples: Einstein said that "Time is relative"; What do you mean shit happens?; He just went to shop; etc.
  • Similar word discovery. Some words are only syntactically different, but means the same. This is similar to lemmatization or stemming, but Talee will solve this through own mapping of words / terms with pattern scores.

Simulating learning process

  • At the starting point of the learning process for Talee, Talee needs to process simple inputs.
  • While Talee does not have so many words understanding (pattern scores), it needs to start register different words and the relations between them.
  • Every new word / term in inputs will be registered.
  • Every word / term that keeps reoccurring in different sessions will become familiar words / terms.
  • Once a word/term becomes familiar, it will need to be verified whether the word/term and the surroundings are correct.
  • After the verification step, these words/terms will be rules / beliefs. In any case that the surroundings of these words / terms mismatch, it will create bias possibilities.
  • Bias possibilities need to be clarified.

Evaluation

  • The choice of term meanings (is it the same as constructicons?) should improve after several sessions of learning.
  • The event groups should capture different ambiguous constructions and score them properly based on syntactic commonalities.

Ideas in progress

  • All words that are used in sentences bring meanings behind the words. These meanings are depended on the perspective of the receiver and the sender at the time the words are passed on and also the meanings are often abstract.
  • To be able to decipher these meanings and reduce the abstract complexity level, some building blocks will be used:
    • Event parts in sentences. TODO: explain
    • Event part subgroups. TODO: explain
    • Virtual body parts. TODO: explain
    • Sentence classifications. TODO: explain
    • Dictionary/index items. TODO: explain
    • Knowledge cells. TODO: explain
    • Thought models. TODO: explain
  • Talee will register two kinds of meaning. The first one is word / concept meaning and the other one is "in the context" meaning.

References

Hilpert, M. (2019). Construction Grammar and its Application to English (2nd ed.). Edinburgh University Press. https://www.jstor.org/stable/10.3366/j.ctvsf1p6c

About

Text Automated Learning by Experience & Empathy

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published