Data Driven Ontology Trees – A speculative Idea

This topic contains 4 replies, has 1 voice, and was last updated by josh January 6, 2021 at 12:02 am.

Author

Posts
January 2, 2021 at 1:57 pm #78815

josh

Declarative NLP communications will normally try to efficiently add new/missing info based on beliefs about the distribution of the audience. In Information Theory Coding, info is a specification relative to a probability measure over a set of possibilities. Language is more universal in the sense that it is used to describe new arrangements & open new questions. Normally those new arrangements are of a type that the audience is familiar with. If the new arrangement is a completely new/unfamiliar type, then that must be described as well.

Computational goals for ontology include:

terms that efficiently structure information about type of situation & other properties – useful for communication & for internal representation/storage

lack of redundancy

collect types of things/events which are likely to appear together in similar observational and/or causal frames – i.e. we can assume that the amoeba didn’t participate in the dinner conversation the exciting thing Suzy found browsing at the bookstore was not the result of Kelly’s tax audit, even though physically both are possible.

Good ontological terms provide a kind of code for efficiently structuring info & communications.
January 2, 2021 at 3:15 pm #78835

josh

Of course that data can be huge & lots of pre-processing steps can be considered.

Example: For each word, consider the types of syntactic role/phrasal structure it can appear in. For each of those – represent these relations to other words – cooccurence in definition bodies & chance that one word is used in the definition of the other vs. visa-versa. Using these measures on words/syntactic category, agglomerative clustering is computationally “ez”. So the output of agglomerative clustering could be a starting point for further refinement or problem decomposition.
- January 5, 2021 at 12:32 pm #79266
  
  josh
  
  These sort of representations can also be useful for improving the accuracy of speech recognition – e.g. boosting the likelihood of belonging to the trees that semantically include trailing context and grammatical fit.
  - January 6, 2021 at 12:02 am #79291
    
    josh
    
    Pose Speech Word Recognition as something like this:
    
    Pick Max Likely Word (Text) *Here* (utterance location) – chosen from, and integrated over all considered combinations of word text, word meaning, linguistic context, speaker accent, speaker context, and noise context.
    
    Linguistic interpretation, in general, tries to identify word + word meaning + discourse context, including any added speaker accents. Putting a tree/cluster structure on word meanings helps with practical plans to identify meaning/discourse context. The smallest neighborhood of the tree which contains the trailing text & discourse context tends to dominate the likelihood estimates. It can be revised when the initial guess is wrong. So we work out
    
    Time flies like an arrow
    vs.
    Time flies like a juice spill
    vs.
    Time flies, like roaches, should be banned from the indoor dining area.
    
    The problem of working out semantic context is also important for speech recognition because all clues help. Visa versa, noticing the my English friend Jesse is putting on a Russian accent here is part of the meaning interpretation for whatever she says.
Author

Posts

You must be logged in to reply to this topic.

Personal Notes

Personal Notes for Friends

Data Driven Ontology Trees – A speculative Idea