Data Driven Ontology Trees – A speculative Idea

Forums Personal Topics Unbidden Thoughts Data Driven Ontology Trees – A speculative Idea

This topic contains 4 replies, has 1 voice, and was last updated by  josh January 6, 2021 at 12:02 am.

  • Author
    Posts
  • #78815

    josh

    Declarative NLP communications will normally try to efficiently add new/missing info based on beliefs about the distribution of the audience. In Information Theory Coding, info is a specification relative to a probability measure over a set of possibilities. Language is more universal in the sense that it is used to describe new arrangements & open new questions. Normally those new arrangements are of a type that the audience is familiar with. If the new arrangement is a completely new/unfamiliar type, then that must be described as well.

    Computational goals for ontology include:

    terms that efficiently structure information about type of situation & other properties – useful for communication & for internal representation/storage

    lack of redundancy

    collect types of things/events which are likely to appear together in similar observational and/or causal frames – i.e. we can assume that the amoeba didn’t participate in the dinner conversation the exciting thing Suzy found browsing at the bookstore was not the result of Kelly’s tax audit, even though physically both are possible.

    Good ontological terms provide a kind of code for efficiently structuring info & communications.

  • #78835

    josh

    Of course that data can be huge & lots of pre-processing steps can be considered.

    Example: For each word, consider the types of syntactic role/phrasal structure it can appear in. For each of those – represent these relations to other words – cooccurence in definition bodies & chance that one word is used in the definition of the other vs. visa-versa. Using these measures on words/syntactic category, agglomerative clustering is computationally “ez”. So the output of agglomerative clustering could be a starting point for further refinement or problem decomposition.

    • #79266

      josh

      These sort of representations can also be useful for improving the accuracy of speech recognition – e.g. boosting the likelihood of belonging to the trees that semantically include trailing context and grammatical fit.

      • #79291

        josh

        Pose Speech Word Recognition as something like this:

        Pick Max Likely Word (Text) *Here* (utterance location) – chosen from, and integrated over all considered combinations of word text, word meaning, linguistic context, speaker accent, speaker context, and noise context.

        Linguistic interpretation, in general, tries to identify word + word meaning + discourse context, including any added speaker accents. Putting a tree/cluster structure on word meanings helps with practical plans to identify meaning/discourse context. The smallest neighborhood of the tree which contains the trailing text & discourse context tends to dominate the likelihood estimates. It can be revised when the initial guess is wrong. So we work out

        Time flies like an arrow
        vs.
        Time flies like a juice spill
        vs.
        Time flies, like roaches, should be banned from the indoor dining area.

        The problem of working out semantic context is also important for speech recognition because all clues help. Visa versa, noticing the my English friend Jesse is putting on a Russian accent here is part of the meaning interpretation for whatever she says.

You must be logged in to reply to this topic.