Data Analysis Train of Thought For Evidence Combines

This topic contains 4 replies, has 1 voice, and was last updated by josh December 12, 2021 at 1:20 am.

Author

Posts
December 11, 2021 at 8:55 am #107853

josh

Note: the algorithmic definitions of agglomerative clustering don’t explicitly yield partitions of the data space. Some other kind of local regression/nearest cluster center inference is required to partition.
December 11, 2021 at 12:27 pm #107861

josh

Agglomerative clustering is a greedy strategy that doesn’t get any special ‘T’ with respect to finding a global optimum of some pseudo-normative clustering criteria. So don’t be shy about trying parallel algorithms that yield good solutions (e.g. estimate N clusters, rank distance to cluster center, look for swaps, arrives at a local optima in parallel…). Finding global optima would depend on the precise optimal criteria & doesn’t seem likely to yield big practical payoffs. But search could be based on choosing the number of clusters & then within each set, doing a kind of monte carlo cooling optimization to try moving boundaries between clusters that touch.
December 11, 2021 at 1:08 pm #107869

josh

If one thinks about kernel probability density estimators in standard multi-dimensional problems, it makes sense that the kernel sizes would be locally adaptive. But if you look in theory journals, proofs using locally adaptive kernels are “messy” & don’t always seem relevant to statements about mathematical behavior in the the far off tails of the limit (here we mean as sample size increases).

Agglomerative clustering is just a way to get a good head start on adaptive kernels. There are lots of specific ways to defense the clusters – some in papers, & some you can imagine for yourself. Why would one way be best for finite samples in any domain & sort of multi-data set??? Silly question – of course there is no such claim. But intuitively, many pratical problems can be represented as local regression, so using shapes that make sense in terms of local regression is likely to be pratically strong if the computation is smooth enough.
December 12, 2021 at 1:20 am #107872

josh

Also interesting is the optimization of clusters, during & after growing, by doing *local* leave-one-out prediction scoring. In general, density estimation & regression can be accomplished & mixed in the same framework, where density estimation is prediction of log-likelihood – here the local group contains all of the current model contributing to that.

If the optimization involves a set of decision choices, then the framework for monte-carlo optimization is similar to the DeepMind Alpha Go decision making algorithm.
Author

Posts

You must be logged in to reply to this topic.

Personal Notes

Personal Notes for Friends

Data Analysis Train of Thought For Evidence Combines