Notes On Dimensionality Reduction Using Spectral Graph Analysis

This topic contains 2 replies, has 1 voice, and was last updated by Josh Stern April 6, 2023 at 11:15 am.

Author

Posts
April 5, 2023 at 12:02 pm #126323

Josh Stern
Moderator

We believe that using Spectral Graph Analysis for Modeling High Dimensional Probability Density goes well with an efficient “Blocks on a Grid” style of computing because the diagonal qualities of the density are largely set by the definition of the neighborhood kernels – which can be an informed mix of spherical and axis aligned, with weights and theories of isotropy that are based on both stat & domain specific or scientific criteria – i.e. crows don’t fly over mixes of most dimensions and categories.

There are different benefits to grid alignment that merit differentiation for different uses:

a) Cognitive simplicity & interpretability
b) Low dimensional visualization & image processing for that -respecting psychological and psychometric preferences
c) Numerical approximation of continuous densities
d) Probability/Measure definition using finite sets of bins.

Our numerical search in high dimensional space is aided by
i) picking functions over support sets that combine multiple lower dimension spaces in an easy way – using product, expanding tree hierarchy, etc.
ii) defining all parameterized kernels as having measure 1 on some space
iii) defining all probability density normalized spectral graph vectors as measure 1 mixtures of measure 1 kernels
iv) Defining all mixtures of multiple spectral graph vectors and other background models like GLM as mixtures of measure 1 models.

So grids help with d), i), and ii) while we search for and analyze models, and then they help with a),b), and c) in the presentation of exploratory results. Getting this without loss of diagonal weighting is nice.

Edit: Note also, on the Rademacher topic as it applies here: I pre-compute the assignment of kernel weight to grid neighborhoods – this helps with parameter search, but also in a model with 1 kernel per point- I can easily see what the effect is of not including that kernel vector of weights, and if I simplify data by agglomeration of near points, then the effect is only on the weight of the combo kernel it went into.
April 6, 2023 at 11:15 am #126325

Josh Stern
Moderator

Math & technical literature on “kernel density estimation” focuses on the problem of modeling projections of densities to low dimensional geometric spaces and visualizing those projections in 1,2, or 2.5 dimensions. Some form of smoothness assumption is assumed. We emphasize that the most interesting practical problems for industry and data science involve high-dimensional distributions of mixed data. Most of this space is neither geometric nor smooth nor densely populated within conventional sample sizes. That said, we have opinions about how to model and display kernel based estimates in low dimensional geometric spaces.

In digital signal processing with rectangular grids, the sinc interpolation function is often an optimal choice for both down-sampling and up-sampling because of its precise and psychometrically pleasing control of spatial frequencies and anti-aliasing. It can be used to sample any type of signal, and it is also fast computationally. Care must be taken at boundaries to design the widths and the content of the border input. Care must also be taken to chop values that exceed the upper or low limits of the medium – e.g. in 8 bit grey scale, you chop back to [0,255]. FFT is one way to describe such a signal processing transformation/sampling of input. FFT can be used with many types of kernels, including sinc and Gaussian and Epechnikov. In the field of kernel density estimation, some modern authors have proposed using the sinc itself to directly implement spatial frequency smoothing for “objective” density sampling of the input data. Our view is that the direct application creates a biased likelihood estimate in favor of both overweighting individual data points (lack of variance regularization), putting too much prior emphasis on higher derivative smoothness at observed frequencies,and lacks theoretical correction for values that leave the allowed function range (e.g. the actual transformation value can be negative at a given point prior to chopping). Our recommendation is to first create a smoothed estimate at the data points or on a given grid and then apply sinc smoothing to enhance the psychometric properties of the original estimate in display contexts.
Author

Posts

You must be logged in to reply to this topic.

Personal Notes

Personal Notes for Friends

Notes On Dimensionality Reduction Using Spectral Graph Analysis