My Recent Notes On Compiler Metaprogramming

This topic contains 22 replies, has 2 voices, and was last updated by josh July 18, 2022 at 1:26 pm.

Author

Posts
January 2, 2022 at 11:33 pm #108630

josh

Suggesting a hard, but tightly focused & relevant test problem:

Specify meta programming logic for specifying structural implementations of signals & slots which a) avoid infinite looping and b) permit desirable degrees of parallelization

Seems like a) has some some abstract condition like – for each given program state (can make a list) there is a partial order on the transmission of info via the signals slots mechnanism & the implementation implicitly respects the order in deciding whether to hook/progate a given signal.
January 3, 2022 at 7:25 am #108637

josh

We think of classical software development environments as mandating the use of ascii/utf-8 text editor/viewer and directories of source files. This setup supported uncountable numbers of variants, but no standardization of “IDE” in the sense that we are using it. We propose that there is value in extending the floor of the editor etc. in ways that support many alternate implementations of the advanced IDE without limiting other forms of creativity, competition, & backwards/forwards compatibility.

Desirable additions:

a) Database-like (i.e. ACID) operations to simultaneously modify multiple sets of related source files.

b) A selection interface to a WordProcessor-like choice of which view/feature set to include in a given screen representation of a given source-language text. The reality we imagine is much more complicated & hierarchical than “show formatting”, but “show formatting” gives the flavor of hiding extra detail & clutter. The analogy to formatting varies over which aspects of the source code are in the display:

i. which level of refinement (in the sense of formal methods),
ii. which unifications (in the sense of relational program solution search), &
iii. instrumentations (in the sense of compiler optimization & executable creation)

are shown. We imagine this can be graphically supported by html/web technologies which support “show formatting” & drill down/drill up. The demand of remote sysadmin using ascii terminal continues to fall by the way side in sysadmin world, but menu switching to different ascii views is theoretically possible & could be spec’s if desired.

The need, in the earlier note, is to permit expressive refinement control of many programming aspects without introducing unwanted clutter into the construction of functional logic, which should be the activity that occupies the application programmers source development time. In our ideal world, functional logic could be pretty like Standard ML or Ruby while simultaneously allowing all sorts of concurrency/metal specs/constraints/promises.

c) Standardize functional access to info needed for storage & manipulation of debug versions of executable code – here & below we mean that “functional access” could be reading a file in a given format but doesn’t have to be that.

d) Standardize functional access to dynamic linking & versioning capabilities of executable code and sub-parts (i.e. callable functions), where that is present (or note that it is not).

e) Standardize functional access to info about staleness/live dependencies of derived program implementation artifacts where those are intended to be long lived & involve significant creational work (e.g. allow things analogous to your C++ template repository to be cataloged)

f) Support exportable/importable standards for descriptions of machine architecture & other specs which affect instrumentation and are expected to come in many flavors.
January 3, 2022 at 7:45 am #108638

josh

We believe that meta-programming of input/output formats is an area that could bring huge economic wins. Programmers have traditionally wasted large percentages of their time programming & debugging input/output formats & repetively wasting effort. Most of this waste could be eliminated with successful metaprogramming tools that integrate into the overall metaprogramming framework. How & why? Here are the features of the better solutions:

a) Allow tight coupling of the spec for reading/writing so that coordination of the both, with future changes, to major & minor format versions, is automatic & not error prone

b) Support functional object concepts like C++ streams that allow the same functional interfaces to operate over files/strings/network pipes/shared memory constructs/etc.

c) Support all relevant forms of concurrency control/safety as aspects that can be specified prior to code instrumentation & optimized by the implementation. Reading and writing to message buffers that *could be* specified to be multi-threaded can be like checking a few (expensive) boxes in some menus.

d) Support modular specification of application logic that is threaded through reading/writing while being freed from the details of intermediate level object construction – i.e. hierarchical systems of application objects & constructions like “when first constructed do…”

We emphasize that from a Computer Science POV, there is no theoretical innovation described or required here. We are simply failing lex/yacc/antlr/regexp etc as application programmer interface. For important reasons.
January 3, 2022 at 6:38 pm #108677

josh

A good sub-project here is about improving description, visualization & directed linking of timelines & partial orders that may start out as unconnected. Consider the nature of different timelines that can be generically involved in programming & program generation:

I. Absolute time & calendar time
II. Metaprogram time -> compile time -> runtime
III. Instruction Sequences as seen by distinct threads of control
IV. UML type descriptions of Global Program State
V. UML type descriptions of various domain processes that are either modeled by or interact with program state
VI. Program User interaction state
VII. Exceptional Event Responses – can be either synchronous or asynchronous interrupts from software POV
VIII – Distributed States of computation at other nodes
IX – Operating System events not controlled by the program or allied with it

A lot of programming involves creating plans & refining commitments for how these timelines & looping constructs within them are related.

Goal is to create nice syntax, semantics, & visualizations for describing planning of commitment relations/edits over sets of partially ordered events that may be pinned to partialy independent timelines like the ones described above.
January 4, 2022 at 1:44 am #108694

josh

Thoughts on Aspect Fields/Controls for Metaprogramming of Reference Types:

Point 1 – We should draw a fundamental distinction between restrictive claims about how a reference *cannot* be used and positive claims about how it *can* be used. The cost of restrictive claims is checking that they are not violated wherever that is relevant in metaprogramming land. That’s doable & can be helpful, so the more the merrier if the cost is not too high & there are use cases. Positive claims require something to be constructed & maintained. Seemingly we should follow a paradigm analogous to C++ with module delegation of authorization for construction & destruction of the reference with a given positive claim (i.e. not empty or dangling, truly frozen in all constness, in a given type of memory, etc.) Prior to worrying about security of metaprogramming, the “module” can be thought of as some assigned integer token for each variety. Positive claims need to be protected by restrictive claims that void usages which would invalidate the positive claims.

Point 2 – What are some useful claim types?

Totally Frozen

Not Cached as physical address – i.e. hash index type

Not Being Modified – Not locally modified or passed to a function that doesn’t promise not to modify the type or accept arguments from the set of references it references that are described as “not being modified” unless same conditions are met

Not asynchronously volatile – to the program or the thread of control?

Only known to the current thread of control

Can be resized…
January 6, 2022 at 5:23 pm #108781

josh

Implementations of message queues vary tremendously depending on a lot of factors:

a) are the send & receiver synchronous or asynchronous at the level of atomic queuing operations?

b) are any hard or soft real-time constraints consulted to affect operations?

c) are any resource limitations imposed on queue length?

d) is a non-blocking algorithm available for asynchronous actors?

e) If locks are to be used, what locking primitives are available?

f) What sort of observability is required?

The combination of generic meta-programming & aspect oriented descriptions offers a new opportunity to try & pick “best approach” in each case as a function of needs, availability, & costs, while avoiding unsafe practices that introduce flaws.
January 12, 2022 at 8:44 am #109040

josh

Goal Unification in Prolog can be structurally similar to sentence derivations in CF and CS grammars – symbols expand into other symbols & terms. Metaprogramming as plans for program synthesis can be described in this form. From a computational & conceptual POV, it’s convenient to think about “adding a new feature” in terms of accessing the derivation of the synthesized program at the level of grammar rules that allow the inclusion of additional items. In terms of Hoare logic, additional program steps can be accomodated provided they fit within the given constraint framework. Intuitively, one feels that the synthesis & editing of programs is more efficient when sub-parts can be manipulated as hunks of derivation trees which fit patterns of parametric constraints – i.e. language, libraries used, hardware dependencies needed, resources consumed, namespaces, etc. Semantical concepts of software hunks may be described with reference to ASM machines & linguistic names/predicates describing the behavior of entities in the system. Rules for unification may also need to reference some unifications of these semantic frameworks & expected behavior.
- January 12, 2022 at 7:53 pm #109057
  
  josh
  
  Side Point about this – Prolog texts & some implementations are written as if set variables are implemented as linked lists rather than hash tables with string or atom keys. That’s clearly retarted from the POV of large scale design & implementation. It should be emphasized explicitly that fields/values can be added to partly variable objects as a step in unification attempts (which are searches for an existential proof or “failure” (need it be exhaustive failure??? – in Prolog yes, but not in all practical searches).
January 12, 2022 at 7:24 pm #109054

josh

Some types of important conditions – like temperature, atmospheric pressure, cloud cover, etc. – may be important to scenarios either as surrounding background assumptions or permeating factors. We could view the teeter-totter example with gravity force equal to the earth’s moon surface, for example. Intuitively, I think of these factors as attaching to surrounding boxes, which can also change in more limited sets of ways.
January 13, 2022 at 3:23 am #109094

josh

Putting on my Pep Organizer Hat Here.

The topic: We want to scale up the use of automatic unit testing for most code branches. As this happens, we encounter greater distress from a small number of tests that may go into infinite loops or crash or seem to yield non-deterministic results where determinism is expected. What to do about?

I advise these steps:’

a) Appoint a small group of wise volunteers to conference & try to to make a first pass a diagnostic list of case reasons for why the unacceptable conditions happen in practice – bugs in system libs, bugs in 3rd party libs, unexpected consequences of side effects in code, bugs in meta programming constructs used, etc. For each case, propose a set of potential methodological recipes that seem like best bets for practice. These might include time-outs, instrumenting of special hooks for 3rd party code calls, probation periods for new meta-routines to become vested, etc.

b) Have a broad publication & then teleconference to get feedback on the case by case details of the suggestions, other missing cases, & move the effort forward.
January 24, 2022 at 4:24 am #109687

josh

The form of library interfaces is worth thinking about for early refinement & stability.

At the meta-programming level, a library should be constructed in a way that makes it easy to verify whether or not later library versions still implement the constraints of the “public interface” in earlier versions.

Where the library is used to create software that maps onto executables, then the interface should make it easy to check & verify demands about the stability, shared usage, security, linkage stability, & other properties of the derived executable components.
January 24, 2022 at 4:38 am #109688

josh

It helps mental visualization to think about a souped up model of relational logic unification working over various different types of clauses & primitive “database fact atoms”.

In toy Prolog texts, the programs are mostly written over linked list data structures with atoms like dog(Fido). The unification (except for some bugs in the system that we fix) yields existence proofs of what was given in a database by the combinations of atoms & relational logic clauses describing their relational properties.

We extend the programming apparatus to objects that start at JSON/Javascript Objects & sets of objects, described in various ways, including input streams (the financial news…). In particular, for program synthesis, some of the atoms are actually recipes for how to implement interesting big hunks of Hoare logic in some executable language. Stringing them together is an existence proof of an executable program that satisfies the given constraints. Some of the techiness of assurance is pushed down to the level of notes about the atoms “The system C library function really worked correctly in all the cases we tested…”
- January 24, 2022 at 4:39 am #109689
  
  josh
  
  Other kinds of interesting claims are an existence proof of homomorphisms between particular variables of the software implementation & state variables of ASM domain or program models given as specification in a precise format.
January 26, 2022 at 7:08 am #109825

josh

An Important Question: Given a metaprogramming library template that can instantiate different types/constants, how should the development system express & coordinate various ranges of correctness assurance for different combinations of instantiation arguments?

Contrast with the more familiar C++ template system – the basic test guard is “Does the instantiation compile?” If it does compile, the meta-programming system offers no claims about correctness. The responsibility for correctness is distributed over
a) the template writer, who could use the clumsy facilities to try & compile fail some other cases or specialize template arguments;
b) the template instantiator who should understand how to use the code
c) general development tests

For greater automaticity, we can imagine trying to link the template meta code to systems of database claims about assurance – e.g. this set of patterns have received full unit testing while this other set is regarged as merely worth trying.

It would be a good sub-project to try & further develop the syntaxt and implementation of that concept. This sort of hairy refinement of patterns is often best implemented as a decision list predicate form.
- January 26, 2022 at 7:14 am #109826
  
  josh
  
  Side note: a lot of proper name nouns/singletons are sometimes thought of as “constants” & expressed as functions of no arguments – but they are not constant here in the static sense of fixed properties, like mathematical/binary constants that would alway map to the same correctness class. For clarity, the system syntax should view them as functions with variable return or something similar.
January 27, 2022 at 6:13 pm #109859

josh

For IDE/devel environments where unification of sub-goals is an important programming step, the environment can speedup identification of missing parts by making it easy to diagnose unification failure for particular sub-goals (point at what you were expecting to succeed & learn more, iterate…)
- January 28, 2022 at 12:17 am #109871
  
  josh
  
  This sub-comment drew surprising interest from military IT hoi-polloi, seemingly enchanted with the idea that I had said something gauche, but not really getting the framework.
  
  By analogy, if you were working in C++ and your compile fails, it’s much nicer to have a msg “This stub not implemented” rather than a deep stack of hard to interpret. The reality in proof land with an IDE is different though because we can’t say whether the issue is that you have failed to proof Goldbach conjecture or if you planned to pick up the cats by their tail and forgot about the manx case. Either way, you have work to put in.
April 9, 2022 at 11:56 am #113052

pers_d7pyza
Keymaster

An extension to the concept of an “optimizing compiler” might include sophisticated capabilities to optimize aspects of an application that is allowed or required to be distributed across multiple components. Some of the optimizations involved might parallel those involved in a compiler that could optimize over a threaded or micro-threaded application. But if the optimization is happening at the design level, there will be extra types of “costs” to consider with multi-component distribution – e.g. physical space, variable latency, mfg. cost, etc.

Considering the large IoT space & related embedded applications, there is potentially lots of value to machine optimization at the design level. Could be a big of a bog to start with (though for all I know, this is already implemented!). But my thought on a starting approach is to figure out how distributed features & costs should be represented as aspects at the meta-programming level & then consider the distributed design as a first/pre-compiler pass on optimizing that.
- April 9, 2022 at 8:15 pm #113088
  
  josh
  
  Q: Explain that intuition
  
  A: Intuition is that global optimization of a specific distributed architecture & roster is a) a big jump in computational cost compared to the pre-fixed version, b) not easily parameterized in the general case (IoT???), c) changes in these parameters add a lot of extra cost to the inner loops in an optimization, and d) global optimization of code is already hard in the general case – in practice, most compiler optimization is in the form of hill climbing – there’s no reason to believe it finds global maxima at the design of architecture stage.
  
  So I thought that it made sense to use coarser heuristics, fix an architectural format, generate other stuff, and then go back to hill climbing. That was my gut intuition.
  - April 9, 2022 at 8:21 pm #113089
    
    josh
    
    Fix feasible region for architecture given constraints
    Fix dominant sub-region of the feasible region – if it’s a set of discrete choices, they can each be optimize and compared -otherwise, may there is some parameterization to introduce? If not, then use heuristics to pick the best version and then see if there is a parameterization to re-open it at the end.
April 9, 2022 at 11:33 pm #113094

josh

A side point/idea about compiler optimization –

Many interpreted languages are energy efficient because – well one possibility is that code memory is relatively compact and relatively static, with the combination of a hash code indexes a decent size chunk of byte code instruction. Taking that idea as a hypothesis, optimizing compilers for other languages can look at run-length encodings of instruction (like LZ) and fix them in memory with hash code lookup, only in the binary space. If that idea is good, then metaprogramming offers a way to make longer runs.
July 18, 2022 at 1:26 pm #118508

josh

Some things I would do differently compared to the C++ template system:

a) a named function & its specializations should be gathered in 1 place, with a format that makes it hard to miss cases: e.g. like a case statement with a required default condition. With template specialization, the default condition is given as the non-specialized form, but specializations can be added anywhere. Better is to require new function names when specialization is used to open the definition in another module. However…

b) Changing the function name doesn’t allow the new name to factor into an existing function used in some implementation or algorithm, where the broader function uses the template to be specialized internally. Is it still correct with the new specialization added?? In an all metaprogramming system, it should be easy to inherit the basic algorithm from any function. Some syntax can be provided which says that NewFunction(…) inherits from OldFunction with EditSet={….} All the functions we want to customize can be stated within EditSet – error if they are not found, etc.

The overall concept is that metaprogramming with high level parameters can be used to helpfully & efficiently factor out all of the common work investment in similar but slightly different implementations/algorithms some we don’t waste time & energy constantly re-writing almost the same thing in new settings. There is language work to do in order to make it good for human factors, low rates of accidental error, & efficiency (development time, compiler time, active run time, & filling in the footprint time).
Author

Posts

You must be logged in to reply to this topic.

Personal Notes

Personal Notes for Friends

My Recent Notes On Compiler Metaprogramming