Serialization/DeSerialization Generators

This topic contains 9 replies, has 1 voice, and was last updated by josh October 17, 2022 at 6:10 pm.

Author

Posts
July 26, 2022 at 7:36 pm #119002

josh

Metaprogramming offers the possibility to compile given language format + work descriptions into various alternative parsing strategies & source language forms. For example, the Ragel generator for regular expression languages allows choice of output language & choice of parsing strategy to be controlled by parameters. Conceivably, the latter could also be based on automatic profiling of performance in testbed cases. To land most computer/artificial languages, practice has found that most all theoretically fall within the the class of Context Free languages but it is not always desirable to describe languages plus work purely in that way or do handle arbitrary variants. One of the notoriouis problems with the context free form is the ease of accidentally writing ambiguous language formalisms. The software engineering project will fare better if it encourage formalisms that prvent ambiguity or clearly diagnose the precise narrow cause where it occurs in development. And it will fare better if the spec is easy for humans to understand & check – e.g. it can include positive & rejected cases as supplementary data. The generated deserialization code may be more efficient if it spends most of its state in locations where recursion is not possible, entering stacking cases where it is possible.
- July 27, 2022 at 3:03 am #119008
  
  josh
  
  One idea worth considering: CFG specs are usually written in a top down way – a derivation starting from “Sentence Root”. Consider instead, writing a spec in a bottom up way that discourages the most expensive forms of recursion and encourages early resolution of ambiguities. For example, by default, each non-terminal is defined down to the lexical level, based on prior definitions. And sequence in the file can be used to determine default precedence. Assume that a metaprogramming compiler will be able to optimize the code, so adding extra clarity/symbols has no true cost. Full realization of all CFG possibilities requires the ability to write rules that contain left recursion, right recursion, & circular recursion. These possibilities add different levels of complexity and potential inefficiency to parsing. Using such forms in preference to regular expression forms should only be done on an as needed basis, so it is correct for the definition style to discourage unecessary recursive usage, and to take advantage of more efficient parsing where such recursion is not actually present.
  
  Consider for example – in Natural Language NLP, it is grammatically possible to have infinite recrusion of the “setence” construct through either quotations or propositional/modal attitudes. The same is also true for “noun phrases” which contain modifiers that use other noun phrases. At the same time, we are clear that more than a small number of levels of such recursion is unintelligible to human language processors & terrible style. Infinite recursion therefore could be rejected by a practical grammar or handled in a very limited way for purposes of formality.
  
  Another sort of example involves reacting to “online” input that continuously streams over long periods of time. The appropriate sort of grammars should allow quick semantic responses that do not need to wait for long tailed events, which may, turn out to include errors of form.
  - July 27, 2022 at 3:46 am #119009
    
    josh
    
    At the metaprogramming level, it is easy to customize the finalization step for concluding of each recognized, correct in context, grammar term. This allows a lot of flexibility for using parts of the grammar in different places & also for special handling of lightly processed recursion of “block quotations” etc., as indicated above.
July 28, 2022 at 7:18 pm #119099

josh

The metaprogramming theme goes well with the aspect oriented theme of cross cutting concerns.

Let’s make an example here: Something like a browser is reading & parsing from multiple different network pipes. Any of them may fail with a signal or simply become too slow for comfort. Those set of issues lead to extra code needed to handle interrupt/failure & monitoring of performance & achievement of concurrent workloads. It’s valuable to keep the software development specs/design/work of parsing & maybe storing what was parsed in a separate module that is shared with other serialization/deserialization while the generated code is configured to include the special handling that the browser like application needs. Optimizing compilers can be given the task of figuring out which sets of generated and shared routines give the best overall time/space performance on a given set of test cases that can also change over time.
July 28, 2022 at 9:53 pm #119105

josh

Q: Can you state, for the overall development project, what is to be optimized?

A: Rougly this:

Simultaneously maximize the set of coding applications where the metaprogramming development & output is a good choice by minimizing development & redevelopment time for 99.99% correct code & suitability of the runtime created, weighted by the frequency of the various types of coding needs. For the person skilled with the tools, it should be quick to do correct & update correct over a very wide range of applications, & the runtime performance on tests & considering active memory footprint are attractive. So lots of people want to use this framework as their main tool & they save lots & lots of time doing that.

Within that larger progject their are various technical optimizations & niceties that help in this way or that & I have listed & discussed a number of those. There’s nothing here about creating new CS theory – it’s about making better tools & practice by a good match of tools to our actual software engineering needs in practice. There’s potentially a lot of win-win improvements there.
- July 28, 2022 at 10:27 pm #119106
  
  josh
  
  The remarks above do strongly contrast with the de facto CS optimization. They teach that the best tool is the implementation of the best algorithm which is first correct for the broadest theoretical category of grammars & secondly gives the best theoretical time performance in the limit as the length of a grammar sentence stretches out to infinity.
  
  I stress above that we won’t give up the ability to use out general tool for any CFG & application arising in practice, but those are not our optimization goals. We have a lot of other concerns that the CS isn’t bothered about.
July 29, 2022 at 7:10 am #119130

josh

Q: Alt Grammars, minor errors, etc. Is it a PITA?

A: No. At the Metaprogramming level it’s ez to include alternative rules, deprecated forms etc & then use actions to set flags about what was seen – and this can be done for each non-terminal. Conceptually it’s somewhat like the familiar levels of error severity found in logging libs. In the meta-programming finalization recognition/rejection etc. of each non-terminal, actions can respond to whatever form of alternative was seen.
July 29, 2022 at 7:22 am #119131

josh

The actions that go with acceptance/rejection of non-terminals & terminals should have some kind of inheritance structure that encourages re-use. For example, the machinery to say at what point in a file the text going with the term started & ended should be available as a common option that could be included in some relevant cases where it applies. For some linguistic applications, users might want to allow ambiguous grammars & return a multi-tree structure that highlights alt branching – this shouldn’t need to be reprogrammed on a per-grammar basis. Optimizing out unused parts or inlcuding extra machinery happens at code generation time.
October 17, 2022 at 6:10 pm #123795

josh

For PL design, in both metaprogramming & regular, we would like a good partial type system for describing the sequence generators at each coordinate and the overall vector that is getting filtered.

Issues:

Deterministic, Pseudo-random with deterministic seed, unknown external with slow change, or uknown random for each sequence – noting whether the randomness applies to values in the sequence, delays in arrival time, or both.

What if any computational strategy controlled by the system is being used to create the sequence from sources. Are they in a fixed database that we control? Does it have some sort of transactional locking in place? What is the ordering strategy? Do the results of filtering or time of arrivals affect the remaining tails of the sequences (future arrivals)? Is the affect on the ultimate contents considered as a set or merely on the order of arrivals?

Whatever we might want to vary for design or testing or what we might need know about for larger program reasoning/analysis is something we would like to express with types. What’s are attractive and understandable ways for doing that?
Author

Posts

You must be logged in to reply to this topic.

Personal Notes

Personal Notes for Friends

Serialization/DeSerialization Generators