Creating Our SMIL-like Software Platform

This topic contains 1 reply, has 1 voice, and was last updated by Josh Stern January 24, 2024 at 6:45 pm.

Author

Posts
January 24, 2024 at 6:45 pm #126412

Josh Stern
Moderator

My Grammarly edit of this from roughly the same dates was somehow a lot more complete:

W3C focused on the goal of “Synchronized Multimedia Language” before the rise of YouTube & Netflix. Their final standard version appeared in 2008: https://www.w3.org/TR/SMIL3/
It was verbose and early. Only a few implementations were created.

Fast forward to today where GT is working to create user-focused Internet TV/Radio/Interactive using bits of YouTube-like video carefully strung together in directed graphs, with controls & branching points. Now we see the need for systematic integration with synchronized interactivity, alternative presentations (comments), material for NLP, and versions that work on alternative output devices – e.g. to support resumption between working on a desktop/laptop and walking about wirelessly.

So we need something similar to SMIL 3.0 in the sense that lots of good thought went into that project & many useful types of content dimensions were considered & defined. But we need to avoid these problems:

a) Verbosity
b) One big mass of definitions that is a long road to implement and an even longer road to conformance testing
c) Streaming rather than “the file” should be the basic mode of comprehension. Updates to “the integrated multimedia presentation & control state” can come as deltas. General efforts should be made to allow for resumption after REST – i.e. not every delta is REST, but they should all exist within well-defined REST epochs, modulo security/authorization concerns. Relevant methods of adaptive data compression can be considered at the protocol level – i.e. “use profile X7234 for Dimensions [Device, ProgramDictionary].
d) design should assume that “natural language” and spoken interfaces will eventually be a conventional alternative rather than a re-design.
e) Initial user acceptability & learning curve are often related to user experience & familiarity with the system of controls that is available & the methods for accessing them. How does the novice learn a given system? How does the interface change as the user moves from novice to expert? How can different systems of controls be implemented & configured on the same platform? Consider allowing some aspects of the presentation to be parameterized by choice of control UI and user experience level.

f) The concept of “profiles” for data compression can & should be extended to “profiles” for content authoring & profiles for conformance testing. For conformance testing of “playback”, the profile can be used to establish the limitations/promises of a given implementation – it only claims to implement a subset of the overall playback universe. In the case of content authoring, there is a set of profiles for the sets of content types that can be (easily) authored by a given software program, and there is a set/universe of playback devices & user audiences that they target, with possibly different alternative modalities for different subsets of that universe. Interface elements akin to skins/stylesheets, and widely distributed (e.g. CDN) javascript libraries are an example of this sort of cached “profile” element. Domain models of what is visually/auditorily represented and user<==>UI interaction models are additional examples that may be reusable parts of hypermedia presentations/programs.

g) Generality of application & software reuse to different user & device contexts can benefit from high-level models of user<==>UI interaction. From the software POV, it may be helpful to model the abstract user as possessing a small/abstracted version of a given UI presentation – i.e. some info has been seen already & some have not; some authorization/acknowledgment/branching/consent choices or gestures have been made, some were presented by not yet selected/confirmed, and others have not been presented yet. A simple, but generalizable protocol for abstract user<==>UI interaction would allow some elements of the program & the constraints to be expressed at an abstract level which can be reified in different ways over various devices & multi-session wanders.

The text above calls for a protocol & standards definition that is more streaming-oriented and less verbose than SMIL3 while being adapted to modalities of data compression & partial implementation of formats. This raises the question – moving away from a file-oriented “language” definition, what type of definitions would be best for both standards & future open development?? We recommend the following division of specs:

I. Require a messaging protocol with standard features – c.f. this review https://getstream.io/blog/messaging-protocols/ . We believe that most of the abstract features of AMQP, including P2P encryption & multi-way publish/subscribe fan-out are useful & relevant. However, we would not restrict our protocol to a specific layer like TCP. We would rather allow the definition to work over many different forms of transport, including http tunneling & layers adapted over IoT. Note that where TCP duplex is available, it may be possible to gain efficiencies over TCP by implementing packet and message control/integrity at the protocol level over possibly non-duplex channels. The protocol should support streams of data messages with a defined sequence that may be completed after some finite count or extended indefinitely.

II. The concepts of user authorization, identification, session boundaries, & virtual message sequence assembly are useful & familiar. However, we should recognize different types of practical issues that require a multi-level definition of these concepts. We need some forms of authorization & identification at the messaging protocol level to implement effective protections against spamming/denial-of-service attacks & other misuse of network resources. One the other hand, different applications may implement different concepts of session boundaries/continuation, resumption, & REST that span gaps in time, network re-authorizations, changes in output device, etc.

III. The form of our definitions should provide maximum convenience & flexibility for building higher levels of application protocol on top of the lower levels while also extracting the relevant information about what was determined by those lower levels. One way to do this is as follows:

a. Name each protocol level & the lower level that it depends on.

b. Assume that output from each level can be encoded as something like a stream of JSON messages with certain reserved keyword names/fields that apply to the lower level & new names/fields that are reserved by the new protocol level. Non-reserved names/data is to be bundled & passed along to other unknown application layers. In earlier writing, I described an efficient extension of JSON for distributed database applications that I called DatabaseJSON – c.f. https://pers.clarityfirst.xyz/forums/topic/databasejson-an-extension-of-json-adding-some-extra-opl-features/

JSON itself is designed to describe serialized forms of any sort of PL object or array efficiently, and it has become a favorite format used by many PL languages for that purpose. DatabaseJSON added syntactic conventions to support the following useful features in a standard, efficient way:

i) References – References are often implemented in PL using pointers as links from one location to data at another location. The use of references can often avoid redundant copying of data, allowing one version of the data to be referenced from two or many locations.

ii) Variable (References) – Variable references allow for the possibility of dynamic mutations to a particular data site that is instantly seen by all referring links. In the Javascript language, the values connected to the keys of an Object may change dynamically, but there is no native support for describing the data associated with a key as another dynamic variable. Additional levels of indirection can be important to languages like C++ that can also compile to WASM and be serialized to persistent storage in a DBMS. In our parent proposal for enhancing RDBMS, we propose to add native support for allowing dynamic variables to version, branch, and merge in a manner inspired by git.

iii) BLOBS – BLOB, an abbreviation for “binary large object”, is an RDBMS term indicating that the type of a particular table cell is some kind of link to a hunk of bits that is meaningful to some user application, and merits a secure place in persistent RDBMS storage/retrieval schemes, but is generally not for printing in its entirety, even in text-safe format. Enhancements to RDBMS allow developers to associate PL scripts designed to access a BLOB natively, assess the contained format, and extract featural data. Features, for example, could be based on the analysis of a JPEG image or MP4 video.

iv) Bignums – Speciali purpose Bignum libraries implement comparisons and arithmetic on integers and floats with arbitary levels of precision, so they are resistant to most types of overflow and roundoff error. Eliminating those error possibilities can be a substantial benefit to ease of coding and or debugging in some application situations that may require persistent database storage. The implementation requirement is meant to standardize database support for Bignums.

Parsers for JSON are already implemented in most modern programming languages. DatabaseJSON uses JSON syntax and is correctly parsed by JSON parsers. DatabaseJSON can be used to efficiently serialize the internal data structures of modern programming languages. A key point about standardizing references and blobs is to avoid wasteful overhead that would otherwise be introduced when the cyclic binary structure of internal data objects was forced into an acyclic printable format for JSON. Another key point is about providing efficient support for saying whether or not two or more large strings/blobs are the same or different along with efficient support for compactly naming them. Cryptography provides secure hash algorithms for converting arbitary blobs into standard binary strings in lengths like 128, 256, or 512 bits (divide by 8 to convert that to bytes). Secure hashes have the property that it is extraordinarily unlikely for an algorithm to create the same secure hash from 2 different strings. If we are only worried about random chances of that, then the 128-bit MD5 algorithm is already sufficient to make the chances of collision too small to worry about. Other secure hashes provide increased security again attempts of an adversary to create collisions by dedicated search. The extra security comes at the cost of greater computational effort and data storage.

Secure hashes provide one sort of route to general object ids. We can parameterize a scheme for a given application by fixing which secure hash algorithm it is using. Then we say that any pair of objects that yield the same secure hash are exact digital copies of one another. In WAN applications with trusted peers, the simplest way to provide an object id is to use a combination [localNodeName, currentIncCntrValue]. This method will generally be the fastest in terms of time to compute and the amount of storage used for the average object. The goal for DatabaseJSON is to support the interleaving of those two approaches, using each where it makes the most sense, while allowing conversion where that is appropriate.

Designating which handles/data fields are to be used as variables with dynamic updating is another important point of clarification. Handling copy-on-write with automated versioning updates of variables is also a convenient form of platform support that would benefit most distributed database systems. DatabaseJSON aims to provide the appropriate level of syntactic sugar/support for those needs as well. For example, most types of database locking for transactions can be avoided if we require the software to specify which copy-on-write processes are updating the HEAD/current versions of which dynamic variables, are explicitly declared as such.

DatabaseJSON exemplifies the use of syntactic sugaring with application-relevant semantics that is layered over the JSON syntax platform. Similar strategies can be used to define multiple layers of network application processing that are built on top of lower layers which transform SugaredVersionJSON_I => SugaredVersionJSON_J (modified & re-written by application needs). Allowing each layer to specify it’s reserved key words, ignoring sub-trees that are tagged with unknown words, allows for a modular, incremental approach to the creation of network application standards. Where SMIL3.0 presented an unwieldy and impractical monolith to the implementation of conformant players & authoring tools using a file format, our suggested approach to sub-profiles and messaging allows for the incremental development of “Synchronized Multimedia Languages” for describing the playback and authoring of streaming network content.
Author

Posts

You must be logged in to reply to this topic.

Personal Notes

Personal Notes for Friends

Creating Our SMIL-like Software Platform