DatabaseJSON: An Extension of JSON Adding Some Extra OPL Features

This topic contains 2 replies, has 1 voice, and was last updated by Josh Stern February 14, 2024 at 8:14 am.

Author

Posts
February 7, 2024 at 1:59 am #126415

Josh Stern
Moderator

JSON has emerged in recent decades as a particularly useful and popular format for storing data records and programming language-related materials in a readable text format. The name, standing for Javascript Object Notation format, uses familiar syntax from the Javascript language and is built into Javascript as a means for serialization and de-serialization of program objects and data. Other languages, including PHP, Python, and Ruby have found similar uses for the JSON file syntax.

Our proposal for DatabaseJSON is a stand-alone part of a larger proposal for updating DBMS systems in ways that provide better native support for application programming. As part of that proposal, we discuss restrictions to WASM that allow RDBMS to host many of the languages that compile to WASM as native DB scripting engines for stored procedures, query definition, and online workflows. Native support for JSON queries is part of that proposal, along with automated support for serialization, de-serialization, and database storage of data and objects in any of the hosted languages. The JSON format is general enough to allow any language’s data to be serialized to JSON using some set of conventions for encoding and decoding. Our note here is an attempt to describe an approach to such methods that is efficient and easy to understand and use, while also being space and time efficient for applications.

The following features deserve special syntactic support in DatabaseJSON:

a) References – References are often implemented in PL using pointers as links from one location to data at another location. The use of references can often avoid redundant copying of data, allowing one version of the data to be referenced from two or many locations.

b) Variable (References) – Variable references allow for the possibility of dynamic mutations to a particular data site that is instantly seen by all referring links. In the Javascript language, the values connected to the keys of an Object may change dynamically, but there is no native support for describing the data associated with a key as another dynamic variable. Additional levels of indirection can be important to languages like C++ that can also compile to WASM and be serialized to persistent storage in a DBMS. In our parent proposal for enhancing RDBMS, we propose to add native support for allowing dynamic variables to version, branch, and merge in a manner inspired by git.

c) BLOBS – BLOB, an abbreviation for “binary large object”, is an RDBMS term indicating that the type of a particular table cell is some kind of link to a hunk of bits that is meaningful to some user application, and merits a secure place in persistent RDBMS storage/retrieval schemes, but is generally not for printing in its entirety, even in text-safe format. Enhancements to RDBMS allow developers to associate PL scripts designed to access a BLOB natively, assess the contained format, and extract featural data. Features, for example, could be based on the analysis of a JPEG image or MP4 video.

d) Bignums – Speciali purpose Bignum libraries implement comparisons and arithmetic on integers and floats with arbitrary levels of precision, so they are resistant to most types of overflow and roundoff error. Eliminating those error possibilities can be a substantial benefit to ease of coding and or debugging in some application situations that may require persistent database storage. The implementation requirement is meant to standardize database support for Bignums.

In the context of the larger DB+ proposal, we advocate for some standard support for application-specified pretty-printing of db types. The motivation for pretty-printing falls squarely within the traditional mission of RDBMS systems – maximizing clarity, comprehension, and display of all viewable data. In many applications, the mapping from semantic or source code data types to DB types will be many-to-one. Pretty-printing should be defined in ways that create 1-1 mappings between all [database, workflow, table, cell] to and viewable textual representations of cell content, which will maximize both aesthetics and comprehension of the application view of a given cell or table or table row.

The text below, informally defines the syntax of DatabaseJSON, showing how the standard JSON can efficiently support and express the presence of the types described above.

DatabaseJSON is indicated by the presence of an outer enclosing Object context – pair curly brace pair {} – with the first listed field in that above having the key value ‘DJSON’. The value associated with DJSON key begins with a string that includes a subset of the 8 letters {‘T’,’L’,’O’,’I’,’C’,V’,’X’,’S’} which have the following interpretation:

‘T’ – Tabulated reference content. The presence of ‘T’ means that any references which appear in the document will be associated with a unique non-negative integer in the value range of a 64-bit unsigned long. In the absence of ‘T’, any reference links that appear are configured using symbolic names and in-line content.

‘L’ – List Tabulated references internally – The presence of ‘L’ implies ‘T’. Additionally, the presence of ‘L’ means that the DatabaseJSON object will catalog the meaning of the 64-bit link references within the document itself rather than by reference to an external source such as a database table (not taking up space in the document).

‘O’ – Indicates that BLOBs are contained in the Object.

‘I’ – Indicates that Bignums are contained in the Object.

‘C’ – Indicates that constant/immutable style reference links are contained in the object.

‘V’ – Indicates that variable/mutable style reference links are contained in the object.

‘X’ – Indicates that variable names – which can be long, unique strings – are tabulated in a manner that parallels what is done for reference links.

‘S’ – Indicates that some references are implemented using unique, symbolic names with inline content.

The optional second half of the DJSON field is a space ‘ ‘ followed by a substring from the following set of choices for secure hash function: [“MD5”, “SHA-1”, “SHA-256”, “SHA-512”, “SHA3–512”].
These standard algorithms are arranged from least expensive to most. The theoretical difficulty of creating artificial collisions increases with the expense. For most real-world purposes, each hunk of distinct data bits maps to a different secure hash value using any of the given algorithms. So, for example, a network app receiving a file that associates various secure hash values with data records can use a secure hash to check whether or not the same hunk of data is already in the local catalog based on comparing and matching the secure hash values.

DatabaseJSON uses secure hash for the dual purpose of i. possibly not including large objects that are already stored and understood in an application-specific manner, ii. describing the addition of new BLOBS to the database, and iii. providing an extra integrity check for large binary objects embedded in the format itself.

A JSON file may contain multiple top-level data artifacts. In this case, only the top-level objects containing the DJSON key field at the top level are set to be interpreted as DatabaseJSON.

Base64 text encoding is a common format for representing binary content in standard ASCII, visible printed text. An alternative text encoding of binaries, yEnc, is more size efficient for use with very large binary blobs – c.f. https://www.tenminutetutor.com/data-formats/binary-encoding/yenc-encoding/
For our purposes, we must include a header line with “=ybegin” and a trailing line with “=yend”, leaving the other size, name, and crc32 fields as optional. Note that in either case, the decoding of the data should skip over un-escaped CR+LF.

Our definitions below use Base64 in the sense of the raw conversion, without a trailing prefix or suffix ala MIME. When binary content for a given field type is Tabulated, 64-bit compatible unsigned numbers can be printed in the programmer’s choice of decimal, hexadecimal, or Base64. For the remainder of this document, we call that NUMB64. Binary content that is not tabulated requires the Base64 form or our version of the yEnc form (see above). Here we call that binary content BC64 and yEnc252

In addition to the prefix DJSON object described above, the top level of a DabaseJSON object may optionally include a trailing suffix object with the key field {“DJSON_share”: } If present, the DJSON_share object is singular for the top level object and all of its child objects. Thus, DJSON and DSHSON_share are reserved key field names for top-level objects in the DatabaseJSON format. Child JSON Objects are interpreted by DatabaseJSON as child objects in the JSON syntax with form constraints and data semantics that inherit the specifications of the DJSON and DHSON_share objects at the top level.

Reference Links and Pointer Links: Reference links are similar to textural references of a pointer. The form of a Reference link depends on whether the enclosing DatabaseJSON object uses Tabulated references (‘T’), and whether symbolic refs (‘S’) are also available. The form for a Tabulated reference is {r: NUMB64} and the form for a symbolic reference is {“r”: , UniqueSymbol: CONTENT}, where UniqueSymbol is unique for the enclosing object as a reference name. CONTENT can be any content that is well-formed as a JSON value.

The form a pointer link is {h: NUMB64 | UniqueSymbol}.

A well-formed DatabaseJSON object assures that the pointer link does refer to a unique Reference ({r:}) location in the same top-level object. A Reference Link object can also contain an h field that points to another Reference Link. This flexibility, could, for example, support the storage of C++ objects with multiple levels of indirection – e.g. ObjectType **ptrType. The DatabaseJSON format doesn’t define particular strategies for serializing languages other than Javascript; it does define a jump-start on a readable, backward-compatible extension that helps to organize some of the thornier parts of binary object serialization in a space-efficient manner. Deserialization of Javascript objects can also benefit from building structures in memory that share fewer copies of repetitive data definitions.

BLOB Objects – The format for a basic BLOB object is {blob64: BC64}. In this case, the data content of the BLOB of variable length is encoded by a BC64 or a yEnc252 string. The format for a tabulated BLOB object is one of two choices: {“blobLx”: NUMB64} or {“blobIdx”: NUMB64}. In the second case, the encoding of the variable length data is shifted to and included within the trailing refLinks section of the DatabaseJSON object. In the third – blobIdx – case, the trailing index maps to a secure hash, by way of an operationally defined database table, and this secure hash is in turn linked to a data location in the distributed application network. In the case of a blob newly introduced to the network, the actual data definition is also introduced within the top-level DatabaseJSON object context by way of an Object definition that defines deltas to the virtual mapping table. Summarizing – blob64 format is inline data, with no secure hash; blobLX is internal tabulated data that gives the inline data for the blob in the context of a trailing table; blobIdx uses the inline table to map the NUMB64 index to a secure hash.

The presence of BigNum, indicated by the ‘I’ flag, simply requires the code processing of numbers to be prepared to recognize any contained in the given object as BigNum.

The interior of the DJSON_share object may contain any subset of the following 3 sub-objects

The refLInks objects define arrays of reference links in the following format:

{“refLinkCnt”: Number // the length of the arrays
“refLinks”: [NUMB64…NUMB64] // refLinkCnt length array of NUMB64
“refLinkData”: [JSON..JSON] // refLinkCnt length array of JSON, including DatabaseJSON children.
}
Cells at the same index of each array are paired. BLOB objects in the blobLX format require a BC64/yEnc252 string encoding of data. BLOB objects in the blobIdx format require a BC64 encoding of a secure hash value from the algorithm specified by the DJSON object.

Another object may tabulate unique variable names in a similar fashion.

{“varLinkCnt”: Number // the length of the arrays
“varLinks”: [NUMB64…NUMB64] //varLinkCnt length array of NUMB64
“varLinkNames”: [string…string] //varLinkCnt length of string

Another object may describe data additions to a virtual table

{“IndexBlobTab”: string //virtual table name
“idCol”: string //virtual table key col that holds NUMB64 indexes
“shCol”: string // virtual table key col that holds secure hash values
“version”: int.int //OPTIONAL field – our belief about which version of the table we are updating to with our deltas
“numAdd”: number // number of new additions
“addID”: [NUMB64…NUMB64] //numAdd length array of indexes
“addSH”: [BC64…BC64] //numAdd length array of secure hash values
“addData”: [BC64…BC64] //numAdd length array of Base64 encoded blobs
{alt: addData_yEnc252: [yEnc252…yEnc252] }
“numDel”: number // number of new deletions
“delID”: [NUMB64…NUMB64] //numDel length array of old ids to forget in this version of IndexBlobTab
“delSH”: [BC64…BC64] //numDel length of the array of old secure hash to forget in this version of IndexBlobTab
failForBadIDMerge: boolean – Should the operation fail entirely if an add id refers to a different secure hash in the given namespace? The Answer=Yes, if this field is set to true.
}

The

Config Sections – The JSON format benefits from wide software support, and is also sometimes used as a readable format for config/ini files and structured document models. Compared to XML, JSON is more readable for humans. However, the TOML format is better yet in that specific regard, and TOML could be used as a pretty-printed format for JSON/DatabaseJSON. This connection, along with some programmatic uses, is enhanced by specifying a default mechanism to identify headers for sections and sub-sections. Let the tag {“LetSection”: “SectionName”} indicate that “SectionName” is the designated name for the section containing the contents of that object. And the tag functions in a similar way for sub-sections of other objects.

SecureHash codes support magical merging and collation in distributed apps that did not previously synchronize. Index-generated ids do not share that property and may be inconsistent between multiple documents or documents and tables. If the index used was only intended for internal consistency in the present document, that is not an issue. Otherwise, it indicates an application failure. The operation may also fail due to version conflicts where versioning is maintained and updated by multiple parties. The version number can be set to -1.0 to indicate no opinion about versioning.

DatabaseJSON is motivated by a desire to support a maximally flexible, application-oriented, possibly distributed, common platform for relational data, program data, and program data structure/object persistence. We imagine ways in which database programming, embedded storage, and network apps can become easier through the sharing and reuse of more advanced basic library routines. For example, if it was easy for apps to specify and save multiple “init” states where they commonly begin work, and to load those states directly from a secure database, substantial savings in start-up time – programmatic and manual – might be achieved on a broad scale. The infrastructure for such a facility could be introduced as library routines, motivating widespread application adoption.

Note: edit 2/7/2024 – added yEnc252 as an alternative to Base64 for blob encodings.
February 14, 2024 at 8:14 am #126416

Josh Stern
Moderator

JSON has emerged in recent decades as a particularly useful and popular format for storing data records and programming language-related materials in a readable text format. The name, standing for Javascript Object Notation format, uses familiar syntax from the Javascript language and is built into Javascript as a means for serialization and de-serialization of program objects and data. Other languages, including PHP, Python, and Ruby have found similar uses for the JSON file syntax.

We especially like the JSON5 variant that does not require double-quoting for names of object fields.

Our proposal for DatabaseJSON is a stand-alone part of a larger proposal for updating DBMS systems in ways that provide better native support for application programming. As part of that proposal, we discuss restrictions to WASM that allow RDBMS to host many of the languages that compile to WASM as native DB scripting engines for stored procedures, query definition, and online workflows. Native support for JSON queries is part of that proposal, along with automated support for serialization, de-serialization, and database storage of data and objects in any of the hosted languages. The JSON format is general enough to allow any language’s data to be serialized to JSON using some set of conventions for encoding and decoding. Our note here is an attempt to describe an approach to such methods that is efficient and easy to understand and use, while also being space and time efficient for applications.

The following features deserve special syntactic support in DatabaseJSON:

a) References – References are often implemented in PL using pointers as links from one location to data at another location. The use of references can often avoid redundant copying of data, allowing one version of the data to be referenced from two or many locations.

b) Variable (References) – Variable references allow for the possibility of dynamic mutations to a particular data site that is instantly seen by all referring links. In the Javascript language, the values connected to the keys of an Object may change dynamically, but there is no native support for describing the data associated with a key as another dynamic variable. Additional levels of indirection can be important to languages like C++ that can also compile to WASM and be serialized to persistent storage in a DBMS. In our parent proposal for enhancing RDBMS, we propose to add native support for allowing dynamic variables to version, branch, and merge in a manner inspired by git.

c) BLOBS – BLOB, an abbreviation for “binary large object”, is an RDBMS term indicating that the type of a particular table cell is some kind of link to a hunk of bits that is meaningful to some user application, and merits a secure place in persistent RDBMS storage/retrieval schemes, but is generally not for printing in its entirety, even in text-safe format. Enhancements to RDBMS allow developers to associate PL scripts designed to access a BLOB natively, assess the contained format, and extract featural data. Features, for example, could be based on the analysis of a JPEG image or MP4 video.

d) Bignums – Speciali purpose Bignum libraries implement comparisons and arithmetic on integers and floats with arbitrary levels of precision, so they are resistant to most types of overflow and roundoff error. Eliminating those error possibilities can be a substantial benefit to ease of coding and or debugging in some application situations that may require persistent database storage. The implementation requirement is meant to standardize database support for Bignums.

In the context of the larger DB+ proposal, we advocate for some standard support for application-specified pretty-printing of db types. The motivation for pretty-printing falls squarely within the traditional mission of RDBMS systems – maximizing clarity, comprehension, and display of all viewable data. In many applications, the mapping from semantic or source code data types to DB types will be many-to-one. Pretty-printing should be defined in ways that create 1-1 mappings between all [database, workflow, table, cell] to and viewable textual representations of cell content, which will maximize both aesthetics and comprehension of the application view of a given cell or table or table row.

The text below, informally defines the syntax of DatabaseJSON, showing how the standard JSON can efficiently support and express the presence of the types described above.

DatabaseJSON is indicated by the presence of an outer enclosing Object context – pair curly brace pair {} – with the first listed field in that above having the key value ‘DJSON’. The value associated with DJSON key begins with a string that includes a subset of the 9 letters {‘T’,’L’,’O’,’I’,’C’,V’,’X’,’S’,’G’} which have the following interpretation:

‘T’ – Tabulated reference content. The presence of ‘T’ means that any references which appear in the document will be associated with a unique non-negative integer in the value range of a 64-bit unsigned long. In the absence of ‘T’, any reference links that appear are configured using symbolic names and in-line content.

‘L’ – List Tabulated references internally – The presence of ‘L’ implies ‘T’. Additionally, the presence of ‘L’ means that the DatabaseJSON object will catalog the meaning of the 64-bit link references within the document itself rather than by reference to an external source such as a database table (not taking up space in the document).

‘O’ – Indicates that BLOBs are contained in the Object.

‘I’ – Indicates that Bignums are contained in the Object.

‘C’ – Indicates that constant/immutable style reference links are contained in the object.

‘V’ – Indicates that variable/mutable style reference links are contained in the object.

‘X’ – Indicates that variable names – which can be long, unique strings – are tabulated in a manner that parallels what is done for reference links.

‘S’ – Indicates that some references are implemented using unique, symbolic names with inline content.

‘G’ – Indicates that Base96G is being used as the preferred alternative to Base64 in “NUMB64” contexts, or any place where a raw number or binary is being textually encoded & there is not a special header set indicating yEnc252.

The optional second half of the DJSON field is a space ‘ ‘ followed by a substring from the following set of choices for secure hash function: [“MD5”, “SHA-1”, “SHA-256”, “SHA-512”, “SHA3–512”].
These standard algorithms are arranged from least expensive to most. The theoretical difficulty of creating artificial collisions increases with the expense. For most real-world purposes, each hunk of distinct data bits maps to a different secure hash value using any of the given algorithms. So, for example, a network app receiving a file that associates various secure hash values with data records can use a secure hash to check whether or not the same hunk of data is already in the local catalog based on comparing and matching the secure hash values.

DatabaseJSON uses secure hash for the dual purpose of i. possibly not including large objects that are already stored and understood in an application-specific manner, ii. describing the addition of new BLOBS to the database, and iii. providing an extra integrity check for large binary objects embedded in the format itself.

A JSON file may contain multiple top-level data artifacts. In this case, only the top-level objects containing the DJSON key field at the top level are set to be interpreted as DatabaseJSON.

Base64 text encoding is a common format for representing binary content in standard ASCII, visible printed text. An alternative text encoding of binaries, yEnc, is more size efficient for use with very large binary blobs – c.f. https://www.tenminutetutor.com/data-formats/binary-encoding/yenc-encoding/
For our purposes, we must include a header line with “=ybegin” and a trailing line with “=yend”, leaving the other size, name, and crc32 fields as optional. Note that in either case, the decoding of the data should skip over un-escaped CR+LF.

Here we define an additional, simpler encoding scheme that we call Base96G. Base96G is simpler to define, compute, and use as an inline form, while offering a large file efficiency in between Base64 and yEnc252. Base96G focuses on the point that the character set we like best for readable network encoding is the subset of ascii called “printable”. This set has a fixed definition for C,C++ and other languages & is easily included in any language application as a table – c.f. https://en.cppreference.com/w/c/string/byte/isprint
This set of printable, non-whitespace characters has 94 members, including single & double quotes. For application application programming reasons, we wish to remove the double quote character (“). So, for example, we use the first character in the table, ‘!’, to encode the number 0x0. And the highest number in the table, ‘~’ represents the number 92 in Base 93 or Base 96. It turns out that achieving a set of 96 tokens via the use of an escape character yields a further gain in space efficiency when we are encoding 8 byte unsigned numbers. In order to create a set of 96 tokens, we add the following alterations to our tabular format:
write \[ instead of [ (representing 58)
write \\ instead of \ ( representing 59)
write \] instead of ] (representing 60)
write \^ instead of ^ (representing 61)
write [ without the preceding \ to represent 94
write ] without the preceding \ to represent 95
write ^ without the preceding \ to represent 96

With this set of 96 tokens representing unsigned integers, we can encode every sequence of 8 unsigned bytes using 9 printable “tokens” as described above. Roughly 4/96 % of random cases will cause an extra character to print. The overall blowup going from binary to printable, non-whitespace strings using the BG96 method is roughly 13%. This is an improvement over Base64 on its own terms. The alternative yEnc252 is more efficient for large files, while using many non-printable and whitespace characters and requiring additional space for line prior & next line header and footer notations. Note that the encoding algorithm does not require 64bit CPUS. The simplest version involves transcoding up to 8 bytes at a time using mask sections focusing on 7-bit sections – c.f. https://cs.stackexchange.com/questions/10318/the-math-behind-converting-from-any-base-to-any-base-without-going-through-base

The encoding & decoding concept for Base96G is to consider every sequence of up to 8 bytes as a 64 bit unsigned number expressed in 8 bytes of Base 256, which can be recorded as a 9 token, 9-10 byte number expressed in Base 96, written using the corresponding tokens of that use graphic ascii characters. Our key motivation: Base96G is particularly useful for cryptographic network applications that commonly need to encode 16,32,64, and 128 byte binary numbers as text (their textual Base96G versions are now, respectively 18+,36+, 72+, and 160 graphic 1-byte char, which could also be double-quoted inline).

The avoidance of whitespace & non-printable characters in Base96G, considered as strings, supports the possibility of computing unique message digests for formatted network data in a way that simply ignores whitespace and non-printable characters and their insignificant variations for purposes of digest computation. This property may be particularly useful for various operations that involve authentication.

One application is the ability to place message digests, signed or unsigned, within the same digital object. This is easily accomplished by leaving blank the required amount of whitespace to hold the digest. If the process that produces the digest understands that it is to ignore free whitespace, then the presence of the whitespace will have no effect on the algorithm result. In a formatted document with fields, it is easy to temporarily add/erase a value. So the convention can be adopted that the value of the digest itself is to be erased when the digest value is being checked.

One potentially valuable application of the idea above: build containers with API’s that allow users to query for the signed message digest of the running container itself.

As another example, consider the application of these techniques to production of families of digital forms that facilitate various types of authoritative stamping including official timestamps & digital notarization.
For example, we can define Sig96(_docX_,SecureHashY (e.g. SHA512), PKI_Z) to mean that we apply Base96G to IgnoringWhitespaceAndUnprintable(_docX), apply SecureHashY to that result, and generate a signature for the hashed result using a public/private key pair from PKI_Z. This description of Sig96 is a kind of template for defining different schemes that can vary in PKI algorithms but also vary in the nature of _docX_ and its requirements. For example we could say that _docX is in DatabaseJSON form including these required fields/value_types (,….). Further validation based on the semantic meaning of v1,v2, etc may include both digital and human-centric protocols. This form of definition allows inheritance reuse by additional schemes that include (,…) along with additional requirements.

For example, consider an application with
i) Confidential digital contents to be conveyed by party A to party B,
ii) A requirement for an authority C playing the role of a digital notary to authenticate the connection of the sender, the contents, and an absolute timestamp.
We may devise a protocol where the sending party A securely submits the digital contents to party C along with the address and PKI info for B. Party C is able to read the unencrypted contents and A’s secure signature of the unencrypted contents and a recent timestamp. In some situations, additional protcols require A to submit some advanced identification credentials, such as biometrics recently matched by the PKI system to his PKI and canonical id coordinates (e.g. legal name & address). Party C may then digitally sign a formatted digital data object describing a secure hash of the underlying digital object, a secure hash of A’s credentials, public PKI, and timestamp, a check that id authentication protocols were observed. Party C will wrap this in a secure, protected transmission that is sent to Party B. In this scenario, the role of notarizing documents is fully carried out without the inconvenience of physical travel by the participants. Many such scenarios may play out with slightly different needs for formatted digital forms that are signed. We offer a way and a light towards how to do that: a) define a generally useful way of formatting digital information containing required fields, and b) find a way of unambiguously adding secure hash and digital signatures to those documents – the combination of using DatabaseJSON for a) and

Our definitions below may use Base64 and Base96G in the sense of the raw conversion, without a trailing prefix or suffix ala MIME. When binary content for a given field type is Tabulated, 64-bit compatible unsigned numbers can be printed in the programmer’s choice of decimal, hexadecimal, Base96G, or Base64. For the remainder of this document, we call that NUMB64. Binary content that is not tabulated requires the BBase64 form,Base96G or our version of the yEnc form (see above). Here we call that binary content BC64 and yEnc252

In addition to the prefix DJSON object described above, the top level of a DabaseJSON object may optionally include a trailing suffix object with the key field {“DJSON_share”: } If present, the DJSON_share object is singular for the top level object and all of its child objects. Thus, DJSON and DSHSON_share are reserved key field names for top-level objects in the DatabaseJSON format. Child JSON Objects are interpreted by DatabaseJSON as child objects in the JSON syntax with form constraints and data semantics that inherit the specifications of the DJSON and DHSON_share objects at the top level.

Reference Links and Pointer Links: Reference links are similar to textural references of a pointer. The form of a Reference link depends on whether the enclosing DatabaseJSON object uses Tabulated references (‘T’), and whether symbolic refs (‘S’) are also available. The form for a Tabulated reference is {r: NUMB64} and the form for a symbolic reference is {“r”: , UniqueSymbol: CONTENT}, where UniqueSymbol is unique for the enclosing object as a reference name. CONTENT can be any content that is well-formed as a JSON value.

The form a pointer link is {h: NUMB64 | UniqueSymbol}.

A well-formed DatabaseJSON object assures that the pointer link does refer to a unique Reference ({r:}) location in the same top-level object. A Reference Link object can also contain an h field that points to another Reference Link. This flexibility, could, for example, support the storage of C++ objects with multiple levels of indirection – e.g. ObjectType **ptrType. The DatabaseJSON format doesn’t define particular strategies for serializing languages other than Javascript; it does define a jump-start on a readable, backward-compatible extension that helps to organize some of the thornier parts of binary object serialization in a space-efficient manner. Deserialization of Javascript objects can also benefit from building structures in memory that share fewer copies of repetitive data definitions.

BLOB Objects – The format for a basic BLOB object is {blob64: BC64}. In this case, the data content of the BLOB of variable length is encoded by a Base96G. BC64 or a yEnc252 string. The format for a tabulated BLOB object is one of two choices: {“blobLx”: NUMB64} or {“blobIdx”: NUMB64}. In the second case, the encoding of the variable length data is shifted to and included within the trailing refLinks section of the DatabaseJSON object. In the third – blobIdx – case, the trailing index maps to a secure hash, by way of an operationally defined database table, and this secure hash is in turn linked to a data location in the distributed application network. In the case of a blob newly introduced to the network, the actual data definition is also introduced within the top-level DatabaseJSON object context by way of an Object definition that defines deltas to the virtual mapping table. Summarizing – blob64 format is inline data, with no secure hash; blobLX is internal tabulated data that gives the inline data for the blob in the context of a trailing table; blobIdx uses the inline table to map the NUMB64 index to a secure hash.

The presence of BigNum, indicated by the ‘I’ flag, simply requires the code processing of numbers to be prepared to recognize any contained in the given object as BigNum.

The interior of the DJSON_share object may contain any subset of the following 3 sub-objects

The refLInks objects define arrays of reference links in the following format:

{“refLinkCnt”: Number // the length of the arrays
“refLinks”: [NUMB64…NUMB64] // refLinkCnt length array of NUMB64
“refLinkData”: [JSON..JSON] // refLinkCnt length array of JSON, including DatabaseJSON children.
}
Cells at the same index of each array are paired. BLOB objects in the blobLX format require a BC64/yEnc252 string encoding of data. BLOB objects in the blobIdx format require a BC64 encoding of a secure hash value from the algorithm specified by the DJSON object.

Another object may tabulate unique variable names in a similar fashion.

{“varLinkCnt”: Number // the length of the arrays
“varLinks”: [NUMB64…NUMB64] //varLinkCnt length array of NUMB64
“varLinkNames”: [string…string] //varLinkCnt length of string

Another object may describe data additions to a virtual table

{“IndexBlobTab”: string //virtual table name
“idCol”: string //virtual table key col that holds NUMB64 indexes
“shCol”: string // virtual table key col that holds secure hash values
“version”: int.int //OPTIONAL field – our belief about which version of the table we are updating to with our deltas
“numAdd”: number // number of new additions
“addID”: [NUMB64…NUMB64] //numAdd length array of indexes
“addSH”: [BC64…BC64] //numAdd length array of secure hash values
“addData”: [BC64…BC64] //numAdd length array of Base64 encoded blobs
{alt: addData_yEnc252: [yEnc252…yEnc252] }
“numDel”: number // number of new deletions
“delID”: [NUMB64…NUMB64] //numDel length array of old ids to forget in this version of IndexBlobTab
“delSH”: [BC64…BC64] //numDel length of the array of old secure hash to forget in this version of IndexBlobTab
failForBadIDMerge: boolean – Should the operation fail entirely if an add id refers to a different secure hash in the given namespace? The Answer=Yes, if this field is set to true.
}

The

Config Sections – The JSON format benefits from wide software support, and is also sometimes used as a readable format for config/ini files and structured document models. Compared to XML, JSON is more readable for humans. However, the TOML format is better yet in that specific regard, and TOML could be used as a pretty-printed format for JSON/DatabaseJSON. This connection, along with some programmatic uses, is enhanced by specifying a default mechanism to identify headers for sections and sub-sections. Let the tag {“LetSection”: “SectionName”} indicate that “SectionName” is the designated name for the section containing the contents of that object. And the tag functions in a similar way for sub-sections of other objects.

SecureHash codes support magical merging and collation in distributed apps that did not previously synchronize. Index-generated ids do not share that property and may be inconsistent between multiple documents or documents and tables. If the index used was only intended for internal consistency in the present document, that is not an issue. Otherwise, it indicates an application failure. The operation may also fail due to version conflicts where versioning is maintained and updated by multiple parties. The version number can be set to -1.0 to indicate no opinion about versioning.

DatabaseJSON is motivated by a desire to support a maximally flexible, application-oriented, possibly distributed, common platform for relational data, program data, and program data structure/object persistence. We imagine ways in which database programming, embedded storage, and network apps can become easier through the sharing and reuse of more advanced basic library routines. For example, if it was easy for apps to specify and save multiple “init” states where they commonly begin work, and to load those states directly from a secure database, substantial savings in start-up time – programmatic and manual – might be achieved on a broad scale. The infrastructure for such a facility could be introduced as library routines, motivating widespread application adoption.

Note: edit 2/7/2024 – added yEnc252 as an alternative to Base64 for blob encodings.

Note: edit 2/9/2024 – defined Base96G as an alternative to Base64 and yEnc252. Base96G is superior for coding short binaries in most applications. Also added a section on the use of JSON5 & efficient transcoding in digital contracts & notarization.
Author

Posts

You must be logged in to reply to this topic.

Personal Notes

Personal Notes for Friends

DatabaseJSON: An Extension of JSON Adding Some Extra OPL Features