Design Features for DB+ Programming

This topic contains 12 replies, has 1 voice, and was last updated by Josh Stern November 5, 2022 at 11:08 pm.

Author

Posts
November 4, 2022 at 2:25 am #124418

Josh Stern
Moderator

In other posts, we advocated the utility of relating metaprogramming in source code to domain models with well defined formal properties that live on VR state machine models that admit formal reasoning & proofs.

Note here that various transactional models, query models, & database management/provising models can be readily described using flow diagrams in the VR state machine language. Defining types of flow diagrams as types of data repositories might be a good way to go.
November 4, 2022 at 2:43 am #124419

Josh Stern
Moderator

Also food for thought:

Consider YouTube as a kind of unusual dynamic database that you don’t control. It constantly changes. Some application might build an abstract repository representing slices they are interested in and another repositiory representing a bridge to some of their own video postings.
November 4, 2022 at 5:56 am #124420

Josh Stern
Moderator

Q: Do these features provide any benefits for a lowly sole proprietor running a database on his own computer & operating a website?

A: A platform that supports ease for many related types of programming brings benefits for the users that don’t need all of those features. This is due to code sharing for the platform among larger classes of developers, some of whom are working in application neighborhoods that are near to unmet needs.

Let’s flesh out a YouTube example. Joe has a biz scouting HS football players & making recs to college recruiters and advertisers. As part of that biz, he wants to search through many thousands of YouTube clips of performances and highlight interesting things, showing small embedded clips on his website along with other info, stats etc. The website will be frequently updated, and pershaps there is even some feature that updates in an unattended fashion. However, most of the features are things Joe prepares as an editor, picking out snippets of interest with the help of special AI/video software/editing tools. It’s not unreasonable that part of his database is on his home office computer & part is remotely connected to his website. In either case, he doesn’t want family members to be able to accidentally corrupt or read unreleased files and he doesn’t want website visitors to get in there either. Searching for YouTube is functionality that he wants to run as a daemon,either from his home machine, or a remote server. As part of his workflow, he wants to navigate through database views, selected by different query features, & do some video browing/editing work. Even if Sqlite3 embedded was okay as a database for him as single worker, he is still interested in support for continuous daemons that are adding to the files & custom support for video/browsing/editing on the database cursors of his queries.

The example can be embellished in other ways, but it isn’t met to be extreme. The reality is that the existing tools are too unwieldy for Joe even if he is an IT pro. They are too much messy work with too many places for problems & poor performance. An eco system which makes these sort of things routine is valuable to Joe’s single pro biz, whether or not he does the work himself, uses a followup “Ez wizard CASE tool to help” or pays a developer.

If code allows virtual nodes & containers & OS based RBAC control then there may be bits code that are different branches for the case of a database that simply runs as a different daemon user & implements the RBAC itself on top of that and connects via sockets. But in the ecosystem of modern gear, the “costs” for the common setup are not meaningful or noticed by Joe, so long as contributors, commercial or open source, take care to document his use cases.
November 4, 2022 at 5:19 pm #124421

Josh Stern
Moderator

Q: What would be the ideal form of a stored procedure that do complex things & AI?

A: A big decision is whether or not the procedure will run as a DB inline, a DB thread or an independent process? The decision is weighty enough that the design should plan for typing all 3 & let applications decide which fits their needs. For inline, I think something based on portable webassembly compiled from guarded inputs could be a good choice. The thread & the OS procedure should not be binary portable, but should support portability at the container level.

Q: Guarded means what?

A: For running inside the db process, on each row, with strong performance likely, we would like to restrict web assembly or something else to an in-memory, single-threaded subset. If the guard is not satisfactory, then use the threaded or independent process model. The language implementation should define a “best available” transport with common interface to each version – e.g. use non-blocking shared memory queues if they are avaiable, or sockets to another process if that is best available, etc. Make it easiest to move source code to different versions, while allow proprietary AI process to run in another container too. For each, there may be a similar abstraction of “init/persist/call/sleep/…deconstruct”
- November 4, 2022 at 5:28 pm #124422
  
  Josh Stern
  Moderator
  
  Interesting side Q is which containers do or readily could support moving the persistent state of a sleeping thread as a binary to the same container binary, different instance, on same or remote machine? Not all threads, but some special ones?
November 5, 2022 at 5:37 am #124430

Josh Stern
Moderator

One of the best features of C++ is the ability to create scope for arbitary size blocks of code inside of a function using { for begin scope and } for end scope. Exception handling specialized for the scope can be listed just after the end. The scope governs the handling of variables/functions introduced inside of it.

For DB+ it would be nice to see something similar that could set regional default conditions for sets of db code inside, including things related to what is an exception & how it is handled, how NULL is handled, local AS definitions, and similar features.
November 5, 2022 at 6:00 am #124434

Josh Stern
Moderator

Thinking about Daemon ports, smart filtering & push, we can imagine user/commercial partnerships for curated push (flash offers, etc) that get stored on a kind of temp db/box & only create a cell phone note if they pass further criteria set by the user. The point is to ignore it unless it’s something likely to be interesting. Adverts should be motivated to behave or be blocked/ignored.
November 5, 2022 at 6:09 am #124435

Josh Stern
Moderator

If I want to associate tags/features with a particular relation/table or view, it’s awkward & messy to requir the inefficient clutter of another table to hold that data. JSON like tags for the table & defined view objects should be standard.
November 5, 2022 at 8:40 am #124458

Josh Stern
Moderator

A common set of primitives for progressive persistence of relations is a benefit for developers & language designers.

Name Reservation – can be inside or outside of a transaction – if it’s inside, then the DB checks that it’s ok at that moment. Why break that off? Programatically it’s a lot better to deal with a conflict up front that might require interactivity and loss of session work or resources to hold a session up.

Memory Caching can be specified as a performance optimization that’s useful for even some large temporary tables that will soon be discarded. We may want to specify lifetimes for the cache within a session, relative to activity, or for some time period after table creation commmit – could be permament as a property even if not literally permanent in active memory.

Table Type Defiitions – should be things we can construct & pass as variables & edit as data structures in their own right. Should have properties that are directly attachable (see note above).

The Table can be created & populated inside or outside of a transaction. Outside means immediate commit of the empty table. For some tables – e.g. like views – it may be desirable to specify that they will be deleted after a certain period of time.

A view can be anonymous or have a name. In the programming context, we can give a view a variable name that is not a permanent db storage name. The system may offer a feature that lets views be saved in some temp edit sstack based on the variable name – e.g. supporting undo.

The database system may choose to put some kind of a save lock on tables supporting named or active views, or it may throw exceptions if any of the underlying tables change definition or become deleted.
November 5, 2022 at 9:06 am #124459

Josh Stern
Moderator

Tangentially, on the the db topic, not visible to developers, I think this idea is potentially interesting:

We can imagine our raw disk space (pick your definition of raw to suit) as a set of pages of a fixed size (may be more than 1 variety per db) and we can describe each string/blob/whatever as Page I, Starting at J, Length K. The set of triples that doesn’t fit in 64 bits is a small minority of use cases. So in the header of any table the hidden implementation can say whether it is using some version of that system or a more traditional model. One of the advantages of this concept is that the numbers are invisible outside so the db is free to optimize locations according to common access patterns of data that is requested at the same time – that is potentially a big win for some use cases. In general, it should lead to less disk fragmenetation as well even prior to moving things around.
November 5, 2022 at 2:41 pm #124460

Josh Stern
Moderator

I’m in the camp that believes UTF-8 should be the default encoding of most names & text strings.
November 5, 2022 at 11:08 pm #124463

Josh Stern
Moderator

Thoughts about Daemons:

You can list advantages & disadvantages for external daemons & ones which expect to run from within the Db+ host server environment. So you probably want to create both with similar syntax/concepts.

Daemons can have a similar list of states: off,working,sleeping and off can be divided into might wake & off until further config changes.

Signals/slots is a powerful messaging paradigm that should be supported for daemons, but the actual work of messaging should be managed within overall time/slicing mgmt. of the system.

Conceptually, a deluxe set of priorities would allow different priorities to be specified for checking for state change, and running/working
The priority might be an integer that is returned by a logically complex function applied to a given absolute time slice. The scheduling system can then add up all the positive votes for a given time slice & apportion effort.

The effort for checking for state change is applied to checking if there are any messages in the signaling queue for the given daemon.
Author

Posts

You must be logged in to reply to this topic.

Personal Notes

Personal Notes for Friends

Design Features for DB+ Programming