Fair digital red tape

2025-07-03

This is a follow-up on fair digital interaction. Selective sharing is really an attractive idea to me. The main question is how do these structured queries on end-user data look from the user's perspective?

Forms

Yes, like these paper forms one has to fill in when interacting with the government. Except digital. The concept is really powerful if you think about it. The form contains only the necessary information to make a decision. Exactly what we need for selective sharing.

The flow may look like this: the requestor constructs a query, and signs it with their identity key. The receiving party that has the access to the clear text data receives a request, verifies the signature via a public registry of public keys, perhaps something similar to certificate transparency logs, and then if the key matches... what do they see?

The trick is that the owner of data has to see the result of the query or a sequence of queries. That can be presented as a pre-filled form! For instance, a company wants to run a survey to collect information about the users to improve their experience - yes, the real improvement, not the "we improve user experience by doing overreaching behavioral analysis in background". The pre-filled form could just show something like "feature X accessed N times", "median response of UI components is Y milliseconds". If the query results look good, the user just clicks the "approve" button and signs the result of a query, perhaps encrypting it for the requester for exclusive access.

What I like about this approach is that the response states that it's correct to the best of correctness of the data stored on the end-user device, unless it was tampered with, which is unlikely and becomes a complex reverse-engineering process.

Placeholders

Certain queries may require knowing some information in advance, and in end-to-end encrypted context that becomes hard. For example, if the financial advisor wants to check the net worth and asset distribution of an individual, they may construct a query with placeholders that reflect the conceptual meaning of the data. For example, the query could look like "show me the balances of all current accounts", and the current accounts are set to be a placeholder for account[]. The user sees another form, which requires entering the appropriate accounts, before the computed resulting form is displayed for further approval.

In this financial advisor accessing data example transactions don't have to be accessed individually - advisors don't really need that information. They need the aggregates. This aggregation happens on the data owner's equipment before sharing.

Of course knowing the schema is also necessary. This is quite easy to have in a purposefully-built app, but requires some public schema registry in case of a more generic infrastructure.

Simplifying the UX

As with physical forms, the requestor can just pick from a set of predefined digital forms, say a form F-1042 that is mapped to a query with a specific fingerprint. That way, some forms can be pre-approved by the service provider, which will keep the registry of the commonly used forms. This also has a nice benefit of reducing the need to do query analysis to prevent abuse of information extraction. A "canned" query in the form of a form from a verified requestor goes through a simpler approval process. Fun?

For the data owner, they could approve certain forms for a regular data collection, perhaps with an expiration time or request count, after which an explicit approval is required. The data owner gets full control over the data and how it's being shared.

It's not more complex than an interaction via email. The requestor can use canned requests, the recipient can respond asynchronously. The response can be a simple "sounds good" like for a yes or no form, or require filling the placeholders, akin to a more detailed response.

This simplified interaction model addresses the user experience challenges, but broader infrastructer implementation questions remain.

Is it even viable?

I think so. While the control over information is really good, it may be quite complicated to implement and scale. Some data collection requires connecting multiple data sources, which may require a shared infrastructure. The infrastructure itself becomes complex due to that and other features. The user may need to log in into multiple apps to approve the queries, until the compiled result is sent to the requestor. To add more analogies, that can be something like a checklist of approvals, with display of completed steps for both parties.

While technical complexity can be addressed, the user-facing side is harder. I think in my model the user has to be educated on basics of data ownership anyway. It's like owning a car - one has to know the basics to operate it. Same applies in this context. The UX can be simplified as much as possible too, with complex functionality hidden behind forms and buttons.

I never thought I would think about forms as a good thing. But hey, they definitely have properties that are fitting well into a fair system of digital interaction.