antonta's space
Following patterns wherever they emerge.
main posts

A database of the paranoid

Ever wondered what a database of a reasonably paranoid engineer looks like?

Database leaks are real

No need to cover that one I hope. There are sites out there on the Internet that clearly show that data breaches are real. Leaking usernames, emails and password hashes, and even public keys can give information to an attacker. Yes, public keys too, they can act as a verification oracle - the information to match the password combinations against. If you get an exact match, you successfully cracked the password. I'm not going to type why leaking emails is a bad thing.

So by being a little paranoid, and to protect myself from potential data breaches or leaks, both external and internal, I decided to go an extra step further and encrypt relevant fields in the database, the ones listed above specifically. Username lookups become a little tricky, however, but you can use a keyed hash for that. Don't forget to pad the information before encrypting where relevant too to prevent inferring the length of fields.

Service provider as a guardian

Yes, you might think that strong truly random passwords are hard to crack anyway, why go this extra mile introducing operational overhead? I think the basic assumption should be that not everyone uses a password manager, or tries to protect their identity online by using random usernames or something. Reuse of usernames, passwords and emails is quite common as is it easier for the user. The service provider should work under this assumption and protect these users by doing what they already know how to do - operate the service.

Insider threats are also real, especially when dealing with sensitive data. Protecting the database with field encryption just gives that extra access control mechanism. You can't just scan the database for usernames, you need some sort of a tool to access that and that tool may as well have access control built in.

Moreover, if you are using a third-party service for the database, such as serverless offerings, you open a door to another set of actors who can peek into your data. Plus it's exposed to the Internet typically, which is fun too. While building software for protecting users' data, I should definitely protect the data about the users too, right?

It's interesting that these practices are applied to storing credit card information, but somehow ignored for handling online identity information which, if exposed, can give access to potentially sensitive information, including financial data. Guess regulations are the driving force here, not common sense.

The real overhead

First, you don't need to worry about encrypting your database backups. The backups already contain encrypted entries. So that one is a clear win for field encryption. Then, you need tools to work with that data, but that actually is good to have anyway, to avoid running these raw SQL queries on a production database.

The overhead comes from batch queries being harder to process. You can also cover that with a tool. What you can't avoid, however, is key rotation. Oh, that's going to be a pain. It's possible to do once a year or something, and after data breaches perhaps. You can just write your code and SQL queries to support multiple keys simultaneously and the hardest part of the rotation will be re-encryption of entries, which can happen asynchronously.

Storing keys can be done in the same database in fact, just make sure they are encrypted by a key that is known to the server only. Should be good enough.

I'm yet to see how this experiment turns out in production.

On true paranoia

Now this approach does not protect from service providers trying to crack credentials of some user - the fields can be decrypted after all. I think if a service provider encrypts information related to credentials, they will unlikely to even attempt cracking them. Then, if both passkeys and passwords are used as key derivation on the client side, and the information about authentication mechanism is not stored on the server side, the ambiguity will just put off from doing these attempts.

I guess I'm paranoid, but reasonably so.