SaveState

pip install savestate

Documentation: https://mrthearman.github.io/savestate/

Source Code: https://github.com/MrThearMan/savestate/

Contributing: https://github.com/MrThearMan/savestate/blob/main/CONTRIBUTING.md

SaveState is a cross-platform fast file storage for arbitrary python objects. It's similar to python's builtin shelve module, but aims to be more performant on Windows while being cross-platform compatible.

Savestate is inspired by semidbm2, with a more modern interface. mapping-like functions, a context manager, and support for arbitrary python objects.

Implementation details

Pure python
No requirements or dependencies
A dict-like interface (no unions)
Same, single file on Windows and Linux (unlike shelve)
Key and value integrity can be evaluated with a checksum, which will detect data corruption on key access.
Recovery from missing bytes at the end of the file, or small amounts of corrupted data in the middle
Both values AND keys put in savestate must support pickling. Note the security implications of this!
This means that you can use arbitrary objects as keys if they support pickle (unlike shelve)
All the keys of the savestate are kept in memory, which limits the savestate size (not a problem for most applications)
NOT Thread safe, so cannot be accessed by multiple processes
File is append-only, so the more non-read operations you do, the more the file size is going to balloon
However, you can compact the savestate, usually on savestate.close(), which will replace the savestate with a new file with only the current non-deleted data. This will impact performance a little, but not by much

Performance

About 50-60% of the performance of shelve with gdbm (linux), but >5000% compared to shelve with dumbdbm (windows) (>20000% for deletes!)
Performance is more favorable with large keys and values when compared to gdbm, but gdbm is still faster on subsequent reads/writes thanks to its caching
A dbm-mode for about double the speed of regular mode, but only string-type keys and values
This is about 25-30% of the performance of gdbm on its own.
Note: Values will be returned in bytes form!

Source code includes a benchmark that you can run to get more accurate performance on your specific machine.