SaveState
pip install savestate
Documentation: https://mrthearman.github.io/savestate/
Source Code: https://github.com/MrThearMan/savestate/
Contributing: https://github.com/MrThearMan/savestate/blob/main/CONTRIBUTING.md
SaveState is a cross-platform fast file storage for arbitrary python objects. It's similar to python's builtin shelve module, but aims to be more performant on Windows while being cross-platform compatible.
Savestate is inspired by semidbm2, with a more modern interface. mapping-like functions, a context manager, and support for arbitrary python objects.
Implementation details
- Pure python
- No requirements or dependencies
- A dict-like interface (no unions)
- Same, single file on Windows and Linux (unlike shelve)
- Key and value integrity can be evaluated with a checksum, which will detect data corruption on key access.
- Recovery from missing bytes at the end of the file, or small amounts of corrupted data in the middle
- Both values AND keys put in savestate must support pickling. Note the security implications of this!
- This means that you can use arbitrary objects as keys if they support pickle (unlike shelve)
- All the keys of the savestate are kept in memory, which limits the savestate size (not a problem for most applications)
- NOT Thread safe, so cannot be accessed by multiple processes
- File is append-only, so the more non-read operations you do, the more the file size is going to balloon
- However, you can compact the savestate, usually on savestate.close(), which will replace the savestate with a new file with only the current non-deleted data. This will impact performance a little, but not by much
Performance
- About 50-60% of the performance of shelve with gdbm (linux), but >5000% compared to shelve with dumbdbm (windows) (>20000% for deletes!)
- Performance is more favorable with large keys and values when compared to gdbm, but gdbm is still faster on subsequent reads/writes thanks to its caching
- A dbm-mode for about double the speed of regular mode, but only string-type keys and values
- This is about 25-30% of the performance of gdbm on its own.
- Note: Values will be returned in bytes form!
Source code includes a benchmark that you can run to get more accurate performance on your specific machine.