SaveState

Coverage Status GitHub Workflow Status PyPI GitHub GitHub Last Commit GitHub Issues Python Version

pip install savestate

Documentation: https://mrthearman.github.io/savestate/

Source Code: https://github.com/MrThearMan/savestate/

Contributing: https://github.com/MrThearMan/savestate/blob/main/CONTRIBUTING.md


SaveState is a cross-platform fast file storage for arbitrary python objects. It's similar to python's builtin shelve module, but aims to be more performant on Windows while being cross-platform compatible.

Savestate is inspired by semidbm2, with a more modern interface. mapping-like functions, a context manager, and support for arbitrary python objects.

Implementation details

  • Pure python
  • No requirements or dependencies
  • A dict-like interface (no unions)
  • Same, single file on Windows and Linux (unlike shelve)
  • Key and value integrity can be evaluated with a checksum, which will detect data corruption on key access.
  • Recovery from missing bytes at the end of the file, or small amounts of corrupted data in the middle
  • Both values AND keys put in savestate must support pickling. Note the security implications of this!
  • This means that you can use arbitrary objects as keys if they support pickle (unlike shelve)
  • All the keys of the savestate are kept in memory, which limits the savestate size (not a problem for most applications)
  • NOT Thread safe, so cannot be accessed by multiple processes
  • File is append-only, so the more non-read operations you do, the more the file size is going to balloon
  • However, you can compact the savestate, usually on savestate.close(), which will replace the savestate with a new file with only the current non-deleted data. This will impact performance a little, but not by much

Performance

  • About 50-60% of the performance of shelve with gdbm (linux), but >5000% compared to shelve with dumbdbm (windows) (>20000% for deletes!)
  • Performance is more favorable with large keys and values when compared to gdbm, but gdbm is still faster on subsequent reads/writes thanks to its caching
  • A dbm-mode for about double the speed of regular mode, but only string-type keys and values
  • This is about 25-30% of the performance of gdbm on its own.
  • Note: Values will be returned in bytes form!

Source code includes a benchmark that you can run to get more accurate performance on your specific machine.