CacheIO
Handles high-performance data persistence using Apache Parquet.
This class provides methods to save and load complex data structures like image hash maps or pandas DataFrames. It optimizes I/O performance and ensures data integrity across different operations.
Attributes:
| Name | Type | Description |
|---|---|---|
SUFFIX |
str
|
The standard file extension for cache files (.parquet). |
settings |
AppSettings
|
Global configuration instance. |
logger |
Logger
|
Logger instance for tracking I/O operations. |
Source code in tools/cache.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 | |
__init__(settings)
Initializes CacheIO with the provided application settings.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
settings
|
AppSettings
|
Application configuration for paths and logging. |
required |
Source code in tools/cache.py
27 28 29 30 31 32 33 34 35 36 37 38 39 | |
generate_cache_filename(source_path, cache_name, **kwargs)
classmethod
Generates a unique, versioned filename for the cache.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source_path
|
Path
|
The directory path being processed. |
required |
cache_name
|
Optional[Union[str, Path]]
|
A custom name for the file. |
required |
**kwargs
|
dict
|
Key-value pairs to include in the versioning (e.g., core_size=16). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
A stable and unique filename string. |
Source code in tools/cache.py
111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 | |
load(cache_file)
Loads data from a parquet cache file into a DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cache_file
|
Path
|
The path to the .parquet file. |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
pd.DataFrame: The loaded data or an empty DataFrame if the file is missing or corrupted. |
Source code in tools/cache.py
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 | |
save(data_map, cache_file)
Saves a dictionary of hashes or a pandas DataFrame to a parquet file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data_map
|
Union[Dict[Path, ndarray], DataFrame]
|
Data to store. |
required |
cache_file
|
Path
|
Target path for the cache file. |
required |
Raises:
| Type | Description |
|---|---|
TypeError
|
If the data_map is not a dictionary or a DataFrame. |
Source code in tools/cache.py
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 | |