File System¶
File system abstractions for data synchronization.
BaseFileSystem
dataclass
¶
BaseFileSystem()
partition ¶
partition(
size_bytes_limit=None,
object_count_limit=None,
raise_error_if_criteria_not_met=False,
)
Partitions the root tree folder structure into a list of nodes.
Partitioning is guided by constraints by size and object count.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
size_bytes_limit
|
Optional[int]
|
If specified, partitions must be less than the specified value. |
None
|
object_count_limit
|
Optional[int]
|
If specified, partitions must contain fewer objects than the specified value. |
None
|
raise_error_if_criteria_not_met
|
bool
|
If True, raises error if nodes cannot meet criteria. In actuality, this is more relevant for size limitations where an object size is greater than the size limit. |
False
|
Raises:
| Type | Description |
|---|---|
ValueError
|
Thrown if raise_error_if_criteria_not_met is true and criteria not met. |
Returns:
| Type | Description |
|---|---|
List[Node]
|
List of nodes representing the partition. |
Source code in src/aibs_informatics_aws_utils/data_sync/file_system.py
171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 | |
Node
dataclass
¶
Node(
path_part,
parent=None,
children=dict(),
size_bytes=0,
object_count=0,
last_modified=BEGINNING_OF_TIME,
is_path_part_prefix=False,
is_path_part_suffix=False,
)
Represents an object or folder in an file system path.
Attributes:
| Name | Type | Description |
|---|---|---|
path_part |
str
|
Specifies the key part of the fs path (an edge) to this node. |
parent |
Optional['Node']
|
Optionally specify the parent node to which this node is connected. By default, this is None. |
children |
Dict[str, 'Node']
|
Child nodes that exist under this path prefix. |
size_bytes |
int
|
The size (in bytes) of all objects under this path prefix. |
object_count |
int
|
The number of objects under this path prefix. |
last_modified |
datetime
|
The most recent date any objects under this prefix were last modified. |
S3FileSystem
dataclass
¶
S3FileSystem(bucket, key)
Bases: BaseFileSystem
Generates a FS tree structure of an S3 path with size and object count stats.
Attributes:
| Name | Type | Description |
|---|---|---|
bucket |
str
|
The S3 bucket to describe. |
key |
str
|
The S3 key to describe. |