-
Notifications
You must be signed in to change notification settings - Fork 7
Bug: Resource Leak in CSVHandler (Unclosed File Handles) #53
Copy link
Copy link
Open
Description
In csv_handler.py, the _read_streaming and _header methods use PyArrow's pa_csv.open_csv by passing a string path. When count_rows=False (the default!), the file stream is read for schema retrieval and then abandoned without being exhausted. This can lead to file descriptor exhaustion in batch processing environments where many CSVs are analyzed sequentially without exiting the process.
Affected Location:
src/croissant_maker/handlers/csv_handler.py
@staticmethod
def _read_streaming(file_path: Path, convert_options, count_rows: bool = False):
try:
reader = pa_csv.open_csv(str(file_path), convert_options=convert_options)
except UnicodeDecodeError as exc:
# ...
# If count_rows is False, we return without exhausting the reader!
if count_rows:
# exhausted
else:
num_rows = None # Leaves file open in memoryReactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels