Skip to content

BUG: read_csv pyarrow engine dtype handling with index_col/defaultdict#65930

Open
jbrockmendel wants to merge 2 commits into
pandas-dev:mainfrom
jbrockmendel:tst-pyarrow-dtypes
Open

BUG: read_csv pyarrow engine dtype handling with index_col/defaultdict#65930
jbrockmendel wants to merge 2 commits into
pandas-dev:mainfrom
jbrockmendel:tst-pyarrow-dtypes

Conversation

@jbrockmendel

Copy link
Copy Markdown
Member

Splits the two dtype-handling fixes out of GH-65859 into a focused PR. With engine="pyarrow", :func:read_csv:

  • raised AttributeError: type object 'str' has no attribute 'get' when a non-dict dtype was passed together with index_col. The per-index-column dtype handling now only runs for a dict dtype; a scalar dtype is applied to the whole frame as before.
  • ignored a defaultdict dtype's default factory for columns not explicitly listed — membership/iteration over a defaultdict doesn't trigger the default — so those columns kept pyarrow's inferred dtype (ENH: support defaultdict in read_csv dtype parameter #41574). The defaultdict is now materialized over all columns before conversion.

Un-xfails test_dtypes_defaultdict and test_dtypes_defaultdict_invalid, and adds test_dtype_scalar_with_index_col. The column-name/dedup fixes from GH-65859 (and the tests that depend on them, including the dup-column defaultdict case) stay in that PR.

  • Tests added and passed
  • All code checks passed
  • Added a whatsnew entry

🤖 Generated with Claude Code

@jbrockmendel jbrockmendel added Bug IO CSV read_csv, to_csv Arrow pyarrow functionality Dtype Conversions Unexpected or buggy dtype conversions labels Jun 21, 2026
Split the dtype fixes out of pandas-dev#65859: a non-dict ``dtype`` passed with
``index_col`` no longer raises AttributeError, and a ``defaultdict``
``dtype`` now applies its default to columns not explicitly listed
(GH#41574).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Arrow pyarrow functionality Bug Dtype Conversions Unexpected or buggy dtype conversions IO CSV read_csv, to_csv

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant