Skip to content

Commit

Permalink
Updated documentation indicating default behaviour is to strip whites…
Browse files Browse the repository at this point in the history
…pace, and how to override. Enhances GH-issue-16950 pandas-dev#16950
  • Loading branch information
RonaldBarnes committed Nov 22, 2022
1 parent 025fbd0 commit 2bfa90a
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 4 deletions.
9 changes: 6 additions & 3 deletions doc/source/user_guide/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1366,8 +1366,10 @@ a different usage of the ``delimiter`` parameter:
* ``widths``: A list of field widths which can be used instead of 'colspecs'
if the intervals are contiguous.
* ``delimiter``: Characters to consider as filler characters in the fixed-width file.
Can be used to specify the filler character of the fields
if it is not spaces (e.g., '~').
Default is "`` \t``" (space and tab).
Used to specify the character(s) to strip from start and end of every field.
To preserve whitespace, set to a character that does not exist in the data,
i.e. "\0".

Consider a typical fixed-width data file:

Expand Down Expand Up @@ -1404,8 +1406,9 @@ column widths for contiguous columns:
df = pd.read_fwf("bar.csv", widths=widths, header=None)
df
The parser will take care of extra white spaces around the columns
The parser will take care of extra whitespace around the columns,
so it's ok to have extra separation between the columns in the file.
To preserve whitespace around the columns, see ``delimiter``.

By default, ``read_fwf`` will try to infer the file's ``colspecs`` by using the
first 100 rows of the file. It can do it only in cases when the columns are
Expand Down
6 changes: 5 additions & 1 deletion pandas/io/parsers/readers.py
Original file line number Diff line number Diff line change
Expand Up @@ -1231,6 +1231,7 @@ def read_fwf(
*,
colspecs: Sequence[tuple[int, int]] | str | None = "infer",
widths: Sequence[int] | None = None,
delimiter: str | None = " \t",
infer_nrows: int = 100,
**kwds,
) -> DataFrame | TextFileReader:
Expand All @@ -1251,7 +1252,7 @@ def read_fwf(
Valid URL schemes include http, ftp, s3, and file. For file URLs, a host is
expected. A local file could be:
``file://localhost/path/to/table.csv``.
colspecs : list of tuple (int, int) or 'infer'. optional
colspecs : list of tuple (int, int) or 'infer', optional
A list of tuples giving the extents of the fixed-width
fields of each line as half-open intervals (i.e., [from, to[ ).
String value 'infer' can be used to instruct the parser to try
Expand All @@ -1260,6 +1261,9 @@ def read_fwf(
widths : list of int, optional
A list of field widths which can be used instead of 'colspecs' if
the intervals are contiguous.
delimiter : str, default " \t" (space and tab), optional
Character(s) to strip from start and end of each field. To
preserve whitespace, must be non-default value (i.e. delimiter="\0").
infer_nrows : int, default 100
The number of rows to consider when letting the parser determine the
`colspecs`.
Expand Down

0 comments on commit 2bfa90a

Please sign in to comment.