diff --git a/doc/data/fx_prices b/doc/data/fx_prices deleted file mode 100644 index 38cadf26909a3..0000000000000 Binary files a/doc/data/fx_prices and /dev/null differ diff --git a/doc/data/mindex_ex.csv b/doc/data/mindex_ex.csv deleted file mode 100644 index 935ff936cd842..0000000000000 --- a/doc/data/mindex_ex.csv +++ /dev/null @@ -1,16 +0,0 @@ -year,indiv,zit,xit -1977,"A",1.2,.6 -1977,"B",1.5,.5 -1977,"C",1.7,.8 -1978,"A",.2,.06 -1978,"B",.7,.2 -1978,"C",.8,.3 -1978,"D",.9,.5 -1978,"E",1.4,.9 -1979,"C",.2,.15 -1979,"D",.14,.05 -1979,"E",.5,.15 -1979,"F",1.2,.5 -1979,"G",3.4,1.9 -1979,"H",5.4,2.7 -1979,"I",6.4,1.2 diff --git a/doc/data/test.xls b/doc/data/test.xls deleted file mode 100644 index db0f9dec7d5e4..0000000000000 Binary files a/doc/data/test.xls and /dev/null differ diff --git a/doc/source/user_guide/io.rst b/doc/source/user_guide/io.rst index be761bb97f320..705861a3aa568 100644 --- a/doc/source/user_guide/io.rst +++ b/doc/source/user_guide/io.rst @@ -837,14 +837,11 @@ input text data into ``datetime`` objects. The simplest case is to just pass in ``parse_dates=True``: .. ipython:: python - :suppress: f = open("foo.csv", "w") f.write("date,A,B,C\n20090101,a,1,2\n20090102,b,3,4\n20090103,c,4,5") f.close() -.. ipython:: python - # Use a column as an index, and parse it as dates. df = pd.read_csv("foo.csv", index_col=0, parse_dates=True) df @@ -862,7 +859,6 @@ order) and the new column names will be the concatenation of the component column names: .. ipython:: python - :suppress: data = ( "KORD,19990127, 19:00:00, 18:56:00, 0.8100\n" @@ -876,9 +872,6 @@ column names: with open("tmp.csv", "w") as fh: fh.write(data) -.. ipython:: python - - print(open("tmp.csv").read()) df = pd.read_csv("tmp.csv", header=None, parse_dates=[[1, 2], [1, 3]]) df @@ -1058,19 +1051,20 @@ While US date formats tend to be MM/DD/YYYY, many international formats use DD/MM/YYYY instead. For convenience, a ``dayfirst`` keyword is provided: .. ipython:: python - :suppress: data = "date,value,cat\n1/6/2000,5,a\n2/6/2000,10,b\n3/6/2000,15,c" + print(data) with open("tmp.csv", "w") as fh: fh.write(data) -.. ipython:: python - - print(open("tmp.csv").read()) - pd.read_csv("tmp.csv", parse_dates=[0]) pd.read_csv("tmp.csv", dayfirst=True, parse_dates=[0]) +.. ipython:: python + :suppress: + + os.remove("tmp.csv") + Writing CSVs to binary file objects +++++++++++++++++++++++++++++++++++ @@ -1133,8 +1127,9 @@ For large numbers that have been written with a thousands separator, you can set the ``thousands`` keyword to a string of length 1 so that integers will be parsed correctly: +By default, numbers with a thousands separator will be parsed as strings: + .. ipython:: python - :suppress: data = ( "ID|level|category\n" @@ -1146,11 +1141,6 @@ correctly: with open("tmp.csv", "w") as fh: fh.write(data) -By default, numbers with a thousands separator will be parsed as strings: - -.. ipython:: python - - print(open("tmp.csv").read()) df = pd.read_csv("tmp.csv", sep="|") df @@ -1160,7 +1150,6 @@ The ``thousands`` keyword allows integers to be parsed correctly: .. ipython:: python - print(open("tmp.csv").read()) df = pd.read_csv("tmp.csv", sep="|", thousands=",") df @@ -1239,16 +1228,13 @@ as a ``Series``: ``read_csv`` instead. .. ipython:: python - :suppress: + :okwarning: data = "level\nPatient1,123000\nPatient2,23000\nPatient3,1234018" with open("tmp.csv", "w") as fh: fh.write(data) -.. ipython:: python - :okwarning: - print(open("tmp.csv").read()) output = pd.read_csv("tmp.csv", squeeze=True) @@ -1365,15 +1351,11 @@ The ``dialect`` keyword gives greater flexibility in specifying the file format. By default it uses the Excel dialect but you can specify either the dialect name or a :class:`python:csv.Dialect` instance. -.. ipython:: python - :suppress: - - data = "label1,label2,label3\n" 'index1,"a,c,e\n' "index2,b,d,f" - Suppose you had data with unenclosed quotes: .. ipython:: python + data = "label1,label2,label3\n" 'index1,"a,c,e\n' "index2,b,d,f" print(data) By default, ``read_csv`` uses the Excel dialect and treats the double quote as @@ -1449,8 +1431,9 @@ a different usage of the ``delimiter`` parameter: Can be used to specify the filler character of the fields if it is not spaces (e.g., '~'). +Consider a typical fixed-width data file: + .. ipython:: python - :suppress: f = open("bar.csv", "w") data1 = ( @@ -1463,12 +1446,6 @@ a different usage of the ``delimiter`` parameter: f.write(data1) f.close() -Consider a typical fixed-width data file: - -.. ipython:: python - - print(open("bar.csv").read()) - In order to parse this file into a ``DataFrame``, we simply need to supply the column specifications to the ``read_fwf`` function along with the file name: @@ -1523,19 +1500,15 @@ Indexes Files with an "implicit" index column +++++++++++++++++++++++++++++++++++++ -.. ipython:: python - :suppress: - - f = open("foo.csv", "w") - f.write("A,B,C\n20090101,a,1,2\n20090102,b,3,4\n20090103,c,4,5") - f.close() - Consider a file with one less entry in the header than the number of data column: .. ipython:: python - print(open("foo.csv").read()) + data = "A,B,C\n20090101,a,1,2\n20090102,b,3,4\n20090103,c,4,5" + print(data) + with open("foo.csv", "w") as f: + f.write(data) In this special case, ``read_csv`` assumes that the first column is to be used as the index of the ``DataFrame``: @@ -1567,7 +1540,10 @@ Suppose you have data indexed by two columns: .. ipython:: python - print(open("data/mindex_ex.csv").read()) + data = 'year,indiv,zit,xit\n1977,"A",1.2,.6\n1977,"B",1.5,.5' + print(data) + with open("mindex_ex.csv", mode="w") as f: + f.write(data) The ``index_col`` argument to ``read_csv`` can take a list of column numbers to turn multiple columns into a ``MultiIndex`` for the index of the @@ -1575,9 +1551,14 @@ returned object: .. ipython:: python - df = pd.read_csv("data/mindex_ex.csv", index_col=[0, 1]) + df = pd.read_csv("mindex_ex.csv", index_col=[0, 1]) df - df.loc[1978] + df.loc[1977] + +.. ipython:: python + :suppress: + + os.remove("mindex_ex.csv") .. _io.multi_index_columns: @@ -1601,16 +1582,12 @@ rows will skip the intervening rows. of multi-columns indices. .. ipython:: python - :suppress: data = ",a,a,a,b,c,c\n,q,r,s,t,u,v\none,1,2,3,4,5,6\ntwo,7,8,9,10,11,12" - fh = open("mi2.csv", "w") - fh.write(data) - fh.close() - -.. ipython:: python + print(data) + with open("mi2.csv", "w") as fh: + fh.write(data) - print(open("mi2.csv").read()) pd.read_csv("mi2.csv", header=[0, 1], index_col=0) Note: If an ``index_col`` is not specified (e.g. you don't have an index, or wrote it @@ -1632,16 +1609,16 @@ comma-separated) files, as pandas uses the :class:`python:csv.Sniffer` class of the csv module. For this, you have to specify ``sep=None``. .. ipython:: python - :suppress: df = pd.DataFrame(np.random.randn(10, 4)) - df.to_csv("tmp.sv", sep="|") - df.to_csv("tmp2.sv", sep=":") + df.to_csv("tmp.csv", sep="|") + df.to_csv("tmp2.csv", sep=":") + pd.read_csv("tmp2.csv", sep=None, engine="python") .. ipython:: python + :suppress: - print(open("tmp2.sv").read()) - pd.read_csv("tmp2.sv", sep=None, engine="python") + os.remove("tmp2.csv") .. _io.multiple_files: @@ -1662,8 +1639,9 @@ rather than reading the entire file into memory, such as the following: .. ipython:: python - print(open("tmp.sv").read()) - table = pd.read_csv("tmp.sv", sep="|") + df = pd.DataFrame(np.random.randn(10, 4)) + df.to_csv("tmp.csv", sep="|") + table = pd.read_csv("tmp.csv", sep="|") table @@ -1672,7 +1650,7 @@ value will be an iterable object of type ``TextFileReader``: .. ipython:: python - with pd.read_csv("tmp.sv", sep="|", chunksize=4) as reader: + with pd.read_csv("tmp.csv", sep="|", chunksize=4) as reader: reader for chunk in reader: print(chunk) @@ -1685,14 +1663,13 @@ Specifying ``iterator=True`` will also return the ``TextFileReader`` object: .. ipython:: python - with pd.read_csv("tmp.sv", sep="|", iterator=True) as reader: + with pd.read_csv("tmp.csv", sep="|", iterator=True) as reader: reader.get_chunk(5) .. ipython:: python :suppress: - os.remove("tmp.sv") - os.remove("tmp2.sv") + os.remove("tmp.csv") Specifying the parser engine '''''''''''''''''''''''''''' @@ -2594,27 +2571,38 @@ Read in the content of the file from the above URL and pass it to ``read_html`` as a string: .. ipython:: python - :suppress: - rel_path = os.path.join("..", "pandas", "tests", "io", "data", "html", - "banklist.html") - file_path = os.path.abspath(rel_path) + html_str = """ +
A | +B | +C | +
---|---|---|
a | +b | +c | +