Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame to_dict method should also provide orient parameter (like to_json) #7840

Closed
femtotrader opened this issue Jul 25, 2014 · 8 comments
Labels
API Design Enhancement Error Reporting Incorrect or improved errors from pandas Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@femtotrader
Copy link

Hello,

it will be nice if to_dict method could provide same orient parameter as to_json.
For example when outtype='split' we get same results as outtype='series'.

I also noticed that df.to_dict(outtype='split1234') is understood as df.to_dict(outtype='series') which is quite strange but df.to_dict(outtype='a1234') raises ValueError: outtype a1234 not understood which is a correct behavior

Kind regards

Femto

@femtotrader femtotrader changed the title DataFrame to_dict method should also provide "orient" parameter (like to_json) DataFrame to_dict method should also provide orient parameter (like to_json) Jul 25, 2014
@jreback
Copy link
Contributor

jreback commented Jul 25, 2014

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_dict.html?highlight=to_dict#pandas.DataFrame.to_dict

can you show what kind of mapping you would expect here?

care to submit a pull-request for this?

@jreback jreback added this to the 0.15.0 milestone Jul 25, 2014
@jreback jreback changed the title DataFrame to_dict method should also provide orient parameter (like to_json) DataFrame to_dict method should also provide orient parameter (like to_json) Jul 25, 2014
@femtotrader
Copy link
Author

I'm looking for orient='split'.
Not sure that my code can be clean enough to make a PR.
Moreover I wonder if a refactoring between to_json and to_dict is not necessary.

@jreback
Copy link
Contributor

jreback commented Jul 25, 2014

not sure what you are asking? isn't `outtype='series`` what you are asking?

maybe show an example?

I suppose the API is different between to_json and to_dict they have different purposes. What are you proposing?

@femtotrader
Copy link
Author

See

In [44]: df
Out[44]:
   c1
0   1
1   2
2   4

In [45]: df.to_json(orient="split")
Out[45]: '{"columns":["c1"],"index":[0,1,2],"data":[[1],[2],[4]]}'

In [46]: df.to_dict(outtype="series")
Out[46]:
{'c1': 0    1
 1    2
 2    4
 Name: c1, dtype: int64}

I'm just proposing to have the same option name: "orient" for both method (but we can have different parameter name... but same is more convenient)
I'm also proposing to provide orient=split

In [46]: json.loads(df.to_json(orient="split"))
Out[47]: {u'columns': [u'c1'], u'data': [[1], [2], [4]], u'index': [0, 1, 2]}

so it will be nice if we could have:

In [46]: json.to_dict(orient="split")
Out[47]: {u'columns': [u'c1'], u'data': [[1], [2], [4]], u'index': [0, 1, 2]}

@jreback
Copy link
Contributor

jreback commented Jul 25, 2014

ic.

ok, would then take a pull-request for this:

  • deprecate outtype, rename to orient, pretty straightforward (you deprecate if the outtype is explicity specified, accepting both till the deprecation is removed (in a future version). their are some decorators to do this in other parts of core/frame.py
  • add split option

@femtotrader
Copy link
Author

my problem is that I don't know how I can have my forked version and also stable version on the same computer.

@deprecate_kwarg(old_arg_name='outtype', new_arg_name='orient')
def to_dict(self, orient='dict'):
    """
    Convert DataFrame to dictionary.

    Parameters
    ----------
    orient : str {'dict', 'list', 'series', 'records', 'split'}
        Determines the type of the values of the dictionary. The
        default `dict` is a nested dictionary {column -> {index -> value}}.
        `list` returns {column -> list(values)}. `series` returns
        {column -> Series(values)}. `records` returns [{columns -> value}].
        `split` returns dict like {index -> [index], columns -> [columns], data -> [values]}.


    Returns
    -------
    result : dict like {column -> {index -> value}}
    """
    if not self.columns.is_unique:
        warnings.warn("DataFrame columns are not unique, some "
                      "columns will be omitted.", UserWarning)
    if outtype.lower().startswith('d') or orient=='dict':
        return dict((k, v.to_dict()) for k, v in compat.iteritems(self))
    elif outtype.lower().startswith('l') or orient=='list':
        return dict((k, v.tolist()) for k, v in compat.iteritems(self))
    elif orient=='split':
        return {'columns': self.columns, 'index': self.index, 'data': self.values.tolist()}
    elif outtype.lower().startswith('s') or orient=='series':
        return dict((k, v) for k, v in compat.iteritems(self))
    elif outtype.lower().startswith('r') or orient=='records':
        return [dict((k, v) for k, v in zip(self.columns, row))
                for row in self.values]
    else:  # pragma: no cover
        raise ValueError("orient %s not understood" % orient)

@jreback
Copy link
Contributor

jreback commented Jul 25, 2014

easy enough, simply clone the repo complete instruction are here: https://github.com/pydata/pandas/wiki

tests first, then a code change.

@jreback
Copy link
Contributor

jreback commented Oct 6, 2014

closed by #8486

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Enhancement Error Reporting Incorrect or improved errors from pandas Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants