namedtuple row factory for sqlite3 #57508

ncoghlan · 2011-10-31T00:32:25Z

BPO	13299
Nosy	@rhettinger, @ncoghlan, @merwok, @akheron, @serhiy-storchaka, @dlenski, @YoSTEALTH, @erlend-aasland
Files	issue_13299.patch issue_13299.1.patch issue_13299.2.patch sqlite_namedtuplerow.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2011-10-31.00:32:25.001>
labels = ['type-feature']
title = 'namedtuple row factory for sqlite3'
updated_at = <Date 2021-08-05.07:15:53.393>
user = 'https://github.com/ncoghlan'

bugs.python.org fields:

activity = <Date 2021-08-05.07:15:53.393>
actor = 'intellimath'
assignee = 'ghaering'
closed = False
closed_date = None
closer = None
components = []
creation = <Date 2011-10-31.00:32:25.001>
creator = 'ncoghlan'
dependencies = []
files = ['26909', '26945', '26946', '37673']
hgrepos = []
issue_num = 13299
keywords = ['patch']
message_count = 19.0
messages = ['146670', '146738', '147474', '168617', '168748', '168761', '168762', '168763', '218657', '218668', '223704', '223725', '224702', '224707', '233839', '234424', '394912', '395065', '398970']
nosy_count = 11.0
nosy_names = ['rhettinger', 'ghaering', 'ncoghlan', 'eric.araujo', 'petri.lehtinen', 'serhiy.storchaka', 'dlenski', 'Russell.Sim', 'YoSTEALTH', 'erlendaasland', 'intellimath']
pr_nums = []
priority = 'normal'
resolution = None
stage = 'patch review'
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue13299'
versions = ['Python 3.5']

ncoghlan · 2011-10-31T00:32:25Z

Currently, sqlite3 allows rows to be easily returned as ordinary tuples (default) or sqlite3.Row objects (which allow dict-style access).

collections.namedtuple provides a much nicer interface than sqlite3.Row for accessing ordered data which uses valid Python identifiers for field names, and can also tolerate field names which are *not* valid identifiers.

It would be convenient if sqlite3 provided a row factory along the lines of the one posted here:
http://peter-hoffmann.com/2010/python-sqlite-namedtuple-factory.html

(except with smarter caching on the named tuples)

rhettinger · 2011-10-31T19:44:37Z

+1

merwok · 2011-11-12T11:20:06Z

collections.namedtuple provides a much nicer interface than sqlite3.Row

Definitely!

RussellSim · 2012-08-20T03:20:31Z

Hi,

Here is an implementation using lru_cache to prevent regeneration of the named tuple each time.

Cheers,
Russell

rhettinger · 2012-08-21T07:20:32Z

Caching based on the cursor going to be problematic because a single cursor can be used multiple times with different descriptions:

   c = conn.cursor()
   c.execute('select symbol from stocks')
   print c.description
   c.execute('select price from stocks')
   print c.description     # same cursor, different layout, needs a new named tuple

It might make more sense to cache the namedtuple() factory itself:

   sql_namedtuple = lru_cache(maxsize=20)(namedtuple)

Also, the example in the docs is too lengthy and indirect. Cut-out the step for creating an populating the database -- just assume db created in the example at the top of the page:

For example::

    >>> conn.row_factory = sqlite3.NamedTupleRow
    >>> c = conn.cursor()
    >>> for record in c.execute('select * from stocks'):
            print record

    Row(date='2006-01-05', trans='BUY', symbol='RHAT', qty=100.0, price=35.14)
    Row(date='2006-01-05', trans='BUY', symbol='RHAT', qty=100, price=35.14)
    Row(date='2006-03-28', trans='BUY', symbol='IBM', qty=1000, price-45.0)

No need to go into a further lesson on how to use named tuples.

Also, the patch uses star-unpacking: _namedtuple_row(cursor)(*row)

Instead, it should use _make: _namedtuple_row(cursor)._make(row)

(u'2006-04-05', u'BUY', u'MSFT', 1000, 72.0)

RussellSim · 2012-08-21T12:16:24Z

Raymond, Thanks for the comprehensive feedback! It's fantastic! I have updated the patch with most of you feedback... but there was one part that I couldn't follow entirely. I am now using the _make method but I have had to use star unpacking to allow the method to be cached, lru_cache won't allow a key to be a list because they aren't hash-able.

ncoghlan · 2012-08-21T12:35:55Z

You should be able to just use "tuple(col[0] for col in cursor.description)" instead of the current list comprehension in order to make the argument hashable.

RussellSim · 2012-08-21T12:49:56Z

Nick, Thanks for the tip. I have removed the star unpacking.

glangford · 2014-05-16T12:32:42Z

In abstract, I like the namedtuple interface for sqlite3 as well. One caution is that the approach suggested at

http://peter-hoffmann.com/2010/python-sqlite-namedtuple-factory.html

can have a dramatic impact on performance. For one DB-intensive application, I experienced 20+ seconds run time with the row factory (under 3.4), versus sub second without (identified with cProfile). Many thousands of calls to namedtuple_factory were not good. :)

glangford · 2014-05-16T15:19:15Z

...if I understand the proposed caching scheme, then repeated executions of the query

SELECT a,b,c FROM table

would result in cache hits, since the column names remain the same. I'm guessing this would resolve the performance problem in the app I saw, but it would be good to verify that performance is broadly similar with/without named tuples.

BreamoreBoy · 2014-07-22T22:11:39Z

I'd like to see this in 3.5 as I often use sqlite so what needs doing here?

serhiy-storchaka · 2014-07-23T05:51:22Z

There is significant overhead. Microbenchmark results:

$ ./python -m timeit -s "import sqlite3; con = sqlite3.connect(':memory:')"  "con.execute('select 1 as a, 2 as b').fetchall()"
10000 loops, best of 3: 35.8 usec per loop

$ ./python -m timeit -s "import sqlite3; con = sqlite3.connect(':memory:'); con.row_factory = sqlite3.Row"  "con.execute('select 1 as a, 2 as b').fetchall()"
10000 loops, best of 3: 37.3 usec per loop

$ ./python -m timeit -s "import sqlite3; con = sqlite3.connect(':memory:'); con.row_factory = sqlite3.NamedTupleRow"  "con.execute('select 1 as a, 2 as b').fetchall()"
10000 loops, best of 3: 92.1 usec per loop

It would be easier to add __getattr__ to sqlite3.Row.

dlenski · 2014-08-04T09:28:05Z

Serhiy,
52 usec/loop doesn't seem like much overhead. This is not 52 usec per row fetched, but just 52 usec per cursor.execute(). An example where >1 row is fetched for each cursor would show this more clearly.

The advantage of namedtuple is that it's a very well-known interface to most Python programmers. Other db-api modules have taken a similar approach; psycopg2 has a dict-like cursor similar to Row, but has added NameTupleCursor in recent versions. (http://initd.org/psycopg/docs/extras.html#namedtuple-cursor)

serhiy-storchaka · 2014-08-04T10:37:41Z

Yes, above microbenchmarks measure the time of execute() + the time of fetching one row. Here is more precise microbenchmarks.

$ ./python -m timeit -s "import sqlite3; con = sqlite3.connect(':memory:'); con.execute('create table t (a, b)')" -s "for i in range(100): con.execute('insert into t values (1, 2)')" -- "con.execute('select * from t').fetchall()"
1000 loops, best of 3: 624 usec per loop

$ ./python -m timeit -s "import sqlite3; con = sqlite3.connect(':memory:'); con.row_factory = sqlite3.Row; con.execute('create table t (a, b)')" -s "for i in range(100): con.execute('insert into t values (1, 2)')" -- "con.execute('select * from t').fetchall()"
1000 loops, best of 3: 915 usec per loop

$ ./python -m timeit -s "import sqlite3; con = sqlite3.connect(':memory:'); con.row_factory = sqlite3.NamedTupleRow; con.execute('create table t (a, b)')" -s "for i in range(100): con.execute('insert into t values (1, 2)')" -- "con.execute('select * from t').fetchall()"
100 loops, best of 3: 6.21 msec per loop

Here sqlite3.Row is about 1.5 times slower than tuple, but sqlite3.NamedTupleRow is about 7 times slower than sqlite3.Row.

With C implementation of lru_cache() (bpo-14373) the result is much better:

100 loops, best of 3: 3.16 msec per loop

And it will be even more faster (up to 1.7x) when add to the Cursor class a method which returns a tuple of field names.

serhiy-storchaka · 2015-01-11T08:05:10Z

Here is faster implementation.

$ ./python -m timeit -s "import sqlite3; con = sqlite3.connect(':memory:'); con.row_factory = sqlite3.NamedTupleRow; con.execute('create table t (a, b)')" -s "for i in range(100): con.execute('insert into t values (1, 2)')" -- "con.execute('select * from t').fetchall()"
100 loops, best of 3: 2.74 msec per loop

But it is still 3 times slower than sqlite3.Row.

YoSTEALTH · 2015-01-21T04:43:50Z

note: sqlite_namedtuplerow.patch _cache method conflicts with attached database with say common table.column name like "id"

Using namedtuple method over sqlite3.Row was a terrible idea for me. I thought namedtuple is like tuple so should be faster then dict! wrong. I wasted 2 days change my work to namedtuple and back to sqlite3.Row, the speed difference on my working project was:

namedtuple 0.035s/result
sqlite3.Rows 0.0019s/result

for(speed test) range: 10000
namedtuple 17.3s
sqlite3.Rows 0.4s

My solution was to use sqlite3.Row (for speed) but to get named like usage by convert dict keys() with setattr names:

class dict2named(dict):
    def __init__(self, *args, **kwargs):
        super(dict2named, self).__init__(*args, **kwargs)
        self.__dict__ = self

Usage:

for i in con.execute('SELECT * FROM table'):
    yield dict2named(i)

Now i can use:

print(i.title)

and handy dict methods for dash column names:

print(i['my-title'])
print(i.get('my-title', 'boo'))

Now working project speed:
sqlite3.Rows 0.0020s/result

for(speed test) range: 10000
sqlite3.Rows 0.8s with dict2named converting

This i can work with, tiny compromise in speed with better usage.

erlend-aasland · 2021-06-02T14:18:26Z

See also bpo-39170

rhettinger · 2021-06-04T04:33:57Z

FWIW, namedtuple speed improved considerably since these posts were made. When I last checked, their lookup speed was about the same as a dict lookup.

See: https://docs.python.org/3/whatsnew/3.9.html#optimizations

intellimath · 2021-08-05T07:15:53Z

Instead of using cache, maybe better to use mutable default argument?

For example:

def make_row_factory(cls_factory, **kw):
    def row_factory(cursor, row, cls=[None]):
        rf = cls[0]
        if rf is None:
            fields = [col[0] for col in cursor.description]
            cls[0] = cls_factory("Row", fields, **kw)
            return cls[0](*row)
        return rf(*row)
    return row_factory

namedtuple_row_factory = make_row_factory(namedtuple)

Seem it should add less overhead.

erlend-aasland · 2022-07-17T21:03:40Z

FTR, here's a benchmark using @serhiy-storchaka's approach w/LRU cache:

Lib/sqlite3/dbapi2.py patch

diff --git a/Lib/sqlite3/dbapi2.py b/Lib/sqlite3/dbapi2.py
index 3b6d2f7ba2..216895c53a 100644
--- a/Lib/sqlite3/dbapi2.py
+++ b/Lib/sqlite3/dbapi2.py
@@ -89,6 +89,19 @@ def convert_timestamp(val):
 
 del(register_adapters_and_converters)
 
+
+import functools
+@functools.lru_cache(maxsize=128)
+def _make_class(fields):
+    return collections.namedtuple("Row", fields)
+
+def NamedTupleRow(cursor, row):
+    """Return a namedtuple row factory for connection objects."""
+    fields = tuple([c[0] for c in cursor.description])
+    cls = _make_class(fields)
+    return cls._make(row)
+
+
 def __getattr__(name):
     if name in _deprecated_names:
         from warnings import warn

bench-row.py

import pyperf

from textwrap import dedent


SETUP_DEF=dedent("""
    import sqlite3
    cx = sqlite3.connect(":memory:")
""")
SETUP_ROW=dedent("""
    import sqlite3
    cx = sqlite3.connect(":memory:")
    cx.row_factory = sqlite3.Row
""")
SETUP_NAMED=dedent("""
    import sqlite3
    cx = sqlite3.connect(":memory:")
    cx.row_factory = sqlite3.NamedTupleRow
""")
STMT=dedent("""
    cx.execute("select 1 as 'a', 2 as 'b'").fetchall()
""")

runner = pyperf.Runner()
runner.timeit(name="default", stmt=STMT, setup=SETUP_DEF)
runner.timeit(name="row", stmt=STMT, setup=SETUP_ROW)
runner.timeit(name="named", stmt=STMT, setup=SETUP_NAMED)

$ python ../cpython-stuff/bench-row.py
.....................
default: Mean +- std dev: 2.48 us +- 0.02 us
.....................
row: Mean +- std dev: 2.60 us +- 0.02 us
.....................
named: Mean +- std dev: 3.71 us +- 0.03 us

Following up Serhiy's suggestion to add a cursor.fieldnames() method shows a small improvement:

$ python ../cpython-stuff/bench-row.py
.....................
default: Mean +- std dev: 2.48 us +- 0.02 us
.....................
row: Mean +- std dev: 2.60 us +- 0.02 us
.....................
named: Mean +- std dev: 3.26 us +- 0.05 us

merwok · 2022-07-17T22:46:45Z

These years, I would prefer a dataclass conversion rather than a named tuple.

erlend-aasland · 2022-07-18T18:52:42Z

These years, I would prefer a dataclass conversion rather than a named tuple.

I'm curious why?

IMO, named tuples are a better fit. A dataclass factory ~~would be considerably slower than using named tuples~~¹, and it does not offer subscripting nor slicing out of the box.

benchmark

$ python ../cpython-stuff/bench-row.py    
.....................
default: Mean +- std dev: 1.35 us +- 0.02 us
.....................
row: Mean +- std dev: 1.44 us +- 0.03 us
.....................
namedtuple: Mean +- std dev: 2.00 us +- 0.04 us
.....................
dataclass: Mean +- std dev: 1.92 us +- 0.08 us

I re-ran the benchmark with an optimised build, and now it performed exactly like the namedtuple factory. ↩

merwok · 2022-07-18T22:52:14Z

I don’t like using named tuples after all! I prefer having attribute access only, it’s more explicit than tuple positions.
Nowadays I see named tuples as a great thing to add attributes to some data structure that used to be a tuple only, but as a tool to define new lightweight data classes I prefer the dataclasses module.

erlend-aasland · 2022-07-19T22:48:48Z

More importantly; is it worth it to pursue this issue further? Is the performance acceptable?

merwok · 2022-07-19T23:17:28Z

I think that named tuples still make sense for a row factory. But I am the wrong person to evaluate performance numbers.

erlend-aasland · 2022-07-20T18:37:38Z

An alternative approach is simply to add attribute access to sqlite3.Row.

erlend-aasland · 2022-11-15T11:58:20Z

FYI, I've added a namedtuple row factory as an example in the proposed "row factory how-to" in gh-99507. Perhaps it is suffictient to keep this as an example/recipe in the docs instead of adding it as a new feature.

erlend-aasland · 2022-11-25T13:48:17Z

The sqlite3 docs now include a "row factory how-to" that includes a recipe for a simple namedtuple row factory (though without the class creation cache optimisation). I'm not sure it is worth it to add the namedtuple row factory as a feature, thought; some users want a named tuple, some want a dataclass, some want something else.

In the light of the docs update, I now suggest to close this feature request.

ncoghlan added the type-feature A feature request or enhancement label Oct 31, 2011

ghaering mannequin self-assigned this Jan 11, 2015

ezio-melotti transferred this issue from another repository Apr 10, 2022

erlend-aasland added the topic-sqlite3 label May 16, 2022

erlend-aasland mentioned this issue Jul 17, 2022

Sqlite3 row_factory for attribute access: NamedRow #83351

Closed

erlend-aasland mentioned this issue Aug 19, 2022

gh-96121: Merge sqlite3.Row examples #96122

Merged

erlend-aasland added the pending The issue will be closed if no feedback is provided label Nov 25, 2022

erlend-aasland closed this as completed Nov 27, 2022

erlend-aasland removed the pending The issue will be closed if no feedback is provided label Nov 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

namedtuple row factory for sqlite3 #57508

namedtuple row factory for sqlite3 #57508

ncoghlan commented Oct 31, 2011

ncoghlan commented Oct 31, 2011

rhettinger commented Oct 31, 2011

merwok commented Nov 12, 2011 •

edited

Loading

RussellSim mannequin commented Aug 20, 2012

rhettinger commented Aug 21, 2012

RussellSim mannequin commented Aug 21, 2012

ncoghlan commented Aug 21, 2012

RussellSim mannequin commented Aug 21, 2012

glangford mannequin commented May 16, 2014

glangford mannequin commented May 16, 2014

BreamoreBoy mannequin commented Jul 22, 2014

serhiy-storchaka commented Jul 23, 2014

dlenski mannequin commented Aug 4, 2014

serhiy-storchaka commented Aug 4, 2014

serhiy-storchaka commented Jan 11, 2015

YoSTEALTH mannequin commented Jan 21, 2015

erlend-aasland commented Jun 2, 2021

rhettinger commented Jun 4, 2021

intellimath mannequin commented Aug 5, 2021

erlend-aasland commented Jul 17, 2022 •

edited

Loading

merwok commented Jul 17, 2022

erlend-aasland commented Jul 18, 2022 •

edited

Loading

merwok commented Jul 18, 2022 •

edited

Loading

erlend-aasland commented Jul 19, 2022

merwok commented Jul 19, 2022

erlend-aasland commented Jul 20, 2022

erlend-aasland commented Nov 15, 2022

erlend-aasland commented Nov 25, 2022

namedtuple row factory for sqlite3 #57508

namedtuple row factory for sqlite3 #57508

Comments

ncoghlan commented Oct 31, 2011

ncoghlan commented Oct 31, 2011

rhettinger commented Oct 31, 2011

merwok commented Nov 12, 2011 • edited Loading

RussellSim mannequin commented Aug 20, 2012

rhettinger commented Aug 21, 2012

RussellSim mannequin commented Aug 21, 2012

ncoghlan commented Aug 21, 2012

RussellSim mannequin commented Aug 21, 2012

glangford mannequin commented May 16, 2014

glangford mannequin commented May 16, 2014

BreamoreBoy mannequin commented Jul 22, 2014

serhiy-storchaka commented Jul 23, 2014

dlenski mannequin commented Aug 4, 2014

serhiy-storchaka commented Aug 4, 2014

serhiy-storchaka commented Jan 11, 2015

YoSTEALTH mannequin commented Jan 21, 2015

erlend-aasland commented Jun 2, 2021

rhettinger commented Jun 4, 2021

intellimath mannequin commented Aug 5, 2021

erlend-aasland commented Jul 17, 2022 • edited Loading

merwok commented Jul 17, 2022

erlend-aasland commented Jul 18, 2022 • edited Loading

Footnotes

merwok commented Jul 18, 2022 • edited Loading

erlend-aasland commented Jul 19, 2022

merwok commented Jul 19, 2022

erlend-aasland commented Jul 20, 2022

erlend-aasland commented Nov 15, 2022

erlend-aasland commented Nov 25, 2022

merwok commented Nov 12, 2011 •

edited

Loading

erlend-aasland commented Jul 17, 2022 •

edited

Loading

erlend-aasland commented Jul 18, 2022 •

edited

Loading

merwok commented Jul 18, 2022 •

edited

Loading