Make p-adic numbers unhashable #11895

mmasdeu · 2011-10-04T20:40:23Z

Only fixed modulus p-adic numbers should be hashable, all others can not implement a reasonable hash function. Currently some of them do which can cause horrible bugs:

sage: @cached_function
....: def is_one(x):
....:     return x==1

sage: R = Zp(3, 70)
sage: is_one(R(1,1))
True
sage: is_one(R(2^64))
True
sage: R(2^64)
1 + 2*3 + 2*3^2 + 3^3 + 3^4 + 2*3^5 + 3^7 + 2*3^8 + 3^10 + 2*3^11 + 2*3^13 + 3^14 + 2*3^16 + 3^18 + 3^19 + 2*3^20 + 3^21 + 3^23 + 2*3^25 + 3^26 + 2*3^27 + 2*3^28 + 3^29 + 2*3^30 + 2*3^31 + 2*3^34 + 2*3^35 + 2*3^36 + 3^37 + 3^38 + 3^39 + 3^40 + O(3^70)

The problem here is that == has been changed so that two numbers are equal if they are equal to the least common precision. In this example, the two elements are equal to precision one and they have the same hash value (at least on my machine).

The proposal of this ticket is to make p-adics unhashable (but cacheable through #16316). The modifications required for this to work are almost always related to functions which implemented manual caching through dictionaries. The long list of dependencies is mostly about rewriting constructor functions to go through UniqueFactory or CachedRepresentation which gets unhashable elements right as of #16317. (I believe that these changes are valuable refactorings anyway.)

I can imagine that some people will not like such a change. What are the alternatives?

Keep it the way it is. (I do not think that this is a good idea. If you do intensive p-adic computations then the birthday paradox will eventually hit you.)
Mark p-adics in a special way so their == is not used by cached_method and friends. (Not an option, because p-adics may be wrapped inside other objects which will still use their operator ==.)
Change == for p-adics to say whether two numbers are "really" equal. (Bad idea. Say you are in a p-adic field. Some algorithm wants to know whether x is invertible and says x==0. Now this will be false for most p-adic zeros.)
Make __hash__ return a constant. (This would work but will give horrible performance.)
…?

Depends on #11670
Depends on #15897
Depends on #15898
Depends on #15956
Depends on #16122
Depends on #16124
Depends on #16129
Depends on #16250
Depends on #16251
Depends on #16316
Depends on #16317
Depends on #16318
Depends on #16321
Depends on #16339
Depends on #16341
Depends on #16342

CC: @roed314 @simon-king-jena

Component: padics

Keywords: hash, days71

Stopgaps: todo

Work Issues: one pickle from the pickle jar does not unpickle correctly

Branch/Commit: u/saraedum/ticket/11895 @ 42afa4d

Issue created by migration from https://trac.sagemath.org/ticket/11895

The text was updated successfully, but these errors were encountered:

roed314 · 2011-11-08T06:11:02Z

comment:1

Apply 11895.patch

roed314 · 2011-11-09T00:59:37Z

Attachment: 11895.patch.gz

Creates a hash function for extension elements

loefflerd · 2012-03-10T11:34:21Z

comment:3

Can you clean up the formatting of the docstrings a bit? There's a general convention that long lines in docstrings (except doctests!) should be wrapped at 80 columns and trailing whitespace removed. Other than that this looks fine to me.

loefflerd · 2012-03-26T18:22:08Z

Work Issues: whitespace, line wrapping in docstrings

loefflerd · 2012-03-26T18:22:08Z

Author: David Roe

loefflerd · 2012-03-26T18:22:08Z

Reviewer: David Loeffler

mmasdeu · 2012-09-13T14:40:03Z

comment:5

When trying the input:

sage: R.<x> = PolynomialRing(QQ)
sage: Cp.<g> = Qp(7,100).extension(x^2-17)
sage: ECp = EllipticCurve('14a1').change_ring(Cp)
sage: xx = (6*g + 6) + 5*7 + (g + 3)*7^2 + (4*g + 1)*7^3 + (g + 5)*7^4 + (3*g + 3)*7^5 + 3*7^6 + (5*g + 2)*7^7 + (5*g + 6)*7^8 + 2*7^9 + 7^10 + 7^11 + 6*g*7^12 + (3*g + 2)*7^13 + (3*g + 2)*7^14 + 3*g*7^15 + 5*7^16 + (6*g + 3)*7^17 + 2*g*7^18 + (g + 6)*7^19 + 6*7^20 + (g + 6)*7^21 + 4*7^22 + (6*g + 6)*7^23 + (2*g + 4)*7^24 + (2*g + 6)*7^25 + 5*g*7^26 + (4*g + 1)*7^27 + (3*g + 1)*7^28 + (6*g + 4)*7^29 + (5*g + 5)*7^30 + (3*g + 6)*7^31 + 7^32 + (2*g + 3)*7^33 + (3*g + 5)*7^34 + (3*g + 2)*7^35 + (2*g + 1)*7^36 + g*7^37 + 6*g*7^38 + (5*g + 1)*7^39 + (2*g + 1)*7^40 + (3*g + 6)*7^41 + (4*g + 1)*7^42 + (3*g + 5)*7^43 + (5*g + 4)*7^44 + (4*g + 1)*7^45 + 3*7^46 + (2*g + 3)*7^47 + (5*g + 6)*7^48 + (5*g + 2)*7^49 + (g + 1)*7^50 + 5*g*7^51 + 3*7^52 + 2*7^53 + (5*g + 4)*7^54 + (3*g + 2)*7^55 + (4*g + 1)*7^56 + (6*g + 2)*7^57 + (6*g + 3)*7^58 + (3*g + 4)*7^60 + (4*g + 1)*7^61 + (2*g + 6)*7^63 + (3*g + 2)*7^64 + (4*g + 3)*7^65 + (g + 1)*7^66 + (2*g + 4)*7^68 + (2*g + 4)*7^69 + (5*g + 5)*7^70 + (5*g + 5)*7^71 + (3*g + 1)*7^72 + (3*g + 6)*7^73 + (2*g + 6)*7^74 + (4*g + 1)*7^75 + 5*7^76 + 2*g*7^77 + 3*g*7^78 + (5*g + 2)*7^79 + (5*g + 3)*7^80 + (2*g + 6)*7^81 + (3*g + 4)*7^82 + (5*g + 5)*7^83 + (2*g + 2)*7^84 + (g + 2)*7^85 + 6*7^86 + (2*g + 6)*7^87 + (2*g + 6)*7^88 + 3*g*7^89 + (5*g + 2)*7^90 + 3*g*7^91 + (2*g + 6)*7^92 + 7^93 + (2*g + 6)*7^94 + (4*g + 4)*7^95 + (4*g + 3)*7^96 + 3*7^97 + (g + 5)*7^98 + (5*g + 5)*7^99 + O(7^100)
sage: yy = (5*g + 2) + (3*g + 1)*7 + (6*g + 1)*7^2 + 2*g*7^3 + 7^4 + (3*g + 6)*7^5 + 6*g*7^6 + (3*g + 4)*7^7 + (4*g + 4)*7^8 + 2*g*7^9 + (g + 1)*7^10 + (4*g + 2)*7^11 + (g + 4)*7^12 + (5*g + 2)*7^13 + (5*g + 3)*7^14 + (3*g + 6)*7^15 + (g + 1)*7^16 + (4*g + 3)*7^17 + (4*g + 2)*7^19 + (4*g + 5)*7^20 + (6*g + 4)*7^21 + (6*g + 1)*7^22 + (3*g + 1)*7^23 + 7^24 + (4*g + 2)*7^25 + (g + 6)*7^26 + (3*g + 3)*7^27 + 5*7^28 + (6*g + 6)*7^30 + (4*g + 4)*7^31 + (g + 4)*7^32 + (2*g + 2)*7^33 + (3*g + 6)*7^34 + (5*g + 4)*7^35 + (3*g + 5)*7^36 + (2*g + 6)*7^37 + (4*g + 5)*7^38 + 4*g*7^40 + (g + 5)*7^41 + (5*g + 2)*7^42 + (3*g + 1)*7^43 + 5*g*7^44 + (4*g + 6)*7^45 + (2*g + 2)*7^46 + (2*g + 3)*7^47 + (3*g + 5)*7^48 + 5*7^49 + (4*g + 2)*7^50 + (g + 6)*7^51 + (4*g + 6)*7^52 + (4*g + 1)*7^53 + 2*g*7^54 + (3*g + 1)*7^56 + (5*g + 6)*7^58 + (2*g + 2)*7^59 + 2*7^60 + 3*7^61 + (6*g + 6)*7^62 + (6*g + 4)*7^63 + (4*g + 3)*7^64 + (6*g + 3)*7^65 + (2*g + 6)*7^66 + (g + 2)*7^67 + 4*7^68 + (3*g + 2)*7^69 + (2*g + 3)*7^70 + (2*g + 2)*7^71 + (5*g + 5)*7^72 + (6*g + 3)*7^73 + 3*g*7^74 + 4*7^75 + (g + 5)*7^76 + (g + 3)*7^77 + (5*g + 4)*7^78 + 5*7^79 + 2*g*7^80 + 3*7^81 + 6*g*7^82 + (4*g + 2)*7^83 + (3*g + 5)*7^84 + (2*g + 4)*7^85 + 2*g*7^86 + (g + 2)*7^87 + (3*g + 4)*7^88 + (g + 4)*7^89 + 6*g*7^90 + 2*g*7^92 + (2*g + 1)*7^93 + (5*g + 3)*7^94 + (4*g + 3)*7^95 + (5*g + 3)*7^96 + (3*g + 1)*7^97 + (4*g + 4)*7^98 + 2*7^99 + O(7^100)
sage: P = ECp([xx,yy])
sage: print 2*P

I get the following error:

/data/sage-5.3.rc1/local/lib/python2.7/site-packages/sage/rings/padics/padic_ZZ_pX_CR_element.so in sage.rings.padics.padic_ZZ_pX_CR_element.pAdicZZpXCRElement.__hash__ (sage/rings/padics/padic_ZZ_pX_CR_element.cpp:16654)()

OverflowError: Python int too large to convert to C long

nbruin · 2012-09-13T15:58:43Z

comment:6

Three remarks:

It looks like the hash maximally only contain about 3 decimal digits of information. That's way too small! Hash table lookups are based on having O(1) elements in the bins. To give you an idea of how small the constant in the O(1) should be: Python's dictionaries make sure that the average bin size in 0.7 (smaller than 1). You're fighting the birthday paradox here, so don't underestimate the likelyhood of collisions. Anyway, when using more than 1000 elements in a dictionary, the proposed hash will let dictionaries just degrade to a linear search, but with way larger memory overhead.
It may seem that having hashes depend on precision (which isn't taken into account in equality testing!) is a mild condition, but it's not. If you have a large matrix or other aggregate element, you can easily end up with some small precision elements in there. This will lead to very confusing situation where a dict lookup doesn't find your key, but you know you have a key that's equal to a stored one. Of course, if you're relying on equality in p-adic computations you've already lost ...
Caching functions on equality of arguments is a very bad idea with the p-adics. If I construct a known invertible matrix to too low a precision, I'll get determinant 0. If subsequently I construct the same matrix to higher precision, a cached determinant function would recognize the argument as equal to one he'd seem before (because equality is only to lowest common precision) and would just give me the 0 back.

Properties 2 and 3 conspire to hide the problem a bit:

sage: @cached_function
....: def DISC(f):
....:         return discriminant(f)
....: 
sage: K=Qp(2,500)
sage: P.<x>=K[]
sage: f=(1+O(2^200))*x^2-(2^200+O(2^200))
sage: DISC(f)
0
sage: g=(1+O(2^300))*x^2-(2^200+O(2^300))
sage: DISC(g)
2^202 + O(2^302)
sage: 
sage: hash(f), hash(g)
(15360174650385708, 15377766836429868)
sage: f == g
True

so the entry under f isn't found when looking up g because they end up being stored in different bins, so the difference in hash is enough to distinguish them. As soon as the dictionary gets resized in a way that lets those two hashes end up in the same bin, it'll be a toss-up which answer you get back from either DISC(f) or DISC(g). Oh course, by then it will be very painful to track down what happened.

To show that functions that depend on more of their arguments than just equality should not be cached:

sage: a=3
sage: b=GF(5)(3)
sage: @cached_function
....: def test(a):
....:         return a^2
....: 
sage: [test(a),test(b)]
[9, 9]
sage: test.clear_cache()
sage: [test(b),test(a)]
[4, 4]

roed314 · 2012-10-15T08:29:15Z

comment:7

Nils, I agree that there are some horrible compromises to be had on this issue. I would like to make p-adic elements unhashable, but there are some nasty consequences of doing so. We can't add points on elliptic curves over p-adic fields. We can't print matrices. Out of curiousity I tried changing the hash methods on elements of Zp and Qp to raise a TypeError. Hundreds of doctest failures. There's an assumption that elements are hashable throughout Sage.

I don't know what the answer is. If we can come to a consensus on this I'm happy to fix the whitespace issues. :-)

saraedum · 2012-10-16T18:07:44Z

comment:8

I had a look into this. Except for one place (which could certainly be worked around somehow), the doctests seem to fail because caching doesn't work anymore. Would there be something wrong with adding a _cache_key() to SageObject which per default returns self. Only if the element is not hashable (but should be cacheable) it could return something that is actually a unique key for this element — something like self.dumps() maybe.

I gave it a try (see attached patch). The only doctests that fail seem to be a few places where self.dumps() doesn't work and one problem in sage/modular/overconvergent/weightspace.py.

saraedum · 2012-10-16T18:08:49Z

experimental patch to disable caching for padics

jdemeyer · 2013-08-13T15:34:36Z

comment:9

Attachment: cache.patch.gz

saraedum · 2014-03-06T13:44:35Z

Dependencies: #15897

saraedum · 2014-03-06T13:48:31Z

Branch: u/saraedum/ticket/11895

saraedum · 2014-03-06T15:45:34Z

Changed dependencies from #15897 to #15897, #15898

saraedum · 2014-03-06T15:45:34Z

Changed branch from u/saraedum/ticket/11895 to none

saraedum · 2014-03-07T10:33:39Z

Branch: u/saraedum/ticket/11895

saraedum · 2014-05-13T13:13:08Z

Changed reviewer from David Loeffler to none

saraedum · 2014-05-13T13:13:08Z

Changed author from David Roe to none

saraedum · 2014-05-13T13:13:08Z

Changed work issues from whitespace, line wrapping in docstrings to one pickle from the pickle jar does not unpickle correctly

saraedum · 2014-05-13T13:14:01Z

comment:41

Added Simon since we talked about this in Bad Boll a while ago.

vbraun · 2014-09-25T13:42:44Z

comment:43

The ticket description mentions a problem when comparing x==0, but that is incorrect. Comparison first coerces both sides to a common parent, and if the rhs is a Python int then this will be the same padic ring as x. So if == were to compare both sides including the precision we'd still be fine. Of course when comparing two padics you coerce to the lower precision, so the main issue remains.

Just making elements not hashable is also terrible, many algorithms use sets and dicts. E.g. if the coefficient field is not hashable then you can't compute groebner bases in polynomial rings, rendering it essentially useless.

saraedum · 2014-09-25T13:54:08Z

comment:44

Replying to @vbraun:

The ticket description mentions a problem when comparing x==0, but that is incorrect. Comparison first coerces both sides to a common parent, and if the rhs is a Python int then this will be the same padic ring as x. So if == were to compare both sides including the precision we'd still be fine. Of course when comparing two padics you coerce to the lower precision, so the main issue remains.

I'm not sure I understand. Say you are in Qp with 2 digits of precision. Then 0 coerces to O(p^2). Setting x=O(p) gives x != 0 when including precisions in the comparison.

Just making elements not hashable is also terrible, many algorithms use sets and dicts. E.g. if the coefficient field is not hashable then you can't compute groebner bases in polynomial rings, rendering it essentially useless.

I agree that disabling hashing has its drawbacks but I do not see an alternative. Judging from the doctests, not that many things break, actually. But I guess that some algorithms have to be modified to support unhashable elements if we want to use them with p-adic numbers.

vbraun · 2014-09-25T14:05:36Z

comment:45

Replying to @saraedum:

I'm not sure I understand. Say you are in Qp with 2 digits of precision. Then 0 coerces to O(p^2). Setting x=O(p) gives x != 0 when including precisions in the comparison.

Ok, I hadn't thought about that. But IMHO thats more of an issue with having two separate and competing ways of specifying precision, one in the parent and one in the element:

sage: O(3).parent()(3)
3 + O(3^21)

I agree that disabling hashing has its drawbacks but I do not see an alternative.

I don't have a good answer right now either, but it once again shows the importance of this issue. We need to fix our failure to adhere to Python's convention about comparison and hashes.

sagetrac-jakobkroeker · 2015-08-25T16:09:45Z

Stopgaps: todo

saraedum · 2015-08-25T20:00:52Z

comment:47

Replying to @vbraun:

But IMHO thats more of an issue with having two separate and competing ways of specifying precision, one in the parent and one in the element

Is it really about competing ways? I guess this happens because it is possible to specify precision in the element at all.

Replying to @saraedum:

I agree that disabling hashing has its drawbacks but I do not see an alternative.

I don't have a good answer right now either, but it once again shows the importance of this issue.
We need to fix our failure to adhere to Python's convention about comparison and hashes.

The problem is that sometimes you want a==b to mean, a is essentially the same as b, say in an algorithm that doese while(x!=0), and sometimes to you want it to mean a is indistinguishable from b. The latter is maybe what you want in dictionary lookups. But not always, for example if you have a dict which maps powers of x, the variable in a polynomial ring, to whatever, then you probably do not care about the precision of the 1 coefficient.
So what I'm trying to say is that precision is tricky. Sure, many algorithms might just throw an exception if we disable hashing. But would they really work correctly otherwise? Whenever we make a dictionary lookup with elements with precision, it seems that we need to decide on a case by case basis which version of == would produce the right result. Throwing an exception and requiring people to explicitly specify what they want to happen seems to be a better solution IMO.

saraedum · 2016-03-24T02:19:42Z

Changed keywords from hash to hash, days71

saraedum · 2016-03-24T02:19:42Z

comment:48

We are abandoning this in favor of #20246.

mmasdeu added this to the sage-5.11 milestone Oct 4, 2011

mmasdeu added c: padics labels Oct 4, 2011

mmasdeu assigned roed314 Oct 4, 2011

roed314 added the s: needs review label Nov 9, 2011

loefflerd mannequin added s: needs work and removed s: needs review labels Mar 26, 2012

jdemeyer modified the milestones: sage-5.11, sage-5.12 Aug 13, 2013

sagetrac-vbraun-spam mannequin modified the milestones: sage-6.1, sage-6.2 Jan 30, 2014

This comment has been minimized.

Sign in to view

saraedum changed the title ~~padic_ZZ_pX_CR_element is unhashable~~ Make p-adic numbers unhashable May 13, 2014

sagetrac-vbraun-spam mannequin modified the milestones: sage-6.3, sage-6.4 Aug 10, 2014

saraedum added s: positive review and removed s: needs work labels Mar 24, 2016

saraedum removed this from the sage-6.4 milestone Mar 24, 2016

vbraun removed the s: positive review label Mar 26, 2016

vbraun closed this as completed Mar 26, 2016

nbruin mentioned this issue Dec 2, 2014

x==y while hash(x)!=hash(y) with SchemeMorphism_point_projective_field #17427

Closed

saraedum mentioned this issue Jan 21, 2017

Use with strict_equality(True) to work around hashing of p-adics #20246

Closed

grhkm21 mentioned this issue Apr 1, 2024

Implementing SchemeMorphism_sum #37705

Draft

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make p-adic numbers unhashable #11895

Make p-adic numbers unhashable #11895

mmasdeu commented Oct 4, 2011

roed314 commented Nov 8, 2011

roed314 commented Nov 9, 2011

loefflerd mannequin commented Mar 10, 2012

loefflerd mannequin commented Mar 26, 2012

loefflerd mannequin commented Mar 26, 2012

loefflerd mannequin commented Mar 26, 2012

mmasdeu commented Sep 13, 2012

nbruin commented Sep 13, 2012

roed314 commented Oct 15, 2012

saraedum commented Oct 16, 2012

saraedum commented Oct 16, 2012

jdemeyer commented Aug 13, 2013

saraedum commented Mar 6, 2014

saraedum commented Mar 6, 2014

saraedum commented Mar 6, 2014

saraedum commented Mar 6, 2014

saraedum commented Mar 7, 2014

saraedum commented May 13, 2014

This comment has been minimized.

saraedum commented May 13, 2014

saraedum commented May 13, 2014

saraedum commented May 13, 2014

vbraun commented Sep 25, 2014

saraedum commented Sep 25, 2014

vbraun commented Sep 25, 2014

sagetrac-jakobkroeker mannequin commented Aug 25, 2015

saraedum commented Aug 25, 2015

saraedum commented Mar 24, 2016

saraedum commented Mar 24, 2016

Make p-adic numbers unhashable #11895

Make p-adic numbers unhashable #11895

Comments

mmasdeu commented Oct 4, 2011

roed314 commented Nov 8, 2011

roed314 commented Nov 9, 2011

loefflerd mannequin commented Mar 10, 2012

loefflerd mannequin commented Mar 26, 2012

loefflerd mannequin commented Mar 26, 2012

loefflerd mannequin commented Mar 26, 2012

mmasdeu commented Sep 13, 2012

nbruin commented Sep 13, 2012

roed314 commented Oct 15, 2012

saraedum commented Oct 16, 2012

saraedum commented Oct 16, 2012

jdemeyer commented Aug 13, 2013

saraedum commented Mar 6, 2014

saraedum commented Mar 6, 2014

saraedum commented Mar 6, 2014

saraedum commented Mar 6, 2014

saraedum commented Mar 7, 2014

saraedum commented May 13, 2014

This comment has been minimized.

saraedum commented May 13, 2014

saraedum commented May 13, 2014

saraedum commented May 13, 2014

vbraun commented Sep 25, 2014

saraedum commented Sep 25, 2014

vbraun commented Sep 25, 2014

sagetrac-jakobkroeker mannequin commented Aug 25, 2015

saraedum commented Aug 25, 2015

saraedum commented Mar 24, 2016

saraedum commented Mar 24, 2016