Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqlite write locks aren't respected in WSL #2395

Closed
tdsmith opened this issue Aug 4, 2017 · 38 comments
Closed

sqlite write locks aren't respected in WSL #2395

tdsmith opened this issue Aug 4, 2017 · 38 comments

Comments

@tdsmith
Copy link

tdsmith commented Aug 4, 2017

  • Your Windows build number: (Type ver at a Windows Command Prompt)

10.0.15063

  • What you're doing and what's happening: (Copy&paste specific commands and their output, or include screen shots)

sqlite's write locking seems to be broken under WSL both inside and outside the /mnt hierarchy; attempting to write to the same database from distinct processes using normal sqlite locking throws I/O errors from sqlite and leads to database corruption.

  • What's wrong / what should be happening instead:

Only one process should get an exclusive database lock at a time.

Sorry I don't have a minimal reproducer or a deep understanding of how sqlite's locking works. This may or may not have the same root cause as #1927 and #1712.

@sunilmut
Copy link
Member

sunilmut commented Aug 4, 2017

@tdsmith - Thanks for your post. Can you share any repro? Or at least some source code reference? Without that it's difficult to say what's going on.

@tdsmith
Copy link
Author

tdsmith commented Aug 4, 2017

Here's a Python script that tries to open connections on a sqlite database from separate processes:

#!/usr/bin/env python3

from itertools import product
import multiprocessing
import sqlite3

DBNAME = "test.sqlite"

def init_db():
    conn = sqlite3.connect(DBNAME)
    conn.execute("DROP TABLE IF EXISTS myvalues")
    conn.execute("CREATE TABLE myvalues(i, j)")
    conn.commit()
    conn.close()

def do_work(i):
    conn = sqlite3.connect(DBNAME)
    my_id = multiprocessing.current_process()._identity[0]
    values = product([my_id], range(i, i+1000))
    conn.executemany("INSERT INTO myvalues VALUES(?, ?)", values) 
    conn.commit()
    conn.close()

def main():
    init_db()
    p = multiprocessing.Pool()
    p.map(do_work, range(0, 50000, 1000))
    p.close()
    p.join()

if __name__ == "__main__":
    main()

On Linux, it succeeds.

tim@neptune:~$ sqlite3 test.sqlite 'select count(*) from myvalues;'
50000

On WSL, the script fails, with the exception:

Traceback (most recent call last):
  File "/home/tim/.pyenv/versions/3.6.1/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/home/tim/.pyenv/versions/3.6.1/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "concurrent_sqlite_test.py", line 21, in do_work
    conn.commit()
sqlite3.OperationalError: disk I/O error

Running the sqlite3 CLI over the file afterwards gives unusual results (this number varies):

tim@tds:~/foo$ sqlite3 test.sqlite 'select count(*) from myvalues;'
4047

If you disable sqlite's journaling by running conn.execute("PRAGMA journal=OFF"), the file on disk ends up being corrupted.

tim@tds:~/foo$ sqlite3 test.sqlite 'select count(*) from myvalues;'
Error: disk I/O error

@tdsmith
Copy link
Author

tdsmith commented Aug 4, 2017

Weirdly, my home directory is now in a state where I can't even create the database anymore in the parent process -- there aren't any visible lock or journal files, so I don't know what state is interfering:

tim@tds:~$ rm -f test.sqlite; python concurrent_sqlite_test.py
Traceback (most recent call last):
  File "concurrent_sqlite_test.py", line 30, in <module>
    main()
  File "concurrent_sqlite_test.py", line 23, in main
    init_db()
  File "concurrent_sqlite_test.py", line 11, in init_db
    conn.execute("DROP TABLE IF EXISTS myvalues")
sqlite3.OperationalError: disk I/O error

But I can mkdir foo; cd foo; python ../concurrent_sqlite_test.py and it resumes failing in the usual way (i.e. in commit() in a child process).

@redbaron
Copy link

redbaron commented Aug 7, 2017

I can confirm it still happens on 16251 build on both LxFs and DrvFS

@tdsmith
Copy link
Author

tdsmith commented Aug 9, 2017

Looking at sqlite's pager.c, it looks like sqlite is setting POSIX advisory byte range locks with fcntl(..., F_SETLK, ...), which indeed makes this a duplicate of #1927.

@tdsmith
Copy link
Author

tdsmith commented Aug 9, 2017

A workaround should be to pass an alternative "VFS" name to sqlite3_open_v2().

These are defined in pager.c and include unix-dotfile and unix-flock. There is a brief description in the docs.

The python3 sqlite module does not seem to support setting an alternative VFS but the apsw module does.

@ThomasMader
Copy link

NixOS seems to suffer from the same problem which can be worked around by disabling WAL mode. NixOS/nix#1203

@tdsmith
Copy link
Author

tdsmith commented Oct 27, 2017

This issue should affect both WAL and regular journals (I was testing with regular journals), though it's possible disabling WAL makes it harder to trigger.

@bbigras
Copy link

bbigras commented May 4, 2018

Any progress on this?

@steveroot
Copy link

I think I have stumbled upon this issue and have a small ruby script that creates a sqlite database, imports logs data and fails like this:

sroot@sroot-msb:/mnt/c/Users/steve/Desktop/motion_sensor_logs$ ruby analyse_importtosqlite.rb                                                           
Anaylse-- import using Ruby                                                                                                                             
motion_counter_20171130090000.log                                                                                                                       
/home/sroot/.rbenv/versions/2.4.4/lib/ruby/gems/2.4.0/gems/sqlite3-1.3.13/lib/sqlite3/statement.rb:108:in `step': disk I/O error (SQLite3::IOException) 
        from /home/sroot/.rbenv/versions/2.4.4/lib/ruby/gems/2.4.0/gems/sqlite3-1.3.13/lib/sqlite3/statement.rb:108:in `block in each'                  
        from /home/sroot/.rbenv/versions/2.4.4/lib/ruby/gems/2.4.0/gems/sqlite3-1.3.13/lib/sqlite3/statement.rb:107:in `loop'                           
        from /home/sroot/.rbenv/versions/2.4.4/lib/ruby/gems/2.4.0/gems/sqlite3-1.3.13/lib/sqlite3/statement.rb:107:in `each'                           
        from /home/sroot/.rbenv/versions/2.4.4/lib/ruby/gems/2.4.0/gems/sqlite3-1.3.13/lib/sqlite3/database.rb:156:in `to_a'                            
        from /home/sroot/.rbenv/versions/2.4.4/lib/ruby/gems/2.4.0/gems/sqlite3-1.3.13/lib/sqlite3/database.rb:156:in `block in execute'                
        from /home/sroot/.rbenv/versions/2.4.4/lib/ruby/gems/2.4.0/gems/sqlite3-1.3.13/lib/sqlite3/database.rb:95:in `prepare'                          
        from /home/sroot/.rbenv/versions/2.4.4/lib/ruby/gems/2.4.0/gems/sqlite3-1.3.13/lib/sqlite3/database.rb:137:in `execute'                         
  • The error appears at different times when run (ie, it will fail at different lines of the log so I ruled out the data format causing the error).
  • I updated sqlite from source on my laptop, this didn't solve it.
  • I ran the script on an ubuntu server and it ran perfectly.

My googling suggests the problem is associated with how WSL gets to talk with the sqlite file which lead me here, but I'm rapidly moving out of my depth.

If there's interest, I can provide the ruby script and some sample logs so that others can recreate error. I'll watch this issue for replies

@spacekitteh
Copy link

Any update?

@ludgerheide
Copy link

ludgerheide commented Mar 7, 2019

This issue is driving me crazy right now as well, causing all sorts of random errors.
Windows Version: Microsoft Windows [Version 10.0.17763.253]
Simple steps to reproduce:

  1. Install sqlite3 command line tool
  2. Open two WSL windows
  3. Acess the database in both windows
  4. Issue two BEGIN EXCLUSIVE TRASACTION statements - they both succeed 😮
  5. Do contradictory stuff
  6. Issue END TRANSACTION statements - get a disk I/O error in one window.

What should happen:

  • One of the windows should error out with database is locked

Screenshot

@rgemulla
Copy link

rgemulla commented Mar 7, 2019

I also feel that this bug is a major pain. I've had destroyed subversion repositories (problems with sqllite) and e-mail databases (problem with multiple writers on Xapian indexes) due to this.

@XerTheSquirrel
Copy link

This also affects Fossil (http://fossil-scm.org/) in that it can easily corrupt repository databases. You can reproduce by doing the following:

Running commands in two difference terminal processes (A$ and B$):

A$ export VISUAL=nano
A$ fossil init boop.fossil
A$ mkdir boop
A$ cd boop
A$ fossil open ../boop.fossil
A$ touch test
A$ fossil uv add test
A$ fossil uv edit test

B$ export VISUAL=nano
B$ cd boop
B$ fossil uv edit test
# This command should fail, but instead it works.

Normally B should fail with a message like Database error: database is locked: {INSERT INTO rcvfrom(uid, mtime, nonce, ipaddr)VALUES(1, julianday('now'), NULL, '#!fossil unversioned edit')}, however on WSL it continues normally and allow you to edit the file.

@earonesty
Copy link

earonesty commented Mar 28, 2019

This issue is blocked by: #1927 ... lxfs doesn't support fcntl locks of any kind, which many programs rely on.

@lambdaheart
Copy link

I've found a workaround to pause for a 0.1 seconds before writing to the database (at least when I'm writing multiple entries at once). This seems to allow SQLite to "recover" and properly handle IO or what ever is happening.

@CyrusNajmabadi
Copy link

CyrusNajmabadi commented Nov 1, 2019

Running into this as well. This affects visual studio which stores information it needs to process in sqlite. If it operates on a project on the shared FS between wsl2 and Windows, then this completely fails (in this case, it just hangs trying to even acquire the proper fs locks).

Something isn't working properly with this file system which breaks the entire scenario of having shared projects that can be edited by either the windows or linux side of things.

@JEM-Mosig
Copy link

It also breaks rasa x

@lcerman
Copy link

lcerman commented Dec 21, 2019

Hit this issue too while trying the parallel optimization with Optuna (https://optuna.readthedocs.io/en/latest/tutorial/distributed.html), it uses sqlite to share optimization progress between the multiple processes, which is broken under WSL...

BTW, could this be resolved by upgrading to WSL2? Currently I am using WSL1 with Ubuntu 18.04.

@earonesty
Copy link

earonesty commented Dec 23, 2019 via email

@aaronsteers
Copy link

Any users able to confirm this is unblocked in WSL2? If yes, any other blockers preventing SQLite functioning properly?

@HenkPoley
Copy link

HenkPoley commented Jun 30, 2020

Any users able to confirm this is unblocked in WSL2? If yes, any other blockers preventing SQLite functioning properly?

A while ago it was still broken across the OS boundary (e.g. /mnt/c/ or Plan 9 Filesystem Protocol). #2395 (comment)

@willgozlan
Copy link

The problem seems to be fixed in WSL2. When in WSL1, I was having database lock issues, but after upgrading to WSL 2 the problems all went away, and it worked as expected. My guess is something to do with the lower level kernel code, of Linux vs Windows kernel.

@RahulDey12
Copy link

RahulDey12 commented Jul 29, 2020

@willgozlan it is not

image

@DanielGoldfarb
Copy link

DanielGoldfarb commented Jul 30, 2020

@RahulDey12
It appears to me that you are not running sqlite in WSL2, but rather in Windows PowerShell (where it is apparently also broken). What happens if you start an Ubuntu Linux terminal window and run sqlite there?

@RahulDey12
Copy link

RahulDey12 commented Jul 30, 2020

@RahulDey12
It appears to me that you are not running sqlite in WSL2, but rather in Windows PowerShell (where it is apparently also broken). What happens if you start an Ubuntu Linux terminal window and run sqlite there?

I use sqlite client called TablePlus on windows So I wanna use it from windows side. In Wsl side it is working fine.

@therealkenc
Copy link
Collaborator

That screencap above on 9p is new landing zone #5762. Otherwise this was effectively /dupe #1927 from the get-go.

@ghost
Copy link

ghost commented Aug 19, 2020

Hi! We've identified this issue as a duplicate of another one that already exists in this repository. This specific instance is being closed in favor of tracking the concern over on the referenced thread.

Thanks for your report!

@Radivarig
Copy link

Hi! We've identified this issue as a duplicate of another one that already exists in this repository. This specific instance is being closed in favor of tracking the concern over on the referenced thread.

Thanks for your report!

So what is the original issue number....

@steveroot
Copy link

Hi! We've identified this issue as a duplicate of another one that already exists in this repository. This specific instance is being closed in favor of tracking the concern over on the referenced thread.
Thanks for your report!

So what is the original issue number....

Bad bot! Let's feed it carrots to encourage it to learn to cite duplicates :-)
I think it means #1927 is an earlier report of the same underlying problem. That report is also closed as fixed in WSL2 but I'm no longer in a position to test

@suravarapusuryakiran
Copy link

It also breaks rasa x

i'm trying to run rasa x on wsl2 and it's breaking.! were you able to resolved it .?

@suravarapusuryakiran
Copy link

It also breaks rasa x

on wsl2 it worked for me

@nonrevlb
Copy link

nonrevlb commented Aug 30, 2021

Still having problems with subversion on WSL2:
`
svn: E200030: sqlite[S10]: disk I/O error
svn: E200042: Additional errors:
svn: E200030: sqlite[S10]: disk I/O error

$ uname -a
Linux 5.10.16.3-microsoft-standard-WSL2 #1 SMP Fri Apr 2 22:23:49 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
`

@HenkPoley
Copy link

Under WSL2, anything under /mnt/*/ is like a remote filesystem, that goes over the Plan 9 Filesystem Protocol (9P).

There sqlite can't lock it's database files. Everywhere else, outside the shared filesystem, sqlite will work.

Under WSL1 it's a bit different, but other breakages means sqlite database files will not work anywhere.

@lycandroid
Copy link

A workaround I use is to install windows tortoisesvn including the command line utilities, which makes it include the windows version of svn.exe. Then in linux set svn to be aliased to svn.exe via .bashrc:

alias svn="/c/Program\ Files/TortoiseSVN/bin/svn.exe"

Bonus is that, it being a windows exe, filesystem performance is better for stuff under /mnt/*.

@zhouwg
Copy link

zhouwg commented Jun 3, 2022

The problem seems to be fixed in WSL2. When in WSL1, I was having database lock issues, but after upgrading to WSL 2 the problems all went away, and it worked as expected. My guess is something to do with the lower level kernel code, of Linux vs Windows kernel.

thanks.
having a same issue with WSL1, and the problems went away after upgrading to WSL2 by

wsl --set-version Ubuntu-18.04 2

wsl -l -v

  NAME            STATE           VERSION
* Ubuntu-18.04    Running         2

~$ cat test.py

#!/usr/bin/env python3

from itertools import product
import multiprocessing
import sqlite3

DBNAME = "test.sqlite"

def init_db():
    conn = sqlite3.connect(DBNAME)
    conn.execute("DROP TABLE IF EXISTS myvalues")
    conn.execute("CREATE TABLE myvalues(i, j)")
    conn.commit()
    conn.close()

def do_work(i):
    conn = sqlite3.connect(DBNAME)
    my_id = multiprocessing.current_process()._identity[0]
    values = product([my_id], range(i, i+1000))
    conn.executemany("INSERT INTO myvalues VALUES(?, ?)", values)
    conn.commit()
    conn.close()

def main():
    init_db()
    p = multiprocessing.Pool()
    p.map(do_work, range(0, 50000, 1000))
    p.close()
    p.join()

if __name__ == "__main__":
    main()

~$ ./test.py
~$ ls
test.py test.sqlite

nobitagamer added a commit to nobitagamer/dotfiles that referenced this issue Oct 9, 2024
* zdharma/fast-syntax-highlighting MUST load before zsh-users/zsh-autosuggestions or auto-complate will NOT work
+ Add `duf` for better `df`
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests