Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird problem when getting files ~25MB #469

Closed
gkyildirim opened this issue Mar 27, 2016 · 11 comments
Closed

Weird problem when getting files ~25MB #469

gkyildirim opened this issue Mar 27, 2016 · 11 comments

Comments

@gkyildirim
Copy link

This is my first attempt to work with leofs. Please let me know if this is a well known issue.

I am running leofs-1.2.20 over a single ubuntu-14.0.4. Configurations are all default. I've created a new bucket and make it public-read-write.

I observe a weird problem. I put a 25MB file (with dragon disk). Then I can get it successfully at my first attempt. But after that I can not get the same file anymore. For example curl exits with "curl: (18) transfer closed with 26246026 bytes remaining to read". This is also true for dragon disk.

I run a short test and I've observed same issue with files between 3MB-40MB. Files below 3MB and files above 40MB has no problem.

@yosukehara
Copy link
Member

Thank you for your report. I've quickly confirmed this situation on my laptop but I could not face the same situation as below. In order to know this problem, It would be nice if you share error logs of the storage-node(s) and the gateway-node.

Case-1: Put an 25MB object via s3cmd

$ ./leofs-adm status
 [System Confiuration]
-----------------------------------+----------
 Item                              | Value
-----------------------------------+----------
 Basic/Consistency level
-----------------------------------+----------
                    system version | 1.2.20
                        cluster Id | leofs_1
                             DC Id | dc_1
                    Total replicas | 2
          number of successes of R | 1
          number of successes of W | 1
          number of successes of D | 1
 number of rack-awareness replicas | 0
                         ring size | 2^128
-----------------------------------+----------
 Multi DC replication settings
-----------------------------------+----------
        max number of joinable DCs | 2
           number of replicas a DC | 1
-----------------------------------+----------
 Manager RING hash
-----------------------------------+----------
                 current ring-hash | 3923d007
                previous ring-hash | 3923d007
-----------------------------------+----------

 [State of Node(s)]
-------+--------------------------+--------------+----------------+----------------+----------------------------
 type  |           node           |    state     |  current ring  |   prev ring    |          updated at
-------+--------------------------+--------------+----------------+----------------+----------------------------
  S    | storage_0@127.0.0.1      | running      | 3923d007       | 3923d007       | 2016-03-27 22:07:10 +0900
  S    | storage_1@127.0.0.1      | running      | 3923d007       | 3923d007       | 2016-03-27 22:07:10 +0900
  S    | storage_2@127.0.0.1      | running      | 3923d007       | 3923d007       | 2016-03-27 22:07:09 +0900
  S    | storage_3@127.0.0.1      | running      | 3923d007       | 3923d007       | 2016-03-27 22:07:10 +0900
  G    | gateway_0@127.0.0.1      | running      | 3923d007       | 3923d007       | 2016-03-27 22:07:19 +0900
-------+--------------------------+--------------+----------------+----------------+----------------------------

$ dd if=/dev/zero of=25M.file bs=25600 count=1024
$ s3cmd mb s3:test/
$ s3cmd put ./25M.file s3://test/

$ leofs-adm whereis test/25M.file
-------+--------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------------------
 del?  |           node           |             ring address             |    size    |   checksum   |  # of chunks   |     clock      |             when
-------+--------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------------------
       | storage_0@127.0.0.1      | 784626dc30fc8147c4fa7edad8d8109      |     25600K |   b7bbfe5965 |              2 | 52f0782b26577  | 2016-03-27 22:09:47 +0900
       | storage_1@127.0.0.1      | 784626dc30fc8147c4fa7edad8d8109      |     25600K |   b7bbfe5965 |              2 | 52f0782b26577  | 2016-03-27 22:09:47 +0900

$ leofs-adm update-acl test 05236 public-read-write

$ curl -v -X GET http://test.localhost:8080/25M.file > 25M.file.1
...
$ curl -v -X GET http://test.localhost:8080/25M.file > 25M.file.7
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 127.0.0.1...
* Connected to test.localhost (127.0.0.1) port 8080 (#0)
> GET /25M.file HTTP/1.1
> Host: test.localhost:8080
> User-Agent: curl/7.43.0
> Accept: */*
>
< HTTP/1.1 200 OK
< connection: keep-alive
< date: Sun, 27 Mar 2016 13:17:35 GMT
< Content-Length: 26214400
< server: LeoFS
< Content-Type: application/octet-stream
< ETag: "b7bbfe5965698d52826c529d34425a1d"
< Last-Modified: Sun, 27 Mar 2016 13:09:47 GMT
<

$ ls -la | grep 25M.file
-rw-r--r--   1 yosukehara  staff  26214400  3 27 22:09 25M.file
-rw-r--r--   1 yosukehara  staff  26214400  3 27 22:17 25M.file.1
-rw-r--r--   1 yosukehara  staff  26214400  3 27 22:17 25M.file.2
-rw-r--r--   1 yosukehara  staff  26214400  3 27 22:17 25M.file.3
-rw-r--r--   1 yosukehara  staff  26214400  3 27 22:17 25M.file.4
-rw-r--r--   1 yosukehara  staff  26214400  3 27 22:17 25M.file.5
-rw-r--r--   1 yosukehara  staff  26214400  3 27 22:17 25M.file.6
-rw-r--r--   1 yosukehara  staff  26214400  3 27 22:17 25M.file.7
-rw-r--r--   1 yosukehara  staff  26214400  3 27 22:17 25M.file.8

Case-2: Put an 25MB object via DragonDisk

$ leofs-adm whereis test/25M-2.file
-------+--------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------------------
 del?  |           node           |             ring address             |    size    |   checksum   |  # of chunks   |     clock      |             when
-------+--------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------------------
       | storage_1@127.0.0.1      | 9fd3c2a0917d698ed50d2edba32ab2e      |     25600K |   bed3c0a4a1 |              5 | 52f07b2451c85  | 2016-03-27 22:23:05 +0900
       | storage_2@127.0.0.1      | 9fd3c2a0917d698ed50d2edba32ab2e      |     25600K |   bed3c0a4a1 |              5 | 52f07b2451c85  | 2016-03-27 22:23:05 +0900


$ curl -v -X GET http://test.localhost:8080/25M-2.file > 25M-2.file.5
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 127.0.0.1...
* Connected to test.localhost (127.0.0.1) port 8080 (#0)
> GET /25M-2.file HTTP/1.1
> Host: test.localhost:8080
> User-Agent: curl/7.43.0
> Accept: */*
>
< HTTP/1.1 200 OK
< connection: keep-alive
< date: Sun, 27 Mar 2016 13:24:41 GMT
< Content-Length: 26214400
< server: LeoFS
< Content-Type: application/octet-stream
< ETag: "bed3c0a4a1407f584989b4009e9ce33f"
< Last-Modified: Sun, 27 Mar 2016 13:23:05 GMT
<
{ [16384 bytes data]
100 25.0M  100 25.0M    0     0  89.6M      0 --:--:-- --:--:-- --:--:-- 89.9M

$ ls -l | grep 25M-2.file
-rw-r--r--   1 yosukehara  staff  26214400  3 27 22:22 25M-2.file
-rw-r--r--   1 yosukehara  staff  26214400  3 27 22:24 25M-2.file.1
-rw-r--r--   1 yosukehara  staff  26214400  3 27 22:24 25M-2.file.2
-rw-r--r--   1 yosukehara  staff  26214400  3 27 22:24 25M-2.file.3
-rw-r--r--   1 yosukehara  staff  26214400  3 27 22:24 25M-2.file.4
-rw-r--r--   1 yosukehara  staff  26214400  3 27 22:24 25M-2.file.5

screen shot 2016-03-27 at 22 27 00

@gkyildirim
Copy link
Author

Thanks for your prompt response!

I can easily produce my case. Please take a look at the log files attached as well as the case below.
My setup is pretty simple. I just fired up a new ubuntu server. Downloaded leofs_1.2.20-1_ubuntu-14.04_amd64.deb, changed gateway port to 80 and created user&bucket with leofs-adm. That's all.
I see that my setup is different with yours as

  • it is all single node
  • bucket is created via leofs-adm

On the other hand I am happy to provide ssh access if needed.

leofs-adm status
 [System Confiuration]
-----------------------------------+----------
 Item                              | Value
-----------------------------------+----------
 Basic/Consistency level
-----------------------------------+----------
                    system version | 1.2.20
                        cluster Id | leofs_1
                             DC Id | dc_1
                    Total replicas | 1
          number of successes of R | 1
          number of successes of W | 1
          number of successes of D | 1
 number of rack-awareness replicas | 0
                         ring size | 2^128
-----------------------------------+----------
 Multi DC replication settings
-----------------------------------+----------
        max number of joinable DCs | 2
           number of replicas a DC | 1
-----------------------------------+----------
 Manager RING hash
-----------------------------------+----------
                 current ring-hash | 433fe365
                previous ring-hash | 433fe365
-----------------------------------+----------

 [State of Node(s)]
-------+--------------------------+--------------+----------------+----------------+----------------------------
 type  |           node           |    state     |  current ring  |   prev ring    |          updated at
-------+--------------------------+--------------+----------------+----------------+----------------------------
  S    | storage_0@127.0.0.1      | running      | 433fe365       | 433fe365       | 2016-03-28 07:51:58 +0300
  G    | gateway_0@127.0.0.1      | running      | 433fe365       | 433fe365       | 2016-03-28 07:52:09 +0300
-------+--------------------------+--------------+----------------+----------------+----------------------------


leofs-adm whereis bhps/25M.file
-------+--------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------------------
 del?  |           node           |             ring address             |    size    |   checksum   |  # of chunks   |     clock      |             when
-------+--------------------------+--------------------------------------+------------+--------------+----------------+----------------+----------------------------
       | storage_0@127.0.0.1      | f8a2a844b117d190fa01d8aecaef3875     |     25600K |   b7bbfe5965 |              2 | 52f14e08568a5  | 2016-03-28 08:06:36 +0300
s3cmd put ./25M.file s3://bhps
WARNING: Module python-magic is not available. Guessing MIME types based on file extensions.
./25M.file -> s3://bhps/25M.file  [part 1 of 2, 15MB]
 15728640 of 15728640   100% in    0s    15.95 MB/s  done
./25M.file -> s3://bhps/25M.file  [part 2 of 2, 10MB]
 10485760 of 10485760   100% in    0s    18.66 MB/s  done
curl -v -X GET http://localhost/bhps/25M.file > 25M.file.1
* Hostname was NOT found in DNS cache
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying ::1...
* connect to ::1 port 80 failed: Connection refused
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 80 (#0)
> GET /bhps/25M.file HTTP/1.1
> User-Agent: curl/7.35.0
> Host: localhost
> Accept: */*
>
< HTTP/1.1 200 OK
< connection: keep-alive
< date: Mon, 28 Mar 2016 05:08:24 GMT
< Content-Length: 26214400
* Server LeoFS is not blacklisted
< server: LeoFS
< Content-Type: application/octet-stream
< ETag: "b7bbfe5965698d52826c529d34425a1d"
< Last-Modified: Mon, 28 Mar 2016 05:06:36 GMT
<

curl -v -X GET http://localhost/bhps/25M.file > 25M.file.2
* Hostname was NOT found in DNS cache
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying ::1...
* connect to ::1 port 80 failed: Connection refused
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 80 (#0)
> GET /bhps/25M.file HTTP/1.1
> User-Agent: curl/7.35.0
> Host: localhost
> Accept: */*
>
< HTTP/1.1 200 OK
< connection: keep-alive
< date: Mon, 28 Mar 2016 05:08:51 GMT
< Content-Length: 26214400
* Server LeoFS is not blacklisted
< server: LeoFS
< Content-Type: application/octet-stream
< ETag: "b7bbfe5965698d52826c529d34425a1d"
< Last-Modified: Mon, 28 Mar 2016 05:06:36 GMT
< x-from-cache: True/via disk
<
  0 25.0M    0     0    0     0      0      0 --:--:--  0:00:04 --:--:--     0{ [data not shown]
* transfer closed with 26214400 bytes remaining to read
  0 25.0M    0     0    0     0      0      0 --:--:--  0:00:05 --:--:--     0
* Closing connection 0
curl: (18) transfer closed with 26214400 bytes remaining to read



ls -la
total 25608
drwxrwxr-x 2 usishi usishi     4096 Mar 28 08:08 .
drwxrwxr-x 3 usishi usishi     4096 Mar 28 08:06 ..
-rw-rw-r-- 1 usishi usishi 26214400 Mar 28 08:08 25M.file.1
-rw-rw-r-- 1 usishi usishi        0 Mar 28 08:08 25M.file.2

Here are the log files.
https://www.dropbox.com/sh/1zi7cp2kofgdj11/AAAtFU3kot2GnJNqZ4HUOybEa?dl=0

@windkit
Copy link
Contributor

windkit commented Mar 28, 2016

@gkyildirim Thank you for reporting the issue, I can now reproduce it on my Ubuntu 14.04 Machine, I will track it down now.

$ curl -v -X GET http://localhost:8080/test/25M.file1 > 25M.file.2
* Hostname was NOT found in DNS cache
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying ::1...
* connect to ::1 port 8080 failed: Connection refused
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET /test/25M.file1 HTTP/1.1
> User-Agent: curl/7.35.0
> Host: localhost:8080
> Accept: */*
>
< HTTP/1.1 200 OK
< connection: keep-alive
< date: Mon, 28 Mar 2016 05:52:21 GMT
< Content-Length: 26214400
* Server LeoFS is not blacklisted
< server: LeoFS
< Content-Type: application/octet-stream
< ETag: "f4e4750fbe4ad8dbfb0e7f4d6a83ef56"
< Last-Modified: Mon, 28 Mar 2016 05:52:17 GMT
<
{ [data not shown]
100 25.0M  100 25.0M    0     0   253M      0 --:--:-- --:--:-- --:--:--  255M
* Connection #0 to host localhost left intact

$ curl -v -X GET http://localhost:8080/test/25M.file1 > 25M.file.3
* Hostname was NOT found in DNS cache
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying ::1...
* connect to ::1 port 8080 failed: Connection refused
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET /test/25M.file1 HTTP/1.1
> User-Agent: curl/7.35.0
> Host: localhost:8080
> Accept: */*
>
< HTTP/1.1 200 OK
< connection: keep-alive
< date: Mon, 28 Mar 2016 05:52:29 GMT
< Content-Length: 26214400
* Server LeoFS is not blacklisted
< server: LeoFS
< Content-Type: application/octet-stream
< ETag: "f4e4750fbe4ad8dbfb0e7f4d6a83ef56"
< Last-Modified: Mon, 28 Mar 2016 05:52:17 GMT
< x-from-cache: True/via disk
<
  0 25.0M    0     0    0     0      0      0 --:--:--  0:00:04 --:--:--     0{ [data not shown]
  0 25.0M    0     0    0     0      0      0 --:--:--  0:00:05 --:--:--     0* transfer closed with 26214400 bytes remaining to read
  0 25.0M    0     0    0     0      0      0 --:--:--  0:00:05 --:--:--     0
* Closing connection 0
curl: (18) transfer closed with 26214400 bytes remaining to read

@windkit
Copy link
Contributor

windkit commented Mar 28, 2016

This is because the body function for reading from disk cache is incorrect
https://github.com/leo-project/leo_gateway/blob/develop/src/leo_gateway_http_commons.erl#L336

...
file:sendfile(CacheObj#cache.file_path, Socket, 0, 0, [{chunk_size, SendChunkLen}]),
...

file:sendfile/5 expects a raw file FD instead of file name in file:sendfile/2
(This is a bug when introducing chunked_send)

@yosukehara
Copy link
Member

@gkyildirim I've also reproduce this on my environment. If you don't want to face this situation, you can modify disk cache configuration@leo_gateway for now as below:

# https://github.com/leo-project/leo_gateway/blob/develop/priv/leo_gateway.conf#L138
cache.cache_disc_capacity = 0

We're going to fix this issue soon.

@windkit
Copy link
Contributor

windkit commented Mar 28, 2016

I have created a PR for this
leo-project/leo_gateway#37

And I don't see the issue with the fix

$ curl -v -X GET http://localhost:8080/test/25M.file > 25.file.1
* Hostname was NOT found in DNS cache
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying ::1...
* connect to ::1 port 8080 failed: Connection refused
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET /test/25M.file HTTP/1.1
> User-Agent: curl/7.35.0
> Host: localhost:8080
> Accept: */*
>
< HTTP/1.1 200 OK
< connection: keep-alive
< date: Mon, 28 Mar 2016 06:35:32 GMT
< Content-Length: 26214400
* Server LeoFS is not blacklisted
< server: LeoFS
< Content-Type: application/octet-stream
< ETag: "f4e4750fbe4ad8dbfb0e7f4d6a83ef56"
< Last-Modified: Mon, 28 Mar 2016 06:35:29 GMT
<
{ [data not shown]
100 25.0M  100 25.0M    0     0   353M      0 --:--:-- --:--:-- --:--:--  357M
* Connection #0 to host localhost left intact
$ curl -v -X GET http://localhost:8080/test/25M.file > 25.file.2
* Hostname was NOT found in DNS cache
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying ::1...
* connect to ::1 port 8080 failed: Connection refused
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET /test/25M.file HTTP/1.1
> User-Agent: curl/7.35.0
> Host: localhost:8080
> Accept: */*
>
< HTTP/1.1 200 OK
< connection: keep-alive
< date: Mon, 28 Mar 2016 06:35:32 GMT
< Content-Length: 26214400
* Server LeoFS is not blacklisted
< server: LeoFS
< Content-Type: application/octet-stream
< ETag: "f4e4750fbe4ad8dbfb0e7f4d6a83ef56"
< Last-Modified: Mon, 28 Mar 2016 06:35:29 GMT
< x-from-cache: True/via disk
<
{ [data not shown]
100 25.0M  100 25.0M    0     0   626M      0 --:--:-- --:--:-- --:--:--  641M
* Connection #0 to host localhost left intact

@yosukehara
Copy link
Member

Note: LeoFS v1.2.18 and v1.2.20 is adversely affected by this bug.

@yosukehara yosukehara added this to the 1.2.21 milestone Mar 28, 2016
@yosukehara
Copy link
Member

@windkit Thanks, I'll check your request out now.

@yosukehara
Copy link
Member

In order to check this situation, we're going to add some test cases on leofs_client_test and leofs_test.

@yosukehara
Copy link
Member

We've checked this issue with some integration-tests and stress-tests, and confirmed this bug was fixed.

@yosukehara
Copy link
Member

@gkyildirim We fixed this issue, then LeoFS v1.2.21 was released yesterday.
You'll be not able to face this issue with v1.2.21. So I've closed this. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants