Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

signer: [feature] Implement streaming signature support in minio-go #607

Closed
xxorde opened this issue Feb 7, 2017 · 14 comments
Closed

signer: [feature] Implement streaming signature support in minio-go #607

xxorde opened this issue Feb 7, 2017 · 14 comments

Comments

@xxorde
Copy link

xxorde commented Feb 7, 2017

If you want to write a data stream to an S3 backend with PutObject() the data is stored in a memory buffer until the stream is finish. After the stream is finished Minio will start to write it to the backend.

This has two main disadvantages

  1. The whole process takes long because Minio does not start to write to the backend before it has put the full stream to memory
  2. Minio consumes as much memory as the data you want to store, it is not possible to store streams bigger than the available memory

@harshavardhana told me that a fix is already work in progress, but I like to open the issue to track the progress and to know when I can test it :)

@hexadecy
Copy link
Contributor

hexadecy commented Feb 7, 2017

For 2. I was thinking that when the size is bigger than 5MiB the data is splitted in parts.

@xxorde
Copy link
Author

xxorde commented Feb 7, 2017

I believe its 64MB, but Minio needs to know the full size of the data before it starts to upload.
So if it can not determine the full size it writes everything in a buffer til EOF and than starts.

@harshavardhana harshavardhana changed the title Start transmitting data whlie streaming Start transmitting data while streaming Mar 10, 2017
@harshavardhana harshavardhana changed the title Start transmitting data while streaming Implement streaming signature support in minio-go Mar 10, 2017
@harshavardhana harshavardhana changed the title Implement streaming signature support in minio-go signer: [feature] Implement streaming signature support in minio-go Mar 10, 2017
@xxorde
Copy link
Author

xxorde commented Mar 14, 2017

Hi,

are there any news here, is it still work in progress?
Or will it be done later because it was tagged "enhancement"?

Kind regards.

@krisis
Copy link
Member

krisis commented Mar 14, 2017

@xxorde I am working on this. #609 is work-in-progress. The implementation in the PR can be improved to consume lesser memory while computing chunk signature. I will be getting back to this next.

@xxorde
Copy link
Author

xxorde commented Mar 14, 2017

@krisis thank you for your fast response. I am really looking forward to use minio for handling my backup streams. :)

@xxorde
Copy link
Author

xxorde commented Apr 10, 2017

@krisis I do not want to annoy you but do you have any idea when you are able to fix this?
If you are not able do get back to this issue please let me know, so I can try to find another solution.

Update: I did not noticed that you are working on #609 again. I checked out your feat/streaming-sig from Feb. 02. By the way your changes there help me a lot, thanks!

@krisis
Copy link
Member

krisis commented Apr 11, 2017

@xxorde Apolgoies for not sharing updates on where this PR is. I have not had time to commit fully on this feature, I am making changes when I find time. I shall update this PR by when we expect to complete this feature in a day or two. Thanks for being patient.

@deekoder deekoder modified the milestones: Current, Future Apr 11, 2017
@krisis
Copy link
Member

krisis commented Apr 13, 2017

@xxorde I have made some more changes to the PR. The memory consumption we discussed on slack might be due to md5 computations performed for multipart upload. Could you share a snippet of your code which uses this streaming signature implementation? It would help me understand where the memory usage pattern.

@xxorde
Copy link
Author

xxorde commented Apr 13, 2017

@krisis yes of cause, here is an example.

package main

import (
	"flag"
	"log"
	"os"

	minio "github.com/minio/minio-go"
)

func main() {
	var object, bucket, accessKey, secretKey string

	location := "us-east-1"

	flag.StringVar(&bucket, "b", "stream-test", "Bucket name")
	flag.StringVar(&object, "o", "stream-test", "Object key name")
	flag.StringVar(&accessKey, "a", "", "accessKey")
	flag.StringVar(&secretKey, "s", "", "secretKey")
	flag.Parse()

	client, err := minio.New("127.0.0.1:9000", accessKey, secretKey, false)

	// Test if bucket is there
	exists, _ := client.BucketExists(bucket)
	if !exists {
		// Try to create bucket
		err = client.MakeBucket(bucket, location)

	}

	n, err := client.PutObject(bucket, object, os.Stdin, "stream")
	if err != nil {
		log.Fatal(err)
		return
	}
	log.Printf("Written %d bytes to %s in bucket %s.", n, object, bucket)
}

The program takes a stream on os.Stdin and writes it to an object in a bucket.

The following command generates a 5GB stream of data and writes it with the tool.

base64 /dev/urandom | head -c 5000000000 | ./minio-minexample -a accessKey -s secretKey
2017/04/13 15:18:03 Written 5000000000 bytes to stream-test in bucket stream-test.

This uses ~1.7GB of memory.

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
sosna    17462 34.3 14.1 2405788 1701148 pts/2 Sl+  15:23   0:39 ./minio-minexample -a accessKey -s secretKey

@krisis
Copy link
Member

krisis commented Apr 19, 2017

@xxorde These are the results with PR #609 and test program found here.

PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND 
15946 kp        20   0  209524  12580   5384 S  10.3  0.2   0:08.69 minio-minexampl

@xxorde
Copy link
Author

xxorde commented Apr 22, 2017

Memory consumption and speed are great!

n, err := client.PutObjectStreaming(bucket, object, os.Stdin, 5000000000)

I do not understand why the size is hard coded in your example. How do I use client.PutObjectStreaming without knowing the size of the stream?

@harshavardhana
Copy link
Member

I do not understand why the size is hard coded in your example. How do I use client.PutObjectStreaming without knowing the size of the stream?

There is a change in the API #657 - let us know how this works for you. You don't need to specify the size.

@xxorde
Copy link
Author

xxorde commented Apr 23, 2017

@harshavardhana I checked out your branch multipart-streaming and tested my little example with it.

When I use client.PutObjectStreaming(bucket, object, os.Stdin), I get
Your proposed upload size '-1' is below the minimum allowed object size '0B' for single PUT operation.

If I follow the function calls and look into PutObjectStreamingWithProgress I find

    // If size cannot be found on a stream, it is not possible
    // to upload using streaming signature.
    if size < 0 {
        return 0, ErrEntityTooSmall(size, bucketName, objectName)
    }

Am I doing it wrong or is there something missing?

@harshavardhana
Copy link
Member

Streaming signature has been implemented and published.. Will send you an example separately @xxorde . Thanks for your patience on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants