Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic support for siva files #6

Merged
merged 64 commits into from
Mar 20, 2019
Merged

Basic support for siva files #6

merged 64 commits into from
Mar 20, 2019

Conversation

jfontan
Copy link
Contributor

@jfontan jfontan commented Feb 25, 2019

In this storage each location is a siva file and can contain multiple virtual repositories. These virtual repositories are in fact remotes. They can be searched by name or by URL.

This storage model implements transactions at siva file level. When a new repository is opened in read write mode in a library with transactions enabled it creates a checkpoint file that contains the size of the siva file before the transaction. This file is used to restore the siva file to the last good known state if the process dies. On Commit the siva file is synced (index written to it) and checkpoint file closed. Close (rollback) truncates the file to the saved file size.

While the transaction is being made read only repositories can be used in the same location (siva file) but opening repositories in RW mode are blocked.

It does not support addressing repositories by its location, for example:

repo, err := lib.Init("98192f31b2f@github.com/src-d/go-borges")

It does not feel natural. There is a method to get the location and with it we can open the desired repository:

location, err := lib.Location("98192f31b2f")
repo, err := location.Init("github.com/src-d/go-borges")

Unimplemented features:

  • transactioner should be renamed and its implementation be pluggable. The current one works for a single process but we need other implementations that can lock using different methods.
  • Walk directories to find siva files. It may not be needed for borges.
  • Change go-siva and go-billy-siva to be able to use checkpoint to open repositories in the middle of a transaction. Use checkpoint data to find the correct index. (Implementation for go-siva is done and changes are being done to go-billy-siva).
  • Add a mode where a new storer is used that only returns objects owned by the virtual repository. For rooted repos.
  • Support multiple libraries. This also come with a basic support for scheduling. For example returning repositories that belong to different libraries (disks) to maximize disk usage or pack repos in the same location to improve cache efficiency.
  • Smarter iterators. Right now it uses a slice but it may not scale.
  • Bucket configuration.

Package documentation:

Package siva implements a go-borges library that uses siva files as its
storage backend.

More information about siva files: https://github.com/src-d/go-siva

Basics

In this storage each location contains a single bare git repository that can
contain the objects from several other logical repositories. These logical
repositories are stored as remotes in the configuration file. Its ID is the
remote name but it can also search its URLs. All repositories returned from a
location have the same objects and references. The function AddLocation creates
an empty location that is initialized when using Location.Init. It's
initialization consists of initializing the repository if it's not already
created and adding a remote with its name and URLs to the provided ID.

For example:

	r1, _ := library.Get("github.com/src-d/go-borges")
	println(r1.Name()) # "0168e2c7-eedc-7358-0a09-39ba833bdd54"
	r2, _ := library.Get("0168e2c7-eedc-7358-0a09-39ba833bdd54")
	println(r1.Name()) # "0168e2c7-eedc-7358-0a09-39ba833bdd54"

	loc, _ := library.AddLocation("test")
	r1, _ := loc.Init("repo1") # the first repo initializes the git repository
	r1.Commit()
	r2, := loc.Init("repos2) # the second just adds a new remote
	r2.Commit()
	loc.Has("repo1") # true
	loc.Has("repo2") # true

After use of repositories they should be closed. When the library is
transactional it can be closed with Commit (only for read write mode) or Close
(save changes or rollback). When the library is non transactional it must be
closed with Close. In both cases the repository should not be used again after
closing it. A double Close returns error.

Transactions

The storage supports transactions and has location lock on transaction when
using the same library. Transactional writes are done directly to the siva file
performing appends and a checkpoint file is created with the size of the file
before starting the transaction. This file is used to recover broken siva files
to the last known good state. Locations can be accessed in read only mode while
the repository is performing a transaction and its content remain stable.

Committing a transaction finishes the writes to the siva file, closes it and
deletes the checkpoint file. Rollback truncates the siva file to the last good
size and deletes the checkpoint file.

Only one repository can be opened in read write mode in the same location when
the library is transactional. When a second repository wants to be opened in RW
mode in the same location the library will wait a grace period for the previous
repository to close. By default is 1 minute but it can be configured when
creating the library.

For example:

	loc, _ := library.Location("foo")
	r1, _ := loc.Get("github.com/src-d/go-borges", borges.ReadOnlyMode)
	r2, _ := loc.Get("github.com/src-d/go-borges", borges.RWMode)
	r2.R().CreateTag("tag", plumbing.ZeroHash, nil)

	r1.R().Tag("tag") # not found
	r3, _ := loc.Get("github.com/src-d/go-borges", borges.ReadOnlyMode)
	r3.R().Tag("tag") # not found

	# errors after configured timeout as r2 transaction is not completed
	r4, _ := loc.Get("github.com/src-d/go-borges", borges.RWMode)

	r2.Commit()
	r1.R().Tag("tag") # not found
	r5, _ := loc.Get("github.com/src-d/go-borges", borges.ReadOnlyMode)
	r5.R().Tag("tag") # found

Note: When using repositories in non transactional mode you should call Close
after finishing, otherwise the siva file will be corrupted.

jfontan and others added 30 commits February 12, 2019 15:22
Signed-off-by: Javi Fontan <jfontan@gmail.com>
Signed-off-by: Javi Fontan <jfontan@gmail.com>
Signed-off-by: Javi Fontan <jfontan@gmail.com>
Signed-off-by: Manuel Carmona <manu.carmona90@gmail.com>
siva: implement method Get for Library and other minor changes.
Signed-off-by: Javi Fontan <jfontan@gmail.com>
If a *.siva.checkpoint is found it tries to fix the siva file truncating
it to the size written in the checkpoint.

Signed-off-by: Javi Fontan <jfontan@gmail.com>
add fixing code using checkpoint files
Signed-off-by: Manuel Carmona <manu.carmona90@gmail.com>
siva: move repositoryIterator to its own file and other minor changes
Now library has a flag to set it transactional and is used with all
locations and repositories below it. Transaction is managed from the
location with Commit and Rollback methods. While a transaction is being
made no other transaction can start.

When a transaction starts a new siva filesystem is created for it and
the previous cachedFS kept. This way previously created repositories
from that location should work the same as before. It should also be
able to create new repositories.

Signed-off-by: Javi Fontan <jfontan@gmail.com>
Signed-off-by: Javi Fontan <jfontan@gmail.com>
Signed-off-by: Javi Fontan <jfontan@gmail.com>
add transactions to repositories
Signed-off-by: Manuel Carmona <manu.carmona90@gmail.com>
siva: add checkpoint as its own type.
Signed-off-by: Manuel Carmona <manu.carmona90@gmail.com>
It's cache size is hardcoded for now until a library options is
implemented.

Transactions should call startTransaction and endTransaction to make
sure that subsequent Location calls return the correct location.

Signed-off-by: Javi Fontan <jfontan@gmail.com>
Now transactions and registry cache size can be configured with
LibraryOptions struct. Also added tests for location registry.

Signed-off-by: Javi Fontan <jfontan@gmail.com>
…ransactional and write operations.

Signed-off-by: Manuel Carmona <manu.carmona90@gmail.com>
siva: location registry + transactioner
Signed-off-by: Manuel Carmona <manu.carmona90@gmail.com>
Signed-off-by: Manuel Carmona <manu.carmona90@gmail.com>
siva: manage fs caching and repositories idempotent operations
Signed-off-by: Javi Fontan <jfontan@gmail.com>
Also deleted unused Location functions

Signed-off-by: Javi Fontan <jfontan@gmail.com>
Signed-off-by: Javi Fontan <jfontan@gmail.com>
Signed-off-by: Javi Fontan <jfontan@gmail.com>
implement AddLocation in siva Library
Signed-off-by: Manuel Carmona <manu.carmona90@gmail.com>
jfontan and others added 17 commits February 22, 2019 10:55
Signed-off-by: Javi Fontan <jfontan@gmail.com>
Signed-off-by: Javi Fontan <jfontan@gmail.com>
Signed-off-by: Javi Fontan <jfontan@gmail.com>
Signed-off-by: Javi Fontan <jfontan@gmail.com>
Signed-off-by: Javi Fontan <jfontan@gmail.com>
siva testing and bug fixes
Signed-off-by: Manuel Carmona <manu.carmona90@gmail.com>
Signed-off-by: Manuel Carmona <manu.carmona90@gmail.com>
Signed-off-by: Manuel Carmona <manu.carmona90@gmail.com>
Signed-off-by: Javi Fontan <jfontan@gmail.com>
siva: return error in repositoy commit if not transactional
Signed-off-by: Javi Fontan <jfontan@gmail.com>
siva: add package documentation
Signed-off-by: Manuel Carmona <manu.carmona90@gmail.com>
@jfontan
Copy link
Contributor Author

jfontan commented Mar 5, 2019

Writing file by file as it is done now can cause slowdowns and higher number of IOPS. To improve writing to the siva files all the changes will be written locally and copied as a block when they are committed.

  • The library will continue to use checkpoint files as updating the siva file can take some time and could have errors before finishing. The checkpoint is created as soon as the repository is opened in transaction mode.
  • A temporary filesystem will be configured at library level.
  • Create a new TransactionalStorer embedding go-git transactional.Storage. This one will use a siva file as a temporal filesystem and on Commit the siva will be closed and it will be appended to the main siva file.
func NewTransactionalStorer(base billy.Filesystem, siva string) *TransactionalStorer
  • newRepository will receive a storer instead of a fs:
// newRepository creates a new siva backed Repository.
func newRepository(
	id borges.RepositoryID,
	sto storage.Storer,
	m borges.Mode,
	transactional bool,
	l *Location,
) (*Repository, error) {
...
}
  • Creation of storer will be moved to location.repository.
  • Most of the tests should be valid for these changes.

@jfontan jfontan self-assigned this Mar 5, 2019
Now the siva files are not modified in-place. Checkpoint is still used
so the final step of appending the temporary siva can be recovered.

Signed-off-by: Javi Fontan <jfontan@gmail.com>
Also do not clean cached fs on failed commit.

Signed-off-by: Javi Fontan <jfontan@gmail.com>
siva: do transactions to a tmp siva file and then append
@@ -0,0 +1,4 @@
[[constraint]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is a library we shouldn't use gopkg

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was needed before transactions in go-git was in a release. I believe we should add at least go.mod for these cases.

I'm deleting Gopkg.testing.toml and dep calls in .travis.yml.

borges "github.com/src-d/go-borges"
billy "gopkg.in/src-d/go-billy.v4"
"gopkg.in/src-d/go-billy.v4/util"
errors "gopkg.in/src-d/go-errors.v1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the useless package allias

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These aliases are now added automatically by goimports:

golang/go#28428

siva/doc.go Outdated
loc, _ := library.AddLocation("test")
r1, _ := loc.Init("repo1") # the first repo initializes the git repository
r1.Commit()
r2, := loc.Init("repos2) # the second just adds a new remote
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing quote at the end of repos2

}

// TODO: find if we have to use ".git" suffix for repository ids
func toRepoID(endpoint string) borges.RepositoryID {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

siva/location.go Outdated
}

// Get implements the borges.Location interface.
func (l *Location) Get(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be one line

// for the siva filesystem that can be later deleted with Cleanup.
func NewStorage(
base billy.Filesystem,
path string,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what for you need a path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs to know where the siva file is to open and update it.

Signed-off-by: Javi Fontan <jfontan@gmail.com>
Signed-off-by: Javi Fontan <jfontan@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants