This repository used to contain an R wrapper for an old version of dat
. Meanwhile dat has changed a lot so this no longer works.
Software is in alpha stage. Not yet ready for use with real world data
The rdat
package provides an R wrapper to the Dat project. Dat (git
for data) is a framework for data versioning, replication and synchronisation, see dat-data.com.
Prerequisites: Instructions below require R, git and nodejs (npm).
Install the latest stable version from npm:
sudo npm install -g dat
See instructions for more details.
If you have not already installed dat
grab it from github:
git clone https://github.com/maxogden/dat ~/dat
cd ~/dat
npm install .
sudo npm link
To update an existing copy of dat
cd ~/dat
git pull
rm -Rf node_modules
npm install .
Then install the R package:
library(devtools)
install_github("ropensci/rdat")
Run through the examples to verify that everything works:
library(rdat)
example(dat)
This api is experimental and hasn't been finalized or implemented. Stay tuned for updates
When no remote
is specified, dat()
will init a new repository:
repo <- dat("cars", path = getwd())
Inserts data from a data frame and gets the dat version key
# insert some data
repo$insert(cars[1:20,])
v1 <- repo$status()$version
v1
Inserts more data, get a new version key
# insert more data
repo$insert(cars[21:25,])
v2 <- repo$status()$version
v2
Retreive particular versions of the dataset from the key.
data1 <- repo$get(v1)
data2 <- repo$get(v2)
List changes in between versions
diff <- repo$diff(v1, v2)
diff$key
Fork a dataset from a particular version into a new branch.
# create fork
repo$checkout(v1)
repo$insert(cars[40:42,])
repo$forks()
v3 <- repo$status()$version
Checkout the data at a particular version.
# go back to v2
repo$checkout(v2)
repo$get()
Save binary data (files) as attachements to the dataset.
# store binary attachements
repo$write(serialize(iris, NULL), "iris")
unserialize(repo$read("iris"))
# Create another repo
dir.create(newdir <- tempfile())
repo2 <- dat("cars", path = newdir, remote = repo$path())
repo2$forks()
repo2$get()
Specifying a remote
(path or url) to clone an existing repo. In this case we clone the previous repo into a new location.
Lets make yet another clone of our original repository
# Create a third repo
dir.create(newdir <- tempfile())
repo3 <- dat("cars", path = newdir, remote = repo$path())
Add data in repo2 and then push
it back to repo1.
# Add some data and push to origin
repo2$insert(cars[31:40,])
repo2$push()
Then pull
data back into repo3.
# sync data with origin
repo3$pull()
# Verify that repositories are in sync
mydata2 <- repo2$get()
mydata3 <- repo3$get()
all.equal(mydata2, mydata3)