Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create cid #21

Merged
merged 29 commits into from
Jan 30, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
1fb3965
require jason and @simonlabs version of ex_cid #11 #7
RobStallion Jan 28, 2019
1e4e2aa
create same CID values as IPFS. works with string, map and struct #7 #11
RobStallion Jan 28, 2019
74f3cf4
rename file, remove unused code, add docs and doc tests #7 #11
RobStallion Jan 28, 2019
b17d84b
adds tests #7 #11
RobStallion Jan 28, 2019
0e4f639
update name of application #22
RobStallion Jan 28, 2019
011b284
rename module to CID #22
RobStallion Jan 28, 2019
597b817
remove cid and and base 58 module #11 #7
RobStallion Jan 28, 2019
3b96a0b
update create cid function so it no longer uses ex_cid module
RobStallion Jan 28, 2019
2412717
rename files as per comment #22
RobStallion Jan 28, 2019
d2a7671
update mix lock
RobStallion Jan 28, 2019
1ae6128
add .travis.yml file to run tests on branch https://github.com/dwyl/c…
nelsonic Jan 29, 2019
8a21996
update documentation on functions #11
RobStallion Jan 29, 2019
ce987d2
adds examples for invalid data types #11
RobStallion Jan 29, 2019
3369a12
test more complex data and show that different (but similar) data ret…
RobStallion Jan 29, 2019
1c142e2
adds tests for empty strings and maps
RobStallion Jan 29, 2019
74c58d2
update documentation to include more info and relevant links
RobStallion Jan 29, 2019
ec620ec
adds code coverage
RobStallion Jan 29, 2019
aa4f2c6
adds a spec for the cid function #11
RobStallion Jan 29, 2019
d15e914
improve docs to explain what a multihash is
RobStallion Jan 29, 2019
3ec0d4d
fix poor grammar
RobStallion Jan 29, 2019
53ccfaf
update travis yml with ipfs install script
RobStallion Jan 29, 2019
58e1c09
add sudo to install command #24
RobStallion Jan 29, 2019
89f3806
cd back out over go-ipfs dir #24
RobStallion Jan 29, 2019
8f97c9a
update travis yml so it runs all tests 24
RobStallion Jan 29, 2019
89dbdc1
adds property tests #24
RobStallion Jan 29, 2019
217d684
remove new line addition. text editors were adding new line, not IPFS…
RobStallion Jan 29, 2019
fc2409c
runs ipfs init on travis, #24
Danwhy Jan 29, 2019
b287876
adds property tests for random maps, #24
Danwhy Jan 29, 2019
0325986
adds docs for property based tests, #24
Danwhy Jan 29, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
language: elixir
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 👍 👍 👍 👍 👍 👍 👍 👍 👍 👍 👍 👍 👍 👍 👍 👍 👍 👍 👍 👍 👍 👍 👍

Good addition @nelsonic

elixir:
- 1.8
env:
- MIX_ENV=test
before_install:
- curl https://dist.ipfs.io/go-ipfs/v0.4.18/go-ipfs_v0.4.18_linux-386.tar.gz --output go-ipfs.tar.gz
- tar xvfz go-ipfs.tar.gz
- cd go-ipfs
- sudo ./install.sh
- cd ..
- ipfs init
script:
- mix all_tests
after_success:
- bash <(curl -s https://codecov.io/bash)
22 changes: 15 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -237,9 +237,9 @@ We can then create a URLs table
in our URL shortening app/service
with the following entry:

| `inserted_at ` | **`URL`** (PK) | `cid` | `short` |
| ----------- | ----------- | ----------- | ----------- |
| 1541609554 | https://github.com/dwyl/phoenix-ecto-append-only-log-example | gVSTedHFGBetxyYib9mBQsjtZj4dJjQe | gV |
| `inserted_at ` | **`URL`** (PK) | `cid` | `short` |
| -------------- | ------------------------------------------------------------ | -------------------------------- | ------- |
| 1541609554 | https://github.com/dwyl/phoenix-ecto-append-only-log-example | gVSTedHFGBetxyYib9mBQsjtZj4dJjQe | gV |

So the "short" url would be
[dwyl.co/gV](https://github.com/dwyl/phoenix-ecto-append-only-log-example)
Expand Down Expand Up @@ -273,6 +273,14 @@ be found at [https://hexdocs.pm/rid](https://hexdocs.pm/rid)

-->

## Tests

The tests for this module are a combination of doctests, unit tests and property based tests.

To run the property based tests you will need an installation of [IPFS](https://ipfs.io/).
See https://github.com/dwyl/learn-ipfs#how for details.

Then you can run `mix all_tests`, which will run the `Cid.cid` function on 100 randomly generated strings and maps, comparing the results of these to the IPFS generated cid, ensuring our function is correct in its implementation.

# Research, Background & Relevant Reading
+ Real World examples of services that use Strings as IDs instead of Integers. [Real World Examples](https://github.com/dwyl/cid/blob/master/read_world_examples.md)
Expand Down Expand Up @@ -355,10 +363,10 @@ be **familiar** to people_)

**`prev: previous_cid`** address _example_:

| `inserted ` | **`cid`**(PK)<sup>1</sup> | **`name`** | **`address`** | **`prev`** |
| ----------- | ----------- | ----------- | ----------- |-----|
| 1541609554 | **gVSTedHFGBetxy** | Bruce Wane | 1007 Mountain Drive, Gotham | null |
| 1541618643 | smnELuCmEaX42 | Bruce Wane | [Rua Goncalo Afonso, Vila Madalena, Sao Paulo, 05436-100, Brazil](https://www.tripadvisor.co.uk/ShowUserReviews-g303631-d2349935-r341872180-Batman_Alley-Sao_Paulo_State_of_Sao_Paulo.html "Batman Alley ;-)") | **gVSTedHFGBetxy** |
| `inserted ` | **`cid`**(PK)<sup>1</sup> | **`name`** | **`address`** | **`prev`** |
| ----------- | ------------------------- | ---------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------ |
| 1541609554 | **gVSTedHFGBetxy** | Bruce Wane | 1007 Mountain Drive, Gotham | null |
| 1541618643 | smnELuCmEaX42 | Bruce Wane | [Rua Goncalo Afonso, Vila Madalena, Sao Paulo, 05436-100, Brazil](https://www.tripadvisor.co.uk/ShowUserReviews-g303631-d2349935-r341872180-Batman_Alley-Sao_Paulo_State_of_Sao_Paulo.html "Batman Alley ;-)") | **gVSTedHFGBetxy** |

When a row does _not_ have a **`prev`** value then we know it is the _first_
time that content has been inserted into the database. When a **`prev`** value
Expand Down
113 changes: 91 additions & 22 deletions lib/cid.ex
Original file line number Diff line number Diff line change
@@ -1,38 +1,107 @@
defmodule Cid do
@moduledoc """
Returns a SHA512 transformed to Base64, remove ambiguous chars then sub-string
Provides a way for a user to turn a String, Map or Struct into a CID that
is identical to one that will be returned from IPFS if the same data is
added.

Currently only produces a default v1 CID.
Currently only uses the "raw" codec
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a little more info on what a v1 CID or "raw" codec is. Just a link would probably be fine, even if it's to elsewhere in this repo.

Data provided must be under 256Kb in order for the CID to match the one
returned by IPFS

For more info on CIDs and IPFS see the following...
https://ipfs.io/
https://pascalprecht.github.io/posts/content-identifiers-in-ipfs/
https://github.com/dwyl/learn-ipfs/issues
"""

@doc """
make/2 create a SHA512 hash from the given input and return the require length
note: we remove "ambiguous" characters so _humans_ can type the hash without
getting "confused" this might not be required, but is to match the original
"Hits" implementation.
Returns a CID that identical to one returned by IPFS if given the same data.
Can take a String, Map or Struct as an argument.

## Examples

## Parameters
iex> Cid.cid("hello")
"zb2rhZfjRh2FHHB2RkHVEvL2vJnCTcu7kwRqgVsf9gpkLgteo"

- input: String the string to be hashed.
- length: Number the length of string required
iex> Cid.cid(%{key: "value"})
"zb2rhn1C6ZDoX6rdgiqkqsaeK7RPKTBgEi8scchkf3xdsi8Bj"

Returns String hash of desired length.
iex> Cid.cid(1234)
"invalid data type"

iex> Cid.cid([1,2,3,"four"])
"invalid data type"
"""
def make(input) when is_map(input) do
input |> stringify_map_values |> make
@spec cid(String.t | map() | struct()) :: String.t
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

def cid(value) do
value
|> create_multihash()
|> create_cid()
end

# create_multihash returns a multihash. A multihash is a self describing hash.
# for more info on multihashes see this blog post...
# https://pascalprecht.github.io/posts/future-proofed-hashes-with-multihash/
# if create_multihash is called with a struct, the struct is converted into a
# map and then create_multihash is called again
# The %_{} syntax works like regular pattern matching. The underscore, _,
# simply matches any Struct/Module name.
defp create_multihash(%_{} = struct) do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A brief explanation of how this pattern matching works would be useful for beginners

struct
|> Map.from_struct()
|> create_multihash()
end

def make(input, length \\ 32) do
hash1 = :crypto.hash(:sha512, input)
{:ok, <<_multihash_code, _length, hash2::binary>>} = Multihash.encode(:sha2_512, hash1)
# if create_multihash is called with a map the map is converted into a JSON
# string and then create_multihash is called again
defp create_multihash(map) when is_map(map) do
map
|> Jason.encode!()
|> create_multihash()
end

# if create_multihash is called with a string then the string is converted
# into a multihash. This uses the erlang crypto hash function. For more
# infomation on using # erlang functions in elixir see...
# https://stackoverflow.com/questions/35283888/how-to-call-an-erlang-function-in-elixir
defp create_multihash(str) when is_binary(str) do
digest = :crypto.hash(:sha256, str)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explaining in the comments that this is an erlang function would be good for beginners

{:ok, multihash} = Multihash.encode(:sha2_256, digest)

hash2
|> Base.encode64()
|> String.replace(~r/[Il0oO=\/\+]/, "", global: true)
|> String.slice(0..(length - 1))
multihash
end

def stringify_map_values(input_map) do
Enum.sort(Map.keys(input_map)) # sort map keys for consistent ordering
|> Enum.map(fn (x) -> Map.get(input_map, x) end)
|> Enum.join("")
# if create_multihash is called something that is not a string, map or struct
# then it returns an error.
defp create_multihash(_), do: {:error, "invalid data type"}

# if an error is passed in return error message
defp create_cid({:error, msg}), do: msg

# takes a multihash and retuns a CID
# B58.encode58 takes the binary returned from create_cid_suffix and converts
# it into a base58 string. For more info on base58 strings see
# https://en.wikipedia.org/wiki/Base58
defp create_cid(multihash) when is_binary(multihash) do
multihash
|> create_cid_suffix()
|> B58.encode58()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could do with a brief explanation of B58

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just added this in one of the latest commits. Thanks for pointing that out

|> add_multibase_prefix()
end

# takes a multihash and returns the suffix
# currently version is hardcoded to 1
# (currenly IPFS only have 2 versions, 0 or 1. O is deprecated)
# and multicodec-packed-content-type is hardcoded to "raw" ("U" == <<85>>)
# more info on multicodec can be found https://github.com/multiformats/multicodec
# <version><multicodec-packed-content-type><multihash>
# the syntax on this line is concatenating strings and binary values together.
# Strings in elixir are binaries and that is how this works. Learn more here...
# https://elixir-lang.org/getting-started/binaries-strings-and-char-lists.html
defp create_cid_suffix(multihash), do: <<1>> <> "U" <> multihash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small comment on the binary syntax might be useful to people unfamiliar with it


# adds the multibase prefix (multibase-prefix) to the suffix (<version><mc><mh>)
# for more info on multibase, see https://github.com/multiformats/multibase
defp add_multibase_prefix(suffix), do: "z" <> suffix
end
22 changes: 18 additions & 4 deletions mix.exs
Original file line number Diff line number Diff line change
@@ -1,13 +1,16 @@
defmodule Rid.MixProject do
defmodule Cid.MixProject do
use Mix.Project

def project do
[
app: :rid,
app: :cid,
version: "0.1.0",
elixir: "~> 1.7",
start_permanent: Mix.env() == :prod,
deps: deps()
deps: deps(),
aliases: aliases(),
test_coverage: [tool: ExCoveralls],
preferred_cli_env: [coveralls: :test, "coveralls.detail": :test, "coveralls.post": :test, "coveralls.html": :test, all_tests: :test]
]
end

Expand All @@ -21,7 +24,18 @@ defmodule Rid.MixProject do
# Run "mix help deps" to learn about dependencies.
defp deps do
[
{:ex_multihash, "~> 2.0"}
{:ex_multihash, "~> 2.0"},
{:jason, "~> 1.1"},
{:basefiftyeight, "~> 0.1.0"}, # Currenly building our own version of this here https://git.io/fhPaK. Can replace when it is ready
{:excoveralls, "~> 0.10", only: :test},
{:stream_data, "~> 0.4.2", only: :test}
]
end

defp aliases do
[
test: ["coveralls --exclude ipfs"],
all_tests: ["coveralls.detail --include ipfs"]
]
end
end
12 changes: 12 additions & 0 deletions mix.lock
Original file line number Diff line number Diff line change
@@ -1,3 +1,15 @@
%{
"basefiftyeight": {:hex, :basefiftyeight, "0.1.0", "3d48544743bf9aab7ab02aed803ac42af77acf268c7d8c71d4f39e7fa85ee8d3", [:mix], [], "hexpm"},
"certifi": {:hex, :certifi, "2.4.2", "75424ff0f3baaccfd34b1214184b6ef616d89e420b258bb0a5ea7d7bc628f7f0", [:rebar3], [{:parse_trans, "~>3.3", [hex: :parse_trans, repo: "hexpm", optional: false]}], "hexpm"},
"ex_multihash": {:hex, :ex_multihash, "2.0.0", "7fb36f842a2ec1c6bbba550f28fcd16d3c62981781b9466c9c1975c43d7db43c", [:mix], [], "hexpm"},
"excoveralls": {:hex, :excoveralls, "0.10.4", "b86230f0978bbc630c139af5066af7cd74fd16536f71bc047d1037091f9f63a9", [:mix], [{:hackney, "~> 1.13", [hex: :hackney, repo: "hexpm", optional: false]}, {:jason, "~> 1.0", [hex: :jason, repo: "hexpm", optional: false]}], "hexpm"},
"hackney": {:hex, :hackney, "1.15.0", "287a5d2304d516f63e56c469511c42b016423bcb167e61b611f6bad47e3ca60e", [:rebar3], [{:certifi, "2.4.2", [hex: :certifi, repo: "hexpm", optional: false]}, {:idna, "6.0.0", [hex: :idna, repo: "hexpm", optional: false]}, {:metrics, "1.0.1", [hex: :metrics, repo: "hexpm", optional: false]}, {:mimerl, "1.0.2", [hex: :mimerl, repo: "hexpm", optional: false]}, {:ssl_verify_fun, "1.1.4", [hex: :ssl_verify_fun, repo: "hexpm", optional: false]}], "hexpm"},
"idna": {:hex, :idna, "6.0.0", "689c46cbcdf3524c44d5f3dde8001f364cd7608a99556d8fbd8239a5798d4c10", [:rebar3], [{:unicode_util_compat, "0.4.1", [hex: :unicode_util_compat, repo: "hexpm", optional: false]}], "hexpm"},
"jason": {:hex, :jason, "1.1.2", "b03dedea67a99223a2eaf9f1264ce37154564de899fd3d8b9a21b1a6fd64afe7", [:mix], [{:decimal, "~> 1.0", [hex: :decimal, repo: "hexpm", optional: true]}], "hexpm"},
"metrics": {:hex, :metrics, "1.0.1", "25f094dea2cda98213cecc3aeff09e940299d950904393b2a29d191c346a8486", [:rebar3], [], "hexpm"},
"mimerl": {:hex, :mimerl, "1.0.2", "993f9b0e084083405ed8252b99460c4f0563e41729ab42d9074fd5e52439be88", [:rebar3], [], "hexpm"},
"parse_trans": {:hex, :parse_trans, "3.3.0", "09765507a3c7590a784615cfd421d101aec25098d50b89d7aa1d66646bc571c1", [:rebar3], [], "hexpm"},
"ssl_verify_fun": {:hex, :ssl_verify_fun, "1.1.4", "f0eafff810d2041e93f915ef59899c923f4568f4585904d010387ed74988e77b", [:make, :mix, :rebar3], [], "hexpm"},
"stream_data": {:hex, :stream_data, "0.4.2", "fa86b78c88ec4eaa482c0891350fcc23f19a79059a687760ddcf8680aac2799b", [:mix], [], "hexpm"},
"unicode_util_compat": {:hex, :unicode_util_compat, "0.4.1", "d869e4c68901dd9531385bb0c8c40444ebf624e60b6962d95952775cac5e90cd", [:rebar3], [], "hexpm"},
}
115 changes: 108 additions & 7 deletions test/cid_test.exs
Original file line number Diff line number Diff line change
@@ -1,18 +1,119 @@
defmodule DummyStruct do
defstruct [:name, :username, :age]
end

defmodule CidTest do
use ExUnit.Case
use ExUnitProperties

doctest Cid

test "Creates a deterministic Content ID from Elixir String" do
assert Cid.make("Elixir") == "NSqJspBr2u1F6z1DhcR2cnQAxLdQZBLk"
defstruct [:a]
@filename "random.txt"
@ipfs_args ["add", @filename, "-n", "--cid-version=1"]
@dummy_map %{
name: "Batman",
username: "The Batman",
age: 80
}

describe "Testing ex_cid cid function" do
test "returns the same CID as IPFS when given a string" do
assert "zb2rhhnbH6zTaAj948YVsYxW4c5AY6TfJURC9EGhQum3Kq7b3" == Cid.cid("Hello World")
end

test "returns the same CID as IPFS when given a map" do
assert "zb2rhdeaHh2UHghBcwxeFP1GRUYETDH96DkV6oppiz5Gk1xGN" == Cid.cid(%{a: "a"})
end

test "returns the same CID as IPFS when given a struct" do
assert "zb2rhdeaHh2UHghBcwxeFP1GRUYETDH96DkV6oppiz5Gk1xGN" == Cid.cid(%__MODULE__{a: "a"})
end

test "returns an error if given invalid data type" do
assert Cid.cid(2) == "invalid data type"
end

test "returns the same CID regardless of order of items in map" do
map = %{
age: 80,
name: "Batman",
username: "The Batman"
}

assert Cid.cid(@dummy_map) == Cid.cid(map)
end

test "A struct with the same keys and values as a map creates the same CID" do
struct = %DummyStruct{
age: 80,
name: "Batman",
username: "The Batman"
}

assert Cid.cid(struct) == Cid.cid(@dummy_map)
end

test "returns a different CID when the value given differs (CIDs are all unique)" do
refute Cid.cid("") == Cid.cid(" ")
refute Cid.cid("\n") == Cid.cid("")
refute Cid.cid("Hello World") == Cid.cid("salve mundi")
refute Cid.cid("Hello World") == Cid.cid("hello world")
refute Cid.cid(%{a: "a"}) == Cid.cid(%{a: "b"})
refute Cid.cid(%__MODULE__{a: "a"}) == Cid.cid(%DummyStruct{})
end

test "empty values also work" do
assert Cid.cid("") == "zb2rhmy65F3REf8SZp7De11gxtECBGgUKaLdiDj7MCGCHxbDW"
assert Cid.cid(%{}) == "zb2rhbE2775XANjTsRTV9sxfFMWxrGuMWYgshDn9xvjG69fZ3"
end

# Property based tests that generate random strings and
# use them in our compare_ipfs_cid function
# Tagged to allow you to ignore these tests if you don't have ipfs installed
@tag :ipfs
property "test with 50 random strings" do
check all str <- StreamData.string(:ascii), max_runs: 50 do
compare_ipfs_cid(str)
end
end

# Property based tests that generate random maps and
# use them in our compare_ipfs_cid function
# Tagged to allow you to ignore these tests if you don't have ipfs installed
@tag :ipfs
property "test with 50 random maps" do
check all map <- random_map(), max_runs: 50 do
map
|> Jason.encode!()
|> compare_ipfs_cid()
end
end
end

test "Create a CID from a Map" do
map = %{cat: "Meow", dog: "Woof", fox: "What Does The Fox Say?"}
assert Cid.make(map) == "GdrVnsLSdxRphXgQgNsmq1FDyRXAySXT"
# Calls IPFS `add` function to generate cid
# then compares result to result of our `Cid.cid` function
# see: https://docs.ipfs.io/introduction/usage/
def compare_ipfs_cid(val) do
File.write(@filename, val)

{added_val, 0} = System.cmd("ipfs", @ipfs_args)

<<"added ", cid::bytes-size(49), _::binary>> = added_val

assert cid == Cid.cid(val)

File.rm!(@filename)
end

test "Cid.make(\"hello world\")" do
assert Cid.make("hello world") == "MJ7MSJwS1utMxA9QyQLytNDtd5RGnx6m"
def random_map do
keys = StreamData.atom(:alphanumeric)
values = StreamData.one_of([random_value(), StreamData.list_of(random_value())])

StreamData.map_of(keys, values)
end

def random_value do
StreamData.one_of([StreamData.string(:ascii), StreamData.integer()])
end
end