Skip to content

Latest commit

 

History

History
145 lines (103 loc) · 4.89 KB

README.md

File metadata and controls

145 lines (103 loc) · 4.89 KB

yalinka - Erlang interface to k-dimensional trees

== The project has been relocated to a new home ==

Overview

This is an Erlang library for operating on k-d trees, implemented as a set of erlang NIF (native implemented functions).

For general information about k-d trees please see wikipedia article.

In short, k-d tree object is a set of points in k-dimensional space, organized (indexed) in memory so that the search operation for nearest point next to a given would be as fast as possible. This library is specially optimized for querying large dataset (millions of points).

Requirements

  • Erlang R15 or later (last tested on R20)

  • GCC (tested on 6.3.0) or clang (tested on 3.8.1)

  • libc with -lm (should be available on any linux distribution by default)

  • GNU make

    The C sources are compiled using options --std=C99 --Wall --Wextra --pedantic so I believe it should be easy to adopt those sources for other environments.

Limitations

  • Point coordinates are limited by IEEE 754 double-precission (64bit) boundaries.
  • Node index is limited to uint64_t (erlang positive integer up to 2^64-1

Installation

Just 'cd' to project directory and type 'make'. This should build the erlang application in 'ebin' directory and shared object file in 'priv/lib'.

Then set the 'DESTDIR' environment variable and type 'make install', like this:

cd yalinka
make
DESTDIR=/usr/lib/erlang/lib make install

Don't forget to use 'gmake' if you're *bsd follower.

It doesn't need to be installed if you want to play with it first,

$ cd yalinka && make && erl -pa ebin
1> yalinka:new([{0, 1.1,2.2}]).
{ok,#Ref<0.1550450785.8781826.35212>}

Configuration

Currently there are no any configuration options.

Tests

Just to ensure everything's work as expected, try make test

Documentation

Autogenerated module documentation available online here.

Usage

The usage is actually as easy as

{ok, Reference} = yalinka:new([{0, 0.0,0.0,0.0}, {1, 1.0,1.0,1.0}]),
{ok, [{Idx, Distance}]} = yalinka:search(Reference, {0.7,0.7,0.7}, 1),

...here we're created a tree object with two points, index 0 with location (0,0,0) and index 1 with location (1,1,1). Then we ask yalinka to find the nearest point to the specified point (0.7,0.7,0.7). As a result, we should get a response pointing to the index of nearest point (1) and the relative distance (0.5).

erl -pa ebin
Erlang/OTP 20 [erts-9.3.1] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:10] [hipe] [kernel-poll:false]

Eshell V9.3.1  (abort with ^G)
1> {ok, Reference} = yalinka:new([{0, 0.0,0.0,0.0}, {1, 1.0,1.0,1.0}]),
1> {ok, [{Idx, Distance}]} = yalinka:search(Reference, {0.7,0.7,0.7}, 1).
{ok,[{1,0.5196152422706632}]}
2>

more details

First, we need to create a tree object. This can be done by calling yalinka:new/1 with the only one parameter - a list of node specification, in form {Idx, {X, Y, Z}}, where Idx is limited to positive integer and X, Y, Z is a float(). This call will swallow all the nodes, then build a tree in internal structure and return the pointer to that structure.

At this point we can store created object with yalinka:store/2 if we need to reuse it later and we want it fast. Function yalinka:load/1 was made exactly for this purpose.

Next, we can query the engine to find the nearest node in the given tree to the given point. This is done by calling yalinka:search(Tree, {X, Y, Z}, Count), where Tree is the reference returned by yalinka:new/1 or yalinka:load/1, Count is the number of first nearest Count nodes to find and {X, Y, Z} is a point specification - a tuple with floats inside. The function call returns {ok, List}, where List is the list of tuples {Idx, Distance}. Idx is an index from node specification while the Distance is the distance between specified point and found node.

For extremely large databases it is strongly advised to split data preparation and usage stages; at the first stage the data prepared should be stored to a file using yalinka:store/2 and then, at the second stage, loaded by yalinka:load/1 into memory at the start of your application:

1> {ok, R} = yalinka:load("/home/fisher/erl/strikead/geoid/test/db/xperian").
{ok, _}
2> yalinka:size(R).
{ok,34244707}

On my local notebook it takes about 28 seconds to load 34 million points in 3d space. The beam process suddenly grows by 1.5 Gb in residential set memory =)

Naming issues

Since 'ekdtree' project already exists here on a github I choose 'yalinka' as a name for this project. Feel free to de-cypher it using 'yet another something' pattern, but actually 'yalinka' is a ukrainian word meaning 'christmas tree'. And now it is X-mas here in real world =)