This project is a collection of UDFs used and/or created by Lovely Systems.
<T> ArrayItemUDF(array<T> arr, int idx)
:
returns item ofarr
at positionidx
. Ifidx
is negative, the length ofarr
is added to it (so that, e.g., func(x, -1) selects the last item of x.)
int ESHashUDF(string k)
:
computes the hash Elasticsearch uses for shard allocation for a given key
int MemcachedUDF(string servers, string key, string value)
:
storesvalue
under givenkey
on memcached instances defined inservers
.
array<string> RegexExtractAllUDF(string haystack, string pattern, int group)
:
Extracts all matches of the regex group atgroup
identified bypattern
fromhaystack
int RowNumberUDF(key1, key2, ...)
:
Return the row number starting at 1. Whenever the value of any key changes the numbering is reset to 1."
string UnescapeXMLUDF(string src)
:
Unescapes the basic xml entitities in src
.
Long SequenceIdUDF(Long existing)
Generates a unique sequence id per row, returns existing if not null. This also generates unique sequences accross multiple task trackers by using the mapreduce task id.
ArraySumUDF(ArrayList<ArrayList<Integer>>)
:
Returns the sum of a two dimensional ArrayList. Empty ArrayLists and null values will get ignored.
ArrayMaxUDF(ArrayList<ArrayList<Integer>>)
:
Returns the maximum of a two dimensional ArrayList. Empty ArrayLists and null values will get ignored.
To use this project in with maven follow the steps described at https://github.com/lovelysystems/maven
The distributionManagement section in the pom contains the actual repository urls on github. It will lead to an error if you try to deploy to those urls, because these are no Maven API endpoints, where maven could upload the artifacts.
So to deploy to the Lovely Systems Maven repository first clone https://github.com/lovelysystems/maven to your local machine and set the deployment target location on the commandline like this:
mvn -DaltDeploymentRepository=snapshot-repo::default::file:../maven/snapshots clean deploy
After deployment simply commit the changes in the maven repository project and push.
This approach was take from the very useful blog entry at http://cemerick.com/2010/08/24/hosting-maven-repos-on-github/