- Type: Standard Library API proposal
- Author: Thomas Nield
- Contributors: Thomas Nield, Burkhard Mittelbach
- Status: Submitted
- Prototype: Not started
After doing some substantial exploration using Kotlin for statistics and stochastic optimization, I think there are opportunties to take advantage of a better implementation for ranges, and be able to support an until
infix operator implementation for Double
and Float
.
Kotlin's stdlib has an implementation for ClosedRange
, but not OpenRange
, OpenClosedRange
, and ClosedOpenRange
. I believe the latter items need to be implemented at least for continuous Double
and Float
ranges, where the exclusive start/end point cannot be achieved discretely with a ClosedRange
.
The ClosedRange
, OpenRange
, OpenClosedRange
, and ClosedOpenRange
should also share a common Range
parent, so they all can be mixed together in a List<Range>
(i.e. histograms or probability density functions).
Also proposed is deprecating the start
property in ClosedRange
for startInclusive
. The Range
interface would also benefit from having upperBound
and lowerBound
properties that do not define a specific inclusivity/exclusivity behavior.
Here are some other comparable API's on other platforms:
Google Guava- This is probably the best source for inspiration, and has several interesting ideas for further range functionalities down the road.
Python - Surprisingly, Python does not support float ranges but makes it easy to implement yourself via generators.
R - Supports ranges as sequencies in a similiar manner as Kotlin progressions, but with float/double support.
Discretization of continuous features is a common mathematical operation. This task comes up in mathematical modeling, basic statistics, probability, and machine learning.
For instance, I may bin Sale
objects on their price
into interval buckets of size 20.0
.
import java.time.LocalDate
fun main(args: Array<String>) {
data class Sale(val accountId: Int, val date: LocalDate, val price: Double)
val sales = listOf(
Sale(1, LocalDate.of(2016,12,3), 180.0),
Sale(2, LocalDate.of(2016, 7, 4), 140.2),
Sale(3, LocalDate.of(2016, 6, 3), 111.4),
Sale(4, LocalDate.of(2016, 1, 5), 192.7),
Sale(5, LocalDate.of(2016, 5, 4), 137.9),
Sale(6, LocalDate.of(2016, 3, 6), 125.6),
Sale(7, LocalDate.of(2016, 12,4), 164.3),
Sale(8, LocalDate.of(2016, 7,11), 144.2)
)
//bin by double ranges
val binned = sales.binByDouble(
valueSelector = { it.price },
binSize = 20.0,
rangeStart = 100.0
)
val ranges = binned.ranges // should return endExclusive ranges
ranges.forEach(::println)
/*
OUTPUT:
100..<120
120..<140
140..<160
160..<180
180..<200
*/
}
I can also define my own ranges for a continuous histogram of values, and putting an until
between two Double
or Float
values will conveniently return an ClosedOpenRange
. I should have the option of putting in a ClosedRange
so the last bin can capture the final end boundary.
val histogramBins = listOf(
0.0 until 0.2,
0.2 until 0.4,
0.4 until 0.6,
0.6 until 0.9,
0.9..1.0
)
To support a collection having both ClosedRange
and ClosedOpenRange
types, extracting a common Range
interface might be necessary (with contains()
, isEmpty()
, lowerBound
, and upperBound
functions and properties).
Another use case is random sampling with a probability density function in some form. While it is unlikely the end/start of each continuous range will be selected in a random sampling, it is still not kosher for those points to be inclusive and overlap on each other.
Below is an implementation of a WeightedDice
that takes T
sides with an associated probability. It uses an OpenEndDoubleRange
with an endExclusive
. While it might be probabilistically negligible, there is no chance a random Double
will belong to two ranges because it falls on a border. Also, even if a ClosedRange
is not doing damage to fair sampling, it is still misleading especially if those ranges are exposed via the API.
/**
* Assigns a probabilty to each distinct `T` item, and randomly selects `T` values given those probabilities.
*
* In other words, this is a Probability Density Function (PDF) for discrete `T` values
*/
class WeightedDice<T>(val probabilities: Map<T,Double>) {
constructor(vararg values: Pair<T, Double>): this(
values.toMap()
)
private val sum = probabilities.values.sum()
val rangedDistribution = probabilities.let {
var binStart = 0.0
it.asSequence().sortedBy { it.value }
.map { it.key to (binStart until (it.value + binStart)) }
.onEach { binStart = it.second.endExclusive }
.toMap()
}
/**
* Randomly selects a `T` value with probability
*/
fun roll() = ThreadLocalRandom.current().nextDouble(0.0, sum).let {
rangedDistribution.asIterable().first { rng -> it in rng.value }.key
}
}
The alternative would be for the Kotlin-Statistics library to continue making its own Range
implementations. This is not desirable because this would make its ranges incompatible with stdlib's, and it does not get the language support with operators like ..
and until
.
What are the dependencies of the proposed API:
None
package kotlin.ranges
You can find implementations here in Kotlin-Statistics.
If we were to extract a Range
parent for ClosedRange
, OpenRange
, OpenClosedRange
, and ClosedOpenRange
, what should it contain?
Here is one proposed implementation: a lowerBound
and upperBound
should be defined to generalize the start and end, but not indicate whether they are inclusive or exclusive. This allows a List<Range>
to still have access to the start and end values, regardless if they are inclusive or exclusive.
/**
* A `Range` is an abstract interface common to all ranges, regardless of their inclusive or exclusive nature
*/
public interface Range<T: Comparable<T>> {
/**
* The minimum value in the range, regardless if it is inclusive or exclusive
*/
public val lowerBound: T
/**
* The maximum value in the range, regardless if it is inclusive or exclusive
*/
public val upperBound: T
/**
* Checks whether the specified [value] belongs to the range.
*/
public operator fun contains(value: T): Boolean
/**
* Checks whether the range is empty.
*/
public fun isEmpty(): Boolean
}
The child implementations can still have their own properties such as startInclusive
endExclusive
, endInclusive
, endExclusive
, and startExclusive
, but they should have the same values as their respective lowerBound
and upperBound
counterparts. This also begs the question if start
should be deprecated and explicitly be labeled startInclusive
or startExclusive
.
public interface ClosedOpenRange<T: Comparable<T>>: Range<T> {
/**
* The minimum value in the range.
*/
public val startInclusive: T get() = lowerBound
/**
* The maximum value in the range (exclusive).
*/
public val endExclusive: T get() = upperBound
/**
* Checks whether the specified [value] belongs to the range.
*/
override operator fun contains(value: T): Boolean = value >= start && value < endExclusive
/**
* Checks whether the range is empty.
*/
override fun isEmpty(): Boolean = start > endExclusive
}
public interface ClosedRange<T: Comparable<T>>: Range<T> {
/**
* The minimum value in the range.
*/
public val startInclusive: T get() = lowerBound
/**
* The maximum value in the range (inclusive).
*/
public val endInclusive: T get() = upperBound
/**
* Checks whether the specified [value] belongs to the range.
*/
override operator fun contains(value: T): Boolean = value >= start && value <= endInclusive
/**
* Checks whether the range is empty.
*/
override fun isEmpty(): Boolean = start > endInclusive
}
The Range
interface maybe can have additional properties describing inclusivity/exclusivity, such as isStartInclusive
. This may add clutter so we should consider this augmentation carefully.
It might also be beneficial to explore progressions with continuous numeric types, so expressions like 10.0 downTo 2.0 step 0.5
can be used. This can have a broad range of use cases, including temperature schedules for simulated annealing.
Considering there is a ClosedRange
, the additional types OpenRange
, ClosedOpenRange
, and OpenClosedRange
follow a stdlib-like convention.
For the OpenRange
and ClosedOpenRange
implementations, the lowerBound
must be less than the upperBound
. Otherwise the range should be empty.
For the OpenClosedRange
implementations, the lowerBound
must be less than or equal to the upperBound
. Otherwise the range should be empty.
The usage of start
in ClosedRange
may need to be deprecated in favor of explicit startInclusive
and startExclusive
.