Skip to content

Latest commit

 

History

History
268 lines (177 loc) · 10.3 KB

exclusive-ranges.md

File metadata and controls

268 lines (177 loc) · 10.3 KB

Exclusive Ranges and Range Interface

  • Type: Standard Library API proposal
  • Author: Thomas Nield
  • Contributors: Thomas Nield, Burkhard Mittelbach
  • Status: Submitted
  • Prototype: Not started

Summary

After doing some substantial exploration using Kotlin for statistics and stochastic optimization, I think there are opportunties to take advantage of a better implementation for ranges, and be able to support an until infix operator implementation for Double and Float.

Kotlin's stdlib has an implementation for ClosedRange, but not OpenRange, OpenClosedRange, and ClosedOpenRange. I believe the latter items need to be implemented at least for continuous Double and Float ranges, where the exclusive start/end point cannot be achieved discretely with a ClosedRange.

The ClosedRange, OpenRange, OpenClosedRange, and ClosedOpenRange should also share a common Range parent, so they all can be mixed together in a List<Range> (i.e. histograms or probability density functions).

Also proposed is deprecating the start property in ClosedRange for startInclusive. The Range interface would also benefit from having upperBound and lowerBound properties that do not define a specific inclusivity/exclusivity behavior.

Similar API review

Here are some other comparable API's on other platforms:

Google Guava- This is probably the best source for inspiration, and has several interesting ideas for further range functionalities down the road.

Python - Surprisingly, Python does not support float ranges but makes it easy to implement yourself via generators.

R - Supports ranges as sequencies in a similiar manner as Kotlin progressions, but with float/double support.

Use cases

Binning and Bucketing Continuous Ranges

Discretization of continuous features is a common mathematical operation. This task comes up in mathematical modeling, basic statistics, probability, and machine learning.

For instance, I may bin Sale objects on their price into interval buckets of size 20.0.

import java.time.LocalDate

fun main(args: Array<String>) {

    data class Sale(val accountId: Int, val date: LocalDate, val price: Double)

    val sales = listOf(
            Sale(1, LocalDate.of(2016,12,3), 180.0),
            Sale(2, LocalDate.of(2016, 7, 4), 140.2),
            Sale(3, LocalDate.of(2016, 6, 3), 111.4),
            Sale(4, LocalDate.of(2016, 1, 5), 192.7),
            Sale(5, LocalDate.of(2016, 5, 4), 137.9),
            Sale(6, LocalDate.of(2016, 3, 6), 125.6),
            Sale(7, LocalDate.of(2016, 12,4), 164.3),
            Sale(8, LocalDate.of(2016, 7,11), 144.2)
            )

    //bin by double ranges
    val binned = sales.binByDouble(
            valueSelector = { it.price },
            binSize = 20.0,
            rangeStart = 100.0
    )
	
val ranges = binned.ranges // should return endExclusive ranges

    ranges.forEach(::println) 
	
	/*
	OUTPUT:
	100..<120
	120..<140
	140..<160
	160..<180
	180..<200
	*/
}

I can also define my own ranges for a continuous histogram of values, and putting an until between two Double or Float values will conveniently return an ClosedOpenRange. I should have the option of putting in a ClosedRange so the last bin can capture the final end boundary.

val histogramBins = listOf(
		0.0 until 0.2,
		0.2 until 0.4,
		0.4 until 0.6,
		0.6 until 0.9,
		0.9..1.0 
)

To support a collection having both ClosedRange and ClosedOpenRange types, extracting a common Range interface might be necessary (with contains(), isEmpty(), lowerBound, and upperBound functions and properties).

Probability and Weighted Sampling

Another use case is random sampling with a probability density function in some form. While it is unlikely the end/start of each continuous range will be selected in a random sampling, it is still not kosher for those points to be inclusive and overlap on each other.

Below is an implementation of a WeightedDice that takes T sides with an associated probability. It uses an OpenEndDoubleRange with an endExclusive. While it might be probabilistically negligible, there is no chance a random Double will belong to two ranges because it falls on a border. Also, even if a ClosedRange is not doing damage to fair sampling, it is still misleading especially if those ranges are exposed via the API.

/**
 *  Assigns a probabilty to each distinct `T` item, and randomly selects `T` values given those probabilities.
 *  
 *  In other words, this is a Probability Density Function (PDF) for discrete `T` values
 */
class WeightedDice<T>(val probabilities: Map<T,Double>) {

    constructor(vararg values: Pair<T, Double>): this(
            values.toMap()
    )

    private val sum = probabilities.values.sum()

    val rangedDistribution = probabilities.let {

        var binStart = 0.0

        it.asSequence().sortedBy { it.value }
                .map { it.key to (binStart until (it.value + binStart)) }
                .onEach { binStart = it.second.endExclusive }
                .toMap()
    }

    /**
     * Randomly selects a `T` value with probability
     */
    fun roll() = ThreadLocalRandom.current().nextDouble(0.0, sum).let {
        rangedDistribution.asIterable().first { rng -> it in rng.value }.key
    }
}

Alternatives

The alternative would be for the Kotlin-Statistics library to continue making its own Range implementations. This is not desirable because this would make its ranges incompatible with stdlib's, and it does not get the language support with operators like .. and until.

Dependencies

What are the dependencies of the proposed API:

None

Placement

package kotlin.ranges

Reference implementation

You can find implementations here in Kotlin-Statistics.

Unresolved questions

If we were to extract a Range parent for ClosedRange, OpenRange, OpenClosedRange, and ClosedOpenRange, what should it contain?

Here is one proposed implementation: a lowerBound and upperBound should be defined to generalize the start and end, but not indicate whether they are inclusive or exclusive. This allows a List<Range> to still have access to the start and end values, regardless if they are inclusive or exclusive.

/**
 * A `Range` is an abstract interface common to all ranges, regardless of their inclusive or exclusive nature
 */
public interface Range<T: Comparable<T>> {

    /**
     * The minimum value in the range, regardless if it is inclusive or exclusive
     */
    public val lowerBound: T
    /**
     * The maximum value in the range, regardless if it is inclusive or exclusive
     */
    public val upperBound: T

    /**
     * Checks whether the specified [value] belongs to the range.
     */
    public operator fun contains(value: T): Boolean

    /**
     * Checks whether the range is empty.
     */
    public fun isEmpty(): Boolean
}

The child implementations can still have their own properties such as startInclusive endExclusive, endInclusive, endExclusive, and startExclusive, but they should have the same values as their respective lowerBound and upperBound counterparts. This also begs the question if start should be deprecated and explicitly be labeled startInclusive or startExclusive.

public interface ClosedOpenRange<T: Comparable<T>>: Range<T> {
    /**
     * The minimum value in the range.
     */
    public val startInclusive: T get() = lowerBound

    /**
     * The maximum value in the range (exclusive).
     */
    public val endExclusive: T get() = upperBound

    /**
     * Checks whether the specified [value] belongs to the range.
     */
    override operator fun contains(value: T): Boolean = value >= start && value < endExclusive

    /**
     * Checks whether the range is empty.
     */
    override fun isEmpty(): Boolean = start > endExclusive
}



public interface ClosedRange<T: Comparable<T>>: Range<T> {
    /**
     * The minimum value in the range.
     */
    public val startInclusive: T get() = lowerBound

    /**
     * The maximum value in the range (inclusive).
     */
    public val endInclusive: T get() = upperBound

    /**
     * Checks whether the specified [value] belongs to the range.
     */
    override operator fun contains(value: T): Boolean = value >= start && value <= endInclusive

    /**
     * Checks whether the range is empty.
     */
    override fun isEmpty(): Boolean = start > endInclusive
}

Future advancements

The Range interface maybe can have additional properties describing inclusivity/exclusivity, such as isStartInclusive. This may add clutter so we should consider this augmentation carefully.

It might also be beneficial to explore progressions with continuous numeric types, so expressions like 10.0 downTo 2.0 step 0.5 can be used. This can have a broad range of use cases, including temperature schedules for simulated annealing.


Naming

Considering there is a ClosedRange, the additional types OpenRange, ClosedOpenRange, and OpenClosedRange follow a stdlib-like convention.

Contracts

For the OpenRange and ClosedOpenRange implementations, the lowerBound must be less than the upperBound. Otherwise the range should be empty.

For the OpenClosedRange implementations, the lowerBound must be less than or equal to the upperBound. Otherwise the range should be empty.

Compatibility impact

The usage of start in ClosedRange may need to be deprecated in favor of explicit startInclusive and startExclusive.