Median of a List of Lists (with no concatenation)

"CPU Heat Sink" by Michael Dziedzic is licenced under Unsplash License

Description

This kata focuses on practicing TDD with algorithms.

TDD does not 'magically' generate non-trivial algorithms.

TDD can help in a different way though. You can use it to validate that your algorithm works, and then use the tests to refactor your algorithm.

Here is the process (roughly):

Think about the algorithm that will solve your problem.
Think of an order of tests that will allow you to build this algorithm incrementally.
Use TDD to write a crude algorithm that solves your problem, and that can later be re-factored into our target algorithm. This usually involves laying down the main blocks we will need, while using very low performance algorithms as much as possible.
With the tests in place, refactor the algorithm to what we want.

Part 1 - Solving the Problem

Problem Description

We have a cluster of machines that each holds a very large list of integers. We want to get the median of all these numbers, but we cannot bring all the numbers on a single machine.

The central question is: how to compute the median value of a list of lists without concatenating all the lists? (A Median value does not need to be an element of the list, and can be a float value even if the lists only contain integers).

Possible Solution

The definition of a median is that there are as many values that are greater as values that are smaller.

A simple way to find the median, is to search for a value that fits the definition by asking for the count of greater and smaller values to all lists, and then summing.

Here is a starting skeleton of the algorithm, that leaves place for incremental optimizations:

List<List<int>> values = ...

isMedian = function(n) {
    totalCountSmaller = sum(countSmaller(n, values[0]), countSmaller(n, values[1]), countSmaller(n, values[2]), ...)
    totalCountGreater = sum(countGreater(n, values[0]), countGreater(n, values[1]), countGreater(n, values[2]), ...)
    return totalCountSmaller == totalCountGreater
}

maxValue = max(max(values[0]), max(values[1]), max(values[2]), ...)
minValue = min(min(values[0]), min(values[1]), min(values[2]), ...)

return findByPredicateBetween(minValue, maxValue, isMedian)

At first, we can implement all the bricks (min, max, findByPredicateBetween, countSmaller, countGreater) in a very basic brute force way.

Part 2 - Code Optimization

The current solution is working, but it is slow! Your task now is to reach a faster solution! Therefore, do not attempt this step before completing part 1 (with a greedy solution).

For this, we will be using two techniques:

Big(O) analysis of each function
Code profiling: For this technique, you rely on tools to help you analyze the space and time complexity of your program. Refer to the below pages for more details:
- Java
- C++
- Go
- Python
- Typescript

Detailed Steps

Here are the steps we suggest you follow:

Introduce a benchmark test
Do a big(O) analysis. For each function, compute the Big(O) (comment the result at the top of the function)
Run the profiler and compare the result to the results you got from the previous step.
Based on the analysis you've done in previous steps, think of optimizations you can do
Apply the optimizations
Re-run the profiler and analyze the output. Was there any improvement in time?
Plan further possible optimization (if any)
Repeat steps 4-7

Possible Optimizations

Use a dichotomous search
Sort the lists to speed up max and min
Use a binary search for countSmaller and countGreater
Compute countSmaller and countGreater in a single call (remember the lists are supposed to be on a different machine)
Parallelize the computations happening on the different lists (BONUS: how to abstract this to keep the unit tests deterministic and single threaded?)

Getting Started

Session Quick Retrospective

You can fill it from here

Useful Links

For this Kata

General

TCR (Test && Commit || Revert) wrapper utility
Collaborative timer for pairing or mobbing: mobti.me or agility timer

Session Information

Style & Duration

2-hour Randori in Pairs

Topic

TDD on algorithms

Focus Points

Acceptance tests
Greedy or not?
⚠ Premature optimization

Source Files

License

Kata-MedianListOfLists and the accompanying materials are made available under the terms of the MIT License which accompanies this distribution, and is available at the Open Source site

Acknowledgements

See ACKNOWLEDGEMENTS.md for more information.

Contributors

Name		Name	Last commit message	Last commit date
Latest commit History 188 Commits
.github		.github
cpp		cpp
doc		doc
go		go
images		images
java		java
python		python
tcr		tcr
typescript		typescript
.gitattributes		.gitattributes
.gitignore		.gitignore
ACKNOWLEDGEMENTS.md		ACKNOWLEDGEMENTS.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
QuickRetrospective.md		QuickRetrospective.md
README.md		README.md
pull-push.sh		pull-push.sh
watch.sh		watch.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Median of a List of Lists (with no concatenation)

Description

Part 1 - Solving the Problem

Problem Description

Possible Solution

Part 2 - Code Optimization

Detailed Steps

Possible Optimizations

Getting Started

Session Quick Retrospective

Useful Links

For this Kata

General

Session Information

Style & Duration

Topic

Focus Points

Source Files

License

Acknowledgements

Contributors

About

Contributors 5

Languages

License

murex/Kata-MedianListOfLists

Folders and files

Latest commit

History

Repository files navigation

Median of a List of Lists (with no concatenation)

Description

Part 1 - Solving the Problem

Problem Description

Possible Solution

Part 2 - Code Optimization

Detailed Steps

Possible Optimizations

Getting Started

Session Quick Retrospective

Useful Links

For this Kata

General

Session Information

Style & Duration

Topic

Focus Points

Source Files

License

Acknowledgements

Contributors

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Contributors 5

Languages