Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using this to predict data in the future with only partial observations available #34

Closed
atsepkov opened this issue Jun 11, 2022 · 4 comments
Labels

Comments

@atsepkov
Copy link

Sorry if this is a naive question, but let's say I have a 1D dataset of observations that has 10 observations, and I want to predict 15th datapoint without being able to measure observations 11-15, but assuming the pattern/trajectory established by the first 10 observations holds. I can't pad the array as that would skew predictions, is it possible to use your library to perform Kalman predictions into the future without the update step after the initial set of observations?

@atsepkov atsepkov changed the title Using this to predict data in the future without access to observations Using this to predict data in the future with only partial observations available Jun 11, 2022
@piercus
Copy link
Owner

piercus commented Jun 11, 2022

@atsepkov yes you can do this.

The main idea would be to use a function in observation.covariance, and to set a huge variance when the observation is null, then you use null observation when your data are note correct

const dynamic = yourDynamic;
const huge = 1e15;
const baseVariance = 1;
new KalmanFilter({
  dynamic,
  observation : {
    ...
    covariance: function(o){
        if(observation[0] === null){
          return [huge]
        } else {
          return [baseVariance]
        }
    }
  }
})

If you can try this and share an example code (with fiddle or just a nodejs piece of code) i could help you

I'm thinking about creating a pre-made 'sensor-nullable' observationType which would make this use case even easier but it would help if you can share your code.

@atsepkov
Copy link
Author

atsepkov commented Jun 11, 2022

I think I understand, you're effectively flagging the entry as an outlier? I can describe my use case, but it will be hard for me to show my code without extracting the data into a 1-off example as this is a small subset of a larger project I've been working on for a couple years.

I'm basically smoothing out/predicting trends based on data I've got from a few different sources (CENSUS, IRS, FBI). Let's say, for example, we're looking at crime trends for a given region based on historical FBI trends. At this point I've already averaged it into a single per-year entry (although perhaps it's possible to do something smarter by using a multi-dimensional filter that factors in correlations between different sources and/or neighboring regions - I just don't understand the full capability of Kalman to make use of these yet).

For regions with larger population, the trends don't vary much so my naive extrapolation logic worked ok. But for more rural regions, the data can be all over the place from year to year (i.e. no crime at all in certain years, and single crime affects these much more due to small population). What I have is effectively a list of readings from a period when digitized data is available (around 20-year timeframe for most of these sources). This data also tends to lag behind a few years by the time the agencies digitize is, and I'm trying to basically extrapolate it to today.

So let's say I grabbed the crime data for a given region and I'm trying to extrapolate it a few years into the future, I've tried the approach you describe above, but seem to be having trouble due to unfamiliarity with your library. Here is what I have so far:

const {KalmanFilter} = require('kalman-filter');

let dataset = [0,0,0,0,16.1,0,0,30.9,0,0,0,0,26.1,null,null] // this is a "index" metric based on crime/population, not individual crimes
let baseVariance = 1;
let huge = 1e15;
let kf = new KalmanFilter({
  // do I need a dynamic? based on ones I see in the lib, it seems like constant-acceleration
  // would make the most sense but it requires projection to be defined?
  observation : {
    dimension: 1, // without this I was getting a dimension error
    covariance: function (o) {
        if (o[0] === null){
          return [huge]
        } else {
          return [baseVariance]
        }
    }
  }
})

const res = kf.filterAll(dataset);
console.log(dataset, res);

I get the following error when I attempt to run the above, which seems to imply that reduce isn't returning an array:

./node_modules/kalman-filter/lib/utils/check-matrix.js:4
        if (matrix.reduce((a, b) => a.concat(b)).filter(a => Number.isNaN(a)).length > 0) {
                                                 ^
TypeError: matrix.reduce(...).filter is not a function

piercus added a commit that referenced this issue Jun 13, 2022
@piercus
Copy link
Owner

piercus commented Jun 13, 2022

@atsepkov

Yes you are right, matrix should be A 2 dimensionnal array, and if you are using null, observations should be 2-dimensionnal arrays too.

Corresponding unit tests are in a226029

Here is the corrected version :

	let dataset = [0,0,0,0,16.1,0,0,30.9,0,0,0,0,26.1,null,null].map(a => [a]) // each observation here should be an array, this allow multi-dimensionnal observations. This is not necessary when using numbers only, but with null, we are king of using a trick which requires us to be more explicit in observation formating
	let baseVariance = 1;
	let huge = 1e15;
	let kf = new KalmanFilter({
	  observation : {
	    dimension: 1, 
	    covariance: function (o) {
	        if (o.observation[0][0] === null){ // each observation here is a column matrix because kalman filter is a matrix library, so [0] is formatted as [[0]]
	          return [[huge]]
	        } else {
	          return [[baseVariance]]
	        }
	    }
	  }
	})

	const res = kf.filterAll(dataset);
	console.log(dataset, res);

do I need a dynamic? based on ones I see in the lib, it seems like constant-acceleration

I feel that constant-acceleration is dangerous, if you do not put any constant-position is the default one, and it means that by default value[t+1] ~ value[t]

If you want to extrapolate, constant-speed is a good choice and basically assume that value[t+1] ~ value[t] + (value[t]-value[t-1])

But the more complex your filter is, the harder it will be to fine-tune it and have good results, i would suggest you to avoid constant-acceleration in your situation.

although perhaps it's possible to do something smarter by using a multi-dimensional filter that factors in correlations between different sources and/or neighboring regions - I just don't understand the full capability of Kalman to make use of these yet

Yes you could easily extend this and use multi-dimensionnal array.

let's consider you have 2 different sources you could do like

	const {diag} = require('kalman-filter').linalgebra;// only available for now in the branch issue-34

	const dataset = [
		[22, null],
		[25, null],
		[4, 4],
		[4, 4],
		[22, 5],
		[null, null],
		[34, 45]
	];

	const baseVariance = 1;
	const huge = 1e15;
	const kf = new KalmanFilter({
		observation: {
			stateProjection: [[1], [1]], // this is saying that each measure of the 2D input is projected in the same 'state' dynamic axis. This also gives the observation dimension (2) and dynamic dimension (1)
			covariance(o) {
				const variances = o.observation.map(a => {
					if (a[0] === null) {
						return huge;
					}

					return baseVariance;
				});
				return diag(variances);
			}
		}
	});

	const response = kf.filterAll(dataset);

@piercus piercus reopened this Jun 13, 2022
github-actions bot pushed a commit that referenced this issue Jun 13, 2022
# [1.10.0](v1.9.4...v1.10.0) (2022-06-13)

### Bug Fixes

* [#34](#34) unit tests ([64c87d1](64c87d1))
* add 1D and 2D tests for [#34](#34) ([a226029](a226029))

### Features

* expose linalgebra ([fdc5b14](fdc5b14))
@piercus
Copy link
Owner

piercus commented Jun 13, 2022

🎉 This issue has been resolved in version 1.10.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

@piercus piercus closed this as completed Jun 16, 2022
github-actions bot pushed a commit to psantos9/kalman-filter that referenced this issue Nov 4, 2024
# 1.0.0 (2024-11-04)

### Bug Fixes

* [piercus#26](https://github.com/psantos9/kalman-filter/issues/26) ([36254f0](36254f0))
* [piercus#34](https://github.com/psantos9/kalman-filter/issues/34) unit tests ([64c87d1](64c87d1))
* [piercus#36](https://github.com/psantos9/kalman-filter/issues/36) ([81fb7de](81fb7de))
* 1d example ([3c9e927](3c9e927))
* add 1D and 2D tests for [piercus#34](https://github.com/psantos9/kalman-filter/issues/34) ([a226029](a226029))
* Add a reference to [@wouterbulten](https://github.com/wouterbulten) kalmanjs ([edd51c6](edd51c6))
* add assets to the release with github actions ([0d079b4](0d079b4))
* add git plugin to semantic-release ([e3be8f8](e3be8f8))
* add index and fix the options for extended filter ([278313e](278313e))
* bikes demo + new GPS demo ([2806e83](2806e83))
* broken link ([4f7ed97](4f7ed97))
* change assets labels ([63ff6f4](63ff6f4))
* change selectedIndexes to obsIndexes in sub square matrix ([28b62e9](28b62e9))
* credits to Adrien Pellissier ([cc04bf2](cc04bf2))
* default index in predicted is -1 ([da8b3d3](da8b3d3))
* digital envelope routines::unsupported ([556302a](556302a))
* getCovariance README example ([d9e6e66](d9e6e66))
* getPredictedCovariance, pass a full object ([9cc0dc1](9cc0dc1))
* move demo to gh-pages ([392591e](392591e))
* move to matrix-inverse 2.0.0 and improve the covariance check ([0bd5c85](0bd5c85))
* new dynamics ([b4dfb9f](b4dfb9f))
* override NODE_OPTIONS ([768d061](768d061))
* override NODE_OPTIONS ([797253f](797253f))
* package.json ([764b713](764b713))
* padWithZero cloning and logic ([250ea11](250ea11))
* pass input through the extended filter functions ([6a4b19a](6a4b19a))
* README.md for MathJax ([b69c3e5](b69c3e5))
* release ([3c505b0](3c505b0))
* repo-card ([b5a14c2](b5a14c2))
* revert CSS ([1f2cdc9](1f2cdc9))
* rm http-server from package.json ([83d1be1](83d1be1))
* unit tests ([2227371](2227371))
* unit tests ([ee0155b](ee0155b))
* update package-lock.json ([ffeed39](ffeed39))
* update xo ([373e8bb](373e8bb))

### Features

* [piercus#25](https://github.com/psantos9/kalman-filter/issues/25) [piercus#28](https://github.com/psantos9/kalman-filter/issues/28) demo clean up ([feb6f9f](feb6f9f))
* add files in package.json to reduce package footprint ([657b7e5](657b7e5))
* add publishing workflow for [piercus#7](https://github.com/psantos9/kalman-filter/issues/7) browser distributions on github releases ([06970ab](06970ab))
* add Tracker and distance collection ([141d8f6](141d8f6))
* adding 5 dynamic models and 2 observation models ([a2ea997](a2ea997))
* Bhattacharyya ([9357433](9357433))
* change polymorphism, now kf.observation.covariance can be a matrix ([4628aa7](4628aa7))
* change selectedIndexes to obsIndexes ([6e53280](6e53280))
* control parameter using dynamic.constant ([90b3333](90b3333))
* detailedMahalanobis ([d3fe87f](d3fe87f))
* expose checkCovariance fn ([c2e260c](c2e260c))
* expose correlationToCovariance and covarianceToCorrelation ([73ff89b](73ff89b))
* expose linalgebra ([fdc5b14](fdc5b14))
* expose State ([fedfbc6](fedfbc6))
* extended kalman filter with fn parameters ([5eac445](5eac445))
* pass correct, predict, getGain input in getValue call ([356f545](356f545))
* polymorphic dynamic and observation options can be number/string ([062726c](062726c))
* polymorphism on covariance ([66ce358](66ce358))
* remove linalgebra and use simple-linalg instead ([bcb5c8d](bcb5c8d))
* state.subState and state.rawDetailedMahalanobis(point) ([7fffe39](7fffe39))

### BREAKING CHANGES

* the linalgebra is not exposed anymore
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants