Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where functionality in xarray including else case (dask compability) #1604

Closed
rpnaut opened this issue Oct 4, 2017 · 8 comments
Closed

Where functionality in xarray including else case (dask compability) #1604

rpnaut opened this issue Oct 4, 2017 · 8 comments

Comments

@rpnaut
Copy link

rpnaut commented Oct 4, 2017

I am faced with the flexibility needed to compute different types of skill scores using xarray. Thus, keeping in mind the attached code - a method for computing a modified mean squared error skill score ("AVSS") - I am fighting with the following problems:

  1. I want to try to keep the code user-friendly regarding an extension of my program to other skill scores. Thus, the middle part of the attached method utilizing the if-then-else statement shall be outsourced.
  2. There are three input datasets in case of skill scores: self.DSref = observations, self.DSrefmod = reference model, self.proof = model to evaluate. I have to combine all three with simple arithmetics (minus), but xarray does not allow simple arithmetics in case of small differences in the coordinates between the three datasets (also if the data type of the coordinates differ from float64 to float). Thus, my horrifying workaround is to make a loop over all variables I want to evaluate and to do for each variable the following: a) create a new dataset "DSnew" based on the dataset-variable "self.DSproof[varnsproof]", b) rename the variable in "DSnew" to the variable name I want to have for the evaulation result (e.g. Bias of temperature or skill score of temperature), c) create some help variables "DSnew['MSE_p1]" by copying and d) modifying the data of the variables to compute those mathematical operations of the related skill score invariant to temporal aggregation, e) applying grouping and resampling to compute climate statistics as monthly means or daily cycles and f) final mathematical operation of the related skill score which has to be done after temporal aggregation. Is there a better way to handle the operations / to prevent the strange process of creating new datasets and copying variables and to prevent the outer loop over the variables? What would be your short code to handle my problem?
  3. The where functionality is sometimes needed to compute skill scores. I have used the where function of numpy, but as I read in your xarray-documentation, an explicit call of numpy functions is not compatible with dask-arrays? Is there an analogue in the xarray-package?
def squarefunc(x):
	return xarray.ufuncs.square(x)
def AVSS_def(x):
	AVSS_p1 = x["MSE_p1"]/x["MSE_p2"] * (-1.0) + 1.0
	AVSS_p2 = x["MSE_p2"]/x["MSE_p1"] - 1.0
	x[varnsres].data = np.where( (x["MSE_p2"] - x["MSE_p1"]) > 0,AVSS_p1,AVSS_p2 )
	return x
#
endresult = xarray.Dataset()
for varnsrefmod,varnsproof,varnsref,varnsres in zip(self.varns_refmod,self.varns_proof,self.varns_ref,varns_result):
	DSnew = xarray.merge([xarray.Dataset(),self.DSproof[varnsproof]])
	DSnew.rename({varnsproof : varnsres },inplace=True)
	DSnew["MSE_p1"] = DSnew[varnsres].copy()
	DSnew["MSE_p2"] = DSnew[varnsres].copy()
	DSnew["MSE_p1"].data = squarefunc(self.DSproof[varnsproof].data   - self.DSref[varnsref].data)
	DSnew["MSE_p2"].data = squarefunc(self.DSrefmod[varnsrefmod].data - self.DSref[varnsref].data)
	coordtime     = GeneralUtils.FromDimList2Pyxarray(dim_time[varnsref])
	if aggregtime == 'fullperiod':
		DSnew = DSnew.mean(coordtime);
		self.RepairTime.update({'Needed' : False});
	elif aggregtime == '-':
		DSnew = DSnew;
		self.RepairTime.update({'Needed' : False});
	elif "overyears" in aggregtime:
		grpby_method=GeneralUtils.ConvertAggregationKey2XRgroupby(aggregtime)
		DSnew = DSnew.groupby(coordtime+'.'+grpby_method).mean(coordtime);
		self.RepairTime.update({'Needed' : True});
		self.RepairTime.update({'start' : self.DSref[coordtime].data[0] });
		self.RepairTime.update({'end'   : self.DSref[coordtime].data[-1]})
	elif "overyears" not in aggregtime:
		resamplefreq=GeneralUtils.ConvertAggregationKey2Resample(aggregtime)
		DSnew = DSnew.resample(resamplefreq, dim=coordtime, how='mean');
		self.RepairTime.update({'Needed' : False});
       AVSS_def(DSnew);
       self.Update_Attributes(Datasetobj=DSnew,variable=varnsres,stdname=varnsres,units=self.DSref[varnsref].attrs['units'], \
	longname="temporal AVSS of "+self.DSref[varnsref].attrs['long_name'])
endresult = xarray.merge([endresult,DSnew])
@jhamman
Copy link
Member

jhamman commented Oct 4, 2017

I think the 3-argument version of where implemented in #1496 will suite your purposes. This is currently in the development branch of xarray and will be part of the next 0.10 release.

@rpnaut
Copy link
Author

rpnaut commented Oct 11, 2017

Thank you very much, jhamman, for your comment on #1496 . I would really like that feature.

Hopefully, I will find also a way to overcome in my script the problem with simple arithmetic operators on DataSets or DataArrays. I do not like to always access only the data-stream (numpy-array) and not the DataSet or DataArray.

@jhamman
Copy link
Member

jhamman commented Oct 11, 2017

@rpnaut - You probably don't need to be operating on the data attribute as you are above. However, its not entirely clear what you're trying to do since we're missing some of the scope in your workflow.

If you can create a simpler example of what you're trying to do, I think your use case would make a good StackOverflow question.

@jhamman
Copy link
Member

jhamman commented Nov 1, 2017

@rpnaut - any update here or should we close this?

@Zac-HD
Copy link
Contributor

Zac-HD commented Dec 14, 2017

Closed by #1496, I think.

@shoyer
Copy link
Member

shoyer commented Dec 14, 2017

Yes, xarray.where() is available in the v0.10 release.

@shoyer shoyer closed this as completed Dec 14, 2017
@rpnaut
Copy link
Author

rpnaut commented Jun 13, 2018

The where operator does only allow for an 'ifthen' construct, but not for an 'ifthenelse' construct. I cannot explicitly tell which values to write in the data at those places where the condition is not fullfilled. It is automatically a 'NA'. This leads to a lot of computation time and address a lot of memory.

@shoyer
Copy link
Member

shoyer commented Jun 13, 2018

Use the xarray.where function. It supports the full ternary if/then/else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants