pemtk.fit._filters

Module Contents

Functions

thresFits(self[, thres, dataType, key, dataDict])

Very basic threshold mask for Pandas DataFrame. Note mask = True for values < thres.

_subsetFromXS(self[, selectors, data])

Subselect from data using pandas.xs.

getFitInds(self[, selectors, key, dataDict, inds])

Get find indexes from subselection (Pandas datasets)

filterData(self[, filterOptions, keys, dim, dTypes])

Very basic filter/mask generation function.

getDataDict(self, dim[, key, dTypes, returnType, dropna])

Return specific dataset from various dictionaries by dimension name.

pemtk.fit._filters.thresFits(self, thres=None, dataType=None, key='fits', dataDict='dfLong')[source]

Very basic threshold mask for Pandas DataFrame. Note mask = True for values < thres.

For more sophisticated multi-param filtering, see TMO-DEV filterData. Should be applicable here for Pandas data too…?

TODO: more sophisticated methods, np or pd masked arrays?

pemtk.fit._filters._subsetFromXS(self, selectors={}, data=None)[source]

Subselect from data using pandas.xs.

pemtk.fit._filters.getFitInds(self, selectors={}, key='fits', dataDict='dfWide', inds='Fit')[source]

Get find indexes from subselection (Pandas datasets)

pemtk.fit._filters.filterData(self, filterOptions={}, keys=None, dim='energies', dTypes=['raw', 'metrics'])[source]

Very basic filter/mask generation function.

filterOptionsdict, containing {dim:values} to filter on.

Singular values are matched. Pairs of values as used as ranges. For multidim parameter sets, specify which source column to use as 3rd parameter.

keyslist, optional, default = None

Datasets to process, defaults to self.runs[‘proc’]

dimstr, optional, default = ‘energies’

Data to use as template. Not usually required, unless multidim return and/or default data is missing.

dTypeslist, optional, default = [‘raw’,’metrics’]

Data dicts to use for filtering. TODO: move this elsewhere!

TODO:

  • More flexibility.

  • Filter functions, e.g. saturated electron detector shots? (‘xc’ > 0).sum(axis = 1) <=1000 in this case I think.

07/12/20: added support for “metrics” data.

pemtk.fit._filters.getDataDict(self, dim, key=None, dTypes=None, returnType='dType', dropna=False)[source]

Return specific dataset from various dictionaries by dimension name.

dimstring

Dimension (data) to find/check.

keystring, int, optional, default = None

Run key into main data structure. If None, use the first run in self.runs[‘proc’].

dTypesstr, list, optional, default = self.dTypes

Data dicts to check, defaults to global settings.

returnTypestr, optional, default = ‘dType’
  • ‘dType’ return data type to use as index.

  • ‘data’ return data array.

  • ‘lims’ return min & max values only.

  • ‘unique’ return list of unique values.

dropnabool, optional, default = True

Drop Nans in data? These cause issues for np.hist with ‘auto’ binning. But… in current code with basic masking, this breaks mask if array sizes are inconsistent. Better to filter out with ranges?

08/12/20: first attempt, to replace repeated code in various base functions, and allow for multiple types (e.g. ‘raw’, ‘metrics’ etc.)

TODO: may also want to add datatype to array conversion routine, since this will otherwise default to float64 and can be memory hungy. May also want to add chunking here too.

TO FIX: dTypes checking buggy, for multiple matched dTypes only returns last matching item.