pemtk.fit.fitClass module

PEMtk fitting base classes

Development for fitting class, take pemtk.dataClass as base and add some subselection & fitting methods.

10/05/21 v1b added fitting routines using ePSproc and lmfit libraries.: Adapted from dev code notebook (functional forms), still needs some tidying up & wrapping for class. For dev code see http://127.0.0.1:8888/lab/tree/dev/PEMtk/fitting/fitting_routines_dev_PEMtk_300421.ipynb

13/10/20 v1

TODO:

Clean up/finalise data scheme. Currently mix of dictionary style, self.data[dataset][datatype] and class attribs self.datatype. Better to go with former for flexibility, or latter for using as simple base class to wrap later? Former works with some existing plotting fns., but complicated.
afblmMatEfit() defaults similarly messy, although basically working.
More analysis tools for fitting & results to add, currently just self.fit() to run.

class pemtk.fit.fitClass.pemtkFit(*args: Any, **kwargs: Any)[source]

Bases: dataClass

Class prototype for pemtkFit class. Dev version builds on dataClass, and adds some basic subselection & fitting capabilities.

BLMfitPlot(keys=None, dataType='AFBLM', Etype='t', thres=0.01, col=None, **kwargs)

Wrap BLMplot for data + simulation results with default params.

TODO: - better plotting (HV?). - fix legend & colour mapping.

BLMsetPlot(key='fits', dataDict='AFxr', agg=True, ref=True, overlay=['l', 'm'], pType='r', thres=0.001, sel=None, xDim=None, sq=True, drop=True, unstack=True, plotDict='plots')

Plot sets of BLM results from Xarray datasets with Holoviews.

For plotting individual datasets with more control, see BLMfitPlot().

TODO: - add Seaborn plotting options. - Streamline, should be able to use recursively to stack additional plots…?

Parameters:

agg (bool, default = True) – If True, define reduced data as hv.reduce([‘Fit’], np.mean, spreadfn=np.std) NOTE: if False, rendering can be quite slow for large datasets. TODO: more options here.
ref (bool, default = True) – If True, include original fitted data in plots. TODO: more options here.

afblmMatEfit(matE=None, data=None, lmmuList=None, basis=None, ADM=None, pol=None, resetBasis=False, selDims={}, thres=None, thresDims='Eke', lmModelFlag=False, XSflag=True, weights=None, backend=None, debug=False, **kwargs)[source]

Wrap epsproc.geomFunc.afblmXprod() for use with lmfit fitting routines.

Parameters:

matE (Xarray or lmfit Parameters object) – Matrix elements to use in calculation. For Parameters object, also require lmmuList to convert to Xarray. If not passed, use self.data[self.subKey][‘matE’].
data (Xarray, optional, default = None) – Data for fitting. If set, return residual. If not set, return model result.
lmmuList (list, optional, default = None) – Mapping for paramters. Uses self.lmmu if not passed.
basis (dict, optional, default = None) – Pre-computed basis set to use for calculations. If not set try to use self.basis, or passed set of ADMs. NOTE: currently defaults to self.basis if it exists, pass resetBasis=True to force overwrite.
ADM (Xarray) – Set of ADMs (alignment parameters). Not required if basis is set.
pol (Xarray) – NOTE: currently NOT used for epsproc.geomFunc.afblmXprod() Set of polarization geometries (Euler angles). Not required if basis is set. (If not set, defaults to ep.setPolGeoms())
resetBasis (bool, optional, default=False) – Force self.basis overwrite with updated values. NOT YET IMPLEMENTED
{} (selDims =) – Selectors passed to backend. TODO: should use global options here.
None (thres =) – Selectors passed to backend. TODO: should use global options here.
'Eke' (thresDims =) – Selectors passed to backend. TODO: should use global options here.
lmModelFlag (bool, optional, default=False) – Output option for flat results structure for lmfit testing.
XSflag (bool, optional, default=True) – Use absolute cross-section (XS) in fitting? This is passed to backends as BLMRenorm flag. If true, use passed B00(t) values in fit, and do not renormalise. If false, renorm by B00(t), i.e. all values will be set to unity (B00(t)=1).
weights (int, Xarray or np.array, optional, default = None) –
Weights to use for residual calculation. - If set, return np.sqrt(weights) * residual. (Must match size of data along key dimension(s).) - If None, try to use use self.data[self.subKey][‘weights’].

If that is not found, or is None, an unweighted residual will be returned.

For bootstrap sampling, setting Poissonian weights can be used, see https://en.wikipedia.org/wiki/Bootstrapping_(statistics)#Poisson_bootstrap Use self.setWeights() for this, e.g. weights = rng.poisson(weights, data.t.size) To use uncertainties from the data, set weights = 1/(sigma^2)

backendfunction, optional, default = None

UPDATED 21/08/22 - now default = None, uses self.fitOpts[‘backend’] Set at class init, see also self.backends().

Testing 12/08/22 Supports backend = afblmXprod or mfblmXprod, and test OK with latter. NOTE - when passing fn externally, it may need to be defined in base namespace. (Default case uses locally defined function.) E.g.

data.afblmMatEfit(backend = ep.mfblmXprod) should be OK following import epsproc as ep data.afblmMatEfit(backend = mfblmXprod) will fail

debugbool, optional, default = False

Print additional debug output for testing.

**kwargsoptional

Additional args passed to backends.

NOTE:

some assumptions here, will probably need to run once to setup (with ADMs), then fit using basis returned.
Currently fitting abs matrix elements and renorm Betas. This sort-of works, but gives big errors on |matE|. Should add options to renorm matE for this case, and options for known B00 values.

TODO:

Consolidate weights to main data structure.
04/05/22: added to basis return as basis[‘weights’], may want to pipe back to self.data[self.subKey][‘weights’], or just set elsewhere?
More sophisticated bootstrapping methods, maybe with https://github.com/smartass101/xr-random and https://arch.readthedocs.io/en/latest/index.html

21/08/22: now with improved backend handling, working for AF and MF case. 12/08/22: testing for MF fitting. Initial tests for case where BASIS PASSED ONLY, otherwise still runs AF calc. 02/05/22: added weights options and updated docs.

aggToXR(key='agg', cols={'comp': ['m', 'p'], 'compC': ['n', 'pc']}, EkeList=[1.1], dType='matE', conformDims=True, refKey=None, returnType='ds', simpleForm=False)

Pull columns from PD dataframe & stack to XR dataset.

colsdict, optional, default = {‘comp’:[‘m’,’p’],’compC’:[‘n’,’pc’]}: Dict of keys for output items/columns, and [mag,phase] columns to convert.

TODO: - EkeList from input data subset? - Use existing routines for more flexible dim handling? E.g. pemtk.sym._util.toePSproc - More returnTypes, currently set for single dataset or set of arrays (per col)

UPDATE 19/07/22 - implemented ep.misc.restack(), which includes dim checking and expansions.: Set conformDims True/False for the latter. Eke dim still handled separately here. NOTE: conformDims=False with refKey only works reliably for da return type, otherwise may fail at dataset stacking.

analyseFits(dataRange=None, batches=None, keyDims='t')

Collate fit data from multiple runs.

Data from self. For individual

See https://pemtk.readthedocs.io/en/latest/fitting/PEMtk_analysis_demo_150621-tidy.html for dev code.

backends(backend=None)[source]

Set backends (model functions) for fitting & select for use.

Pass backend = ‘name of backend’ to select and set backend (model function) from the presets.

Pass None to return dict of available backends.

If a function is passed it is set directly.

Settings are pushed to self.backend (name) and self.fitOpts[‘backend’] (function handle).

01/09/22 v2, modified to set directly, rather than by return. 22/08/22 v1

classifyFits(key='fits', dataDict='dfPF', dataType='redchi', group=None, bins=None, labels=None, plotHist=True, propagate=True, batch=False)

Classify fit result sets (DataFrame) based on chisqr or redchi values.

Parameters:

bins (int or list, default = None) – Bins setting for classifier - Set as None for default case, will bin by (min - min*0.05, min*5, 10) - Set as int to define a specific number of (equally spaced) bins, for (min - min*0.05, min*5, numbins) - Set as a list [start,stop] or [start,stop,bins] for specific range. - Set as list (>3 elements) to define specific bin intervals.
dataType (str, default = 'redchi') – DataType to classify. (Currently only supports a single dataType.)
key (str, optional, default = 'fits') – Key into main self.data dictionary.
dataDict (str, optional, default = 'dfPF') – Dataset to use, from self.data[key]. Default case is per-fit metrics.
group (str, optional, default = None) – Name for classifier column in dataframe. If None, defaults to {dataType}Group
labels (list, optional, default = None) – Specify names for group memebers. Defaults to alphabetic labels.
plotHist (bool, optional, default = True) – Plot histogram of results & show tabular summary (from self.data[key][group])
propagate (bool, optional, default = True) – propagate classifications to other data types?
batch (bool, optional, default = False) –
Dynamically group by batches? Note different bins settings per batch in this case, where x is the batch dataframe.

bins = np.linspace(x.min(), x.min() * bins[0], bins[1])

Where default case bins = [1.05, 10]

corrPlot(key='fits', dataDict='dfWide', hue='redchiGroup', hRound=None, dataType=None, level=None, sel=None, selLevel='redchiGroup', dataPF='dfPF', plotDict='plots', remap=None, backend='sns', pairgrid=False, **kwargs)

Similar to paramPlot(), but set for correlation matrix plotter.

This requires wide-form parameters data self.data[‘fits’][‘dfWide’]. Two levels of selection are currently supported (index data only, NOT columns)

**kwargs are passed to Seaborn’s pairplot routine, https://seaborn.pydata.org/generated/seaborn.pairplot.html

TODO: if numerical data columns are added for hue mapping they may result in additional plots too.

TODO: add HV gridmatrix + linked brushing: http://holoviews.org/user_guide/Linked_Brushing.html

TODO: FIX HORRIBLE SELECTION ROUTINES.

fit(fcn_args=None, fcn_kws=None, fitInd=None, keepSubset=False, **kwargs)[source]

Wrapper to run lmfit.Minimizer, for details see https://lmfit.github.io/lmfit-py/fitting.html#lmfit.minimizer.Minimizer

Uses preset self.params for parameters, and self.data[self.subKey] for data.

Default case runs a Levenberg-Marquardt minimization (method=’leastsq’), using scipy.optimize.least_squares(), see the Scipy docs for more options, using the AF fitting model epsproc.geomFunc.afblmXprod() calculation routine. For MF fitting backend set fcn_kws[‘backend’] = ep.geomFunc.mfblmXprod

Parameters:

fcn_args (tuple, optional, default = None) – Positional arguments to pass to the fitting function. If None, will be set as (self.data[self.subKey][‘AFBLM’], self.lmmu, self.basis)
fcn_kws (dict, optional, default = {}) – Keyword arguments to pass to the fitting function. For MF fitting backend set fcn_kws[‘backend’] = ep.geomFunc.mfblmXprod

fitIndint, optional, default = None

If None, will use self.fitInd For parallel usage, supply explicit fitInd instead of using class var to ensure unique key per fit.

keepSubsetbool, optional, default = False

If True, keep a copy of self.data[self.subKey] in self.data[fitInd][self.subKey]

**kwargs

Passed to the fitting functions, for options see:

For lmfit options and defaults see https://lmfit.github.io/lmfit-py/fitting.html
For scipy (lmfit backend) see https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.least_squares.html

18/08/22: debugged for MF fitting case, now can pass MF backend via fcn_kws[‘backend’] = ep.geomFunc.mfblmXprod

02/05/22: added **kwags for backends.

07/09/21: updating for parallel use.: Note that main outputs (self.reults etc.) are now dropped. May want to set to last result?

fitHist(bins='auto', dataType='redchi', key='fits', dataDict='dfPF', thres=None, mask=True, binRange=None, backend='hv', plotDict='plots')

Basic histogram plot of batch fit results.

Parameters:

bins (- Added try/except on MemoryError for) – Bins setting for histogram, essentially as per Numpy routine https://numpy.org/doc/stable/reference/generated/numpy.histogram_bin_edges.html - Set as string for various auto options. - Set as int to define a specific number of (equally spaced) bins. - Set as list to define specific bin intervals. NOTE: some combinations currently not working with ‘hv’ backend.
dataType (str, default = 'redchi') – DataType to histogram. (Currently only supports a single dataType.)
key (str, optional, default = 'fits') – Key into main self.data dictionary.
dataDict (str, optional, default = 'dfPF') – Dataset to use, from self.data[key]. Default case is per-fit metrics.
thres (float, optional, default = None) – Set threshold for plotting, for range [0, thres]. This is passed to self.thresFits() and sets self.data[key][‘mask’]. For more control use binRange setting.
mask (- Threshold value sets) – Use self.data[key][‘mask’] to subselect data if set.
binRange (list, optional, default = None) – Specify range for binning. Note this is only used by HV plotter, and will override bins settings for auto types. Specify bins = int and binRange = [start, stop] for full control.
backend (str, optional, default = 'hv') – Specify backend: - ‘hv’ for Holoviews - ‘pd’ or ‘mpl’ for Pandas.hist()
plotDict (str, optional, default = 'plots') – For hv case, return plot object & data to self.data[plotDict] as [‘fitHistPlot’] and [‘fitHistData’]
Notes –
self.data[key][dataDict][dataType]. (- Data to plot is specified by) –
mask –
mask. (this will overwrite existing selection) –
exists (- If self.data[key]['mask']) –
True. (this will be used if mask =) –
TODO –
bins –
range. (can get this in 'auto' case for a large data) –
(https (- see TMO-DEV) –
Implement (-) –
better (but) –
selection. (with decorators for data checking &) –
chain (- Import) –
ideas.) (- Data subselection by threshold or range. (Again see TMO-DEV routines for) –
cases. (- Fix binning issues with certain) –
stuff (- Holoviews) –
- Fix data subset to plotter, otherwise get full dataset to tooltip.
- Hist bar options to fix. UPDATE: now set to bins=’auto’ as default, which works well.
- See hv.help(histogram) or http://holoviews.org/user_guide/Transforming_Elements.html for more.

fitsReport(key='fits', dataDict='dfPF', thres=None, mask=True)

Generate fit report/metrics, defaults to self.data[‘fits’][‘dfPF’]. Results are printed if self.verbose, and also set to self.fitsSummary.

Parameters:

key (str, optional, default = 'fits') – Key into main self.data dictionary.
dataDict (str, optional, default = 'dfPF') – Dataset to use, from self.data[key]. Default case is per-fit metrics.
thres (float, optional, default = None) – Set threshold for subselection, for range [0, thres]. This is passed to self.thresFits() and sets self.data[key][‘mask’].
mask (bool, optional, default = True) – Use self.data[key][‘mask’] to subselect data if set.

13/09/22 Added more stats to output.

getFitInds(selectors={}, key='fits', dataDict='dfWide', inds='Fit'): Get find indexes from subselection (Pandas datasets)

hvSave(key='plots', pTypes=None, outStem=None, outPath=None, outTypes=['png', 'html'])

Wrapper for quick plot save routine from data dict.

If data is a HV object, set key=None to save directly Update: removed this, since it’s not very clear or useful (missing defaults).

lmPlotFit(keys=None, dataType='AFBLM', Etype='t', thres=0.01, **kwargs): Wrap lmPlot for data + simulation results with default params.

loadFitData(fList=None, dataPath=None, batch=None, **kwargs)

Load data dumps from a file or set of files (and stack).

Currently Pickle files only.

See writeFitData for other options/file types - to be added here too.

NOTE: currently only supports single dir for dir scan. For file lists can pass names only, in which case dataPath will be added, or pass full paths in list.

multiFit(nRange=[0, 10], parallel=True, num_workers=None, randomizeParams=True, seedParams=None)

Basic wrapper for pemtk.fitClass.fit() for multiprocess execution.

Run a batch of fits in parallel, and return results to main class structure.

Currently wraps xyzpy’s run_combos for parallel functionality and data handling. See the xyzpy docs for details.

Note: full set of results currently returned as an Xarray DataSet, then sorted back to base class as self.data[n] (integer n). In future may just want to use Xarray return directly?

Parameters:

nRange (list) – Fit indexers. Set [nStart, nStop], full run will be set as list(range(nRange[0],nRange[1])). TODO: more flexibility here, and auto.
parallel (bool, default = True) – Run fit jobs in parallel? Note - in testing this seemed to be ignored?
num_workers (int, default = None) – Number of cores to use if parallel job. Currently set to default to ~90% of mp.cpu_count()
randomizeParams (bool, default = True) – Randomize seed parameters per fit?
seedParams (int, default = None) – NOT IMPLEMENTED, but will provide an option to seed fits from a previous result.

paramFidelity(key='fits', dataDict='dfWide', refDict='dfRef', phaseCorr=True, phaseCorrParams={})

Quick prototype for differences/fidelity per fit parameter set compared to refs.

Similar to paramsCompare, except per fit, rather than for aggregate data.

19/05/22 v1

paramPlot(selectors={'Type': 'm'}, hue=None, hRound=7, x='Param', y='value', key='fits', dataDict='dfWide', dataPF='dfPF', plotDict='plots', hvType=None, plotScatter=True, remap=None, backend='sns', returnFlag=False, **kwargs)

Basic scatter-plot of parameter values by name/type.

Currently supports Seaborn for back-end only, and requires wide-form dataDict as input.

Parameters:

selectors (dict, optional, default = {'Type':'m'}) – Used to cross-section data (pd.xs).
hue (string, optional, default = None) – Variable name to use for colour mapping (scatter plot points).
hRound (int, optional, default = 7) – Rounding for colour mapping scale. May need tweaking for cases with very closely clustered values.
x='Param' –
y='value' –
'fits' (key =) –
'dfWide' (dataDict =) –
'dfPF' (dataPF =) –
'plots' (plotDict =) –
None (hvType =) –
True (plotScatter =) –

TODO: - better and more concise dim handling for multi-level selection. Integrate to single dict of selectors? (See tmo-dev?) - Box/violin plot options. Also option to drop scatter plot in these cases (now partially implemented for HV only). - Add ref data to plots! See e.g. paramsFidelity and paramsCompare - HV support?

Basic support now in place, but cmapping needs some work for non-cat data. SEE NOTES ELSEWHERE!

Also breaks for subselection case unless another hue dim is set.

18/11/21: better, but messy, support now in place. Includes Violin & BoxWhisker options.

TODO: implement grouping and/or holomap for extra dims.

Currently: have selLevel and hue, which must be different in general. - sel and selLevel define subselection by a value in a column, e.g. sel = ‘E’, selLevel = ‘redchiGroup’ for values E in column ‘selLevel’ - hue specifies hue mapping for Seaborn plot, which must be a column name. - If hue is not in input data, it will be taken from the per-fit dataframe.

Ref: Seaborn catplot, https://seaborn.pydata.org/generated/seaborn.catplot.html Ref: HV scatter,

For usage notes see https://pemtk.readthedocs.io/en/latest/fitting/PEMtk_fitting_multiproc_class_analysis_141121-tidy.html

05/07/22: marginally improved plot type handling for HV case. 19/05/22: updated for multiple XS from selectors, now passed as dictionary with items {level:value}

paramsCompare(params=None, ref=None, phaseCorr=True, phaseCorrParams={})

Compare extracted parameter set with reference data.

NOTE: currently assumes self.paramsSummary and self.params for aggregate fit results & reference data.

Parameters:

params (pd.DataFrame, optional, default = None) – Fit parameters to tabulate. Will use self.paramsSummary in default case (and run self.paramsReport() if missing).
ref (pd.DataFrame, optional, default = None) – Reference parameter set to compare with. Will use self.data[‘fits’][‘dfRef’] in default case (and attempt to set this if missing).
phaseCorr (bool, optional, default = True) – Run phase correction routine for reference parameters.
phaseCorrParams (dict, optional, default = {}) – Pass dictionary to additionally set parameters for phaseCorrection() method. Default cases runs with {‘dataDict’:’dfRef’, ‘dataOut’:’dfRefPC’, ‘useRef’:False}, these parameters will update the defaults. Note - these params are only used if phaseCorr = True.

TODO:

Better dim handling.
Generalize to compare any pair of parameter sets. (Just need to loop over param sets and check attrs labels.)

paramsReport(key='fits', dataDict='dfWide', groups='Type', inds={}, aggList=['min', 'mean', 'median', 'max', 'std', 'var'])

Generate parameter report/metrics, defaults to self.data[‘fits’][‘dfWide’]. Results are printed if self.verbose, and also set to self.paramsSummary.

Parameters:

key (str, optional, default = 'fits') – Key into main self.data dictionary.
dataDict (str, optional, default = 'dfWide') – Dataset to use, from self.data[key]. Default case is per-fit metrics.
groups (str or list of strings, optional, default = 'Type') – Additional groupings to use for output (pd.groupby).
inds (dict, optional, default = {}) – Set of indexs to subselect from, as dictionary items. E.g. xs = {‘redchiGroup’:’C’} will select group ‘C’.
aggList (list, optional, default = ['min', 'mean', 'median', 'max', 'std', 'var']) – List of aggregator functions to use. These are passed to Pandas.agg(), a list of common functions can be found at https://pandas.pydata.org/docs/user_guide/basics.html#descriptive-statistics
TODO (consolidate indexing methods with other functions & extend to thesholds and cross-ref (column) values.) –

pdConv(fitVars=['success', 'chisqr', 'redchi'], paramVars=['value', 'stderr', 'vary', 'expr'], dataRange=None, batches=None)

Basic conversion for set of fit results > Pandas, long format.

Extract fit and parameter results from lmFit objects and stack to PD dataframe.

Parameters:

fitVars (optional, list, default = ['success', 'chisqr', 'redchi']) – Values to extract from lmfit result object (per fit).
paramVars (optional, list, default = ['value', 'stderr', 'vary', 'expr']) – Values to extract from lmfit params object (per parameter per fit).
dataRange (optional, list, default = None) – Range of indexes to use, defaults to [0, self.fitInd].
batches (optional, int, default = None) – Additional batch of labelling for fits. - If int, label as ceil(fit #)/batches. E.g. batches = 100 will label fits per 100. - If list, use as labels per fit. (NOT YET IMPLEMENTED)
Todo –
options (- Additional batching) –
case. (inc. by file for multiple read) –
13/07/22 (Added type checking and casting, this seems to be an issue now/sometimes (PD version?) - currently defaulting all types to 'object' in testing, although was working previously!) –

pdConvRef(paramVars=['value'], outputIndex=['Fit', 'Type', 'pn'])

Convert reference params set to reference PD table.

Basic routine stripped from main pdConv() method for reuse elsewhere.

TODO: add flexibility here.

13/07/22: Added type checking and casting, this seems to be an issue now/sometimes (PD version?) - currently defaulting all types to ‘object’ in testing, although was working previously!

pdConvSetFit(matE, colDim='it')

Restack matE to pd.DataFrame and force to 1D.

Utility function for setting up fit parameter sets.

phaseCorrection(key='fits', dataDict='dfLong', dataOut='dfWide', renorm=True, dataRef='dfRef', useRef=True, returnFlag=False, **kwargs)

Wrapper for ._util.phaseCorrection() (functional form).

Parameters:

key (str, default = 'fits') – Data key for analysis dataframe.
dataDict (str, default = 'dfLong') – Data dict for analysis dataframe. Note default case uses self.data[key][dataDict]
dataOut (str, default = 'dfWide') – Output dict key for phase-corrected dataframe.
renorm (bool, default = True) – Also set renormalised magnitudes if True (via pemtk.fit._util.renormMagnitudes())
dataRef (str, default = 'deRef') – Reference dict key for phase. Default case uses self.data[key][dataRef] Note this is ONLY USED IF useRef = True is set.
useRef (bool, default = True) – Use reference phase from self.data[key][dataRef]? Otherwise ref phase will be set to phasesIn.columns[0] (as per pemtk.fit._util.phaseCorrection())
returnFlag (bool, default = True) – If True return phase-corrected data. If False, set data to self.data[key][dataOut]
**kwargs – Passed to pemtk.fit._util.phaseCorrection()

NOTE: this currently only sets phaseCorrected data in wide-form dataset, self.data[key][dataOut]. May want to push to long-form too? (Otherwise this will be lost by self._setWide().)

TODO: tidy up options here, a bit knotty.

processedToHDF5(dataKey='fits', dataTypes=['dfLong', 'AFxr'], fType='pdHDF', outStem=None, multiFile=False, timeStamp=True, **kwargs)

Save processed fit data to HDF5.

Write self.data[‘fits’][‘dfLong’] and self.data[‘fits’][‘AFxr’] to file.

Wraps self.writeFitData for processed data types.

TODO: generalise to arb set of dataTypes and add checks.

randomizeParams()[source]: Set random values for self.params.

reconParams(params=None, lmmuList=None)[source]

Convert parameters object > Xarray for tensor multiplication.

VERY UGLY! Should be a neater way to do this with existing Xarray and just replace/link values (i.e. pointers)?

… but it does work.

setADMs(**kwargs)[source]: Thin wrapper for ep.setADMs(), pass args & set returns to self.data[‘ADM’]

setAggMatE(key='agg', dataOut=None, compDataLabels={'comp': ['m', 'p'], 'compC': ['n', 'pc']}, simpleForm=False, dropLabelsList=['Cont', 'Targ', 'Total', 'mu', 'it'], dropLevelsList=['Targ', 'Total', 'it'])

Set aggregate results to matE format (Pandas)

If key=’ref’ use self.data[self.subKey][‘matE’] instead of aggregate data

18/07/22 - quickly hacked in ref data case for consistent results tabulations, probably already have this stuff elsewhere.: See also pemtk.fit._conv.pdConvRef() and self.setMatEFit()

setClassArgs(args)

setData(keyExpt=None, keyData=None)[source]

Data prototype - this will be used to set experimental data to the master structure. For now, set data by passing, and computational data can be set for testing, by passing a key. This basically assumes that the expt. provides AFBLMs.

TO CONSIDER:

Data format, file IO for HDF5?
Routines to read VMI images and process etc, planned for experimental code-base.
Further simulation options, e.g. add noise etc., for testing.

setMatE(**kwargs)[source]: Thin wrapper for ep.setMatE.setMatE(), pass args & set returns to self.data[‘matE’]

setMatEFit(matE=None, paramsCons='auto', refPhase=0, colDim='it', verbose=1)[source]

Convert an input Xarray into (mag,phase) array of matrix elements for fitting routine.

Parameters:

matE (Xarray) – Input set of matrix elements, used to set allowed (l,m,mu) and input parameters. If not passed, use self.data[self.subKey][‘matE’].
paramsCons (dict, optional, default = 'auto') – Input dictionary of constraints (expressions) to be set for the parameters. See https://lmfit.github.io/lmfit-py/constraints.html If ‘auto’, parameters will be set via self.symCheck()
refPhase (int or string, default = 0) – Set reference phase by integer index or name (string). If set to None (or other types) no reference phase will be set.
colDims (dict, default = 'it') –
Quick hack to allow for restacking via ep.multiDimXrToPD, this will set to cols = ‘it’, then restack to 1D dataframe. This should always work for setting matE > fit parameters, but can be overridden if required.

This is convienient for converting to Pandas > lmfit inputs, but should redo directly from Xarray for more robust treatment. For ePS matrix elements the default should always work, although will drop degenerate cases (it>1). but shouldn’t matter here. TODO:
- make this better, support for multiple selectors.
- For eps case, matE.pd may already be set?

Returns:

params (lmfit parameters object) – Set of fitting parameters.
lmmu (dict) – List of states and mappings from states to fitting parameters (names & indexes).

29/06/21: Adapted to use ‘it’ on restack, then set to single-column with dummy dim. No selection methods, use self.setSubset() for this.

setPolGeoms(**kwargs)[source]: Thin wrapper for ep.setPolGeoms(), pass args & set returns to self.data[‘pol’]

setSubset(dataKey, dataType, sliceParams=None, subKey=None, resetSelectors=False, **kwargs)[source]

Threshold and subselect on matrix elements.

Wrapper for epsproc.Esubset() and epsproc.matEleSelector(), to handle data array slicing & subselection from params dict.

Subselected elements are set to self.data[subKey][dataType], where subKey defaults to self.subKey (uses existing .data structure for compatibility with existing functions!)

To additionally slice data, set dict of parameters sliceParams = {‘sliceDim’:[start, stop, step]}

To reset existing parameters, pass resetSelectors = True.

To do: better slice handling - will likely have issues with different slices for different dataTypes in current form.

setTimeStampedFileName(outStem=None, n=None, ext='pickle', timeString=None, timeFormat='%d%m%y_%H-%M-%S')

Set unique filename as f’{outStem}_n{n}_{timeString.strftime(“%d%m%y_%H-%M-%S”)}.{ext}’

Parameters:

outStem (str, optional, default = None) – Stem for output file. If None, set to ‘PEMtk_data_dump’
n (int, optional, default = None) – Int index to include in file name. If None, this will be omitted.
ext (str, optional, default = 'pickle') – File ending.
timeString (Datatime object, default = None) – Timestamp for the file. If None, current time will be used.
timeFormat (Datatime format string, optional, default = "%d%m%y_%H-%M-%S") –
TODO (additional formatting options, data[key][item] naming option?) –

setWeights(wConfig=None, keyExpt=None, keyData=None, **kwargs)[source]

Wrapper for setting weights for/from data. Basically follows self.setData, with some additional options.

Will set self.data[keyExpt][‘weights’] from existing data if keyData is a string, or from keyData as passed otherwise.

Parameters:

wConfig (optional, str, default = None) –

Additional handling for weights.

’poission’, set Poissionian weights to match data dims using self.setPoissWeights()
’errors’, set weights as 1/(self.data[keyExpt][‘weights’]**2)

symCheck(pdTest=None, matE=None, colDim='it', lams=None, verbose=1)

Check symmetrization of input matrix elements.

Parameters:

pdTest (pandas DataFrame, optional, default = None) – Matrix elements to check, as set in a Pandas table format. Currently expects 1D array of matrix elements, as set in setMatEFit(). If None, matE will be used to create the test data.
matE (Xarray, optional, default = None) – Matrix elements to check. If None, then uses self.data[self.subKey][‘matE’]
colDims (dict, default = 'it') – Quick hack to allow for restacking via ep.multiDimXrToPD, this will set to cols = ‘it’, then restack to 1D dataframe. This should always work for setting matE > fit parameters, but can be overridden if required.
lams (dict, optional, default = None) – Dictionary of test lambda functions. If not set, will use lams = symCheckDefns()

Returns:

dict – Set of parameter mappings/constraints, suitable to use for self.setMatEFit(paramsCons = newDict)
dict of DataFrames –
- ‘unique’, Reduced set of unique matrix elements only.
- ’constraints’, List of constraints (as per parameters dict).
- ’tests’, Full list of tests & relations found.

TODO:

Wrap for class.
Input checks and set default cases (see setMatEFit()). Should tidy to single input and then check type?

thresFits(thres=None, dataType=None, key='fits', dataDict='dfLong')

Very basic threshold mask for Pandas DataFrame. Note mask = True for values < thres.

For more sophisticated multi-param filtering, see TMO-DEV filterData. Should be applicable here for Pandas data too…?

TODO: more sophisticated methods, np or pd masked arrays?

writeFitData(dataPath=None, fName=None, outStem=None, n=None, fType='pickle', ext=None, **kwargs): Dump fit data with various backends.