pemtk.fit._analysis

Module Contents

Functions

analyseFits(self[, dataRange, batches, keyDims])

Collate fit data from multiple runs.

_setData(self, key, dataDict[, dataType, thres, mask])

_setWide(self[, indexDims, valueDims, key, dataDict, ...])

Set "wide" format data table from current dfLong array, with optional index dims.

fitsReport(self[, key, dataDict, thres, mask])

Generate fit report/metrics, defaults to self.data['fits']['dfPF'].

paramsReport(self[, key, dataDict, groups, inds, aggList])

Generate parameter report/metrics, defaults to self.data['fits']['dfWide'].

paramsCompare(self[, params, ref, phaseCorr, ...])

Compare extracted parameter set with reference data.

paramFidelity(self[, key, dataDict, refDict, ...])

Quick prototype for differences/fidelity per fit parameter set compared to refs.

classifyFits(self[, key, dataDict, dataType, group, ...])

Classify fit result sets (DataFrame) based on chisqr or redchi values.

phaseCorrection(self[, key, dataDict, dataOut, ...])

Wrapper for ._util.phaseCorrection() (functional form).

fitHist(self[, bins, dataType, key, dataDict, thres, ...])

Basic histogram plot of batch fit results.

_mergePFLong(self, pData, key, dataPF, hue, hRound)

Merge per-fit data to existing dataset & force to long form.

corrPlot(self[, key, dataDict, hue, hRound, dataType, ...])

Similar to paramPlot(), but set for correlation matrix plotter.

paramPlot(self[, selectors, hue, hRound, x, y, key, ...])

Basic scatter-plot of parameter values by name/type.

pemtk.fit._analysis.analyseFits(self, dataRange=None, batches=None, keyDims='t')[source]

Collate fit data from multiple runs.

Data from self. For individual

See https://pemtk.readthedocs.io/en/latest/fitting/PEMtk_analysis_demo_150621-tidy.html for dev code.

pemtk.fit._analysis._setData(self, key, dataDict, dataType=None, thres=None, mask=True)[source]
pemtk.fit._analysis._setWide(self, indexDims=['Fit', 'Type', 'chisqrGroup', 'redchiGroup', 'batch'], valueDims=['value'], key='fits', dataDict='dfLong', dataWide='dfWide', dataIn=None, returnFlag=False)[source]

Set “wide” format data table from current dfLong array, with optional index dims.

Defaults to self.data[key][dataDict], or pass dataIn to use this instead and return wide-form data directly.

NOTE: current form checks indexDims for valid subselection. May want to add smarter dim matching for groups?

pemtk.fit._analysis.fitsReport(self, key='fits', dataDict='dfPF', thres=None, mask=True)[source]

Generate fit report/metrics, defaults to self.data[‘fits’][‘dfPF’]. Results are printed if self.verbose, and also set to self.fitsSummary.

Parameters:
  • key (str, optional, default = 'fits') – Key into main self.data dictionary.

  • dataDict (str, optional, default = 'dfPF') – Dataset to use, from self.data[key]. Default case is per-fit metrics.

  • thres (float, optional, default = None) – Set threshold for subselection, for range [0, thres]. This is passed to self.thresFits() and sets self.data[key][‘mask’].

  • mask (bool, optional, default = True) – Use self.data[key][‘mask’] to subselect data if set.

13/09/22 Added more stats to output.

pemtk.fit._analysis.paramsReport(self, key='fits', dataDict='dfWide', groups='Type', inds={}, aggList=['min', 'mean', 'median', 'max', 'std', 'var'])[source]

Generate parameter report/metrics, defaults to self.data[‘fits’][‘dfWide’]. Results are printed if self.verbose, and also set to self.paramsSummary.

Parameters:
  • key (str, optional, default = 'fits') – Key into main self.data dictionary.

  • dataDict (str, optional, default = 'dfWide') – Dataset to use, from self.data[key]. Default case is per-fit metrics.

  • groups (str or list of strings, optional, default = 'Type') – Additional groupings to use for output (pd.groupby).

  • inds (dict, optional, default = {}) – Set of indexs to subselect from, as dictionary items. E.g. xs = {‘redchiGroup’:’C’} will select group ‘C’.

  • aggList (list, optional, default = ['min', 'mean', 'median', 'max', 'std', 'var']) – List of aggregator functions to use. These are passed to Pandas.agg(), a list of common functions can be found at https://pandas.pydata.org/docs/user_guide/basics.html#descriptive-statistics

  • TODO (consolidate indexing methods with other functions & extend to thesholds and cross-ref (column) values.) –

pemtk.fit._analysis.paramsCompare(self, params=None, ref=None, phaseCorr=True, phaseCorrParams={})[source]

Compare extracted parameter set with reference data.

NOTE: currently assumes self.paramsSummary and self.params for aggregate fit results & reference data.

Parameters:
  • params (pd.DataFrame, optional, default = None) – Fit parameters to tabulate. Will use self.paramsSummary in default case (and run self.paramsReport() if missing).

  • ref (pd.DataFrame, optional, default = None) – Reference parameter set to compare with. Will use self.data[‘fits’][‘dfRef’] in default case (and attempt to set this if missing).

  • phaseCorr (bool, optional, default = True) – Run phase correction routine for reference parameters.

  • phaseCorrParams (dict, optional, default = {}) – Pass dictionary to additionally set parameters for phaseCorrection() method. Default cases runs with {‘dataDict’:’dfRef’, ‘dataOut’:’dfRefPC’, ‘useRef’:False}, these parameters will update the defaults. Note - these params are only used if phaseCorr = True.

TODO:

  • Better dim handling.

  • Generalize to compare any pair of parameter sets. (Just need to loop over param sets and check attrs labels.)

pemtk.fit._analysis.paramFidelity(self, key='fits', dataDict='dfWide', refDict='dfRef', phaseCorr=True, phaseCorrParams={})[source]

Quick prototype for differences/fidelity per fit parameter set compared to refs.

Similar to paramsCompare, except per fit, rather than for aggregate data.

19/05/22 v1

pemtk.fit._analysis.classifyFits(self, key='fits', dataDict='dfPF', dataType='redchi', group=None, bins=None, labels=None, plotHist=True, propagate=True, batch=False)[source]

Classify fit result sets (DataFrame) based on chisqr or redchi values.

Parameters:
  • bins (int or list, default = None) – Bins setting for classifier - Set as None for default case, will bin by (min - min*0.05, min*5, 10) - Set as int to define a specific number of (equally spaced) bins, for (min - min*0.05, min*5, numbins) - Set as a list [start,stop] or [start,stop,bins] for specific range. - Set as list (>3 elements) to define specific bin intervals.

  • dataType (str, default = 'redchi') – DataType to classify. (Currently only supports a single dataType.)

  • key (str, optional, default = 'fits') – Key into main self.data dictionary.

  • dataDict (str, optional, default = 'dfPF') – Dataset to use, from self.data[key]. Default case is per-fit metrics.

  • group (str, optional, default = None) – Name for classifier column in dataframe. If None, defaults to {dataType}Group

  • labels (list, optional, default = None) – Specify names for group memebers. Defaults to alphabetic labels.

  • plotHist (bool, optional, default = True) – Plot histogram of results & show tabular summary (from self.data[key][group])

  • propagate (bool, optional, default = True) – propagate classifications to other data types?

  • batch (bool, optional, default = False) –

    Dynamically group by batches? Note different bins settings per batch in this case, where x is the batch dataframe.

    bins = np.linspace(x.min(), x.min() * bins[0], bins[1])

    Where default case bins = [1.05, 10]

pemtk.fit._analysis.phaseCorrection(self, key='fits', dataDict='dfLong', dataOut='dfWide', renorm=True, dataRef='dfRef', useRef=True, returnFlag=False, **kwargs)[source]

Wrapper for ._util.phaseCorrection() (functional form).

Parameters:
  • key (str, default = 'fits') – Data key for analysis dataframe.

  • dataDict (str, default = 'dfLong') – Data dict for analysis dataframe. Note default case uses self.data[key][dataDict]

  • dataOut (str, default = 'dfWide') – Output dict key for phase-corrected dataframe.

  • renorm (bool, default = True) – Also set renormalised magnitudes if True (via pemtk.fit._util.renormMagnitudes())

  • dataRef (str, default = 'deRef') – Reference dict key for phase. Default case uses self.data[key][dataRef] Note this is ONLY USED IF useRef = True is set.

  • useRef (bool, default = True) – Use reference phase from self.data[key][dataRef]? Otherwise ref phase will be set to phasesIn.columns[0] (as per pemtk.fit._util.phaseCorrection())

  • returnFlag (bool, default = True) – If True return phase-corrected data. If False, set data to self.data[key][dataOut]

  • **kwargs – Passed to pemtk.fit._util.phaseCorrection()

NOTE: this currently only sets phaseCorrected data in wide-form dataset, self.data[key][dataOut]. May want to push to long-form too? (Otherwise this will be lost by self._setWide().)

TODO: tidy up options here, a bit knotty.

pemtk.fit._analysis.fitHist(self, bins='auto', dataType='redchi', key='fits', dataDict='dfPF', thres=None, mask=True, binRange=None, backend='hv', plotDict='plots')[source]

Basic histogram plot of batch fit results.

Parameters:
  • bins (- Added try/except on MemoryError for) – Bins setting for histogram, essentially as per Numpy routine https://numpy.org/doc/stable/reference/generated/numpy.histogram_bin_edges.html - Set as string for various auto options. - Set as int to define a specific number of (equally spaced) bins. - Set as list to define specific bin intervals. NOTE: some combinations currently not working with ‘hv’ backend.

  • dataType (str, default = 'redchi') – DataType to histogram. (Currently only supports a single dataType.)

  • key (str, optional, default = 'fits') – Key into main self.data dictionary.

  • dataDict (str, optional, default = 'dfPF') – Dataset to use, from self.data[key]. Default case is per-fit metrics.

  • thres (float, optional, default = None) – Set threshold for plotting, for range [0, thres]. This is passed to self.thresFits() and sets self.data[key][‘mask’]. For more control use binRange setting.

  • mask (- Threshold value sets) – Use self.data[key][‘mask’] to subselect data if set.

  • binRange (list, optional, default = None) – Specify range for binning. Note this is only used by HV plotter, and will override bins settings for auto types. Specify bins = int and binRange = [start, stop] for full control.

  • backend (str, optional, default = 'hv') – Specify backend: - ‘hv’ for Holoviews - ‘pd’ or ‘mpl’ for Pandas.hist()

  • plotDict (str, optional, default = 'plots') – For hv case, return plot object & data to self.data[plotDict] as [‘fitHistPlot’] and [‘fitHistData’]

  • Notes

  • self.data[key][dataDict][dataType]. (- Data to plot is specified by) –

  • mask

  • mask. (this will overwrite existing selection) –

  • exists (- If self.data[key]['mask']) –

  • True. (this will be used if mask =) –

  • TODO

  • bins

  • range. (can get this in 'auto' case for a large data) –

  • (https (- see TMO-DEV) –

  • Implement (-) –

  • better (but) –

  • selection. (with decorators for data checking &) –

  • chain (- Import) –

  • ideas.) (- Data subselection by threshold or range. (Again see TMO-DEV routines for) –

  • cases. (- Fix binning issues with certain) –

  • stuff (- Holoviews) –

pemtk.fit._analysis._mergePFLong(self, pData, key, dataPF, hue, hRound)[source]

Merge per-fit data to existing dataset & force to long form.

Used for Seaborn plotting routines with hue mapping.

pemtk.fit._analysis.corrPlot(self, key='fits', dataDict='dfWide', hue='redchiGroup', hRound=None, dataType=None, level=None, sel=None, selLevel='redchiGroup', dataPF='dfPF', plotDict='plots', remap=None, backend='sns', pairgrid=False, **kwargs)[source]

Similar to paramPlot(), but set for correlation matrix plotter.

This requires wide-form parameters data self.data[‘fits’][‘dfWide’]. Two levels of selection are currently supported (index data only, NOT columns)

**kwargs are passed to Seaborn’s pairplot routine, https://seaborn.pydata.org/generated/seaborn.pairplot.html

TODO: if numerical data columns are added for hue mapping they may result in additional plots too.

TODO: add HV gridmatrix + linked brushing: http://holoviews.org/user_guide/Linked_Brushing.html

TODO: FIX HORRIBLE SELECTION ROUTINES.

pemtk.fit._analysis.paramPlot(self, selectors={'Type': 'm'}, hue=None, hRound=7, x='Param', y='value', key='fits', dataDict='dfWide', dataPF='dfPF', plotDict='plots', hvType=None, plotScatter=True, remap=None, backend='sns', returnFlag=False, **kwargs)[source]

Basic scatter-plot of parameter values by name/type.

Currently supports Seaborn for back-end only, and requires wide-form dataDict as input.

Parameters:
  • selectors (dict, optional, default = {'Type':'m'}) – Used to cross-section data (pd.xs).

  • hue (string, optional, default = None) – Variable name to use for colour mapping (scatter plot points).

  • hRound (int, optional, default = 7) – Rounding for colour mapping scale. May need tweaking for cases with very closely clustered values.

  • x='Param'

  • y='value'

  • 'fits' (key =) –

  • 'dfWide' (dataDict =) –

  • 'dfPF' (dataPF =) –

  • 'plots' (plotDict =) –

  • None (hvType =) –

  • True (plotScatter =) –

TODO: - better and more concise dim handling for multi-level selection. Integrate to single dict of selectors? (See tmo-dev?) - Box/violin plot options. Also option to drop scatter plot in these cases (now partially implemented for HV only). - Add ref data to plots! See e.g. paramsFidelity and paramsCompare - HV support?

  • Basic support now in place, but cmapping needs some work for non-cat data. SEE NOTES ELSEWHERE!

  • Also breaks for subselection case unless another hue dim is set.

  • 18/11/21: better, but messy, support now in place. Includes Violin & BoxWhisker options.

  • TODO: implement grouping and/or holomap for extra dims.

Currently: have selLevel and hue, which must be different in general. - sel and selLevel define subselection by a value in a column, e.g. sel = ‘E’, selLevel = ‘redchiGroup’ for values E in column ‘selLevel’ - hue specifies hue mapping for Seaborn plot, which must be a column name. - If hue is not in input data, it will be taken from the per-fit dataframe.

Ref: Seaborn catplot, https://seaborn.pydata.org/generated/seaborn.catplot.html Ref: HV scatter,

For usage notes see https://pemtk.readthedocs.io/en/latest/fitting/PEMtk_fitting_multiproc_class_analysis_141121-tidy.html

05/07/22: marginally improved plot type handling for HV case. 19/05/22: updated for multiple XS from selectors, now passed as dictionary with items {level:value}