`pemtk.fit._analysis`

Module Contents

Functions

`analyseFits`(self[, dataRange, batches, keyDims])	Collate fit data from multiple runs.
`_setData`(self, key, dataDict[, dataType, thres, mask])
`_setWide`(self[, indexDims, valueDims, key, dataDict, ...])	Set "wide" format data table from current dfLong array, with optional index dims.
`fitsReport`(self[, key, dataDict, thres, mask])	Generate fit report/metrics, defaults to self.data['fits']['dfPF'].
`paramsReport`(self[, key, dataDict, groups, inds, aggList])	Generate parameter report/metrics, defaults to self.data['fits']['dfWide'].
`paramsCompare`(self[, params, ref, phaseCorr, ...])	Compare extracted parameter set with reference data.
`paramFidelity`(self[, key, dataDict, refDict, ...])	Quick prototype for differences/fidelity per fit parameter set compared to refs.
`classifyFits`(self[, key, dataDict, dataType, group, ...])	Classify fit result sets (DataFrame) based on chisqr or redchi values.
`phaseCorrection`(self[, key, dataDict, dataOut, ...])	Wrapper for ._util.phaseCorrection() (functional form).
`fitHist`(self[, bins, dataType, key, dataDict, thres, ...])	Basic histogram plot of batch fit results.
`_mergePFLong`(self, pData, key, dataPF, hue, hRound)	Merge per-fit data to existing dataset & force to long form.
`corrPlot`(self[, key, dataDict, hue, hRound, dataType, ...])	Similar to paramPlot(), but set for correlation matrix plotter.
`paramPlot`(self[, selectors, hue, hRound, x, y, key, ...])	Basic scatter-plot of parameter values by name/type.

pemtk.fit._analysis.analyseFits(self, dataRange=None, batches=None, keyDims='t')[source]

Collate fit data from multiple runs.

Data from self. For individual

See https://pemtk.readthedocs.io/en/latest/fitting/PEMtk_analysis_demo_150621-tidy.html for dev code.

pemtk.fit._analysis._setData(self, key, dataDict, dataType=None, thres=None, mask=True)[source]

pemtk.fit._analysis._setWide(self, indexDims=['Fit', 'Type', 'chisqrGroup', 'redchiGroup', 'batch'], valueDims=['value'], key='fits', dataDict='dfLong', dataWide='dfWide', dataIn=None, returnFlag=False)[source]

Set “wide” format data table from current dfLong array, with optional index dims.

Defaults to self.data[key][dataDict], or pass dataIn to use this instead and return wide-form data directly.

NOTE: current form checks indexDims for valid subselection. May want to add smarter dim matching for groups?

pemtk.fit._analysis.fitsReport(self, key='fits', dataDict='dfPF', thres=None, mask=True)[source]

Generate fit report/metrics, defaults to self.data[‘fits’][‘dfPF’]. Results are printed if self.verbose, and also set to self.fitsSummary.

Parameters:

key (str, optional, default = 'fits') – Key into main self.data dictionary.
dataDict (str, optional, default = 'dfPF') – Dataset to use, from self.data[key]. Default case is per-fit metrics.
thres (float, optional, default = None) – Set threshold for subselection, for range [0, thres]. This is passed to self.thresFits() and sets self.data[key][‘mask’].
mask (bool, optional, default = True) – Use self.data[key][‘mask’] to subselect data if set.

13/09/22 Added more stats to output.

pemtk.fit._analysis.paramsReport(self, key='fits', dataDict='dfWide', groups='Type', inds={}, aggList=['min', 'mean', 'median', 'max', 'std', 'var'])[source]

Generate parameter report/metrics, defaults to self.data[‘fits’][‘dfWide’]. Results are printed if self.verbose, and also set to self.paramsSummary.

Parameters:

key (str, optional, default = 'fits') – Key into main self.data dictionary.
dataDict (str, optional, default = 'dfWide') – Dataset to use, from self.data[key]. Default case is per-fit metrics.
groups (str or list of strings, optional, default = 'Type') – Additional groupings to use for output (pd.groupby).
inds (dict, optional, default = {}) – Set of indexs to subselect from, as dictionary items. E.g. xs = {‘redchiGroup’:’C’} will select group ‘C’.
aggList (list, optional, default = ['min', 'mean', 'median', 'max', 'std', 'var']) – List of aggregator functions to use. These are passed to Pandas.agg(), a list of common functions can be found at https://pandas.pydata.org/docs/user_guide/basics.html#descriptive-statistics
TODO (consolidate indexing methods with other functions & extend to thesholds and cross-ref (column) values.) –

pemtk.fit._analysis.paramsCompare(self, params=None, ref=None, phaseCorr=True, phaseCorrParams={})[source]

Compare extracted parameter set with reference data.

NOTE: currently assumes self.paramsSummary and self.params for aggregate fit results & reference data.

Parameters:

params (pd.DataFrame, optional, default = None) – Fit parameters to tabulate. Will use self.paramsSummary in default case (and run self.paramsReport() if missing).
ref (pd.DataFrame, optional, default = None) – Reference parameter set to compare with. Will use self.data[‘fits’][‘dfRef’] in default case (and attempt to set this if missing).
phaseCorr (bool, optional, default = True) – Run phase correction routine for reference parameters.
phaseCorrParams (dict, optional, default = {}) – Pass dictionary to additionally set parameters for phaseCorrection() method. Default cases runs with {‘dataDict’:’dfRef’, ‘dataOut’:’dfRefPC’, ‘useRef’:False}, these parameters will update the defaults. Note - these params are only used if phaseCorr = True.

TODO:

Better dim handling.
Generalize to compare any pair of parameter sets. (Just need to loop over param sets and check attrs labels.)

pemtk.fit._analysis.paramFidelity(self, key='fits', dataDict='dfWide', refDict='dfRef', phaseCorr=True, phaseCorrParams={})[source]

Quick prototype for differences/fidelity per fit parameter set compared to refs.

Similar to paramsCompare, except per fit, rather than for aggregate data.

19/05/22 v1

pemtk.fit._analysis.classifyFits(self, key='fits', dataDict='dfPF', dataType='redchi', group=None, bins=None, labels=None, plotHist=True, propagate=True, batch=False)[source]

Classify fit result sets (DataFrame) based on chisqr or redchi values.

Parameters:

bins (int or list, default = None) – Bins setting for classifier - Set as None for default case, will bin by (min - min*0.05, min*5, 10) - Set as int to define a specific number of (equally spaced) bins, for (min - min*0.05, min*5, numbins) - Set as a list [start,stop] or [start,stop,bins] for specific range. - Set as list (>3 elements) to define specific bin intervals.
dataType (str, default = 'redchi') – DataType to classify. (Currently only supports a single dataType.)
key (str, optional, default = 'fits') – Key into main self.data dictionary.
dataDict (str, optional, default = 'dfPF') – Dataset to use, from self.data[key]. Default case is per-fit metrics.
group (str, optional, default = None) – Name for classifier column in dataframe. If None, defaults to {dataType}Group
labels (list, optional, default = None) – Specify names for group memebers. Defaults to alphabetic labels.
plotHist (bool, optional, default = True) – Plot histogram of results & show tabular summary (from self.data[key][group])
propagate (bool, optional, default = True) – propagate classifications to other data types?
batch (bool, optional, default = False) –
Dynamically group by batches? Note different bins settings per batch in this case, where x is the batch dataframe.

bins = np.linspace(x.min(), x.min() * bins[0], bins[1])

Where default case bins = [1.05, 10]

pemtk.fit._analysis.phaseCorrection(self, key='fits', dataDict='dfLong', dataOut='dfWide', renorm=True, dataRef='dfRef', useRef=True, returnFlag=False, **kwargs)[source]

Wrapper for ._util.phaseCorrection() (functional form).

Parameters:

key (str, default = 'fits') – Data key for analysis dataframe.
dataDict (str, default = 'dfLong') – Data dict for analysis dataframe. Note default case uses self.data[key][dataDict]
dataOut (str, default = 'dfWide') – Output dict key for phase-corrected dataframe.
renorm (bool, default = True) – Also set renormalised magnitudes if True (via pemtk.fit._util.renormMagnitudes())
dataRef (str, default = 'deRef') – Reference dict key for phase. Default case uses self.data[key][dataRef] Note this is ONLY USED IF useRef = True is set.
useRef (bool, default = True) – Use reference phase from self.data[key][dataRef]? Otherwise ref phase will be set to phasesIn.columns[0] (as per pemtk.fit._util.phaseCorrection())
returnFlag (bool, default = True) – If True return phase-corrected data. If False, set data to self.data[key][dataOut]
**kwargs – Passed to pemtk.fit._util.phaseCorrection()

NOTE: this currently only sets phaseCorrected data in wide-form dataset, self.data[key][dataOut]. May want to push to long-form too? (Otherwise this will be lost by self._setWide().)

TODO: tidy up options here, a bit knotty.

pemtk.fit._analysis.fitHist(self, bins='auto', dataType='redchi', key='fits', dataDict='dfPF', thres=None, mask=True, binRange=None, backend='hv', plotDict='plots')[source]

Basic histogram plot of batch fit results.

Parameters:

bins (- Added try/except on MemoryError for) – Bins setting for histogram, essentially as per Numpy routine https://numpy.org/doc/stable/reference/generated/numpy.histogram_bin_edges.html - Set as string for various auto options. - Set as int to define a specific number of (equally spaced) bins. - Set as list to define specific bin intervals. NOTE: some combinations currently not working with ‘hv’ backend.
dataType (str, default = 'redchi') – DataType to histogram. (Currently only supports a single dataType.)
key (str, optional, default = 'fits') – Key into main self.data dictionary.
dataDict (str, optional, default = 'dfPF') – Dataset to use, from self.data[key]. Default case is per-fit metrics.
thres (float, optional, default = None) – Set threshold for plotting, for range [0, thres]. This is passed to self.thresFits() and sets self.data[key][‘mask’]. For more control use binRange setting.
mask (- Threshold value sets) – Use self.data[key][‘mask’] to subselect data if set.
binRange (list, optional, default = None) – Specify range for binning. Note this is only used by HV plotter, and will override bins settings for auto types. Specify bins = int and binRange = [start, stop] for full control.
backend (str, optional, default = 'hv') – Specify backend: - ‘hv’ for Holoviews - ‘pd’ or ‘mpl’ for Pandas.hist()
plotDict (str, optional, default = 'plots') – For hv case, return plot object & data to self.data[plotDict] as [‘fitHistPlot’] and [‘fitHistData’]
Notes –
self.data[key][dataDict][dataType]. (- Data to plot is specified by) –
mask –
mask. (this will overwrite existing selection) –
exists (- If self.data[key]['mask']) –
True. (this will be used if mask =) –
TODO –
bins –
range. (can get this in 'auto' case for a large data) –
(https (- see TMO-DEV) –
Implement (-) –
better (but) –
selection. (with decorators for data checking &) –
chain (- Import) –
ideas.) (- Data subselection by threshold or range. (Again see TMO-DEV routines for) –
cases. (- Fix binning issues with certain) –
stuff (- Holoviews) –
- Fix data subset to plotter, otherwise get full dataset to tooltip.
- Hist bar options to fix. UPDATE: now set to bins=’auto’ as default, which works well.
- See hv.help(histogram) or http://holoviews.org/user_guide/Transforming_Elements.html for more.

pemtk.fit._analysis._mergePFLong(self, pData, key, dataPF, hue, hRound)[source]

Merge per-fit data to existing dataset & force to long form.

Used for Seaborn plotting routines with hue mapping.

pemtk.fit._analysis.corrPlot(self, key='fits', dataDict='dfWide', hue='redchiGroup', hRound=None, dataType=None, level=None, sel=None, selLevel='redchiGroup', dataPF='dfPF', plotDict='plots', remap=None, backend='sns', pairgrid=False, **kwargs)[source]

Similar to paramPlot(), but set for correlation matrix plotter.

This requires wide-form parameters data self.data[‘fits’][‘dfWide’]. Two levels of selection are currently supported (index data only, NOT columns)

**kwargs are passed to Seaborn’s pairplot routine, https://seaborn.pydata.org/generated/seaborn.pairplot.html

TODO: if numerical data columns are added for hue mapping they may result in additional plots too.

TODO: add HV gridmatrix + linked brushing: http://holoviews.org/user_guide/Linked_Brushing.html

TODO: FIX HORRIBLE SELECTION ROUTINES.

pemtk.fit._analysis.paramPlot(self, selectors={'Type': 'm'}, hue=None, hRound=7, x='Param', y='value', key='fits', dataDict='dfWide', dataPF='dfPF', plotDict='plots', hvType=None, plotScatter=True, remap=None, backend='sns', returnFlag=False, **kwargs)[source]

Basic scatter-plot of parameter values by name/type.

Currently supports Seaborn for back-end only, and requires wide-form dataDict as input.

Parameters:

selectors (dict, optional, default = {'Type':'m'}) – Used to cross-section data (pd.xs).
hue (string, optional, default = None) – Variable name to use for colour mapping (scatter plot points).
hRound (int, optional, default = 7) – Rounding for colour mapping scale. May need tweaking for cases with very closely clustered values.
x='Param' –
y='value' –
'fits' (key =) –
'dfWide' (dataDict =) –
'dfPF' (dataPF =) –
'plots' (plotDict =) –
None (hvType =) –
True (plotScatter =) –

TODO: - better and more concise dim handling for multi-level selection. Integrate to single dict of selectors? (See tmo-dev?) - Box/violin plot options. Also option to drop scatter plot in these cases (now partially implemented for HV only). - Add ref data to plots! See e.g. paramsFidelity and paramsCompare - HV support?

Basic support now in place, but cmapping needs some work for non-cat data. SEE NOTES ELSEWHERE!

Also breaks for subselection case unless another hue dim is set.

18/11/21: better, but messy, support now in place. Includes Violin & BoxWhisker options.

TODO: implement grouping and/or holomap for extra dims.

Currently: have selLevel and hue, which must be different in general. - sel and selLevel define subselection by a value in a column, e.g. sel = ‘E’, selLevel = ‘redchiGroup’ for values E in column ‘selLevel’ - hue specifies hue mapping for Seaborn plot, which must be a column name. - If hue is not in input data, it will be taken from the per-fit dataframe.

Ref: Seaborn catplot, https://seaborn.pydata.org/generated/seaborn.catplot.html Ref: HV scatter,

For usage notes see https://pemtk.readthedocs.io/en/latest/fitting/PEMtk_fitting_multiproc_class_analysis_141121-tidy.html

05/07/22: marginally improved plot type handling for HV case. 19/05/22: updated for multiple XS from selectors, now passed as dictionary with items {level:value}

pemtk.fit._analysis

Module Contents

Functions

`pemtk.fit._analysis`