pemtk.fit._analysis
Module Contents
Functions
|
Collate fit data from multiple runs. |
|
|
|
Set "wide" format data table from current dfLong array, with optional index dims. |
|
Generate fit report/metrics, defaults to self.data['fits']['dfPF']. |
|
Generate parameter report/metrics, defaults to self.data['fits']['dfWide']. |
|
Compare extracted parameter set with reference data. |
|
Quick prototype for differences/fidelity per fit parameter set compared to refs. |
|
Classify fit result sets (DataFrame) based on chisqr or redchi values. |
|
Wrapper for ._util.phaseCorrection() (functional form). |
|
Basic histogram plot of batch fit results. |
|
Merge per-fit data to existing dataset & force to long form. |
|
Similar to paramPlot(), but set for correlation matrix plotter. |
|
Basic scatter-plot of parameter values by name/type. |
- pemtk.fit._analysis.analyseFits(self, dataRange=None, batches=None, keyDims='t')[source]
Collate fit data from multiple runs.
Data from self. For individual
See https://pemtk.readthedocs.io/en/latest/fitting/PEMtk_analysis_demo_150621-tidy.html for dev code.
- pemtk.fit._analysis._setWide(self, indexDims=['Fit', 'Type', 'chisqrGroup', 'redchiGroup', 'batch'], valueDims=['value'], key='fits', dataDict='dfLong', dataWide='dfWide', dataIn=None, returnFlag=False)[source]
Set “wide” format data table from current dfLong array, with optional index dims.
Defaults to self.data[key][dataDict], or pass dataIn to use this instead and return wide-form data directly.
NOTE: current form checks indexDims for valid subselection. May want to add smarter dim matching for groups?
- pemtk.fit._analysis.fitsReport(self, key='fits', dataDict='dfPF', thres=None, mask=True)[source]
Generate fit report/metrics, defaults to self.data[‘fits’][‘dfPF’]. Results are printed if self.verbose, and also set to self.fitsSummary.
- Parameters:
key (str, optional, default = 'fits') – Key into main self.data dictionary.
dataDict (str, optional, default = 'dfPF') – Dataset to use, from self.data[key]. Default case is per-fit metrics.
thres (float, optional, default = None) – Set threshold for subselection, for range [0, thres]. This is passed to self.thresFits() and sets self.data[key][‘mask’].
mask (bool, optional, default = True) – Use self.data[key][‘mask’] to subselect data if set.
13/09/22 Added more stats to output.
- pemtk.fit._analysis.paramsReport(self, key='fits', dataDict='dfWide', groups='Type', inds={}, aggList=['min', 'mean', 'median', 'max', 'std', 'var'])[source]
Generate parameter report/metrics, defaults to self.data[‘fits’][‘dfWide’]. Results are printed if self.verbose, and also set to self.paramsSummary.
- Parameters:
key (str, optional, default = 'fits') – Key into main self.data dictionary.
dataDict (str, optional, default = 'dfWide') – Dataset to use, from self.data[key]. Default case is per-fit metrics.
groups (str or list of strings, optional, default = 'Type') – Additional groupings to use for output (pd.groupby).
inds (dict, optional, default = {}) – Set of indexs to subselect from, as dictionary items. E.g. xs = {‘redchiGroup’:’C’} will select group ‘C’.
aggList (list, optional, default = ['min', 'mean', 'median', 'max', 'std', 'var']) – List of aggregator functions to use. These are passed to Pandas.agg(), a list of common functions can be found at https://pandas.pydata.org/docs/user_guide/basics.html#descriptive-statistics
TODO (consolidate indexing methods with other functions & extend to thesholds and cross-ref (column) values.) –
- pemtk.fit._analysis.paramsCompare(self, params=None, ref=None, phaseCorr=True, phaseCorrParams={})[source]
Compare extracted parameter set with reference data.
NOTE: currently assumes self.paramsSummary and self.params for aggregate fit results & reference data.
- Parameters:
params (pd.DataFrame, optional, default = None) – Fit parameters to tabulate. Will use self.paramsSummary in default case (and run self.paramsReport() if missing).
ref (pd.DataFrame, optional, default = None) – Reference parameter set to compare with. Will use self.data[‘fits’][‘dfRef’] in default case (and attempt to set this if missing).
phaseCorr (bool, optional, default = True) – Run phase correction routine for reference parameters.
phaseCorrParams (dict, optional, default = {}) – Pass dictionary to additionally set parameters for phaseCorrection() method. Default cases runs with {‘dataDict’:’dfRef’, ‘dataOut’:’dfRefPC’, ‘useRef’:False}, these parameters will update the defaults. Note - these params are only used if phaseCorr = True.
TODO:
Better dim handling.
Generalize to compare any pair of parameter sets. (Just need to loop over param sets and check attrs labels.)
- pemtk.fit._analysis.paramFidelity(self, key='fits', dataDict='dfWide', refDict='dfRef', phaseCorr=True, phaseCorrParams={})[source]
Quick prototype for differences/fidelity per fit parameter set compared to refs.
Similar to paramsCompare, except per fit, rather than for aggregate data.
19/05/22 v1
- pemtk.fit._analysis.classifyFits(self, key='fits', dataDict='dfPF', dataType='redchi', group=None, bins=None, labels=None, plotHist=True, propagate=True, batch=False)[source]
Classify fit result sets (DataFrame) based on chisqr or redchi values.
- Parameters:
bins (int or list, default = None) – Bins setting for classifier - Set as None for default case, will bin by (min - min*0.05, min*5, 10) - Set as int to define a specific number of (equally spaced) bins, for (min - min*0.05, min*5, numbins) - Set as a list [start,stop] or [start,stop,bins] for specific range. - Set as list (>3 elements) to define specific bin intervals.
dataType (str, default = 'redchi') – DataType to classify. (Currently only supports a single dataType.)
key (str, optional, default = 'fits') – Key into main self.data dictionary.
dataDict (str, optional, default = 'dfPF') – Dataset to use, from self.data[key]. Default case is per-fit metrics.
group (str, optional, default = None) – Name for classifier column in dataframe. If None, defaults to {dataType}Group
labels (list, optional, default = None) – Specify names for group memebers. Defaults to alphabetic labels.
plotHist (bool, optional, default = True) – Plot histogram of results & show tabular summary (from self.data[key][group])
propagate (bool, optional, default = True) – propagate classifications to other data types?
batch (bool, optional, default = False) –
Dynamically group by batches? Note different bins settings per batch in this case, where x is the batch dataframe.
bins = np.linspace(x.min(), x.min() * bins[0], bins[1])
Where default case bins = [1.05, 10]
- pemtk.fit._analysis.phaseCorrection(self, key='fits', dataDict='dfLong', dataOut='dfWide', renorm=True, dataRef='dfRef', useRef=True, returnFlag=False, **kwargs)[source]
Wrapper for ._util.phaseCorrection() (functional form).
- Parameters:
key (str, default = 'fits') – Data key for analysis dataframe.
dataDict (str, default = 'dfLong') – Data dict for analysis dataframe. Note default case uses self.data[key][dataDict]
dataOut (str, default = 'dfWide') – Output dict key for phase-corrected dataframe.
renorm (bool, default = True) – Also set renormalised magnitudes if True (via
pemtk.fit._util.renormMagnitudes()
)dataRef (str, default = 'deRef') – Reference dict key for phase. Default case uses self.data[key][dataRef] Note this is ONLY USED IF useRef = True is set.
useRef (bool, default = True) – Use reference phase from self.data[key][dataRef]? Otherwise ref phase will be set to phasesIn.columns[0] (as per
pemtk.fit._util.phaseCorrection()
)returnFlag (bool, default = True) – If True return phase-corrected data. If False, set data to self.data[key][dataOut]
**kwargs – Passed to
pemtk.fit._util.phaseCorrection()
NOTE: this currently only sets phaseCorrected data in wide-form dataset, self.data[key][dataOut]. May want to push to long-form too? (Otherwise this will be lost by self._setWide().)
TODO: tidy up options here, a bit knotty.
- pemtk.fit._analysis.fitHist(self, bins='auto', dataType='redchi', key='fits', dataDict='dfPF', thres=None, mask=True, binRange=None, backend='hv', plotDict='plots')[source]
Basic histogram plot of batch fit results.
- Parameters:
bins (- Added try/except on MemoryError for) – Bins setting for histogram, essentially as per Numpy routine https://numpy.org/doc/stable/reference/generated/numpy.histogram_bin_edges.html - Set as string for various auto options. - Set as int to define a specific number of (equally spaced) bins. - Set as list to define specific bin intervals. NOTE: some combinations currently not working with ‘hv’ backend.
dataType (str, default = 'redchi') – DataType to histogram. (Currently only supports a single dataType.)
key (str, optional, default = 'fits') – Key into main self.data dictionary.
dataDict (str, optional, default = 'dfPF') – Dataset to use, from self.data[key]. Default case is per-fit metrics.
thres (float, optional, default = None) – Set threshold for plotting, for range [0, thres]. This is passed to self.thresFits() and sets self.data[key][‘mask’]. For more control use binRange setting.
mask (- Threshold value sets) – Use self.data[key][‘mask’] to subselect data if set.
binRange (list, optional, default = None) – Specify range for binning. Note this is only used by HV plotter, and will override bins settings for auto types. Specify bins = int and binRange = [start, stop] for full control.
backend (str, optional, default = 'hv') – Specify backend: - ‘hv’ for Holoviews - ‘pd’ or ‘mpl’ for Pandas.hist()
plotDict (str, optional, default = 'plots') – For hv case, return plot object & data to self.data[plotDict] as [‘fitHistPlot’] and [‘fitHistData’]
Notes –
self.data[key][dataDict][dataType]. (- Data to plot is specified by) –
mask –
mask. (this will overwrite existing selection) –
exists (- If self.data[key]['mask']) –
True. (this will be used if mask =) –
TODO –
bins –
range. (can get this in 'auto' case for a large data) –
(https (- see TMO-DEV) –
Implement (-) –
better (but) –
selection. (with decorators for data checking &) –
chain (- Import) –
ideas.) (- Data subselection by threshold or range. (Again see TMO-DEV routines for) –
cases. (- Fix binning issues with certain) –
stuff (- Holoviews) –
Fix data subset to plotter, otherwise get full dataset to tooltip.
Hist bar options to fix. UPDATE: now set to bins=’auto’ as default, which works well.
See hv.help(histogram) or http://holoviews.org/user_guide/Transforming_Elements.html for more.
- pemtk.fit._analysis._mergePFLong(self, pData, key, dataPF, hue, hRound)[source]
Merge per-fit data to existing dataset & force to long form.
Used for Seaborn plotting routines with hue mapping.
- pemtk.fit._analysis.corrPlot(self, key='fits', dataDict='dfWide', hue='redchiGroup', hRound=None, dataType=None, level=None, sel=None, selLevel='redchiGroup', dataPF='dfPF', plotDict='plots', remap=None, backend='sns', pairgrid=False, **kwargs)[source]
Similar to paramPlot(), but set for correlation matrix plotter.
This requires wide-form parameters data self.data[‘fits’][‘dfWide’]. Two levels of selection are currently supported (index data only, NOT columns)
**kwargs are passed to Seaborn’s pairplot routine, https://seaborn.pydata.org/generated/seaborn.pairplot.html
TODO: if numerical data columns are added for hue mapping they may result in additional plots too.
TODO: add HV gridmatrix + linked brushing: http://holoviews.org/user_guide/Linked_Brushing.html
TODO: FIX HORRIBLE SELECTION ROUTINES.
- pemtk.fit._analysis.paramPlot(self, selectors={'Type': 'm'}, hue=None, hRound=7, x='Param', y='value', key='fits', dataDict='dfWide', dataPF='dfPF', plotDict='plots', hvType=None, plotScatter=True, remap=None, backend='sns', returnFlag=False, **kwargs)[source]
Basic scatter-plot of parameter values by name/type.
Currently supports Seaborn for back-end only, and requires wide-form dataDict as input.
- Parameters:
selectors (dict, optional, default = {'Type':'m'}) – Used to cross-section data (pd.xs).
hue (string, optional, default = None) – Variable name to use for colour mapping (scatter plot points).
hRound (int, optional, default = 7) – Rounding for colour mapping scale. May need tweaking for cases with very closely clustered values.
x='Param' –
y='value' –
'fits' (key =) –
'dfWide' (dataDict =) –
'dfPF' (dataPF =) –
'plots' (plotDict =) –
None (hvType =) –
True (plotScatter =) –
TODO: - better and more concise dim handling for multi-level selection. Integrate to single dict of selectors? (See tmo-dev?) - Box/violin plot options. Also option to drop scatter plot in these cases (now partially implemented for HV only). - Add ref data to plots! See e.g. paramsFidelity and paramsCompare - HV support?
Basic support now in place, but cmapping needs some work for non-cat data. SEE NOTES ELSEWHERE!
Also breaks for subselection case unless another hue dim is set.
18/11/21: better, but messy, support now in place. Includes Violin & BoxWhisker options.
TODO: implement grouping and/or holomap for extra dims.
Currently: have selLevel and hue, which must be different in general. - sel and selLevel define subselection by a value in a column, e.g. sel = ‘E’, selLevel = ‘redchiGroup’ for values E in column ‘selLevel’ - hue specifies hue mapping for Seaborn plot, which must be a column name. - If hue is not in input data, it will be taken from the per-fit dataframe.
Ref: Seaborn catplot, https://seaborn.pydata.org/generated/seaborn.catplot.html Ref: HV scatter,
For usage notes see https://pemtk.readthedocs.io/en/latest/fitting/PEMtk_fitting_multiproc_class_analysis_141121-tidy.html
05/07/22: marginally improved plot type handling for HV case. 19/05/22: updated for multiple XS from selectors, now passed as dictionary with items {level:value}