06/12/21 - TO FIX: push/debug ePSproc updates, currently throwing loading errors on Jake

Analysis routines

  • 06/12/21 v2 with updated param analysis and tabulation routines.

  • 14/11/21 v1 (still in development)

In this notebook some of the packaged analysis routines are demonstrated. For an earlier exploration of the basic/underlying analysis routines, see the previous fit fidelity & analysis notebook.

Setup

Load fit & analysis class and import data (currently a bit messy).

[1]:
# Import & set paths
import pemtk
from pemtk.fit.fitClass import pemtkFit

from pathlib import Path

# Path for demo script
demoPath = Path(pemtk.__file__).parent.parent/Path('demos','fitting')

# Some additional default plot settings
# TODO: this is already run at class init, but out of notebook scope? Should fix.
from epsproc.plot import hvPlotters
hvPlotters.setPlotters()
*** ePSproc installation not found, setting for local copy.
C:\Users\femtolab\.conda\envs\ePSdev\lib\importlib\_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject
  return f(*args, **kwds)
C:\Users\femtolab\.conda\envs\ePSdev\lib\site-packages\xyzpy\plot\xyz_cmaps.py:6: MatplotlibDeprecationWarning:
The revcmap function was deprecated in Matplotlib 3.2 and will be removed two minor releases later. Use Colormap.reversed() instead.
  return LinearSegmentedColormap(name, cm.revcmap(cmap._segmentdata))
[2]:
# Here we'll just load some test data
# TODO: wrap this to class!

# Load sample dataset
# Full path to the file may be required here, in repo/demos/fitting
import pickle
from pathlib import Path

# dataPath = Path(pemtk.__path__[0]).parent/Path('demos','fitting')  # Test data in repo
dataPath = demoPath

# Basic test data - simulated results with no noise, 100 fits
# dataFile = 'dataDump_100fitTests_10t_randPhase_130621.pickle'

# Noisey test data - simulated results with noise, 1000 fits
dataFile = 'dataDump_1000fitTests_multiFit_noise_051021.pickle'


# Set for empty class, or with full demo setup (includes ref. parameter set)
demo = False

if demo:
    # Version with full demo setup
    %run {demoPath/"setup_fit_demo.py"}


else:
    data = pemtkFit()  # Data loads OK for blank class, but may be missing some necessary vars.

data.verbose['sub'] = 1
with open( dataPath/dataFile, 'rb') as handle:
    data.data = pickle.load(handle)

data.fitInd = list(data.data.keys())[-1]  # Set final key from data - probably not a robust method however!
                                          # This is currently used for indexing

Data overview

For more on the data structures used here, see the setup & batch runs notebook.

[3]:
# Check number of datasets loaded
data.fitInd
[3]:
999
[4]:
# Run basic stats
# TODO: add more outputs here & tidy up output formatting.
data.analyseFits()
Pandas reference table not set, missing self.params data.
Setting wide-form data self.[fits][dfWide] from self.[fits][dfLong] (as pivot table).
*** Warning: found MultiIndex for DataFrame data.index - checkDims doesn't yet support Pandas MultiIndex.
*** Warning: found MultiIndex for DataFrame data.index - checkDims doesn't yet support Pandas MultiIndex.
Index(es) = ['Fit', 'Type'], cols = value
{ 'Fits': 999,
  'Minima': {'chisqr': 0.04215746268613921, 'redchi': 0.00022911664503336526},
  'Success': 991}
[5]:
# Basic histogram of fit sets
# Note this defaults to Holoviews/Bokeh for plotting, which produces an interactive plot.
# Set backend = 'mpl' if Holoviews is not available.
data.fitHist()

The overview histogram is fairly coarse, since it shows all results. It is clear, however, that there are many results at the low end of the range (by default this plots reduced \(\chi^2\) values). This can be explored in further detail via the interactive plot (if using Holoviews), or via further plotting with specified thresholds or ranges (see below).

Dataset notes:

  • For “perfect” data (dataDump_1000fitTests_multiFit_130921.pickle) can get a large spread here, with best results at \(10^{-11}\).

  • For noisey data (dataDump_1000fitTests_multiFit_noise_011021.pickle) have a smaller spread, best results low \(10^{-4}\).

[6]:
# With threshold set & fine binning.
# See fitHist() docs for more options.
data.fitHist(thres = 2.4e-4, bins = 100)

# data.fitHist(bins=20, binRange = [2.3e-4, 2.4e-4])   # Example with a range set
Mask selected 561 results (from 999).

Data exploration

The general aim in this procedure is to ascertain whether there was a good spread of parameters explored, and a single (or few sets) of best-fit results. There are a few procedures and helper methods for this…

View results

Single results sets can be viewed in the main data structure, indexed by #.

[7]:
# Check keys
fitNumber = 2
data.data[fitNumber].keys()
[7]:
dict_keys(['AFBLM', 'residual', 'results'])

Here results is an lmFit object, which includes final fit results and information, and AFBLM contains the model output. (TODO: helper functions for this.)

An example is shown below. Of particular note here is which parameters have vary=True - these are included in the fitting - and if there is a column expression, which indicates any parameters defined to have specific relationships (see the basic demo notebook for more). Any correlations found during fitting are also shown, which can also indicate parameters which are related (even if this is not predefined or known a priori).

For the current dataset (‘dataDump_1000fitTests_multiFit_noise_051021.pickle’), note that p_PU_SG_PU_1_n1_1_1 has vary=False, hence the results are already defined as relative phases (and, in this example, the fixed value is kept at the original model value).

[8]:
# Show some results
data.data[fitNumber]['results']
[8]:

Fit Statistics

fitting methodleastsq
# function evals2812
# data points195
# variables11
chi-square 0.04215747
reduced chi-square 2.2912e-04
Akaike info crit.-1623.67186
Bayesian info crit.-1587.66886

Variables

name value standard error relative error initial value min max vary
m_PU_SG_PU_1_n1_1_1 1.61758339 227151.843 (14042666.65%) 0.4963375310160162 1.0000e-04 5.00000000 True
m_PU_SG_PU_1_1_n1_1 1.60986876 227266.217 (14117064.81%) 0.349972501184159 1.0000e-04 5.00000000 True
m_PU_SG_PU_3_n1_1_1 1.14366717 214620.511 (18765993.77%) 0.43544003283291643 1.0000e-04 5.00000000 True
m_PU_SG_PU_3_1_n1_1 1.14104307 213989.161 (18753819.79%) 0.16101762796183416 1.0000e-04 5.00000000 True
m_SU_SG_SU_1_0_0_1 2.65253557 0.06899386 (2.60%) 0.8234700291388766 1.0000e-04 5.00000000 True
m_SU_SG_SU_3_0_0_1 1.14378960 0.14902324 (13.03%) 0.5411867072477154 1.0000e-04 5.00000000 True
p_PU_SG_PU_1_n1_1_1 -0.86104140 0.00000000 (0.00%) -0.8610414024232179 -3.14159265 3.14159265 False
p_PU_SG_PU_1_1_n1_1 -0.85729904 167895.839 (19584279.35%) 0.021992539259326538 -3.14159265 3.14159265 True
p_PU_SG_PU_3_n1_1_1 1.20005808 224407.741 (18699740.07%) 0.023782497271738423 -3.14159265 3.14159265 True
p_PU_SG_PU_3_1_n1_1 1.19902071 296532.682 (24731239.48%) 0.5554954153030479 -3.14159265 3.14159265 True
p_SU_SG_SU_1_0_0_1 2.43703747 83742.0262 (3436222.35%) 0.5266569153447359 -3.14159265 3.14159265 True
p_SU_SG_SU_3_0_0_1 -0.70401900 83725.8242 (11892551.81%) 0.004178913362662073 -3.14159265 3.14159265 True

Correlations (unreported correlations are < 0.100)

m_PU_SG_PU_1_n1_1_1m_PU_SG_PU_1_1_n1_1-1.0000
m_PU_SG_PU_3_n1_1_1m_PU_SG_PU_3_1_n1_1-1.0000
p_SU_SG_SU_1_0_0_1p_SU_SG_SU_3_0_0_11.0000
p_PU_SG_PU_1_1_n1_1p_SU_SG_SU_1_0_0_10.9999
p_PU_SG_PU_1_1_n1_1p_SU_SG_SU_3_0_0_10.9999
m_PU_SG_PU_3_n1_1_1p_PU_SG_PU_3_n1_1_10.9923
m_PU_SG_PU_3_1_n1_1p_PU_SG_PU_3_n1_1_1-0.9921
m_PU_SG_PU_1_n1_1_1m_PU_SG_PU_3_1_n1_10.9852
m_PU_SG_PU_1_n1_1_1m_PU_SG_PU_3_n1_1_1-0.9849
m_PU_SG_PU_1_1_n1_1m_PU_SG_PU_3_1_n1_1-0.9848
m_PU_SG_PU_1_1_n1_1m_PU_SG_PU_3_n1_1_10.9845
m_PU_SG_PU_1_n1_1_1p_PU_SG_PU_3_n1_1_1-0.9585
m_PU_SG_PU_1_1_n1_1p_PU_SG_PU_3_n1_1_10.9579
m_SU_SG_SU_1_0_0_1m_SU_SG_SU_3_0_0_1-0.9312
p_PU_SG_PU_3_n1_1_1p_PU_SG_PU_3_1_n1_1-0.8223
m_PU_SG_PU_3_n1_1_1p_PU_SG_PU_3_1_n1_1-0.7514
m_PU_SG_PU_3_1_n1_1p_PU_SG_PU_3_1_n1_10.7503
p_PU_SG_PU_1_1_n1_1p_PU_SG_PU_3_1_n1_10.6663
p_PU_SG_PU_3_1_n1_1p_SU_SG_SU_1_0_0_10.6480
p_PU_SG_PU_3_1_n1_1p_SU_SG_SU_3_0_0_10.6476
m_PU_SG_PU_1_n1_1_1p_PU_SG_PU_3_1_n1_10.6332
m_PU_SG_PU_1_1_n1_1p_PU_SG_PU_3_1_n1_1-0.6316
m_PU_SG_PU_3_n1_1_1m_SU_SG_SU_1_0_0_1-0.5284
m_PU_SG_PU_3_1_n1_1m_SU_SG_SU_1_0_0_10.5284
m_SU_SG_SU_1_0_0_1p_PU_SG_PU_3_n1_1_1-0.5263
m_PU_SG_PU_1_n1_1_1m_SU_SG_SU_1_0_0_10.5166
m_PU_SG_PU_1_1_n1_1m_SU_SG_SU_1_0_0_1-0.5164
m_PU_SG_PU_3_n1_1_1m_SU_SG_SU_3_0_0_10.4968
m_PU_SG_PU_3_1_n1_1m_SU_SG_SU_3_0_0_1-0.4967
m_SU_SG_SU_3_0_0_1p_PU_SG_PU_3_n1_1_10.4949
m_PU_SG_PU_1_n1_1_1m_SU_SG_SU_3_0_0_1-0.4853
m_PU_SG_PU_1_1_n1_1m_SU_SG_SU_3_0_0_10.4851
m_SU_SG_SU_1_0_0_1p_PU_SG_PU_3_1_n1_10.4101
m_SU_SG_SU_3_0_0_1p_PU_SG_PU_3_1_n1_1-0.3865
m_PU_SG_PU_1_1_n1_1p_SU_SG_SU_3_0_0_10.1724
m_PU_SG_PU_1_1_n1_1p_SU_SG_SU_1_0_0_10.1719
m_PU_SG_PU_1_n1_1_1p_SU_SG_SU_3_0_0_1-0.1702
m_PU_SG_PU_1_n1_1_1p_SU_SG_SU_1_0_0_1-0.1697
m_PU_SG_PU_1_1_n1_1p_PU_SG_PU_1_1_n1_10.1670
m_PU_SG_PU_1_n1_1_1p_PU_SG_PU_1_1_n1_1-0.1648
p_PU_SG_PU_1_1_n1_1p_PU_SG_PU_3_n1_1_1-0.1330
p_PU_SG_PU_3_n1_1_1p_SU_SG_SU_1_0_0_1-0.1088
p_PU_SG_PU_3_n1_1_1p_SU_SG_SU_3_0_0_1-0.1083

Classify fits

To review and classify the “sets” of fit results which are apparent here, use the classifyFits() method. This will rebin/categorise the data and label each bin alphabetically.

[9]:
# The default case simply rebins all data into 10 categories.
data.classifyFits()
success chisqr redchi Min Max
count unique top freq count unique top freq count unique top freq
redchiGroup
A 835 2 True 833 835.0 835.0 0.044630 1.0 835.0 835.0 0.000229 1.0 0.000218 0.000321
B 80 2 True 79 80.0 80.0 0.069253 1.0 80.0 80.0 0.000377 1.0 0.000321 0.000424
C 14 2 True 13 14.0 14.0 0.088500 1.0 14.0 14.0 0.000440 1.0 0.000424 0.000527
D 11 1 True 11 11.0 11.0 0.103048 1.0 11.0 11.0 0.000605 1.0 0.000527 0.000630
E 6 1 True 6 6.0 6.0 0.132356 1.0 6.0 6.0 0.000719 1.0 0.000630 0.000733
F 18 1 True 18 18.0 18.0 0.140435 1.0 18.0 18.0 0.000775 1.0 0.000733 0.000836
G 4 1 True 4 4.0 4.0 0.154999 1.0 4.0 4.0 0.000842 1.0 0.000836 0.000939
H 7 1 True 7 7.0 7.0 0.173290 1.0 7.0 7.0 0.000942 1.0 0.000939 0.001042
I 2 1 True 2 2.0 2.0 0.193286 1.0 2.0 2.0 0.001053 1.0 0.001042 0.001146
Setting wide-form data self.[fits][dfWide] from self.[fits][dfLong] (as pivot table).
*** Warning: found MultiIndex for DataFrame data.index - checkDims doesn't yet support Pandas MultiIndex.
*** Warning: found MultiIndex for DataFrame data.index - checkDims doesn't yet support Pandas MultiIndex.
Index(es) = ['Fit', 'Type', 'redchiGroup'], cols = value
Set redchiGroup for data frame dfLong.
Set redchiGroup for data frame AFpdLong.
Couldn't set redchiGroup for data frame mask. Error <class 'KeyError'>: ('dType',).
../_images/fitting_PEMtk_fitting_multiproc_class_analysis_141121-tidy_16_2.png
[10]:
# Again, more control can be obtained by specifiying the desired binning.
data.classifyFits(bins = [2.26e-4, 2.4e-4,20])
success chisqr redchi Min Max
count unique top freq count unique top freq count unique top freq
redchiGroup
A 0 0 NaN NaN 0 0 NaN NaN 0 0 NaN NaN 0.000226 0.000227
B 0 0 NaN NaN 0 0 NaN NaN 0 0 NaN NaN 0.000227 0.000227
C 0 0 NaN NaN 0 0 NaN NaN 0 0 NaN NaN 0.000227 0.000228
D 0 0 NaN NaN 0 0 NaN NaN 0 0 NaN NaN 0.000228 0.000229
E 349 1 True 349 349 349 0.0421575 1 349 349 0.000229117 1 0.000229 0.000230
F 0 0 NaN NaN 0 0 NaN NaN 0 0 NaN NaN 0.000230 0.000230
G 27 1 True 27 27 27 0.0425252 1 27 27 0.000231115 1 0.000230 0.000231
H 43 1 True 43 43 43 0.0426516 1 43 43 0.000231163 1 0.000231 0.000232
I 24 1 True 24 24 24 0.0426877 1 24 24 0.000231998 1 0.000232 0.000233
J 0 0 NaN NaN 0 0 NaN NaN 0 0 NaN NaN 0.000233 0.000233
K 11 1 True 11 11 11 0.0430736 1 11 11 0.000234096 1 0.000233 0.000234
L 27 1 True 27 27 27 0.0432038 1 27 27 0.000234803 1 0.000234 0.000235
M 3 2 True 2 3 3 0.0433464 1 3 3 0.000235579 1 0.000235 0.000236
N 54 1 True 54 54 54 0.0433468 1 54 54 0.000235581 1 0.000236 0.000236
O 5 1 True 5 5 5 0.0435333 1 5 5 0.00023659 1 0.000236 0.000237
P 0 0 NaN NaN 0 0 NaN NaN 0 0 NaN NaN 0.000237 0.000238
Q 15 2 True 14 15 15 0.0438441 1 15 15 0.000237988 1 0.000238 0.000239
R 3 1 True 3 3 3 0.0439158 1 3 3 0.000238673 1 0.000239 0.000239
S 0 0 NaN NaN 0 0 NaN NaN 0 0 NaN NaN 0.000239 0.000240
Setting wide-form data self.[fits][dfWide] from self.[fits][dfLong] (as pivot table).
*** Warning: found MultiIndex for DataFrame data.index - checkDims doesn't yet support Pandas MultiIndex.
*** Warning: found MultiIndex for DataFrame data.index - checkDims doesn't yet support Pandas MultiIndex.
Index(es) = ['Fit', 'Type', 'redchiGroup'], cols = value
Set redchiGroup for data frame dfLong.
Set redchiGroup for data frame AFpdLong.
Couldn't set redchiGroup for data frame mask. Error <class 'KeyError'>: ('dType',).
../_images/fitting_PEMtk_fitting_multiproc_class_analysis_141121-tidy_17_2.png

In this case, there are 349 fit results within the lowest (occupied) category (set E), which we can take to be the best fits possible in this case.

To visualise the associated parameter sets, we can use the corrPlot() and paramPlot() methods. These default to the currently set classified data (this is found in the wide-form data output by classifyFits(), which defaults to self.[fits][dfWide]), with colour-coding by group.

corrPlot()

corrPlot() uses Seaborn’s pairplot routine to generate a full pair-wise correlation plot.

TODO: add HV gridmatrix + linked brushing: http://holoviews.org/user_guide/Linked_Brushing.html

[11]:
# Default plot will give output by grouping
data.corrPlot()
Mask not set for dataType = None.
*** Warning: found MultiIndex for DataFrame data.index - checkDims doesn't yet support Pandas MultiIndex.
<seaborn.axisgrid.PairGrid at 0x20519f5e048>
../_images/fitting_PEMtk_fitting_multiproc_class_analysis_141121-tidy_20_2.png
[12]:
# For more control, standard catplot parameters can be passed.
# See https://seaborn.pydata.org/generated/seaborn.pairplot.html

# Plot with colour by Type and histograms on the diagonal
data.corrPlot(hue='Type', diag_kind='hist')

# Additional per-fit data can also be set for the hue mapping,
# e.g. plot with colour by redchi (to 6 dp) and histograms on the diagonal
# data.corrPlot(hue='redchi', hRound = 6, diag_kind='hist')
Mask not set for dataType = None.
*** Warning: found MultiIndex for DataFrame data.index - checkDims doesn't yet support Pandas MultiIndex.
<seaborn.axisgrid.PairGrid at 0x2051b21eeb8>
../_images/fitting_PEMtk_fitting_multiproc_class_analysis_141121-tidy_21_2.png
[13]:
# Tabulated output from the last plot can be found in self.data['plots']['<plotType>Data'], and the plot object in self.data['plots']['<plotType>Plot']
data.data['plots']['corrData']
[13]:
Param Type redchiGroup PU_SG_PU_1_1_n1_1 PU_SG_PU_1_n1_1_1 PU_SG_PU_3_1_n1_1 PU_SG_PU_3_n1_1_1 SU_SG_SU_1_0_0_1 SU_SG_SU_3_0_0_1
0 m Q 1.992972 1.921566 0.466055 0.000100 2.461439 1.556832
1 p Q -0.861561 -0.861041 3.141575 1.313656 -3.032645 -0.647618
2 m E 1.609869 1.617583 1.141043 1.143667 2.652536 1.143790
3 p E -0.857299 -0.861041 1.199021 1.200058 2.437037 -0.704019
4 m K 1.932782 1.931857 0.355341 0.368418 2.533506 1.418030
... ... ... ... ... ... ... ... ...
1117 p L 0.145056 -0.861041 2.001652 1.692756 -3.141593 0.230230
1118 m N 1.952031 1.992561 0.000100 0.805098 2.387993 1.651091
1119 p N -0.739526 -0.861041 -1.434696 0.937920 1.420065 -0.905355
1120 m E 1.613212 1.614101 1.142782 1.142106 2.652523 1.143815
1121 p E -0.862937 -0.861041 1.195923 1.197572 2.434125 -0.708249

1122 rows × 8 columns

paramPlot()

For examining specific datatypes in more detail paramPlot() uses Seaborn’s catplot routine to generate scatter plots for a specified data type.

[14]:
# Plot all data, magnitudes
data.paramPlot(dataType='m')
Mask not set for dataType = None.
*** Warning: found MultiIndex for DataFrame data.index - checkDims doesn't yet support Pandas MultiIndex.
<seaborn.axisgrid.FacetGrid at 0x2051e7fa240>
../_images/fitting_PEMtk_fitting_multiproc_class_analysis_141121-tidy_24_2.png
[15]:
# Plot all data, phases
data.paramPlot(dataType='p')
Mask not set for dataType = None.
*** Warning: found MultiIndex for DataFrame data.index - checkDims doesn't yet support Pandas MultiIndex.
<seaborn.axisgrid.FacetGrid at 0x2051e9c09e8>
../_images/fitting_PEMtk_fitting_multiproc_class_analysis_141121-tidy_25_2.png

Here we can see that there are various sets of parameters which do, indeed, cluster by group (hence \(\chi^2\)) for the most part, although there is a full spread in some of the phase parameters. We can look more closely at this by selecting just the best group(s) and colour-coding by \(\chi^2\)

[16]:
# Subselect on redchiGroup and colour by redchi value (note this automatically rounds to 7 dp, set hRound = N to control this)
data.paramPlot(dataType='m', sel = 'E', hue = 'redchi')
Mask not set for dataType = None.
<seaborn.axisgrid.FacetGrid at 0x2051eb04c50>
../_images/fitting_PEMtk_fitting_multiproc_class_analysis_141121-tidy_27_2.png
[17]:
# As above, but for phase data
data.paramPlot(dataType='p', sel = 'E', hue = 'redchi')
Mask not set for dataType = None.
<seaborn.axisgrid.FacetGrid at 0x2051fd0bdd8>
../_images/fitting_PEMtk_fitting_multiproc_class_analysis_141121-tidy_28_2.png

In this case, there are clear sub-groupings by \(\chi^2\). For the phases, these can potentially be cleaned up a bit by setting a reference phase & wrapping the values.

[18]:
# Tabulated output from the last plot can be found in self.data['plots']['<plotType>Data'], and the plot object in self.data['plots']['<plotType>Plot']
data.data['plots']['paramData']
[18]:
Fit Param value redchi
0 2 PU_SG_PU_1_1_n1_1 -0.857299 0.000229
1 8 PU_SG_PU_1_1_n1_1 -1.057287 0.000229
2 9 PU_SG_PU_1_1_n1_1 -1.303957 0.000229
3 11 PU_SG_PU_1_1_n1_1 -0.862917 0.000229
4 13 PU_SG_PU_1_1_n1_1 -0.863154 0.000229
... ... ... ... ...
2089 984 SU_SG_SU_3_0_0_1 -0.702315 0.000229
2090 987 SU_SG_SU_3_0_0_1 -0.704560 0.000229
2091 988 SU_SG_SU_3_0_0_1 -1.015555 0.000229
2092 992 SU_SG_SU_3_0_0_1 -0.710594 0.000229
2093 997 SU_SG_SU_3_0_0_1 -0.708249 0.000229

2094 rows × 4 columns

[19]:
# Basic Holoviews support is also working, with a few additional plot options
# data.paramPlot(dataType='p', backend='hv')
data.paramPlot(dataType='p', backend='hv', hvType='violin')   # Include violin KDE
Mask not set for dataType = None.
*** Warning: found MultiIndex for DataFrame data.index - checkDims doesn't yet support Pandas MultiIndex.
[20]:
# Single group with 'redchi' cmap
data.paramPlot(dataType='p', sel = 'E', hue = 'redchi', backend='hv', hvType='violin')
# data.paramPlot(dataType='p', sel = 'E', backend='hv')  # NOTE: this currently doesn't work without Hue set to a numerical category.
Mask not set for dataType = None.

Phase shifts & corrections

To set the phases relative to a speific parameter, and wrap to a specified range, use the phaseCorrection() method. This defaults to using the first parameter as a reference phase, and wraps to \(-\pi:\pi\). The phase-corrected values are output to a new Type, ‘pc’.

[21]:
# Phase correction method
# This defaults to using the first parameter as a reference phase. The phase-corrected values are output to a new Type, 'pc'.
data.phaseCorrection()
Setting wide-form data self.[fits][dfWide] from self.[fits][dfLong] (as pivot table).
*** Warning: found MultiIndex for DataFrame data.index - checkDims doesn't yet support Pandas MultiIndex.
*** Warning: found MultiIndex for DataFrame data.index - checkDims doesn't yet support Pandas MultiIndex.
Index(es) = ['Fit', 'Type', 'redchiGroup'], cols = value
Set ref param = PU_SG_PU_1_1_n1_1
Param PU_SG_PU_1_1_n1_1 PU_SG_PU_1_n1_1_1 PU_SG_PU_3_1_n1_1 PU_SG_PU_3_n1_1_1 SU_SG_SU_1_0_0_1 SU_SG_SU_3_0_0_1
Fit Type redchiGroup
1 m Q 1.992972 1.921566 0.466055 0.000100 2.461439 1.556832
p Q -0.861561 -0.861041 3.141575 1.313656 -3.032645 -0.647618
pc Q 0.000000 0.000519 -2.280049 2.175216 -2.171084 0.213943
2 m E 1.609869 1.617583 1.141043 1.143667 2.652536 1.143790
p E -0.857299 -0.861041 1.199021 1.200058 2.437037 -0.704019
... ... ... ... ... ... ... ... ...
996 p N -0.739526 -0.861041 -1.434696 0.937920 1.420065 -0.905355
pc N 0.000000 -0.121515 -0.695170 1.677446 2.159591 -0.165829
997 m E 1.613212 1.614101 1.142782 1.142106 2.652523 1.143815
p E -0.862937 -0.861041 1.195923 1.197572 2.434125 -0.708249
pc E 0.000000 0.001896 2.058860 2.060509 -2.986124 0.154688

1683 rows × 6 columns

We can replot the best fits for the phase-corrected case… this looks slightly cleaner, but still shows some +/- pairs and spread in retrieved values.

[22]:
data.paramPlot(dataType='pc', sel = 'E', hue = 'redchi')
Mask not set for dataType = None.
<seaborn.axisgrid.FacetGrid at 0x2051de4d438>
../_images/fitting_PEMtk_fitting_multiproc_class_analysis_141121-tidy_36_2.png
[23]:
# data.data[1]['results']  #.keys()  #['fits']['dfRef']  #.keys()

Selecting & testing candidate parameter sets

After reviewing a batch of results (parameter sets), the best result(s) need more careful investigation…

[24]:
# Set new classifiers for small range
data.classifyFits(bins = [2.29e-4, 2.3e-4,20])
success chisqr redchi Min Max
count unique top freq count unique top freq count unique top freq
redchiGroup
A 0 0 NaN NaN 0 0 NaN NaN 0 0 NaN NaN 0.000229 0.000229
B 0 0 NaN NaN 0 0 NaN NaN 0 0 NaN NaN 0.000229 0.000229
C 203 1 True 203 203 203 0.0421575 1 203 203 0.000229117 1 0.000229 0.000229
D 0 0 NaN NaN 0 0 NaN NaN 0 0 NaN NaN 0.000229 0.000229
E 44 1 True 44 44 44 0.0421805 1 44 44 0.000229242 1 0.000229 0.000229
F 0 0 NaN NaN 0 0 NaN NaN 0 0 NaN NaN 0.000229 0.000229
G 55 1 True 55 55 55 0.0421963 1 55 55 0.000229328 1 0.000229 0.000229
H 0 0 NaN NaN 0 0 NaN NaN 0 0 NaN NaN 0.000229 0.000229
I 0 0 NaN NaN 0 0 NaN NaN 0 0 NaN NaN 0.000229 0.000229
J 0 0 NaN NaN 0 0 NaN NaN 0 0 NaN NaN 0.000229 0.000230
K 47 1 True 47 47 47 0.0422356 1 47 47 0.000229542 1 0.000230 0.000230
L 0 0 NaN NaN 0 0 NaN NaN 0 0 NaN NaN 0.000230 0.000230
M 0 0 NaN NaN 0 0 NaN NaN 0 0 NaN NaN 0.000230 0.000230
N 0 0 NaN NaN 0 0 NaN NaN 0 0 NaN NaN 0.000230 0.000230
O 0 0 NaN NaN 0 0 NaN NaN 0 0 NaN NaN 0.000230 0.000230
P 0 0 NaN NaN 0 0 NaN NaN 0 0 NaN NaN 0.000230 0.000230
Q 0 0 NaN NaN 0 0 NaN NaN 0 0 NaN NaN 0.000230 0.000230
R 0 0 NaN NaN 0 0 NaN NaN 0 0 NaN NaN 0.000230 0.000230
S 0 0 NaN NaN 0 0 NaN NaN 0 0 NaN NaN 0.000230 0.000230
Setting wide-form data self.[fits][dfWide] from self.[fits][dfLong] (as pivot table).
*** Warning: found MultiIndex for DataFrame data.index - checkDims doesn't yet support Pandas MultiIndex.
*** Warning: found MultiIndex for DataFrame data.index - checkDims doesn't yet support Pandas MultiIndex.
Index(es) = ['Fit', 'Type', 'redchiGroup'], cols = value
Set redchiGroup for data frame dfLong.
Set redchiGroup for data frame AFpdLong.
Couldn't set redchiGroup for data frame mask. Error <class 'KeyError'>: ('dType',).
../_images/fitting_PEMtk_fitting_multiproc_class_analysis_141121-tidy_39_2.png

Drill down into the best (lowest \(\chi^2\)) grouping ‘C’…

[25]:
# Note hRound (== decimal places) specified to avoid collapsing colour-mapping to a single value.
data.paramPlot(dataType='m', sel = 'C', hue = 'redchi', backend='hv', hvType='box', hRound = 9)
Mask not set for dataType = None.
[26]:
data.paramPlot(dataType='p', sel = 'C', hue = 'redchi', backend='hv', hRound = 9)  # Plot with redchi cmap
# data.paramPlot(dataType='pc', sel = 'C', hue = 'Fit', backend='hv', hRound = 9)  # Plot with Fit # cmap
Mask not set for dataType = None.

In this case, the results look consistent for both magnitudes and phases (with +/- pairs evident in the latter case). We can look more carefully at any correlations here with corrPlot() with subselections (TODO: make this cleaner!).

[27]:
# corrPlot() with subselections (two levels)
# NOTE: with hue mapping for a continuous range this can be quite slow (minutes).
data.corrPlot(dataType = 'C', level = 'redchiGroup', sel = 'p', selLevel = 'Type', hue = 'redchi')  #, diag_kind = 'scatter')

# For a faster plot either skip hue mapping and force diag to scatter (otherwise KDE generation can be slow)
# data.corrPlot(dataType = 'C', level = 'redchiGroup', sel = 'p', selLevel = 'Type', diag_kind = 'scatter')
Mask not set for dataType = None.
<seaborn.axisgrid.PairGrid at 0x2051ebd7438>
../_images/fitting_PEMtk_fitting_multiproc_class_analysis_141121-tidy_44_2.png

In this case there appears to be a direct correlation between +/- phase pairs over all parameters - for example, the third row (PU_SG_PU_3_1_n1_1) appears to show only +ve results sets or -ve results sets. This is expected if the dataset is not sensitive to the sign of the phases, and means that the phases are essentially defined modulo(\(\pi\)).

Again, the phase-corrected values can also be defined & visualised to check this, and can also be mapped to the range \(0:\pi\) by setting absFlag=True.

[28]:
data.phaseCorrection(absFlag=True)  # NOTE - need to rerun phaseCorrection after classification
# data.phaseCorrection()
data.paramPlot(dataType='pc', sel = 'C', hue = 'redchi', backend='hv', hRound = 9)
Setting wide-form data self.[fits][dfWide] from self.[fits][dfLong] (as pivot table).
*** Warning: found MultiIndex for DataFrame data.index - checkDims doesn't yet support Pandas MultiIndex.
*** Warning: found MultiIndex for DataFrame data.index - checkDims doesn't yet support Pandas MultiIndex.
Index(es) = ['Fit', 'Type', 'redchiGroup'], cols = value
Set ref param = PU_SG_PU_1_1_n1_1
Param PU_SG_PU_1_1_n1_1 PU_SG_PU_1_n1_1_1 PU_SG_PU_3_1_n1_1 PU_SG_PU_3_n1_1_1 SU_SG_SU_1_0_0_1 SU_SG_SU_3_0_0_1
Fit Type redchiGroup
2 m C 1.609869 1.617583 1.141043 1.143667 2.652536 1.143790
p C -0.857299 -0.861041 1.199021 1.200058 2.437037 -0.704019
pc C 0.000000 0.003742 2.056320 2.057357 2.988849 0.153280
8 m G 1.569258 1.654651 1.166865 1.207240 2.659540 1.129703
p G -1.057287 -0.861041 -2.876167 3.141593 2.040967 -1.129251
... ... ... ... ... ... ... ... ...
992 p C -0.867768 -0.861041 1.193419 1.195135 2.431628 -0.710594
pc C 0.000000 0.006727 2.061187 2.062903 2.983790 0.157174
997 m C 1.613212 1.614101 1.142782 1.142106 2.652523 1.143815
p C -0.862937 -0.861041 1.195923 1.197572 2.434125 -0.708249
pc C 0.000000 0.001896 2.058860 2.060509 2.986124 0.154688

1047 rows × 6 columns

Mask not set for dataType = None.
[29]:
# corrPlot() with subselections (two levels)
# NOTE: with hue mapping for a continuous range this can be quite slow (minutes).
data.corrPlot(dataType = 'C', level = 'redchiGroup', sel = 'pc', selLevel = 'Type', hue = 'redchi')  # Check pair correlations including redchi
# data.corrPlot(dataType = 'C', level = 'redchiGroup', sel = 'pc', selLevel = 'Type', hue = 'Fit')   # Check pair correlations including Fit #
Mask not set for dataType = None.
<seaborn.axisgrid.PairGrid at 0x2052b4c42b0>
../_images/fitting_PEMtk_fitting_multiproc_class_analysis_141121-tidy_47_2.png

Here a few things are clearer - in particular the relative phases seem to be well-behaved, with a fairly small scatter, and the (1D) minima of the fitting hyperspace are visible in the \(\chi^2\) plots (bottom row).

Best values and statistics

To get a final parameter set and associated statistics, based on a subset of the fit results, the paramsReport() method is available. In this case, it will provide summary statistics corresponding to the corrPlot() above.

[30]:
# Get parameter statistics.
# Specify subselection criteria with a dictionary of selectors (currently uses index values only)
# TODO: more general selection & groupings here.
data.paramsReport(inds = {'redchiGroup':'C'})
Mask not set for dataType = None.
Set parameter stats to self.paramsSummary.
Type m p pc
Param Agg
PU_SG_PU_1_1_n1_1 min 1.605239e+00 -0.883176 0.000000
mean 1.613850e+00 -0.859382 0.000000
median 1.613657e+00 -0.860940 0.000000
max 1.625738e+00 -0.738983 0.000000
std 2.084992e-03 0.014059 0.000000
var 4.347190e-06 0.000198 0.000000
PU_SG_PU_1_n1_1_1 min 1.601543e+00 -0.861041 0.000019
mean 1.613570e+00 -0.861041 0.005372
median 1.613690e+00 -0.861041 0.002132
max 1.627827e+00 -0.861041 0.122059
std 2.202383e-03 0.000000 0.013093
var 4.850490e-06 0.000000 0.000171
PU_SG_PU_3_1_n1_1 min 1.124479e+00 -2.931328 2.007461
mean 1.142652e+00 0.144138 2.058870
median 1.142526e+00 1.196919 2.058890
max 1.174468e+00 1.266499 2.091134
std 3.045999e-03 1.800711 0.005289
var 9.278107e-06 3.242561 0.000028
PU_SG_PU_3_n1_1_1 min 1.124041e+00 -2.927744 1.984783
mean 1.142349e+00 0.143498 2.058926
median 1.142249e+00 1.196777 2.058825
max 1.172665e+00 1.243822 2.154092
std 3.032450e-03 1.801116 0.009501
var 9.195751e-06 3.244017 0.000090
SU_SG_SU_1_0_0_1 min 2.652285e+00 2.116167 2.933528
mean 2.652524e+00 2.356759 2.986923
median 2.652526e+00 2.434259 2.986913
max 2.653001e+00 2.493262 3.048962
std 4.240577e-05 0.134550 0.006770
var 1.798250e-09 0.018104 0.000046
SU_SG_SU_3_0_0_1 min 1.142825e+00 -1.025558 0.105030
mean 1.143807e+00 -0.785126 0.154275
median 1.143806e+00 -0.708064 0.154106
max 1.144052e+00 -0.635932 0.222062
std 7.751534e-05 0.134263 0.006839
var 6.008628e-09 0.018027 0.000047
[31]:
# For more control, pass a list of aggregation functions
# These are passed to Pandas.agg(), a list of common functions can be found at https://pandas.pydata.org/docs/user_guide/basics.html#descriptive-statistics
data.paramsReport(inds = {'redchiGroup':'C'}, aggList = ['mean', 'std'])
Mask not set for dataType = None.
Set parameter stats to self.paramsSummary.
Type m p pc
Param Agg
PU_SG_PU_1_1_n1_1 mean 1.613850 -0.859382 0.000000
std 0.002085 0.014059 0.000000
PU_SG_PU_1_n1_1_1 mean 1.613570 -0.861041 0.005372
std 0.002202 0.000000 0.013093
PU_SG_PU_3_1_n1_1 mean 1.142652 0.144138 2.058870
std 0.003046 1.800711 0.005289
PU_SG_PU_3_n1_1_1 mean 1.142349 0.143498 2.058926
std 0.003032 1.801116 0.009501
SU_SG_SU_1_0_0_1 mean 2.652524 2.356759 2.986923
std 0.000042 0.134550 0.006770
SU_SG_SU_3_0_0_1 mean 1.143807 -0.785126 0.154275
std 0.000078 0.134263 0.006839

Model results

For each fit, the model output (AF-\(\beta_{LM}\) parameter) can also be investigated…

Basic plotters

Basic per-fit outputs using wrapped ePSproc routines.

[32]:
# Plot selected results from per-fit datasets.
data.BLMfitPlot(keys = ['subset', 0, 50])  # This works, but need to add selectors? And control output.
Dataset: subset, AFBLM
Dataset: 0, AFBLM
Dataset: 50, AFBLM
../_images/fitting_PEMtk_fitting_multiproc_class_analysis_141121-tidy_54_1.png
[33]:
# Per-fit strip-chart style plotting (inc. all labelled dims) with lmPlot() wrapper.
# NOTE: self.lmPlotOpts will be set and is sticky. (As per epsproc-dev version 23/11/21)
# data.lmPlotFit(keys = ['subset', 0, 50])
data.lmPlotFit(keys = [50])   #, xDim=['t'])
Plotting data (No filename), pType=a, thres=0.01, with Seaborn
D:\code\github\ePSproc\epsproc\_sns_matrixMod.py:282: PendingDeprecationWarning:
The label function will be deprecated in a future version. Use Tick.label1 instead.
  fontsize = tick.label.get_size()
../_images/fitting_PEMtk_fitting_multiproc_class_analysis_141121-tidy_55_2.png

Fit set plotters

For large sets the BLMsetPlot() method implements Holoviews to allow for fit browsing and aggregations.

For manual examples with Seaborn (not yet wrapped), see https://pemtk.readthedocs.io/en/latest/fitting/PEMtk_analysis_demo_150621-tidy.html#Model-results

[34]:
# The default routine aggreagates over all fits (mean + std for spread) and stacks plots
# Set xDim to ensure consistent plotting
data.BLMsetPlot(xDim = 't')
[35]:
# Browse a set of fits interactively.
# Note this may be slow to render for large datasets (==minutes) if ref data is included.

# Get indexes for a group
inds = data.getFitInds(selectors = {'redchiGroup':'C'} )

# HV plot without aggregation - this will automatically Holomap other dims
data.BLMsetPlot(xDim = 't', sel = {'Fit':inds}, agg = False, ref = False)
Mask not set for dataType = None.

Comparison of parameters/matrix elements with inputs (if known)

For model data and testing the final parameter set(s) can be compared with the known inputs. This is facilitated by the paramsCompare() method. This compares the data in self.paramsSummary with self.params. If not set, the routine will attempt to set these from defaults (for input values this may be in self.data['fits']['dfRef'] or will be pulled from the matrix elements in self.data['subset']['matE']).

TODO: generalise the routine for comparison of arb pairs/sets of parameters.

[36]:
# Run with defaults.
# Note this will produce extra output if setting self.params, as in this example
data.paramsCompare()
self.params not set, setting ref values from self.data[subset]['matE'] without constraints.
Set 6 complex matrix elements to 12 fitting params, see self.params for details.
name value initial value min max vary
m_PU_SG_PU_1_n1_1_1 1.78461575 1.784615753610107 1.0000e-04 5.00000000 True
m_PU_SG_PU_1_1_n1_1 1.78461575 1.784615753610107 1.0000e-04 5.00000000 True
m_PU_SG_PU_3_n1_1_1 0.80290495 0.802904951323892 1.0000e-04 5.00000000 True
m_PU_SG_PU_3_1_n1_1 0.80290495 0.802904951323892 1.0000e-04 5.00000000 True
m_SU_SG_SU_1_0_0_1 2.68606212 2.686062120382649 1.0000e-04 5.00000000 True
m_SU_SG_SU_3_0_0_1 1.10915311 1.109153108617096 1.0000e-04 5.00000000 True
p_PU_SG_PU_1_n1_1_1 -0.86104140 -0.8610414024232179 -3.14159265 3.14159265 False
p_PU_SG_PU_1_1_n1_1 -0.86104140 -0.8610414024232179 -3.14159265 3.14159265 True
p_PU_SG_PU_3_n1_1_1 -3.12044446 -3.1204444620772467 -3.14159265 3.14159265 True
p_PU_SG_PU_3_1_n1_1 -3.12044446 -3.1204444620772467 -3.14159265 3.14159265 True
p_SU_SG_SU_1_0_0_1 2.61122920 2.611229196458127 -3.14159265 3.14159265 True
p_SU_SG_SU_3_0_0_1 -0.07867828 -0.07867827542158025 -3.14159265 3.14159265 True
Setting wide-form data self.[fits][dfWide] from self.[fits][dfLong] (as pivot table).
*** Warning: found MultiIndex for DataFrame data.index - checkDims doesn't yet support Pandas MultiIndex.
*** Warning: found MultiIndex for DataFrame data.index - checkDims doesn't yet support Pandas MultiIndex.
Index(es) = ['Fit', 'Type'], cols = value
Set ref param = PU_SG_PU_1_1_n1_1
Param PU_SG_PU_1_1_n1_1 PU_SG_PU_1_n1_1_1 PU_SG_PU_3_1_n1_1 PU_SG_PU_3_n1_1_1 SU_SG_SU_1_0_0_1 SU_SG_SU_3_0_0_1
Fit Type
ref m 1.784616 1.784616 0.802905 0.802905 2.686062 1.109153
p -0.861041 -0.861041 -3.120444 -3.120444 2.611229 -0.078678
pc 0.000000 0.000000 -2.259403 -2.259403 -2.810915 0.782363
C:\Users\femtolab\.conda\envs\ePSdev\lib\site-packages\IPython\core\interactiveshell.py:2886: PerformanceWarning: indexing past lexsort depth may impact performance.
  return runner(coro)
Set parameter comparison to self.paramsSummaryComp.
Param PU_SG_PU_1_1_n1_1 PU_SG_PU_1_n1_1_1 PU_SG_PU_3_1_n1_1 PU_SG_PU_3_n1_1_1 SU_SG_SU_1_0_0_1 SU_SG_SU_3_0_0_1
Type Source dType
m mean num 1.613850 1.613570e+00 1.142652 1.142349 2.652524 1.143807
ref num 1.784616 1.784616e+00 0.802905 0.802905 2.686062 1.109153
diff % 10.581260 1.060047e+01 29.733209 29.714546 1.264393 3.029722
num -0.170766 -1.710460e-01 0.339747 0.339444 -0.033538 0.034654
std % 0.129194 1.364913e-01 0.266573 0.265457 0.001599 0.006777
num 0.002085 2.202383e-03 0.003046 0.003032 0.000042 0.000078
diff/std % 8190.232468 7.766406e+03 11153.883911 11193.713369 79089.050126 44706.225465
p mean num -0.859382 -8.610414e-01 0.144138 0.143498 2.356759 -0.785126
ref num -0.861041 -8.610414e-01 -3.120444 -3.120444 2.611229 -0.078678
diff % 0.193090 4.512885e-13 2264.893304 2274.558140 10.797481 89.978902
num 0.001659 3.885781e-15 3.264583 3.263942 -0.254471 -0.706448
std % 1.635940 0.000000e+00 1249.292494 1255.151491 5.709133 17.100819
num 0.014059 0.000000e+00 1.800711 1.801116 0.134550 0.134263
diff/std % 11.802998 inf 181.294078 181.217818 189.126456 526.167201
pc mean num 0.000000 5.372290e-03 2.058870 2.058926 2.986923 0.154275
ref num 0.000000 0.000000e+00 -2.259403 -2.259403 -2.810915 0.782363
diff % NaN 1.000000e+02 209.739951 209.736965 194.107385 407.123575
num 0.000000 5.372290e-03 4.318273 4.318329 5.797837 -0.628088
std % NaN 2.437070e+02 0.256878 0.461470 0.226641 4.433055
num 0.000000 1.309265e-02 0.005289 0.009501 0.006770 0.006839
diff/std % NaN 4.103288e+01 81649.662014 45449.762175 85645.382379 9183.815020
[37]:
# With abs phaseCorrection flag set
data.paramsCompare(phaseCorrParams={'absFlag':True})
Setting wide-form data self.[fits][dfWide] from self.[fits][dfLong] (as pivot table).
*** Warning: found MultiIndex for DataFrame data.index - checkDims doesn't yet support Pandas MultiIndex.
*** Warning: found MultiIndex for DataFrame data.index - checkDims doesn't yet support Pandas MultiIndex.
Index(es) = ['Fit', 'Type'], cols = value
Set ref param = PU_SG_PU_1_1_n1_1
Param PU_SG_PU_1_1_n1_1 PU_SG_PU_1_n1_1_1 PU_SG_PU_3_1_n1_1 PU_SG_PU_3_n1_1_1 SU_SG_SU_1_0_0_1 SU_SG_SU_3_0_0_1
Fit Type
ref m 1.784616 1.784616 0.802905 0.802905 2.686062 1.109153
p -0.861041 -0.861041 -3.120444 -3.120444 2.611229 -0.078678
pc 0.000000 0.000000 2.259403 2.259403 2.810915 0.782363
C:\Users\femtolab\.conda\envs\ePSdev\lib\site-packages\IPython\core\interactiveshell.py:2886: PerformanceWarning: indexing past lexsort depth may impact performance.
  return runner(coro)
Set parameter comparison to self.paramsSummaryComp.
Param PU_SG_PU_1_1_n1_1 PU_SG_PU_1_n1_1_1 PU_SG_PU_3_1_n1_1 PU_SG_PU_3_n1_1_1 SU_SG_SU_1_0_0_1 SU_SG_SU_3_0_0_1
Type Source dType
m mean num 1.613850 1.613570e+00 1.142652 1.142349 2.652524 1.143807
ref num 1.784616 1.784616e+00 0.802905 0.802905 2.686062 1.109153
diff % 10.581260 1.060047e+01 29.733209 29.714546 1.264393 3.029722
num -0.170766 -1.710460e-01 0.339747 0.339444 -0.033538 0.034654
std % 0.129194 1.364913e-01 0.266573 0.265457 0.001599 0.006777
num 0.002085 2.202383e-03 0.003046 0.003032 0.000042 0.000078
diff/std % 8190.232468 7.766406e+03 11153.883911 11193.713369 79089.050126 44706.225465
p mean num -0.859382 -8.610414e-01 0.144138 0.143498 2.356759 -0.785126
ref num -0.861041 -8.610414e-01 -3.120444 -3.120444 2.611229 -0.078678
diff % 0.193090 4.512885e-13 2264.893304 2274.558140 10.797481 89.978902
num 0.001659 3.885781e-15 3.264583 3.263942 -0.254471 -0.706448
std % 1.635940 0.000000e+00 1249.292494 1255.151491 5.709133 17.100819
num 0.014059 0.000000e+00 1.800711 1.801116 0.134550 0.134263
diff/std % 11.802998 inf 181.294078 181.217818 189.126456 526.167201
pc mean num 0.000000 5.372290e-03 2.058870 2.058926 2.986923 0.154275
ref num 0.000000 0.000000e+00 2.259403 2.259403 2.810915 0.782363
diff % NaN 1.000000e+02 9.739951 9.736965 5.892615 407.123575
num 0.000000 5.372290e-03 -0.200533 -0.200477 0.176008 -0.628088
std % NaN 2.437070e+02 0.256878 0.461470 0.226641 4.433055
num 0.000000 1.309265e-02 0.005289 0.009501 0.006770 0.006839
diff/std % NaN 4.103288e+01 3791.665217 2109.989164 2599.979908 9183.815020

In this particular case a few points of note:

  • The final results are generally quite close to the inputs. (In this case, the input dataset had reference data had ~10% random noise added, and the 1st phase was fixed as a reference during fitting; all other parameters were free.)

  • The absolute phase values appear to be quite far off in some cases, but the phase corrected values typically look quite good; this is consistent with a lack of sensitivity in the test dataset to the sign of the phases (as discussed elsewhere), but the remapped phases (\(0:\pi\)) are precise.

  • The differences between the data and reference values are much larger than the standard deviation of the fits. This is indicative of a good (==close/singular) batch of fits, but reveals that the results obtained are not perfect - as expected for noisey data. Adding more data-points to the fit, and/or using higher fidelity data would help in this case (and will be explored elsewhere in future).

  • Colour coding and further analysis may help with interpretation, see below for a quick demo.

[84]:
# Try checking full DF, rather than subset/slice - WITH ADDTIONS
# See, e.g., https://stackoverflow.com/questions/50220200/conditional-styling-in-pandas-using-other-columns

# Basically, set mask then styles in new dataframe.
import pandas as pd
idx = pd.IndexSlice
# This works for %/abs pairs
# Groupby to set for type subsets?
# Code from https://stackoverflow.com/questions/50220200/conditional-styling-in-pandas-using-other-columns
def select_pairs(x, props = '', bg = ''):
    #compare columns
    mask = idx[x.loc[idx[:,:,'%'],:] > 50] .droplevel('dType')
#     mask = idx[data.paramsSummaryComp.loc[idx[:,:,'%'],:] > 50]
#     mask = idx[x.loc[idx[:,:,'%']] > 50]   # Select and drop - only works in pd >v1.2?
    #DataFrame with same index and columns names as original filled empty strings
    df1 =  pd.DataFrame(bg, index=x.index, columns=x.columns)
    #modify values of df1 column by boolean mask
#     df1.where(mask, props, inplace = True)
    df1.where(~(mask.reindex_like(x).fillna(False)), props, inplace = True)

    return df1


# dfT.style.apply(highlight_cols, props='color:white;background-color:darkblue', axis=None)  # OK
# data.paramsSummaryComp.style.apply(select_pairs, props='color:white;background-color:darkblue', axis=None)  # OK
data.paramsSummaryComp.style.apply(select_pairs, props='background-color:orange',axis=None)  # OK
[84]:
Param PU_SG_PU_1_1_n1_1 PU_SG_PU_1_n1_1_1 PU_SG_PU_3_1_n1_1 PU_SG_PU_3_n1_1_1 SU_SG_SU_1_0_0_1 SU_SG_SU_3_0_0_1
Type Source dType
m mean num 1.613850 1.613570 1.142652 1.142349 2.652524 1.143807
ref num 1.784616 1.784616 0.802905 0.802905 2.686062 1.109153
diff % 10.581260 10.600471 29.733209 29.714546 1.264393 3.029722
num -0.170766 -0.171046 0.339747 0.339444 -0.033538 0.034654
std % 0.129194 0.136491 0.266573 0.265457 0.001599 0.006777
num 0.002085 0.002202 0.003046 0.003032 0.000042 0.000078
diff/std % 8190.232468 7766.405823 11153.883911 11193.713369 79089.050126 44706.225465
p mean num -0.859382 -0.861041 0.144138 0.143498 2.356759 -0.785126
ref num -0.861041 -0.861041 -3.120444 -3.120444 2.611229 -0.078678
diff % 0.193090 0.000000 2264.893304 2274.558140 10.797481 89.978902
num 0.001659 0.000000 3.264583 3.263942 -0.254471 -0.706448
std % 1.635940 0.000000 1249.292494 1255.151491 5.709133 17.100819
num 0.014059 0.000000 1.800711 1.801116 0.134550 0.134263
diff/std % 11.802998 inf 181.294078 181.217818 189.126456 526.167201
pc mean num 0.000000 0.005372 2.058870 2.058926 2.986923 0.154275
ref num 0.000000 0.000000 2.259403 2.259403 2.810915 0.782363
diff % nan 100.000000 9.739951 9.736965 5.892615 407.123575
num 0.000000 0.005372 -0.200533 -0.200477 0.176008 -0.628088
std % nan 243.707016 0.256878 0.461470 0.226641 4.433055
num 0.000000 0.013093 0.005289 0.009501 0.006770 0.006839
diff/std % nan 41.032877 3791.665217 2109.989164 2599.979908 9183.815020

Versions

[39]:
import scooby
scooby.Report(additional=['epsproc', 'pemtk', 'xarray', 'jupyter'])
[39]:
Tue Dec 07 10:44:58 2021 Eastern Standard Time
OS Windows CPU(s) 32 Machine AMD64
Architecture 64bit RAM 63.9 GB Environment Jupyter
Python 3.7.3 (default, Apr 24 2019, 15:29:51) [MSC v.1915 64 bit (AMD64)]
epsproc 1.3.1-dev pemtk 0.0.1 xarray 0.15.0
jupyter Version unknown numpy 1.19.2 scipy 1.3.0
IPython 7.12.0 matplotlib 3.3.1 scooby 0.5.6
Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191125 for Intel(R) 64 architecture applications
[85]:
# Check current Git commit for local pemtk version
!git -C {Path(pemtk.__file__).parent} branch
!git -C {Path(pemtk.__file__).parent} log --format="%H" -n 1
* master
C:\Users\femtolab\.conda\envs\ePSdev\lib\site-packages\IPython\utils\_process_win32.py:131: ResourceWarning: unclosed file <_io.BufferedWriter name=8>
  return process_handler(cmd, _system_body)
C:\Users\femtolab\.conda\envs\ePSdev\lib\site-packages\IPython\utils\_process_win32.py:131: ResourceWarning: unclosed file <_io.BufferedReader name=9>
  return process_handler(cmd, _system_body)
C:\Users\femtolab\.conda\envs\ePSdev\lib\site-packages\IPython\utils\_process_win32.py:131: ResourceWarning: unclosed file <_io.BufferedReader name=10>
  return process_handler(cmd, _system_body)
4d97c2d50d362168fd720c8c357b0f6e1eb66c3f
[86]:
# Check current remote commits
!git ls-remote --heads git://github.com/phockett/pemtk
# !git ls-remote --heads git://github.com/phockett/epsman
4d97c2d50d362168fd720c8c357b0f6e1eb66c3f        refs/heads/master
[87]:
# Check current Git commit for local ePSproc version
import epsproc as ep
!git -C {Path(ep.__file__).parent} branch
!git -C {Path(ep.__file__).parent} log --format="%H" -n 1
* dev
  master
  numba-tests
5bfa0c6c47ed5f23eff65a290624ffff4e9c5924
[88]:
# Check current remote commits
!git ls-remote --heads git://github.com/phockett/ePSproc
# !git ls-remote --heads git://github.com/phockett/epsman
8fb518c63bfb4edc49ef994808f1c18fb68ff0c3        refs/heads/dependabot/pip/notes/envs/envs-versioned/pip-21.1
5bfa0c6c47ed5f23eff65a290624ffff4e9c5924        refs/heads/dev
5278f0389878f6e7e5eb9b0c013ff3594c70938f        refs/heads/master
69cd89ce5bc0ad6d465a4bd8df6fba15d3fd1aee        refs/heads/numba-tests
ea30878c842f09d525fbf39fa269fa2302a13b57        refs/heads/revert-9-master
[ ]: