Fit fidelity & analysis

22/06/21

Here we’ll look in detail at a batch of fit results (see the batch run notebook for initial setup & running fits), and stats.

TODO: final uncertainty analysis, bootstrapping and testing with noise etc.

Batch results

Generally, there are many things to consider in terms of the quality/fidelity of the fit results. In particular, we might want to investigate:

\(\chi^2\) values.
Estimated uncertainties.
Uniqueness of fit.

This is generally aided by having a large batch of fit results, with randomised initial parameters, to allow for statistical analysis and ensure a full probing of the solution hyper-space.

For further discussion, see, for example:

[1]

Hockett, Quantum Metrology with Photoelectrons, Volume 2: Applications and advances. IOP Publishing, 2018. doi: 10.1088/978-1-6817-4688-3.

[2]

Hockett, “Photoionization dynamics of polyatomic molecules,” University of Nottingham, 2009. Available: http://eprints.nottingham.ac.uk/10857/

Load sample dataset

[1]:

# If running from scratch, create a blank object first
# # Init blank object
import pemtk as pm
from pemtk.fit.fitClass import pemtkFit
data = pemtkFit()

*** ePSproc installation not found, setting for local copy.

[2]:

# Load sample dataset
# Full path to the file may be required here, in repo/demos/fitting
import pickle
from pathlib import Path

dataFile = 'dataDump_100fitTests_10t_randPhase_130621.pickle'
dataPath = Path(pm.__path__[0]).parent/Path('demos','fitting')

with open( dataPath/dataFile, 'rb') as handle:
    data.data = pickle.load(handle)

[3]:

# The sample data dictionary contains 100 fits, as well as the data used to set things up.
data.data.keys()

[3]:

dict_keys(['orb6', 'orb5', 'ADM', 'pol', 'subset', 'sim', 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])

[4]:

# Set max fit ind for reference
data.fitInd = 99

# Set reference values from input matrix elements (data.data['subset']['matE'])
data.setMatEFit()

Set 6 complex matrix elements to 12 fitting params, see self.params for details.

name	value	initial value	min	max	vary
m_PU_SG_PU_1_n1_1	1.78461575	1.784615753610107	1.0000e-04	5.00000000	True
m_PU_SG_PU_1_1_n1	1.78461575	1.784615753610107	1.0000e-04	5.00000000	True
m_PU_SG_PU_3_n1_1	0.80290495	0.802904951323892	1.0000e-04	5.00000000	True
m_PU_SG_PU_3_1_n1	0.80290495	0.802904951323892	1.0000e-04	5.00000000	True
m_SU_SG_SU_1_0_0	2.68606212	2.686062120382649	1.0000e-04	5.00000000	True
m_SU_SG_SU_3_0_0	1.10915311	1.109153108617096	1.0000e-04	5.00000000	True
p_PU_SG_PU_1_n1_1	-0.86104140	-0.8610414024232179	-3.14159265	3.14159265	False
p_PU_SG_PU_1_1_n1	-0.86104140	-0.8610414024232179	-3.14159265	3.14159265	True
p_PU_SG_PU_3_n1_1	-3.12044446	-3.1204444620772467	-3.14159265	3.14159265	True
p_PU_SG_PU_3_1_n1	-3.12044446	-3.1204444620772467	-3.14159265	3.14159265	True
p_SU_SG_SU_1_0_0	2.61122920	2.611229196458127	-3.14159265	3.14159265	True
p_SU_SG_SU_3_0_0	-0.07867828	-0.07867827542158025	-3.14159265	3.14159265	True

Process results

Here we’ll use Seaborn and Holoviews to look at the fit results outputs from lmfit; for this, the results will first be restacked to a long-form Pandas data-frame.

[5]:

# Additional import for data analysis
import xarray as xr
import numpy as np
import pandas as pd
import string

# pd.options.display.max_rows = 50
pd.set_option("display.max_rows", 50)

# For AFBLM > PD conversion
import epsproc as ep

# For plotting
import seaborn as sns
import holoviews as hv
from holoviews import opts

# Some additional default plot settings
# TODO: should set versino for PEMtk, just sets various default plotters & HV backends.
from epsproc.plot import hvPlotters
hvPlotters.setPlotters()

[6]:

# The converter currently returns directly, not to the class.
dfLong, dfRef = data.pdConv()

[7]:

# The long-format data includes per fit and per parameter data
dfLong

[7]:

			value	stderr	vary	expr	Param	success	chisqr	redchi
Fit	Type	pn
0	m	0	1.89405	0.00952422	False	m_PU_SG_PU_1_1_n1	PU_SG_PU_1_n1_1	True	9.83239e-05	5.25796e-07
		1	1.89405	0.00952422	True	None	PU_SG_PU_1_1_n1	True	9.83239e-05	5.25796e-07
		2	0.489781	0.0369748	False	m_PU_SG_PU_3_1_n1	PU_SG_PU_3_n1_1	True	9.83239e-05	5.25796e-07
		3	0.489781	0.0369748	True	None	PU_SG_PU_3_1_n1	True	9.83239e-05	5.25796e-07
		4	2.57267	0.0125781	True	None	SU_SG_SU_1_0_0	True	9.83239e-05	5.25796e-07
...	...	...	...	...	...	...	...	...	...	...
98	p	7	2.46926	23279.8	True	None	PU_SG_PU_1_1_n1	True	0.000136565	7.30294e-07
		8	0.1535	23279.8	False	p_PU_SG_PU_3_1_n1	PU_SG_PU_3_n1_1	True	0.000136565	7.30294e-07
		9	0.1535	23279.8	True	None	PU_SG_PU_3_1_n1	True	0.000136565	7.30294e-07
		10	-1.04185	23279.8	True	None	SU_SG_SU_1_0_0	True	0.000136565	7.30294e-07
		11	2.14965	23279.8	True	None	SU_SG_SU_3_0_0	True	0.000136565	7.30294e-07

1188 rows × 8 columns

For the fit/model results (AF-\(\beta_{LM}\) parameters), the output is an Xarray per fit, these can be stacked to a dataset for batched analysis.

[8]:

# Stack fits to Xarrays
# Don't recall the best way to do this - here pass to list then stack directly

# AFstack = xr.DataArray()
AFstack = []

for n in range(0, data.fitInd):
    AFstack.append(data.data[n]['AFBLM'].expand_dims({'Fit':[n]})) # NOTE [n] here, otherwise gives length not coord
                                                                   # http://xarray.pydata.org/en/stable/generated/xarray.DataArray.expand_dims.html

AFxr = xr.concat(AFstack,'Fit')

… and further convert to Pandas for Seaborn plotting…

[9]:

AFpd, AFxrRS = ep.multiDimXrToPD(AFxr.squeeze().pipe(np.abs), colDims=['t'],
                                 thres = 1e-4, fillna=True)

AFpdLong = AFpd.reset_index().melt(id_vars=['Fit','l','m'])  # This works for pushing to full long-format with col by t

AFpdLong

[9]:

	Fit	l	m	t	value
0	0	0	0	4.02	1.668968
1	0	2	0	4.02	0.924928
2	0	4	0	4.02	0.147711
3	0	6	0	4.02	0.007405
4	1	0	0	4.02	1.668803
...	...	...	...	...	...
5143	97	6	0	4.96	0.010057
5144	98	0	0	4.96	1.599609
5145	98	2	0	4.96	0.947416
5146	98	4	0	4.96	0.088204
5147	98	6	0	4.96	0.009020

5148 rows × 5 columns

Quick overviews

Panda’s describe() method gives some summary details, which may (or may not) be useful here.

[10]:

dfLong.describe()

[10]:

	value	stderr	vary	expr	Param	success	chisqr	redchi
count	1188.000000	1116.000000	1188	396	1188	1188	1188.000000	1.188000e+03
unique	790.000000	1116.000000	2	4	6	2	99.000000	9.900000e+01
top	3.141593	47132.056337	True	m_PU_SG_PU_3_1_n1	SU_SG_SU_3_0_0	True	0.000098	7.302939e-07
freq	3.000000	1.000000	792	99	198	1176	12.000000	1.200000e+01

\(\chi^2\)

Any datatype (column) can be plotted… here are the \(\chi^2\) histograms (note the frquencies are currently per parameter, not per fit, here).

[11]:

dfLong['redchi'].hist(bins=50)  # Plot redchi histogram - note this currently reports fits x params values.
# dfLong.hist(column='redchi', by='Fit')  # Should work for per-fit results, but not currently.

[11]:

<AxesSubplot:>

../_images/fitting_PEMtk_analysis_demo_150621-tidy_19_1.png

[12]:

# Zoom in on the lowest set via a selection mask

mask = dfLong['redchi']<1e-5

dfLong['redchi'][mask].hist(bins=20)

[12]:

<AxesSubplot:>

../_images/fitting_PEMtk_analysis_demo_150621-tidy_20_1.png

[13]:

# Interactive version with Holoviews - scatter plot + histograms
# For additional control, calculate histograms first, then use hv.Histogram()
# See http://holoviews.org/reference/elements/bokeh/Histogram.html
hv.Scatter(dfLong['redchi'][mask].reset_index(), kdims='redchi').hist(dimension=['redchi','Fit'])

[13]:

In this case, it looks like three \(\chi^2\) groupings (local minima) are consistently found, suggesting three good fit candidate parameter sets, with possibly another two rare candidates.

For easy reference later, these can be categorised in the data table:

[14]:

# Set groupings and label by letter
dfLong['pGroups'] = pd.cut(dfLong['redchi'], bins = np.linspace(0,1e-6,10), labels = list(string.ascii_uppercase[0:9]))

# Check result - note this is ordered by first appearance in the dataFrame, unless sorted first. Also NaNs are dropped.
dfLong['pGroups'][mask].sort_values().hist(bins=20)

[14]:

<AxesSubplot:>

../_images/fitting_PEMtk_analysis_demo_150621-tidy_24_1.png

Testing local and global minimum candidates

We can look more carefully by looking at the unique values within a group…

[15]:

# mask = dfLong['redchi']<3e-7
mask = dfLong['pGroups'] == 'A'

# Tabulate
print(f"Total results: {mask.sum()}, unique values: {dfLong['redchi'][mask].unique().size}")
print(dfLong['redchi'][mask].unique())

# Plot
# hv.Curve(dfLong['redchi'][mask].unique()) # Linear scale
hv.Curve(dfLong['redchi'][mask].apply(lambda x: np.log10(x)).unique())  # Log10 scale

Total results: 228, unique values: 19
[1.2604134959527601e-18 5.3106194674890043e-20 4.303597576031393e-18
 2.494886350900348e-14 1.7186846944721688e-18 8.259039836499798e-19
 1.7837416345305325e-20 1.4191997193877506e-20 7.204261384699086e-19
 7.760645030848521e-21 5.393095140259509e-28 3.345117650051837e-16
 3.7633462560795635e-22 1.5760735489756745e-22 2.469959625269985e-09
 3.4643812630697878e-22 5.7835502902223365e-24 1.2442220028745357e-18
 7.6046693509239e-23]

[15]:

For this group, the candidate parameter sets typically give orders of magnitude lower \(\chi^2\) than the higher groups, so this group is likely a genuine global minimum in the solution space. This will be investigated further below. (Note that, in this demo case where a “perfect” fit is possible, very low \(\chi^2\) values like this might be expected; more generally, with real/noisy/imperfect datasets, there may not be such an obvious “best fit” parameter grouping.)

Per-fit data spread & correlations

Returning to the full batch of results, it is also useful to look at the spread and potential correlations of the paramters themselves.

[16]:

# For per-fit analysis, set also a "wide" format table
dfWide = dfLong.reset_index().pivot_table(columns = 'Param', values = ['value'], index=['Fit','Type','pGroups'],aggfunc=np.sum)

dfWide

[16]:

			value
		Param	PU_SG_PU_1_1_n1	PU_SG_PU_1_n1_1	PU_SG_PU_3_1_n1	PU_SG_PU_3_n1_1	SU_SG_SU_1_0_0	SU_SG_SU_3_0_0
Fit	Type	pGroups
0	m	E	1.894049	1.894049	0.489781	0.489781	2.572670	1.352896
0	p	E	-3.059175	-3.059175	-0.673191	-0.673191	-0.763252	3.120775
1	m	G	1.582893	1.582893	1.150754	1.150754	2.710144	1.048704
1	p	G	2.822310	2.822310	-1.145108	-1.145108	0.050245	-3.141212
2	m	G	1.582889	1.582889	1.150759	1.150759	2.710143	1.048707
...	...	...	...	...	...	...	...	...
96	p	E	-2.675860	-2.675860	1.221342	1.221342	1.311402	-2.572625
97	m	E	1.894049	1.894049	0.489781	0.489781	2.572669	1.352897
97	p	E	3.038308	3.038308	0.652325	0.652325	0.742385	3.141543
98	m	G	1.582888	1.582888	1.150761	1.150761	2.710142	1.048709
98	p	G	2.469261	2.469261	0.153500	0.153500	-1.041850	2.149648

182 rows × 6 columns

[17]:

# Look at parameter correlations

sns.pairplot(dfWide.reset_index().drop('Fit', axis=1), hue='Type') # Drop Fit #

C:\Users\femtolab\.conda\envs\ePSdev\lib\site-packages\pandas\core\generic.py:3936: PerformanceWarning: dropping on a non-lexsorted multi-index without a level parameter may impact performance.
  obj = obj._drop_axis(labels, axis, level=level, errors=errors)

[17]:

<seaborn.axisgrid.PairGrid at 0x204e828e0f0>

../_images/fitting_PEMtk_analysis_demo_150621-tidy_30_2.png

This indicates that there is generally a good (narrow) distirbution of magnitudes (m) in the batch of results, but the phases (p) are broadly distributed, although appear to be strongly correlated.

In this case, this is (somewhat) expected, since the reference phase was also randomised per fit… here the correlations are interesting however, since this indicates that (as we might hope) the relative phases are well-defined.

[18]:

dfWide['value','PU_SG_PU_1_n1_1'].xs('p',level=1).hist()

[18]:

<AxesSubplot:>

../_images/fitting_PEMtk_analysis_demo_150621-tidy_32_1.png

[19]:

# We can further group or filter by the defined candidate parameter sets
# sns.pairplot(dfWide.xs('m',level=1).reset_index().drop('Fit', axis=1), hue='pGroups') # Drop Fit, magnitudes only
sns.pairplot(dfWide.reset_index().drop('Fit', axis=1), hue='pGroups')

C:\Users\femtolab\.conda\envs\ePSdev\lib\site-packages\pandas\core\generic.py:3936: PerformanceWarning: dropping on a non-lexsorted multi-index without a level parameter may impact performance.
  obj = obj._drop_axis(labels, axis, level=level, errors=errors)

[19]:

<seaborn.axisgrid.PairGrid at 0x204eae25cc0>

../_images/fitting_PEMtk_analysis_demo_150621-tidy_33_2.png

There’s a lot going on in this visualisation, but - broadly speaking - it is quite clear that the main distributions (diagonal) are group-dependent.

Model results

For each fit, the model output (AF-\(\beta_{LM}\) parameter) can also be investigated…

[21]:

# If Fit not mapped to style/size etc. it gives error bar
# More details at https://seaborn.pydata.org/tutorial/relational.html#aggregation-and-representing-uncertainty
sns.relplot(data = AFpdLong, x='t', y='value', hue='l', style='m', kind='line', marker = 'x')

# No error bar with this dataset?  Odd - worked in testing. Maybe only a couple of outliers/bad fits?

[21]:

<seaborn.axisgrid.FacetGrid at 0x204eb6f0fd0>

../_images/fitting_PEMtk_analysis_demo_150621-tidy_36_1.png

[22]:

# Without aggregation and groupby Fit - this will likely make sense where there is a lot to show!
sns.relplot(data = AFpdLong, x='t', y='value', hue='l', style="m", estimator=None, units="Fit", kind='line')

[22]:

<seaborn.axisgrid.FacetGrid at 0x204ed9524e0>

../_images/fitting_PEMtk_analysis_demo_150621-tidy_37_1.png

[23]:

# Add pGroups to the AFBLM listing to allow for sub-selection...

# Fit > pGroups mapping, from https://stackoverflow.com/questions/58025433/drop-duplicates-based-on-first-level-column-in-multiindex-dataframe
# dfLong.reset_index().loc[dfLong.reset_index()['Fit'].drop_duplicates().index]
dfFits = dfLong.reset_index().loc[dfLong.reset_index()['Fit'].drop_duplicates().index].set_index('Fit')
AFpdLong = AFpdLong.merge(dfFits['pGroups'].reset_index(), on='Fit', how='left')

# Could also try as a mapping with a dict, https://datascience.stackexchange.com/questions/39773/mapping-column-values-of-one-dataframe-to-another-dataframe-using-a-key-with-dif
# Should be other ways too, maybe with multiindex slices?

[24]:

# Without aggregation and groupby Fit, hue by pGroups
sns.relplot(data = AFpdLong, x='t', y='value', hue='pGroups', style="l", estimator=None, units="Fit", kind='line')
# sns.relplot(data = AFpdLong, x='t', y='value', hue='pGroups', style="l", estimator=None, units="pGroups", kind='line')
# sns.relplot(data = AFpdLong, x='t', y='value', hue='pGroups', style="l", kind='line')

[24]:

<seaborn.axisgrid.FacetGrid at 0x204e82867f0>

../_images/fitting_PEMtk_analysis_demo_150621-tidy_39_1.png

In this case - where only the best fits are assigned to groups - there is little to see or choose from!

NOTE: some work to do on the manipulation & plotting here.

Parameter estimation & fidelity

For the test case we can, of course, compare the fit results to the known input parameters. This should give a feel for how well the data defines the matrix elements (parameters) in this case…

[25]:

# For multiindex case, above code throw missing data (columns) errors, but seems OK with a reset_index()
# This basically forces the indexs to long format.
g = sns.catplot(x='Param', y='value', hue = 'Type', data = dfLong.reset_index(), kind='box')
g.set_xticklabels(rotation=-60)

# Add ref case
sns.scatterplot(x='Param', y='value', hue = 'Type', style = 'Type', data = dfRef.reset_index(), legend=False)

[25]:

<AxesSubplot:xlabel='Param', ylabel='value'>

../_images/fitting_PEMtk_analysis_demo_150621-tidy_42_1.png

This indicates that most of the magnitudes are well-defined, with a fairly low spread of values, and averages right at the reference values, but - as per note above - the (absolute) phases are undefined. (Note the O and X markers here show the reference results.)

We can refine things further by applying the \(\chi^2\) maskings or groupings from above, and we’ll look at just the magnitudes. This reduces the statistical uncertainties.

[109]:

# With masking for best fits only
pType = 'm'
# mask = dfLong['redchi']<1e-5
mask = dfLong['pGroups']=='A'

# g = sns.catplot(x='Param', y='value', hue = 'Type', data = dfLong[mask].reset_index(), kind='box')
# g = sns.catplot(x='Param', y='value', hue = 'redchi', data = dfLong[mask].xs('m',level=1).reset_index(), kind='box')  # TODO: fix cmapping here to allow sub-cat
g = sns.catplot(x='Param', y='value', data = dfLong[mask].xs(pType,level=1).reset_index(), kind='box') # With mask
# g = sns.catplot(x='Param', y='value', row='pGroups', data = dfLong.xs(pType,level=1).reset_index(), kind='box')  # Rows
g.set_xticklabels(rotation=-60)

# Add ref case
# sns.scatterplot(x='Param', y='value', hue = 'Type', style = 'Type', data = dfRef.reset_index(), legend=False)
sns.scatterplot(x='Param', y='value', data = dfRef.xs(pType,level=1).reset_index(), legend=False, marker = 'x', color='red', s=50)

[109]:

<AxesSubplot:xlabel='Param', ylabel='value'>

../_images/fitting_PEMtk_analysis_demo_150621-tidy_44_1.png

For parameter set A, things look excellent for the magnitudes.

[110]:

# With masking for best fits only
pType = 'm'
# mask = dfLong['redchi']<1e-5
mask = dfLong['pGroups']=='E'

# g = sns.catplot(x='Param', y='value', hue = 'Type', data = dfLong[mask].reset_index(), kind='box')
# g = sns.catplot(x='Param', y='value', hue = 'redchi', data = dfLong[mask].xs('m',level=1).reset_index(), kind='box')  # TODO: fix cmapping here to allow sub-cat
g = sns.catplot(x='Param', y='value', data = dfLong[mask].xs(pType,level=1).reset_index(), kind='box') # With mask
# g = sns.catplot(x='Param', y='value', row='pGroups', data = dfLong.xs(pType,level=1).reset_index(), kind='box')  # Rows
g.set_xticklabels(rotation=-60)

# Add ref case
# sns.scatterplot(x='Param', y='value', hue = 'Type', style = 'Type', data = dfRef.reset_index(), legend=False)
sns.scatterplot(x='Param', y='value', data = dfRef.xs(pType,level=1).reset_index(), legend=False, marker = 'x', color='red', s=50)

[110]:

<AxesSubplot:xlabel='Param', ylabel='value'>

../_images/fitting_PEMtk_analysis_demo_150621-tidy_46_1.png

For the next-best candidates, set E, the spread in the parameters is small, but the values are a bit off from the reference case.

We can also plot all the data to get a better comparison…

[111]:

# With masking for best fits only
pType = 'm'
# mask = dfLong['redchi']<1e-5
# mask = dfLong['pGroups']=='E'

# g = sns.catplot(x='Param', y='value', hue = 'Type', data = dfLong[mask].reset_index(), kind='box')
# g = sns.catplot(x='Param', y='value', hue = 'redchi', data = dfLong[mask].xs('m',level=1).reset_index(), kind='box')  # TODO: fix cmapping here to allow sub-cat
g = sns.catplot(x='Param', y='value', hue = 'pGroups', data = dfLong.xs(pType,level=1).reset_index(), kind='box') # With mask
# g = sns.catplot(x='Param', y='value', row='pGroups', data = dfLong.xs(pType,level=1).reset_index(), kind='box')  # Rows
g.set_xticklabels(rotation=-60)

# Add ref case
# sns.scatterplot(x='Param', y='value', hue = 'Type', style = 'Type', data = dfRef.reset_index(), legend=False)
sns.scatterplot(x='Param', y='value', data = dfRef.xs(pType,level=1).reset_index(), legend=False, marker = 'x', color='red', s=50)

[111]:

<AxesSubplot:xlabel='Param', ylabel='value'>

../_images/fitting_PEMtk_analysis_demo_150621-tidy_48_1.png

This needs some work - but does show that the spread in the magnitudes is generally small… whilst those in the phases are large…

[113]:

# With masking for best fits only
pType = 'p'
# mask = dfLong['redchi']<1e-5
# mask = dfLong['pGroups']=='E'

# g = sns.catplot(x='Param', y='value', hue = 'Type', data = dfLong[mask].reset_index(), kind='box')
# g = sns.catplot(x='Param', y='value', hue = 'redchi', data = dfLong[mask].xs('m',level=1).reset_index(), kind='box')  # TODO: fix cmapping here to allow sub-cat
g = sns.catplot(x='Param', y='value', hue = 'pGroups', data = dfLong.xs(pType,level=1).reset_index(), kind='box') # With mask
# g = sns.catplot(x='Param', y='value', row='pGroups', data = dfLong.xs(pType,level=1).reset_index(), kind='box')  # Rows
g.set_xticklabels(rotation=-60)

# Add ref case
# sns.scatterplot(x='Param', y='value', hue = 'Type', style = 'Type', data = dfRef.reset_index(), legend=False)
sns.scatterplot(x='Param', y='value', data = dfRef.xs(pType,level=1).reset_index(), legend=False, marker = 'x', color='red')

[113]:

<AxesSubplot:xlabel='Param', ylabel='value'>

../_images/fitting_PEMtk_analysis_demo_150621-tidy_50_1.png

Set reference phase

In this case - with a randomised ref phase - to get the corrected (relative) fitted phase results, a reference needs to be set. (Note this can also be set for the initial fit, rather than during post-fit analysis - results should be identical.)

[136]:

# Phase correction/shift function

def phaseCorrection(dfWide, dfRef = None, refParam = None, wrapFlag = True):
    """
    Phase correction/shift/wrap function.

    Prototype from test code:

    - Assumes full Pandas tabulated wide-form dataset as input.
    - Supply dfRef to use reference phase (abs phase values), otherwise will be relative with refParam set to zero.
    - wrapFlag: wrap to -pi:pi range? Default True.

    """

    phasesIn = dfWide['value'].xs('p',level=1)  # Set phase data
    phaseCorr = phasesIn.copy()

#     print(refParam)

    if refParam is None:
        refParam = phasesIn.columns[0]  # Default ref phase

    refPhase = dfWide['value', refParam].xs('p',level=1)
#     print(refPhase)

    # For abs ref phase, set that too
#     absFlag = False
    if dfRef is not None:
        refPhase = refPhase - dfRef[dfRef['Param'] == refParam].xs('p',level=1)['value'].item()
#         absFlag = True

    print(f"Set ref param = {refParam}")

    # Substract (shift) by refPhase
    phaseCorr = phaseCorr.subtract(refPhase, axis='index')  # Subtract ref phase, might be messing up sign here?

    # Rectify phases...? Defined here for -pi:pi range.
    if wrapFlag:
        # phaseCorrRec = phaseCorr.apply(lambda x: np.sign(x)*np.mod(np.abs(x),np.pi)) # This will wrap towards zero, should be OK for zero ref phase case.

        # Use arctan, defined for -pi:pi range
        return np.arctan2(np.sin(phaseCorr), np.cos(phaseCorr))


    else:
        return phaseCorr

This defaults to setting the first phase term as the reference…

[134]:

phaseCorrection(dfWide)

Set ref param = PU_SG_PU_1_1_n1

[134]:

	Param	PU_SG_PU_1_1_n1	PU_SG_PU_1_n1_1	PU_SG_PU_3_1_n1	PU_SG_PU_3_n1_1	SU_SG_SU_1_0_0	SU_SG_SU_3_0_0
Fit	pGroups
0	E	0.0	0.0	2.385985	2.385985	2.295923	6.179950
1	G	0.0	0.0	-3.967418	-3.967418	-2.772066	-5.963522
2	G	0.0	0.0	3.967417	3.967417	2.772063	-0.319630
3	E	0.0	0.0	-3.897202	-3.897202	-3.987262	-0.103235
4	G	0.0	0.0	-2.315762	-2.315762	-3.511114	-0.319602
...	...	...	...	...	...	...	...
94	E	0.0	0.0	-2.385983	-2.385983	-2.295923	-6.179951
95	E	0.0	0.0	2.385983	2.385983	2.295923	6.179950
96	E	0.0	0.0	3.897201	3.897201	3.987262	0.103235
97	E	0.0	0.0	-2.385984	-2.385984	-2.295923	0.103235
98	G	0.0	0.0	-2.315761	-2.315761	-3.511111	-0.319613

91 rows × 6 columns

In general, one would set the ref. phase to zero but, for a known test case, we can also set it to the known input (reference) phase.

[132]:

phaseCorrection(dfWide, dfRef = dfRef, refParam = 'PU_SG_PU_1_n1_1')

Set ref param = PU_SG_PU_1_n1_1

[132]:

	Param	PU_SG_PU_1_1_n1	PU_SG_PU_1_n1_1	PU_SG_PU_3_1_n1	PU_SG_PU_3_n1_1	SU_SG_SU_1_0_0	SU_SG_SU_3_0_0
Fit	pGroups
0	E	-0.861041	-0.861041	1.524943	1.524943	1.434882	5.318908
1	G	-0.861041	-0.861041	-4.828460	-4.828460	-3.633107	-6.824564
2	G	-0.861041	-0.861041	3.106375	3.106375	1.911022	-1.180671
3	E	-0.861041	-0.861041	-4.758243	-4.758243	-4.848303	-0.964276
4	G	-0.861041	-0.861041	-3.176804	-3.176804	-4.372156	-1.180644
...	...	...	...	...	...	...	...
94	E	-0.861041	-0.861041	-3.247025	-3.247025	-3.156964	-7.040992
95	E	-0.861041	-0.861041	1.524942	1.524942	1.434882	5.318909
96	E	-0.861041	-0.861041	3.036160	3.036160	3.126221	-0.757807
97	E	-0.861041	-0.861041	-3.247025	-3.247025	-3.156964	-0.757807
98	G	-0.861041	-0.861041	-3.176802	-3.176802	-4.372153	-1.180654

91 rows × 6 columns

[141]:

# Full phase spread, without wrapping
phaseCorr = phaseCorrection(dfWide, wrapFlag = False)

pType = 'p'

# g = sns.catplot(x='Param', y='value', data = phaseCorr.reset_index().melt(id_vars=['Fit']), kind='box')  # .melt() to force long-format
# g = sns.catplot(x='Param', y='value', data = phaseCorr.reset_index().melt(id_vars=['Fit','pGroups']), kind='box')  # For pGroups case
# g = sns.catplot(x='Param', y='value', hue = 'pGroups', data = phaseCorr.reset_index().melt(id_vars=['Fit','pGroups']), kind='box')  # For pGroups case
g = sns.catplot(x='Param', y='value', hue = 'pGroups', data = phaseCorr.reset_index().melt(id_vars=['Fit','pGroups']))  # pGroups + scatter plot - this shows groupings better
g.set_xticklabels(rotation=-60)

# Add ref case - note this may rescale y-axis as set
# sns.scatterplot(x='Param', y='value', hue = 'Type', style = 'Type', data = dfRef.reset_index(), legend=False)
sns.scatterplot(x='Param', y='value', data = dfRef.xs(pType,level=1).reset_index(), legend=False, marker = 'x', color='red', s=100)  #, size = 10)

# Force limits
padding = 0.2
mult = 3
plt.ylim(mult*(-np.pi-padding), mult*(np.pi+padding))

Set ref param = PU_SG_PU_1_1_n1

[141]:

(-10.02477796076938, 10.02477796076938)

../_images/fitting_PEMtk_analysis_demo_150621-tidy_57_2.png

[142]:

# Full phase spread, WITH WRAPPING (default case)
phaseCorr = phaseCorrection(dfWide, wrapFlag = True)

pType = 'p'

# g = sns.catplot(x='Param', y='value', data = phaseCorr.reset_index().melt(id_vars=['Fit']), kind='box')  # .melt() to force long-format
# g = sns.catplot(x='Param', y='value', data = phaseCorr.reset_index().melt(id_vars=['Fit','pGroups']), kind='box')  # For pGroups case
# g = sns.catplot(x='Param', y='value', hue = 'pGroups', data = phaseCorr.reset_index().melt(id_vars=['Fit','pGroups']), kind='box')  # For pGroups case
g = sns.catplot(x='Param', y='value', hue = 'pGroups', data = phaseCorr.reset_index().melt(id_vars=['Fit','pGroups']))  # pGroups + scatter plot - this shows groupings better
g.set_xticklabels(rotation=-60)

# Add ref case - note this may rescale y-axis as set
# sns.scatterplot(x='Param', y='value', hue = 'Type', style = 'Type', data = dfRef.reset_index(), legend=False)
sns.scatterplot(x='Param', y='value', data = dfRef.xs(pType,level=1).reset_index(), legend=False, marker = 'x', color='red', s=100)  #, size = 10)

# Force limits
padding = 0.2
mult = 3
plt.ylim(mult*(-np.pi-padding), mult*(np.pi+padding))

Set ref param = PU_SG_PU_1_1_n1

[142]:

(-10.02477796076938, 10.02477796076938)

../_images/fitting_PEMtk_analysis_demo_150621-tidy_58_2.png

[144]:

# Full phase spread, WITH WRAPPING (default case) + set ref phase from known value
phaseCorr = phaseCorrection(dfWide, dfRef, wrapFlag = True)

pType = 'p'

# g = sns.catplot(x='Param', y='value', data = phaseCorr.reset_index().melt(id_vars=['Fit']), kind='box')  # .melt() to force long-format
# g = sns.catplot(x='Param', y='value', data = phaseCorr.reset_index().melt(id_vars=['Fit','pGroups']), kind='box')  # For pGroups case
# g = sns.catplot(x='Param', y='value', hue = 'pGroups', data = phaseCorr.reset_index().melt(id_vars=['Fit','pGroups']), kind='box')  # For pGroups case
g = sns.catplot(x='Param', y='value', hue = 'pGroups', data = phaseCorr.reset_index().melt(id_vars=['Fit','pGroups']))  # pGroups + scatter plot - this shows groupings better
g.set_xticklabels(rotation=-60)

# Add ref case - note this may rescale y-axis as set
# sns.scatterplot(x='Param', y='value', hue = 'Type', style = 'Type', data = dfRef.reset_index(), legend=False)
sns.scatterplot(x='Param', y='value', data = dfRef.xs(pType,level=1).reset_index(), legend=False, marker = 'x', color='red', s=100)  #, size = 10)

# Force limits
padding = 0.2
mult = 1
plt.ylim(mult*(-np.pi-padding), mult*(np.pi+padding))

Set ref param = PU_SG_PU_1_1_n1

[144]:

(-3.3415926535897933, 3.3415926535897933)

../_images/fitting_PEMtk_analysis_demo_150621-tidy_59_2.png

Here it looks like we have - roughly - sets of phase groups, symmetric relative to the ref. cases…

What’s going on…?

Issue with rectification?
Insensitivity to phases? (Periodicity in \(\chi^2\).)
Unsigned phases? (Periodicity in \(\chi^2\).)
Not enough sampling?

Let’s look at the correlations to see if they show anything interesting…

Looking in more detail, for set A only…

[152]:

# Full phase spread, WITH WRAPPING (default case) + set ref phase from known value
phaseCorr = phaseCorrection(dfWide, dfRef, wrapFlag = True)

pType = 'p'

# g = sns.catplot(x='Param', y='value', data = phaseCorr.reset_index().melt(id_vars=['Fit']), kind='box')  # .melt() to force long-format
# g = sns.catplot(x='Param', y='value', data = phaseCorr.reset_index().melt(id_vars=['Fit','pGroups']), kind='box')  # For pGroups case
# g = sns.catplot(x='Param', y='value', hue = 'pGroups', data = phaseCorr.reset_index().melt(id_vars=['Fit','pGroups']), kind='box')  # For pGroups case
g = sns.catplot(x='Param', y='value', hue = 'Fit', data = phaseCorr.xs('A', level=1).reset_index().melt(id_vars=['Fit']))  # pGroups + scatter plot - this shows groupings better
g.set_xticklabels(rotation=-60)

# Add ref case - note this may rescale y-axis as set
# sns.scatterplot(x='Param', y='value', hue = 'Type', style = 'Type', data = dfRef.reset_index(), legend=False)
sns.scatterplot(x='Param', y='value', data = dfRef.xs(pType,level=1).reset_index(), legend=False, marker = 'x', color='red', s=100)  #, size = 10)

# Force limits
padding = 0.2
mult = 1
plt.ylim(mult*(-np.pi-padding), mult*(np.pi+padding))

Set ref param = PU_SG_PU_1_1_n1

[152]:

(-3.3415926535897933, 3.3415926535897933)

../_images/fitting_PEMtk_analysis_demo_150621-tidy_62_2.png

This looks promising - there seem to be two correlated sets of solutions.

We can also look at the pair plots to see the solution set correlations in more detail:

[147]:

sns.pairplot(phaseCorr.xs('A', level=1).reset_index())  #, hue='Fit')

[147]:

<seaborn.axisgrid.PairGrid at 0x20484467cc0>

../_images/fitting_PEMtk_analysis_demo_150621-tidy_64_1.png

[153]:

# Full set
# phaseCorr.xs('A', level=1)  #['PU_SG_PU_3_1_n1']

# Inspect unique value sets
# phaseCorrRec.xs('A', level=1).apply(np.round)  #.drop_duplicates()
phaseCorr.xs('A', level=1).apply(lambda x: np.round(x,4)).drop_duplicates()  # Get 7 sets returned down to 4 d.p. Fit 12 matches ref. values

[153]:

Param	PU_SG_PU_1_1_n1	PU_SG_PU_1_n1_1	PU_SG_PU_3_1_n1	PU_SG_PU_3_n1_1	SU_SG_SU_1_0_0	SU_SG_SU_3_0_0
Fit
8	-0.861	-0.861	-3.1204	-3.1204	2.6112	-0.0787
9	-0.861	-0.861	1.3984	1.3984	1.9499	-1.6434
64	-0.861	-0.861	1.3957	1.3957	1.9604	-1.6421

This now starts to make sense… the results are showing pairs of best solutions in phase space. This is actually a good indication that the fitting routine is probing the full solution space, and things are working properly:

As expected, the PU_SG_PU_1* variables are correlated, since this was defined for the fitting, and they take only a single value, since they were set as the reference case.
The PU_SG_PU_3* pair show mirror symmetry about the reference pair, with a +ve and -ve solution found.
There are three sets of SU_SG_SU* solutions, correlated with the +/- solutions, although two of the “unique” solutions are very close, and can be essentially regarded as defining the uncertainty in this case.
The SU_SG_SU_1_0_0 solution pair seems to break the mirror symmetry (about the reference phase), but this is simply due to the wrapping of the -ve solution.

Why do we see this? For a cylindrically symmetric problem the sign of the phases is not defined, hence symmetric phase groups \(\hat{\phi_{\Gamma}} = \pm|\phi_{ref}\pm\phi_{\Gamma}|\) are therefore expected in this case, and these will produce identical results.

So, in this case, we find two good (valid) solution sets from an intial trial run of 100 fits, using 10 data points.

For this test case, it was easy to find these, since the \(\chi^2\) values were significantly lower than other candidate sets, but in general this may not be the case, and additional tests might be required, e.g. adding additional data points or testing via independent measurements. This will be explored further in the bootstrapping notebook (to follow), along with effects of noise etc.

Versions

[154]:

import scooby
scooby.Report(additional=['epsproc', 'pemtk', 'xarray', 'jupyter'])

[154]:

Thu Jun 24 11:03:34 2021 Eastern Daylight Time
OS	Windows	CPU(s)	32	Machine	AMD64
Architecture	64bit	RAM	63.9 GB	Environment	Jupyter
Python 3.7.3 (default, Apr 24 2019, 15:29:51) [MSC v.1915 64 bit (AMD64)]
epsproc	1.3.0-dev	pemtk	0.0.1	xarray	0.15.0
jupyter	Version unknown	numpy	1.19.2	scipy	1.3.0
IPython	7.12.0	matplotlib	3.3.1	scooby	0.5.6
Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191125 for Intel(R) 64 architecture applications

[155]:

# Check current Git commit for local ePSproc version
!git -C {Path(ep.__file__).parent} branch
!git -C {Path(ep.__file__).parent} log --format="%H" -n 1

* dev
  master
  numba-tests
da12376cc36f640d8974f5ce2c121be3d391caab

[156]:

# Check current remote commits
!git ls-remote --heads git://github.com/phockett/ePSproc
# !git ls-remote --heads git://github.com/phockett/epsman

16cfad26e658b740f267baa89d1550336b0134bf        refs/heads/dev
82d12cf35b19882d4e9a2cde3d4009fe679cfaee        refs/heads/master
69cd89ce5bc0ad6d465a4bd8df6fba15d3fd1aee        refs/heads/numba-tests
ea30878c842f09d525fbf39fa269fa2302a13b57        refs/heads/revert-9-master