Data reduction and evaluation workflow#

This notebook is based on a work in progress project, called famewoks, that may be affected by bugs
famewoks is an open-source Python code publicly available:
- https://gitlab.esrf.fr/F-CRG/fames/famewoks (repository)
- https://famewoks.readthedocs.io (documentation)
Subscribe to the `fame-data-analysis@esrf.fr mailing-list <https://sympa.esrf.fr/sympa/info/fame-data-analysis>`__ to be kept updated about bug fixes and new features
Report bugs and features requests in the famewoks issues tracker or by directly sending an email to mauro.rovezzi@esrf.fr
To run this notebook:
- BM16 and BM30: use Visual Studio Code and the sloth (2507) kernel
- On jupyter-slurm follow the ESRF installation

Main imports and global variables#

NOTE to restart the notebook, you need to restart the underlying kernel process. Simply closing the notebook window/tab does not restart the underlying kernel process.

[ ]:

%load_ext autoreload
%autoreload 2

# Uncomment the following two lines if the plots do not show in the notebook
# import plotly.io as pio
# pio.renderers.default = "iframe"

from famewoks import __version__ as wkflver
from famewoks import __date__ as wkfldate
from famewoks.models import ExpSession, ExpCounters
from famewoks.plots import plot_data, plot_eshift
from famewoks.bliss2larch import (
    get_group,
    set_enealign,
    apply_eshift,
    set_eshift,
    search_samples,
    show_samples_info,
    search_datasets,
    load_data,
    set_bad_fluo_channels,
    set_bad_scans,
    set_bad_samples,
    merge_data,
    rebin_data,
    save_data,
)
from famewoks.bliss2larch import _logger


# adjust the logger level:
# "DEBUG" -> show all messages
# "INFO" -> useful messages
# "WARNING" -> warnings only
# "ERROR" -> only errors
_logger.setLevel("INFO")

#show workflow version
_logger.info(f"--> Using famewoks version: {wkflver} [{wkfldate}]")

Initialize the ExpSession object, which is the representation of the whole experimental session. The three required fields are:

datadir: the directory where the data is stored (stop at the session level)
counters: the counters names mapped in the ExpCounters object
rebin_pars: the rebinning parameters (TIP: use Larix once for defining them visually)

[ ]:

# DEFINE the counters names
MYCNTS_BM16 = ExpCounters(
    ene="energy_enc",
    ix=["I0", "I1", "I2"],  #: I0, I1, I2
    #fluo_roi=["xpad_roi1"],  # all detector names for ROI1
    #fluo_corr=["xpad_roi1"],  # all detector names (DT corrected)
    fluo_roi=["mercury4_det2_roi1"],  # all detector names for ROI1
    fluo_corr=["mercury4_det2_roi1"],  # all detector names (DT corrected)
    fluo_time=["sec"],  # elapsed time, which is different for the spikes
    time="sec",  # "musst_timer"
    ref_fluo=False,  #: use referece as I2/I1
    independent_fluo_channels=True,  #: use fluorescence channels as independent detectors *TODO*: fix it
)

MYCNTS_BM30 = ExpCounters(
    ene="energy_enc",
    ix=["I0", "I1", "I2"],  #: I0, I1, I2
    fluo_roi=[
        #before april 2026 was: "roi1_detXX",
        "xglab_det0_roi1",
        "xglab_det1_roi1",
        "xglab_det2_roi1",
        "xglab_det3_roi1",
        "xglab_det4_roi1",
        "xglab_det5_roi1",
        "xglab_det6_roi1",
        "xglab_det8_roi1",
        "xglab_det9_roi1",
        "xglab_det10_roi1",
        "xglab_det11_roi1",
        "xglab_det12_roi1",
        "xglab_det13_roi1",
        "xglab_det14_roi1",
    ],  # all detector names for ROI1
    fluo_corr=[
        #NOTE: before march 2026 shutdown was: roi1_corr_detXX
        "roi1_det00_corr",
        "roi1_det01_corr",
        "roi1_det02_corr",
        "roi1_det03_corr",
        "roi1_det04_corr",
        "roi1_det05_corr",
        "roi1_det06_corr",
        "roi1_det08_corr",
        "roi1_det09_corr",
        "roi1_det10_corr",
        "roi1_det11_corr",
        "roi1_det12_corr",
        "roi1_det13_corr",
        "roi1_det14_corr",
    ],  # all detector names (DT corrected)
    # fluo_roi = [f"roi1_det{num:02d}" for num in range(16)],
    # fluo_corr = [f"roi1_det_corr{num:02d}" for num in range(16)],
    fluo_time=[
        "sec" for n in range(14)
    ],  # elapsed time, which is different for the spikes
    time="sec",  # "musst_timer"
    ref_fluo=False,  #: use referece as I2/I1
)

# INIT the experimental session object
session = ExpSession(
    datadir="/data/visitor/ch7994/bm30/20260422",
    counters=MYCNTS_BM30,
    rebin_pars={
        "e0": None,
        "pre1": -35,
        "pre2": -15,
        "pre_step": 2,
        "xanes_step": 0.3,
        "exafs1": 25,
        "exafs2": None,
        "exafs_kstep": 0.05,
        "method": "boxcar",
    },
)

# to display the session metadata
# display(asdict(session))

Assign a reference spectrum for the energy calibration#

In order to use the calc_eshift=True option in the load_data() function, you need to select a sample and a dataset containing the reference spectrum to be used for calibrating the energy to. This operation is done only once:

select a sample and a dataset containing the reference spectrum to be used for calibrating the energy to
load the data without calculating the energy shift (calc_eshift=False)
get the energy reference group and assign it to session

NOTE: in case you do not know the sample and dataset at this stage, use the `Users workflow section below to search/load/plot the datasets.

[ ]:

samples = search_samples(session, ignore_names=[], verbose=False)
sel_sample = 'bl_align'
datasets = search_datasets(session, sample=sel_sample, verbose=False)
sel_dataset = 0
dataset = datasets[sel_dataset]
load_data(
    session,
    sel_sample,
    sel_dataset,
    use_fluo_corr=False,
    iskip=1, #: ignore the first point
    istrip=1, #: ignore the last point
    calc_eshift=False,
    merge=False,
    skip_scans=[],
)
#set the reference
energy_reference_group = get_group(dataset, scanint=26, data="ref")
set_enealign(session, energy_reference_group)

Users workflow#

A minimal/typical workflow for the users consists of:

search_samples
select a sample
search_datasets
select a dataset
load_data
plot_data
remove bad channels/scans
check the energy shifts
save_data
import the Athena project in Larix and continue your data analysis workflow there

Search and select samples and datasets#

Search for the samples names available in the given experimental session. It is possible to use the parameter ignore_names = ["list", "of", "strings"] to ignore those samples names containing such words.

[ ]:

samples = search_samples(session, ignore_names=[])

[ ]:

sel_sample = 8
datasets = search_datasets(session, sample=sel_sample)

[ ]:

sel_dataset = 5
dataset = datasets[sel_dataset]

Load data#

Load the data for a given sample/dataset into the session (e.g. read data from the HDF5 files on disk into memory).

Parameters for ``load_data()``

skip_scans: the scans that are not going to be loaded (e.g. bad scans), it can be a list of numbers [1,2,3] or a string "1:4, 7"
use_fluo_corr: if True, it uses the dead-time corrected fluorescence channel (NOTE this correction usually fails at low count rates, check with/without correction so see which is the lower noise configuration)
iskip: the index of the initial data points to skip (None)
istrip: the relative index with respect to the last data points to strip (None)
merge: to automatically merge the scans in a dataset (True)
calc_eshift: fit the energy shift using the first scan of the dataset as reference (NOTE this slows down the loading)

[ ]:

load_data(
    session,
    sel_sample,
    sel_dataset,
    use_fluo_corr=True,
    iskip=1, #: ignore the first point
    istrip=1, #: ignore the last point
    calc_eshift=True,
    merge=True,
    skip_scans=[],
)

Plot the data#

Plot the data for a given loaded dataset

Parameters for ``plot_data()``

data can be:
- "fluos": to show all fluorescence channels
- "fluo": sum of active fluorescence channels (use set_bad_fluo_channels() for excluding bad ones)
- "trans": sample transmission (muT1)
- "ref": reference “foil” transmission (muT2)
- None: shows only I0
ynorm: None, area (shows y data normalized by their area), flat (show flattened) or True (show normalized)
show_slide: if True shows one scan at time with a slider
show_i0: True shows I0 signal (NOTE for data = "ref" it is I1 signal)
show_e0: True shows E0 (as found by the pre_edge() function of Larch)
show_deriv: True shows the derivative of the signal
show_merge: True shows the merged signal (sum of the channels for the current scan)
- if "rebin" it shows the rebinned version of the merge (NOTE the single scans are never rebinned, as they are meant to be merged (and then rebinned))

[ ]:

fig = plot_data(
    dataset,
    data="fluo",
    ynorm="area",
    show_slide=False,
    show_i0=False,
    show_e0=False,
    show_deriv=False,
    show_merge="rebin",
)

Data cleaning#

This section describes how to flag (set the flag attribute to 0=bad, 1=good) samples/datasets/scans/fluorescence_channels.

NOTES:

in all the following functions, by setting None means “set all good”;
the variables can be a list of integers or a string representing a list that is interpreted (always put spaces after commas);

To skip some samples

[ ]:

set_bad_samples(session, [15])

to show all the samples, including those marked as bad (flag = 0)

[ ]:

show_samples_info(session, all=True)

To exclude bad scans

[ ]:

set_bad_scans(session, sel_sample, sel_dataset, scans="9:12, 15, 16")

To enable back all scans:

[ ]:

set_bad_scans(session, sel_sample, sel_dataset, scans=None) #: all marked as good

To exclude bad fluorescence channels. If scans=None it will exclude the channels for all scans in the dataset

[ ]:

set_bad_fluo_channels(session, sel_sample, sel_dataset, channels=[1,2,3], scans=[1,2,3])

To enable back all channels on all scans:

[ ]:

set_bad_fluo_channels(session, sel_sample, sel_dataset, channels=None, scans=None)

Check the energy shifts#

When loading the data with the option calc_eshift=True, the energy shift is calculated and reported as info. Nevertheless, it is important to to check that this shift found is correct, before appying it. This can be done with the function plot_eshift().

[ ]:

efig = plot_eshift(session=session, dset=dataset, show_e0=True, array="dmude")

it is also possible to manually adjust the shifts by acting on the variable scans_eshifts and replotting to check that the shifts are correct

[ ]:

#dataset.scans_eshifts = [shift + 1.9 for shift in dataset.scans_eshifts]
#dataset.scans_eshifts[8] += -1

once you are happy with the shifts, you can apply them to the data channel of choice with the function apply_eshift(). NOTE: the data channel to apply the energy shift should be specified, e.g. data = ["fluo", "ref", "trans"].

[ ]:

apply_eshift(dataset, data='ref')
apply_eshift(dataset, data='fluo')

Save/export the data#

Save the data to an Athena project file (NOTE: the files are overwritten each time). To save the rebinned spectra, use the option save_rebinned=True. The data channel should be specified, e.g. data = ["fluo", "ref", "trans"]. An Athena project for each data channel is created. If you want to change the scans saved in the Athena project, simply use the set_bad_scans() function (first enable all scans and then select those to export, see example below).

By default the output filename is : {dataset name}_{time stamp}_{data}_{suffix}.prj. You can use the variable suffix to add your own string.

[ ]:

#mydatadir = None #use this for testing, to save into a temporary directory `/tmp/PROCESSED_DATA/famewoks`
mydatadir = session.datadir #use this to save into `PROCESSED_DATA/famewoks`
save_data(dataset, data=["fluo"], datadir=mydatadir, save_rebinned=True)

[ ]: