Datasets module

The tabular dataset is created using the particle physics simulation tools Pythia 8.2 and Delphes 3.5.0. The proton-proton collision events are generated with a center of mass energy of 13 TeV using Pythia8. Subsequently, these events undergo the Delphes tool to produce simulated detector measurements. We used an ATLAS-like detector description to make the dataset closer to experimental data. The events are divided into four groups:

  1. Higgs boson signal (\(H \rightarrow \tau \tau\))

  2. \(Z\) boson background (\(Z \rightarrow \tau \tau\))

  3. Diboson background (\(VV \rightarrow \tau \tau\))

  4. \(t\bar{t}\) background (\(t \bar{t}\))

By default the repo has a sample dataset. To get the bigger Public dataset, .. code-block:: python3

from datasets import Neurips2024_public_dataset as public_dataset data = public_dataset()

This code is already included in the starting kit and you can run it to get the public dataset.

class datasets.Data(input_dir)

Bases: object

A class to represent a dataset.

Parameters:
  • input_dir (str): The directory path of the input data.

Attributes:
  • __train_set (dict): A dictionary containing the train dataset.

  • __test_set (dict): A dictionary containing the test dataset.

  • input_dir (str): The directory path of the input data.

Methods:
  • load_train_set(): Loads the train dataset.

  • load_test_set(): Loads the test dataset.

  • get_train_set(): Returns the train dataset.

  • get_test_set(): Returns the test dataset.

  • delete_train_set(): Deletes the train dataset.

  • get_syst_train_set(): Returns the train dataset with systematic variations.

delete_train_set()

Deletes the train dataset.

get_syst_train_set(tes=1.0, jes=1.0, soft_met=0.0, ttbar_scale=None, diboson_scale=None, bkg_scale=None, dopostprocess=False)
get_test_set()

Returns the test dataset.

Returns:

dict: The test dataset.

get_train_set()

Returns the train dataset.

Returns:

dict: The train dataset.

load_test_set()
load_train_set(sample_size=None, selected_indices=None)
datasets.Neurips2024_public_dataset()

Downloads and extracts the Neurips 2024 public dataset.

Returns:

Data: The path to the extracted input data.

Raises:

HTTPError: If there is an error while downloading the dataset. FileNotFoundError: If the downloaded dataset file is not found. zipfile.BadZipFile: If the downloaded file is not a valid zip file.

For more details on Data, see the data page.