Datasets module
The tabular dataset is created using the particle physics simulation tools Pythia 8.2 and Delphes 3.5.0. The proton-proton collision events are generated with a center of mass energy of 13 TeV using Pythia8. Subsequently, these events undergo the Delphes tool to produce simulated detector measurements. We used an ATLAS-like detector description to make the dataset closer to experimental data. The events are divided into four groups:
Higgs boson signal (\(H \rightarrow \tau \tau\))
\(Z\) boson background (\(Z \rightarrow \tau \tau\))
Diboson background (\(VV \rightarrow \tau \tau\))
\(t\bar{t}\) background (\(t \bar{t}\))
By default the repo has a sample dataset. To get the bigger Public dataset, .. code-block:: python3
from datasets import Neurips2024_public_dataset as public_dataset data = public_dataset()
This code is already included in the starting kit and you can run it to get the public dataset.
- class datasets.Data(input_dir)
Bases:
object
A class to represent a dataset.
- Parameters:
input_dir (str): The directory path of the input data.
- Attributes:
__train_set (dict): A dictionary containing the train dataset.
__test_set (dict): A dictionary containing the test dataset.
input_dir (str): The directory path of the input data.
- Methods:
load_train_set(): Loads the train dataset.
load_test_set(): Loads the test dataset.
get_train_set(): Returns the train dataset.
get_test_set(): Returns the test dataset.
delete_train_set(): Deletes the train dataset.
get_syst_train_set(): Returns the train dataset with systematic variations.
- delete_train_set()
Deletes the train dataset.
- get_syst_train_set(tes=1.0, jes=1.0, soft_met=0.0, ttbar_scale=None, diboson_scale=None, bkg_scale=None, dopostprocess=False)
- get_test_set()
Returns the test dataset.
- Returns:
dict: The test dataset.
- get_train_set()
Returns the train dataset.
- Returns:
dict: The train dataset.
- load_test_set()
- load_train_set(sample_size=None, selected_indices=None)
- datasets.Neurips2024_public_dataset()
Downloads and extracts the Neurips 2024 public dataset.
- Returns:
Data: The path to the extracted input data.
- Raises:
HTTPError: If there is an error while downloading the dataset. FileNotFoundError: If the downloaded dataset file is not found. zipfile.BadZipFile: If the downloaded file is not a valid zip file.
For more details on Data, see the data page.