Data
The tabular dataset is created using the particle physics simulation tools Pythia 8.2 and Delphes 3.5.0. The proton-proton collision events are generated with a center of mass energy of 13 TeV using Pythia8. Subsequently, these events undergo the Delphes tool to produce simulated detector measurements. We used an ATLAS-like detector description to make the dataset closer to experimental data. The events are divided into two groups:
Higgs boson signal (\(H \rightarrow \tau \tau\))
\(Z\) boson background (\(Z \rightarrow \tau \tau\))
Diboson background (\( VV \rightarrow \tau \tau\))
ttbar background (\(t \bar{t}\))
Higgs Signal:
The Higgs bosons are produced with all possible production modes and decay into two tau leptons. The tau leptons are further allowed to decay into all possible final states, but only final state with one lepton (electron or muon) and one hadron tau decay are kept.
Z boson Background:
Only background events coming from \(Z\) bosons are included in this challenge. While simulating the process, interference effects between \(Z\) bosons and photons are included. Similar to signal events, only the tau-tau decay mode of the \(Z\) boson is included in the dataset.
⚠️ Note:
The training events have weights.
Event Weights:
Event weights are defined as:
\( w = \frac{\textrm{Cross-Section} ~ \times ~ \textrm{Luminosity}}{\textrm{Total number of generated events}} \)
The challenge is considering a scenario of analyzing proton-proton collision data of \(10 ~\textrm{fb} ^{-1}\) luminosity collected by the ATLAS experiment.
Features in the data
Prefix-less variables
Weight, Label,DetailedLabel, have a special role and should NOT be used as regular features for the model:
Variable |
Description |
---|---|
Weight |
The event weight \(w_i\). |
Label |
The event label \(y_i \in \{1,0\}\) (1 for signal, 0 for background). |
Detailed Label |
The event detailed label \(\in\{ htautau, ztautau, diboson, ttbar \}\) |
Primary Features
The variables prefixed with PRI (for PRImitives) are “raw” quantities about the bunch collision as measured by the detector, essentially parameters of the momenta of particles.
Variable |
Description |
---|---|
PRI_had_pt |
The transverse momentum \(\sqrt{{p_x}^2 + {p_y}^2}\) of the hadronic tau. |
PRI_had_eta |
The pseudorapidity \(\eta\) of the hadronic tau. |
PRI_had_phi |
The azimuth angle \(\phi\) of the hadronic tau. |
PRI_lep_pt |
The transverse momentum \(\sqrt{{p_x}^2 + {p_y}^2}\) of the lepton (electron or muon). |
PRI_lep_eta |
The pseudorapidity \(\eta\) of the lepton. |
PRI_lep_phi |
The azimuth angle \(\phi\) of the lepton. |
PRI_met |
The missing transverse energy \({E}^{miss}_{T}\). |
PRI_met_phi |
The azimuth angle \(\phi\) of the missing transverse energy. |
PRI_jet_num |
The number of jets (integer with a value of 0, 1, 2 or 3; possible larger values have been capped at 3). |
PRI_jet_leading_pt |
The transverse momentum \(\sqrt{{p_x}^2 + {p_y}^2}\) of the leading jet, that is the jet with the largest transverse momentum (undefined if PRI_jet_num = 0). |
PRI_jet_leading_eta |
The pseudorapidity \(\eta\) of the leading jet (undefined if PRI_jet_num = 0). |
PRI_jet_leading_phi |
The azimuth angle \(\phi\) of the leading jet (undefined if PRI_jet_num = 0). |
PRI_jet_subleading_pt |
The transverse momentum \(\sqrt{{p_x}^2 + {p_y}^2}\) of the leading jet, that is, the jet with the second largest transverse momentum (undefined if PRI_jet_num ≤ 1). |
PRI_jet_subleading_eta |
The pseudorapidity \(\eta\) of the subleading jet (undefined if PRI_jet_num ≤ 1). |
PRI_jet_subleading_phi |
The azimuth angle \(\phi\) of the subleading jet (undefined if PRI_jet_num ≤ 1). |
PRI_jet_all_pt |
The scalar sum of the transverse momentum of all the jets of the events. |
Derived Features
These variables are derived from the primary varibales with the help of derived_quantities.py
. When the test sets are made they inherently have derived quantities. (train set doesnt have derived quantities as they change after systematics is applied. Hence partipants are adviced to use systematics function for this.)
Variable |
Description |
---|---|
DER_mass_transverse_met_lep |
The transverse mass between the missing transverse energy and the lepton. |
DER_mass_vis |
The invariant mass of the hadronic tau and the lepton. |
DER_pt_h |
The modulus of the vector sum of the transverse momentum of the hadronic tau, the lepton and the missing transverse energy vector. |
DER_deltaeta_jet_jet |
The absolute value of the pseudorapidity separation between the two jets (undefined if PRI_jet_num ≤ 1). |
DER_mass_jet_jet |
The invariant mass of the two jets (undefined if PRI_jet_num ≤ 1). |
DER_prodeta_jet_jet |
The product of the pseudorapidities of the two jets (undefined if PRI_jet_num ≤ 1). |
DER_deltar_had_lep |
The R separation between the hadronic tau and the lepton. |
DER_pt_tot |
The modulus of the vector sum of the missing transverse momenta and the transverse momenta of the hadronic tau, the lepton, the leading jet (if PRI_jet_num ≥ 1) and the subleading jet (if PRI_jet_num = 2) (but not of any additional jets). |
DER_sum_pt |
The sum of the moduli of the transverse momenta of the hadronic tau, the lepton, the leading jet (if PRI_jet_num ≥ 1) and the subleading jet (if PRI_jet_num = 2) and the other jets (if PRI_jet_num = 3). |
DER_pt_ratio_lep_tau |
The ratio of the transverse momenta of the lepton and the hadronic tau. |
DER_met_phi_centrality |
The centrality of the azimuthal angle of the missing transverse energy vector w.r.t. the hadronic tau and the lepton. |
DER_lep_eta_centrality |
The centrality of the pseudorapidity of the lepton w.r.t. the two jets (undefined if PRI_jet_num ≤ 1). |
Preselection Cuts
Criteria |
Pre-selected cut |
Post selection cut |
---|---|---|
Number of \(\tau_{had}\) |
1 |
|
Number of \(\tau_{lep}\) |
1 |
|
\(p_T \tau_{had}\) |
> 20GeV |
> 26GeV |
\(p_T \tau_{lep}\) |
> 20GeV |
> 20GeV |
\(p_T leading jet\) |
> 20GeV |
> 26GeV |
\(p_T subleading jet\) |
> 20GeV |
> 26GeV |
Charege |
Opposite Charges |
⚠️ Note: The Post selection cuts are the cuts made after systematics is applied.
How to get Public Data?
Download the Neurips_Public_data_26_08_2024 (6.5 GB)
or use the following command to download using terminal
wget -O public_data.zip https://www.codabench.org/datasets/download/b9e59d0a-4db3-4da4-b1f8-3f609d1835b2/