FAQ
Answers to common questions about the platform, metrics, and how to participate. If your question is not answered here, open a challenge proposal or reach out via the Community Proposals workflow.
Metric Overview
The platform uses different metric families depending on the forecast objective of the challenge. The sections below group the metric explanations by forecast type so you can jump directly to the metrics that matter for your submissions.
Why are MAPE and MASE not shown in the forecast error chart?
MAPE (Mean Absolute Percentage Error) and MASE (Mean Absolute Scaled Error) are widely used in the forecasting literature but have known shortcomings.
MAPE is undefined or unstable when truth values are near zero, which is a frequent issue for solar or low-demand periods. MASE requires a baseline forecast to normalise against, which makes it sensitive to implementation choices and harder to compare across platforms.
Both metrics are still computed internally and available via the raw leaderboard API, but they are excluded from the default chart view to avoid misleading comparisons. The default ranking focuses on RMSE, MAE, WIS, CRPS, Energy Score, and Variogram Score depending on the challenge type.
Point Forecast Metrics
What is Root Mean Squared Error (RMSE)?
RMSE applies a square root to the mean of squared errors.
Lower is better. RMSE is the primary point-forecast metric on this platform and is aligned with mean-oriented forecasts. Because RMSE is in the same unit as the target variable, it is directly comparable to MAE. A large gap between RMSE and MAE indicates occasional large errors.
For ensemble challenges, RMSE(mean) applies the same formula to the mean of the submitted ensemble members instead of a single point forecast.
What is Symmetric Mean Absolute Percentage Error (SMAPE)?
SMAPE normalises the absolute error by the average magnitude of forecast and truth.
Lower is better. SMAPE can become unstable or misleading when both forecast and truth are close to zero, which is common for solar generation at night.
Quantile Forecast Metrics
What is Weighted Interval Score (WIS)?
WIS is the primary probabilistic metric on the platform. It jointly rewards narrow prediction intervals and accurate coverage at every submitted quantile level.
Lower is better. WIS reduces to MAE when only the median is submitted, and generalises naturally to any set of symmetric quantile pairs, which makes it useful for comparing forecasts across different challenge formats.
What is Quantile Loss (LQS / Pinball Loss)?
LQS, also called pinball loss, is the average quantile loss across all submitted quantile levels.
Lower is better. This metric is meaningful only for participants who submit probabilistic quantile forecasts.
What is Mean Absolute Error (MAE)?
MAE measures the average magnitude of prediction errors without squaring them.
Lower is better. On this platform, MAE is used as a median-oriented metric when a 50% quantile is available, because minimizing MAE corresponds to the median functional.
What does Coverage (50%) / Coverage (95%) mean?
Coverage metrics report the empirical coverage of a central prediction interval: the percentage of realised observations that actually fell inside the interval.
Higher is better, and the target value equals the nominal level: a well-calibrated 95% interval should cover roughly 95% of observations. Because the 95% interval is always wider than the 50% interval, Coverage (95%) is always at least as large as Coverage (50%). Coverage is a diagnostic metric and does not affect the leaderboard ranking.
Ensemble Forecast Metrics
What is CRPS (Continuous Ranked Probability Score)?
CRPS generalises MAE to full predictive distributions. It measures the integrated squared difference between the forecast CDF and the step function at the realised outcome.
Lower is better. On this platform, CRPS is approximated from submitted ensemble members using the energy-score decomposition. CRPS is used as the primary scoring metric for ensemble forecast challenges.
What is the Energy Score?
The Energy Score generalises CRPS to multivariate forecasts. On this platform each ensemble submission covers an entire day, so the Energy Score evaluates the full daily trajectory rather than individual timestamps in isolation.
This means the platform interprets e1 across all timestamps as one scenario path, e2 as another, and so on. Ensemble member order therefore matters across time, not just within each timestamp.
Lower is better. The first term penalises distance from the truth; the second term rewards ensemble spread (diversity). A degenerate ensemble that always predicts the same trajectory minimises spread and is penalised heavily for any miss.
What is the Variogram Score?
The Variogram Score evaluates how well the ensemble reproduces the temporal dependence structure of the realised series, not just its marginal distributions. It compares empirical variogram increments between every pair of timestamps.
Lower is better. The Variogram Score is complementary to the Energy Score: a forecast can look good in Energy Score while misrepresenting the correlation structure between hours, and the Variogram Score will detect that.
Leaderboard
What does the rolling window mean?
Rankings are based on a rolling evaluation window: only the most recent Ntarget periods are included in each participant's score. This means the leaderboard reflects recent performance, not cumulative performance since the challenge started. Rolling windows of 1, 7, and 30 targets are available. The 7-target window is the default — it provides a stable weekly snapshot while reacting to recent trends.
What counts as a target period?
Each challenge defines a target period in its configuration. For all current challenges, one target period equals one full calendar day, for example 2026-03-21 in the Europe/Berlin timezone. The day-ahead submission deadline is typically 12:00 CET/CEST on the day before.
How is participation ratio calculated?
Participation ratio is the fraction of target periods within the rolling window for which a valid, evaluated submission exists. A ratio of 100% means the participant submitted a forecast for every target in the window. Missing submissions are not penalised in the metric score but are visible in the participation column.
What does 'open' vs 'closed' forecast visibility mean?
Participants can choose whether their forecast trajectories are visible in the leaderboard charts. An open forecast is plotted alongside the ground truth for everyone to see. A closed forecast participates in the ranking but the trajectory is hidden. Visibility can be changed at any time in the Dashboard.
Participation
Do I need to write code to participate?
No. The platform accepts forecasts via a standard REST API, so any tool that can send HTTP POST requests works. The starter repository provides ready-to-run Python scripts, but you could also submit from R, Julia, curl, or any programming language.
What forecast formats are supported?
Each challenge specifies one of three forecast objectives, visible on the challenge detail page:
- Point forecast — a single scalar value per timestamp. Scored with RMSE, R², and related error metrics.
- Quantile forecast — a vector of quantile values per timestamp in the order defined by the challenge (e.g. [q2.5, q25, q50, q75, q97.5]). Scored with WIS, LQS, and Coverage.
- Ensemble forecast — a vector of ensemble member values per timestamp. Scored with CRPS, Energy Score, Variogram Score, RMSE(mean), WIS, and Coverage.
See the Participation Guide for the exact payload format and per-challenge quantile definitions.
What happens if I miss a submission deadline?
Submissions are accepted up to the gate closure time defined per challenge, typically 12:00 CET/CEST on the day before delivery. After that, the submission window closes for that target period and a missing entry is recorded. The rolling-window leaderboard still shows your score across the remaining periods where you did submit.
How long does it take until my submission is evaluated?
Evaluation happens automatically once the ground truth for the target period is published. Ground truth values are fetched from ENTSO-E for internal scoring; the public forecast comparison chart uses SMARD as an independent source. Your submission first appears as pending in the Dashboard and transitions to evaluated once the pipeline has run and ground truth is available.