FAQ

Answers to common questions about the platform, metrics, and how to participate. If your question is not answered here, open a challenge proposal or reach out via the Community Proposals workflow.

Metric Overview

The platform uses different metric families depending on the forecast objective of the challenge. The sections below group the metric explanations by forecast type so you can jump directly to the metrics that matter for your submissions.

Point Forecast ChallengesRMSE is the primary ranking metric; RÂ² and related deterministic diagnostics help interpret fit and scale-free performance.

Quantile Forecast ChallengesWIS, LQS, MAE (via q50), and Coverage evaluate calibration and sharpness of submitted quantiles.

Ensemble Forecast ChallengesCRPS, Energy Score, Variogram Score, and RMSE(mean) evaluate both marginal accuracy and full-path ensemble behavior.

Why are MAPE and MASE not shown in the forecast error chart?

MAPE (Mean Absolute Percentage Error) and MASE (Mean Absolute Scaled Error) are widely used in the forecasting literature but have known shortcomings.

MAPE is undefined or unstable when truth values are near zero, which is a frequent issue for solar or low-demand periods. MASE requires a baseline forecast to normalise against, which makes it sensitive to implementation choices and harder to compare across platforms.

Both metrics are still computed internally and available via the raw leaderboard API, but they are excluded from the default chart view to avoid misleading comparisons. The default ranking focuses on RMSE, MAE, WIS, CRPS, Energy Score, and Variogram Score depending on the challenge type.

Point Forecast Metrics

What is Root Mean Squared Error (RMSE)?

RMSE applies a square root to the mean of squared errors.

\mathrm{RMSE} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (f_i - y_i)^2}

Same symbols as MAELarge errors receive extra weight because of the square

Lower is better. RMSE is the primary point-forecast metric on this platform and is aligned with mean-oriented forecasts. Because RMSE is in the same unit as the target variable, it is directly comparable to MAE. A large gap between RMSE and MAE indicates occasional large errors.

For ensemble challenges, RMSE(mean) applies the same formula to the mean of the submitted ensemble members instead of a single point forecast.

What is Symmetric Mean Absolute Percentage Error (SMAPE)?

SMAPE normalises the absolute error by the average magnitude of forecast and truth.

\mathrm{SMAPE} = \frac{200}{n} \sum_{i=1}^{n} \frac{\lvert f_i - y_i \rvert}{\lvert f_i \rvert + \lvert y_i \rvert} \;\%

Result is expressed as a percentage, bounded between 0% and 200%Use with caution near zero

Lower is better. SMAPE can become unstable or misleading when both forecast and truth are close to zero, which is common for solar generation at night.

Quantile Forecast Metrics

What is Weighted Interval Score (WIS)?

WIS is the primary probabilistic metric on the platform. It jointly rewards narrow prediction intervals and accurate coverage at every submitted quantile level.

\mathrm{WIS} = \frac{{\tfrac{1}{2}\lvert y - f_{0.5}\rvert + \displaystyle\sum_{k=1}^{K} \tfrac{\alpha_k}{2} \cdot \mathrm{IS}_k}}{K + \tfrac{1}{2}}

K = number of symmetric interval levelsα_k= 1 − nominal coverage of interval k (e.g. 0.1 for the 90% interval)IS_k = interval width + penalties for observations outside [l_k, u_k]f₀₅ = median (50% quantile) forecast

Lower is better. WIS reduces to MAE when only the median is submitted, and generalises naturally to any set of symmetric quantile pairs, which makes it useful for comparing forecasts across different challenge formats.

What is Quantile Loss (LQS / Pinball Loss)?

LQS, also called pinball loss, is the average quantile loss across all submitted quantile levels.

\mathrm{QL}_q(f_q, y) = \begin{cases} q\,(y - f_q) & \text{if } y \geq f_q \\[4pt] (1 - q)\,(f_q - y) & \text{if } y < f_q \end{cases}

q = quantile levelf𝑞 = submitted q-quantile forecast

Lower is better. This metric is meaningful only for participants who submit probabilistic quantile forecasts.

What is Mean Absolute Error (MAE)?

MAE measures the average magnitude of prediction errors without squaring them.

\mathrm{MAE} = \frac{1}{n} \sum_{i=1}^{n} \lvert f_i - y_i \rvert

n = number of target periodsf𝑖 = forecast at period iy𝑖 = ground truth at period i

Lower is better. On this platform, MAE is used as a median-oriented metric when a 50% quantile is available, because minimizing MAE corresponds to the median functional.

What does Coverage (50%) / Coverage (95%) mean?

Coverage metrics report the empirical coverage of a central prediction interval: the percentage of realised observations that actually fell inside the interval.

\mathrm{Coverage}_\alpha = \frac{1}{n} \sum_{i=1}^{n} \mathbf{1}[\,l_i \leq y_i \leq u_i\,] \times 100\%

l𝑖, u𝑖 = lower and upper quantiles of the intervalCoverage (50%) uses [q_0.25, q_0.75]; Coverage (95%) uses [q_0.025, q_0.975]

Higher is better, and the target value equals the nominal level: a well-calibrated 95% interval should cover roughly 95% of observations. Because the 95% interval is always wider than the 50% interval, Coverage (95%) is always at least as large as Coverage (50%). Coverage is a diagnostic metric and does not affect the leaderboard ranking.

Ensemble Forecast Metrics

What is CRPS (Continuous Ranked Probability Score)?

CRPS generalises MAE to full predictive distributions. It measures the integrated squared difference between the forecast CDF and the step function at the realised outcome.

\mathrm{CRPS}(F,\, y) = \int_{-\infty}^{\infty} \!\bigl(F(z) - \mathbf{1}[z \geq y]\bigr)^2 \mathrm{d}z

F(z) = forecast CDF at threshold z1[z ≥ y] = indicator for z ≥ realised y

Lower is better. On this platform, CRPS is approximated from submitted ensemble members using the energy-score decomposition. CRPS is used as the primary scoring metric for ensemble forecast challenges.

What is the Energy Score?

The Energy Score generalises CRPS to multivariate forecasts. On this platform each ensemble submission covers an entire day, so the Energy Score evaluates the full daily trajectory rather than individual timestamps in isolation.

This means the platform interprets e1 across all timestamps as one scenario path, e2 as another, and so on. Ensemble member order therefore matters across time, not just within each timestamp.

\mathrm{ES} = \mathbb{E}\bigl[\lVert \mathbf{X} - \mathbf{y} \rVert\bigr] - \tfrac{1}{2}\,\mathbb{E}\bigl[\lVert \mathbf{X} - \mathbf{X}' \rVert\bigr]

X, X′ = independent draws from the ensemble (vectors over all T timestamps)y = vector of realised values‖·‖ = Euclidean norm over timestamps

Lower is better. The first term penalises distance from the truth; the second term rewards ensemble spread (diversity). A degenerate ensemble that always predicts the same trajectory minimises spread and is penalised heavily for any miss.

What is the Variogram Score?

The Variogram Score evaluates how well the ensemble reproduces the temporal dependence structure of the realised series, not just its marginal distributions. It compares empirical variogram increments between every pair of timestamps.

\mathrm{VS}_p = \frac{1}{|P|} \sum_{(i,j)\,\in\, P} \Bigl(\lvert y_i - y_j \rvert^p - \mathbb{E}\bigl[\lvert X_i - X_j \rvert^p\bigr]\Bigr)^{\!2}

P = set of all timestamp pairs (i, j) with i < j|P| = T(T−1)/2 pairs averaged to keep scores comparable across day lengthsp = 0.5 (fixed; fractional power reduces sensitivity to outliers)

Lower is better. The Variogram Score is complementary to the Energy Score: a forecast can look good in Energy Score while misrepresenting the correlation structure between hours, and the Variogram Score will detect that.

Leaderboard

What does the rolling window mean?

Rankings are based on a rolling evaluation window: only the most recent Ntarget periods are included in each participant's score. This means the leaderboard reflects recent performance, not cumulative performance since the challenge started. Rolling windows of 1, 7, 30, and 90 targets are available. The 7-target window is the default — it provides a stable weekly snapshot while reacting to recent trends.

What counts as a target period?

Each challenge defines a target period in its configuration. For all current challenges, one target period equals one full calendar day, for example 2026-03-21 in the Europe/Berlin timezone. The day-ahead submission deadline is typically 12:00 CET/CEST on the day before.

What does Information Cut-off mean on the leaderboard?

Information cut-offs compare forecasts made with different amounts of information before the same submission deadline. For example, a day-ahead challenge with a 12:00 deadline can expose 07:00, 08:00, 09:00, 10:00, 11:00, and 12:00 leaderboards. A submission at 09:55 counts for the 10:00 cutoff and all later cutoffs, but not for 07:00, 08:00, or 09:00. If several submissions fall into the same cutoff interval, the latest one is used.

How is participation ratio calculated?

Participation ratio is the fraction of target periods within the rolling window for which a valid, evaluated submission exists. A ratio of 100% means the participant submitted a forecast for every target in the window. Missing submissions are not penalised in the metric score but are visible in the participation column.

What do the forecast visibility modes mean?

Private hides forecast trajectories from public charts and API feeds. Open after evaluation publishes trajectories only after ground truth is available and the submission has been evaluated. Open immediately is a cooperative opt-in that can publish submitted forecasts before the target period is realized and can keep them accessible through the cooperative API history for that challenge. Visibility can be changed at any time in the Dashboard.

How can I download public forecasts?

Use the public forecast API endpoint /api/v1/public-forecasts with your Energy Arena API key in the X-API-Key header. The default endpoint view returns the current cooperative live feed for the requested challenge_id from profiles set to Open immediately. Current official challenges are single-area, so the backend infers area; only multi-area challenges need it explicitly.

curl -H "X-API-Key: YOUR_API_KEY" "https://api.energy-arena.org/api/v1/public-forecasts?challenge_id=1"

To download the full cooperative submission history for one participant in a challenge, add the public leaderboard name as participant_name and set history=true. The numeric participant_id is still accepted for backward compatibility:

curl -H "X-API-Key: YOUR_API_KEY" "https://api.energy-arena.org/api/v1/public-forecasts?challenge_id=1&participant_name=alice&history=true"

Historical data is available only for participants whose profile is currently visible on the public leaderboard and set to Open immediately. Evaluation-only profiles are not included through this endpoint. History responses are paginated and include total_count, returned_count, limit, and offset.

Participation & Submission

Do I need to write code to participate?

No. The platform accepts forecasts via a standard REST API, so any tool that can send HTTP POST requests works. The starter repository provides ready-to-run Python scripts, but you could also submit from R, Julia, curl, or any programming language.

What are cooperative submissions?

Cooperative submissions are forecasts from participants who set their challenge profile to Open immediately. These forecasts can appear in the forecast comparison plot before the corresponding ground truth is available.

They do not change leaderboard ranks or evaluation metrics for the current window. The leaderboard is still based only on evaluated target periods with available ground truth. Cooperative submissions are an additional visualization and sharing layer on top of the benchmark.

When you enable the checkbox in the leaderboard, the plot keeps the full evaluated history for the selected participant and extends it with the participant's currently shared future target periods. If multiple future cooperative submissions are available, all of them are shown in chronological order.

Cooperative submissions are publicly visible to platform users through the leaderboard plot and through the authenticated forecast feed endpoint /api/v1/public-forecasts. Only profiles that remain visible on the public leaderboard and explicitly opt into Open immediately are included. The default API view returns the current cooperative live feed for a challenge. Current official challenges are single-area, so the backend infers area; only multi-area challenges need it explicitly. A participant-specific history view is available with participant_name and history=true.

What forecast formats are supported?

Each challenge specifies one of three forecast objectives, visible on the challenge detail page:

Point forecast — a single scalar value per timestamp. Scored with RMSE, R², and related error metrics.
Quantile forecast — a vector of quantile values per timestamp in the order defined by the challenge (e.g. [q2.5, q25, q50, q75, q97.5]). Scored with WIS, LQS, and Coverage.
Ensemble forecast — a vector of ensemble member values per timestamp. Scored with CRPS, Energy Score, Variogram Score, RMSE(mean), WIS, and Coverage.

See the Participation Guide for the exact payload format and per-challenge quantile definitions.

What happens if I miss a submission deadline?

Submissions are accepted up to the gate closure time defined per challenge, typically 12:00 CET/CEST on the day before delivery. After that, the submission window closes for that target period and a missing entry is recorded. The rolling-window leaderboard still shows your score across the remaining periods where you did submit.

How should I handle clock changes (DST) in my submission?

Use the submission format shown on the challenge page: challenge_id, target_start, and a values array. You do not need to submit one timestamp per forecast value. The platform derives the canonical time grid from target_start and the challenge timezone.

For challenges with target-period timezone Europe/Berlin, the values array must match the actual local calendar day. Most days this means 24 hourly values. On the spring DST change day, however, the day has only 23 hours, so you must submit 23 hourly values. On the autumn DST change day, the day has 25 hours, so you must submit 25 hourly values.

The same rule applies to other resolutions. For example, quarter-hour challenges require 92, 96, or 100 values depending on whether the target day has 23, 24, or 25 hours. Scoring is always computed on that actual 23h/24h/25h target grid, not on a fixed 24-hour template.

How long does it take until my submission is evaluated?

Evaluation happens automatically once the ground truth for the target period is published. Ground truth values are fetched from ENTSO-E for internal scoring; the public forecast comparison chart uses SMARD as an independent source. Your submission first appears as pending in the Dashboard and transitions to evaluated once the pipeline has run and ground truth is available.