How Forecasters In Allora Make AI Context-Aware and Smarter

Joel Pfeffer
April 25, 2025

Artificial intelligence (AI) thrives on accurate predictions, but what if we could predict the accuracy of AI itself? Forecasting in the Allora Network is a critical innovation that enables context-awareness by predicting the inference accuracy of participants in the network. For the first time, this allows an AI network to dynamically adjust weights assigned to different predictions depending on the current context, ensuring the most accurate collective inferences under varying conditions.

Why Do AI Networks Need Forecasters?

Traditional inference synthesis schemes use simple or historically-weighted averages to combine the predictions (inferences) from different AI agents in the network. However, the accuracy of any given model can vary based on changing conditions (e.g., market volatility, data shifts, external events).

Without forecasting, inference synthesis is blind to context and must combine inferences based on time-averaged (and sometimes outdated) performance, which can lead to suboptimal or misleading outcomes. This is a fundamental problem in federated, distributed, and decentralized learning: how can we combine inferences in a way that is cognizant of their accuracy under varying circumstances? In other words: how can we make AI context-aware? 

Allora’s forecasters predict the expected accuracy of each contributed inference under the current conditions. That way they add real-time context, ensuring the network adds more weight to the inferences that are expected to outperform the others.

The Forecasting Process

The role of forecasters within the Allora Network is as follows.

  1. Workers provide raw inferences: AI agents in the network generate predictions for a given topic (a “topic” is a sub-network focussed on a specific target variable).
  2. Workers provide forecasts: Instead of generating inferences themselves, these models forecast the accuracy of the provided raw inferences. Each forecaster predicts either of:
    1. the expected error of an inference (loss);
    2. the expected error of an inference relative to that of the combined network inference (regret);
    3. the relative ranking of the regret (regret z-score).
  3. Aggregation of forecasts: The forecasted losses, regrets, or regret z-scores are used to adjust the weight given to each raw inference, combining them into a "forecast-implied inference". Inferences that are expected to be more accurate than others (lower loss or higher regret) contribute a higher weight to the forecast-implied inference.
  4. The network combines all inferences: The raw inferences are combined with the forecast-implied inferences in the inference synthesis process, taking into account the historical performance of the raw or forecast-implied inferences originating from each worker. This way, the network generates a single network inference, which outperforms the best individual workers in the network thanks to its incorporation of context-dependent loss forecasts.
  5. Self improvement: When the ground truth becomes available, the actual losses of the raw inferences are fed back to the forecasting models to improve future forecasts.

Forecaster Design

We have designed a forecasting model that can be used as a starting point to hopefully jumpstart forecaster designs of other network participants. A good forecasting model should be flexible; the ideal setup for one topic could perform poorly on a different topic. For this reason, we have tested using different variables as a base for the forecasting model (losses and regrets), different methods of processing the base variable to define the target for the forecaster (e.g. raw values and z-scores), as well as different methods of partitioning the worker data into models (single combined vs per-inferer models). As a basis for the machine learning models, we have tested both LightGBM and XGBoost gradient boosting frameworks, finding similar performance between both methods. The figures shown here use the LightGBM model.

Both losses and regrets are a representation of the accuracy of inferences and thus are valid choices as base variables for the forecasting model. Losses represent accuracy in an absolute sense. They are independent from the inferences of other workers in the network, but forecasting errors for each inference can be akin to the difficulty of predicting the ground truth in the first place.

On the other hand, regrets are a representation of relative accuracy. As they depend on the combined inference of the network (regret is the difference between the combined network loss and the worker loss), regrets are not independent between inferences. However, the major benefit is that they do not require the actual error to be forecasted, only a measure of relative performance compared to other inferences within the network, and can provide a more stable property for the forecaster to predict. Another advantage is that regrets are used to set the weights in Allora's inference synthesis mechanism, implying that their use as a base variable for the forecaster models allows these models to more directly steer the forecast-implied inference. Indeed, we find that forecasters predicting regrets significantly outperform those predicting losses, as those predicting losses often do not outperform the best inference worker in the topic.

Along similar lines, we have implemented different methods of processing the base variable (losses or regrets) to provide the target variable for the forecaster: raw values (no processing), differencing (predicting the change from one epoch to the next), normalisation (dividing all values by the standard deviation at that epoch), and z-scores (subtracting the mean and dividing by the standard deviation). The latter two methods (normalisation and z-scores) are motivated by a similar principle to that discussed above for regrets: in order to set weights, it is only the relative performance that needs to be predicted for different inferers by the forecasting model, not the absolute performance.

To partition the inferer data into models we have tested two methods: using a single combined forecasting model, with inferer ID as a feature variable to distinguish workers; and seperate forecasting models for each inferer. A single combined model is simpler to implement and enables training on more limited data samples, but can suffer from `regression to the mean' issues where it does not adequately distinguish the performance of different inferers. In contrast, per-inferer models have better context awareness between different inferers, but will fail where data is limited (e.g. new inferers in the network). In practice, we have implemented a hybrid scheme, where those workers with sufficient information receive an individual model, while others revert to a combined model.

Finally, we perform feature engineering on the raw information from the network (e.g. inferences, ground truth, losses, regrets) to provide further meaningful feature variables for the forecasting model, in addition to the raw values. In particular, engineered properties such as difference from the moving average, rates of change (momentum, acceleration, gradient, percentage change) and standard deviation of the exponential moving average are often among the most important features for the models.

The above figure shows the resulting performance (log loss) of different forecasting model setups for a BTC/USD 5-minute price prediction topic over 500 epochs. For reference, the horizontal lines show the loss of the naive network inference (black dashed line) and best inference worker (grey dash-dotted line). The thin coloured lines show the results for the forecasting models for different target variables (raw regret, raw loss and regret z-score). Solid lines show the single combined model, dashed lines show per-inferer models, and different colours indicate different choices for EMA spans (blue: 3; orange: 7; green: 3 and 7; red: 7, 14 and 30). For most tests, we find that performance differences from EMA spans is marginal, but generally shorter spans (3 and 7) give the best performance.

The figure shows that, for a given target variable, the per-inferer models outperform (lower loss) a single combined forecasting model, with only per-inferer models (predicting raw regrets or z-scores) outperforming the best inference worker. Of the three target variables, forecasting models predicting raw losses show the worst performance; the combined model performs worse than the naive network inference, while even the per-inferer models do not outperform the best inference worker in the network. Overall, per-inferer forecasting models which predict regret z-scores show the best performance in this test, outperforming both the naive network inference and the best inference worker in the network, with per-inferer models predicting raw regrets a close second. However, we have observed some variance in the relative performances of these forecaster models, suggesting that a suite of forecasters covering these different setups will best serve the network.

Illustrative Examples

As a demonstration of the ability of Allora's forecasting mechanism to improve the performance of the network inference, we have applied forecasting to ETH/USD and BTC/USD 5-minute price prediction topic. These topics collect raw inferences of models predicting what the Ethereum USD and Bitcoin USD prices are going to be 5 minutes into the future. The forecasting model in these examples predicts the z-score of the regrets, which we found to be the best performing model for both topics.

First, we compare results from the ETH/USD topic. The above time series figure compares the naive network inference (dashed line; this is a network inference that only combines the raw inferences in a weighted average) to the forecast-implied inference generated from their forecasted losses (orange line) and the actual Ethereum price (blue line). The forecasting model uses past trends and current circumstances (current inference, rate of change of the ground truth, etc.) to estimate the expected accuracy of the raw inferences. The network then transforms these forecasted losses into weights, which are used in the calculation of the forecast-implied inference.

The figure shows that during periods of rapid change, the historically-weighted naive network inference (dashed line) adapts slowly to the changing conditions and lags behind the ground truth (blue line). In contrast, the forecasting worker identifies inference workers that outperform during these periods and weights them highly in the forecast-implied inference (orange line). This context awareness enables the forecast-implied inference (log loss=0.93) to outperform both the naive network inference (log loss=1.25) and the best inference worker in the network (log loss=1.17).

Next, we compare results from the BTC/USD topic, with the above figure again showing a time series comparing the raw inferences, naive network inference and forecast-implied inference to the actual Bitcoin price. We find very similar behaviour to that observed in the ETC/USD topic, with the naive network inference adapting poorly to periods of rapid change. Again, the ability of the forecasting worker to identify the best performing workers in different circumstances enables the forecaster-implied inference (log loss=4.06) to outperform the naive network inference (log loss = 4.35) and best inference worker in the network (log loss=4.23).

The Future of Forecasting in AI

Allora’s novel forecasting mechanism enhances AI predictions by considering current conditions and anticipating the errors of inferences before the outcome is known. By using a network of forecasters to predict inference accuracy, Allora achieves a self-improving, decentralized intelligence that continuously refines its decision-making over time and outperforms the best individual participant by definition. As AI continues to evolve, Allora’s forecasting mechanism will play a critical role in creating more transparent, adaptive, and intelligent systems.

About the Allora Network

Allora is a self-improving decentralized AI network.

Allora enables applications to leverage smarter, more secure AI through a self-improving network of ML models. By combining innovations in crowdsourced intelligence, reinforcement learning, and regret minimization, Allora unlocks a vast new design space of applications at the intersection of crypto and AI.

To learn more about Allora Network, visit the Allora website, X, Blog, Discord, Research Hub, and developer docs.