Don't model an individual dataset, model the world!
All existing statistics- and machine learning- models are trained on an individual dataset. They don't learn general patterns, but only those occurring in this dataset.
The Tree of Knowledge models are instead trained on many different datasets with observations from many different scenarios.
The findings from Tree of Knowledge are therefore more generally valid and significantly more accurate and robust.
Don't just make predictions, fully understand the system!
At the core of Tree of Knowledge lies a Probabilistic Programming engine which allows users to create simulation-based models. With this type of model you can easily model and understand complex behaviours and interactions.
Furthermore, the simulation-based models make it is easy to investigate phenomena that are not directly observed or that are entangled with other phenomena - you no longer have to control for variables, use instrumental variables, etc.
Be certain about uncertainty!
Due to Bayesian inference, Bayesian updating and Monte-Carlo simulations, all uncertainties of the system are captured and modelled. The error margins of the simulations correspond exactly to our level of understanding of the system.
Stand on the shoulders of giants!
When using Tree of Knowledge to model your system of interest, you don't really create a new model, rather you adapt part of a big, holistic model.
This holistic model grows, becomes more detailed and accurate, as users model new systems with it. Together with the holistic model the individual simulations get more accurate, too.
How it works
With Tree of Knowledge you create simulation models with one or multiple interacting agents.
The agents' behaviours are governed by behaviour rules, which change the agents' internal state depending on inside and outside conditions.
The basic premise of the Tree of Knowledge is that objects/agents should behave according to the same rules independent of what scenario they are in!
For example: people will stay the same people and act according to the same basic rules if they are shopping in a street market or conducting job interviews.
So, to model your system of interest, you only need to specify which objects are present in your system and what their initial state is.
The objects' behaviours come from a centralized repository of object behaviours.
If necessary, you may also add new behaviour rules to the repository. This might be especially relevant when modelling a new type of object or objects in a completely new environment.
Learning Object Behaviours
When users add new behaviour rules to objects, the question remains 'Which of the behaviours rules are actually true?'
Here is where the strength of the Tree of Knowledge truly comes to bear:
Over time users will have modelled different systems using the same objects. The system can therefore automatically test the objects' behaviours in all these different scenarios!
After all, if we say that some type of objects (e.g. people) have a certain behaviour, then this behaviour must be valid in all circumstances/scenarios.
By testing the behaviour rules this way we can get truly robust insights about our objects of interest.
Also, the insights will be more accurate than with existing methods, because they are learned from more data.
The behaviour rules are optimized such that in as many simulations as possible the simulated values are as close as possible to the true, observed values.
Through the optimization a probability is learned for each behaviour rule - the probability that the rule is true (in all the different scenarios).
The Rule Learning can also easily be used for hypothesis testing: You can formulate any relation, dependence or behaviour as a new rule and the system will learn the probability that it is true.
Actually, due to the Bayesian inference and updating, the system doesn't only learn the probabilities that rules are true but also the exact confidence for these predictions (in form of pdfs - see figure 2).
Using simulations has the added advantage that behaviours/phenomena can be tested even if they were not directly observed or if they co-occur with other phenomena.
Figure 1: To facilitate learning across different models, the objects and their behaviour rules are sorted into a hierarchy of objects.
Figure 2: This probability density function (pdf) shows that the corresponding rule is likely to be 100% true, but the probability of it being true could also be lower (tail to the left). The more data and simulations a rule has been evaluated on, the narrower its pdf becomes i.e. the more precisely its probability can be determined.
As already mentioned, there is a centralized repository for object behaviours. The second key component of the Tree of Knowledge's holistic model is the knowledge base - a centralized store for all the data.
A knowledge base is a database which stores data in an integrated fashion. As with existing knowledge bases (dbpedia, yago, etc.), any structured data can be uploaded to the Tree of Knowledge's knowledge base.
The Tree of Knowledge's knowledge base, however, goes much further: for any datapoint in the knowledge base we know everything: when it was observed, what attribute/relation of what specific object was being observed and much more.
Furthermore, a high level of data quality is assured through strict data monitoring and data validation.
This powerful knowledge base enables much of the other functionality such as automatic model validation and rule learning, automatic model initalisation and advanced querying.
Growing the most detailed and accurate model
The holistic model behind the Tree of Knowledge grows the more it is used. It is the only existing data-based model that whose capability and accuracy grow in this way.
As users upload new datasets to the knowledge base it will grow to an extensive data portal. It has the capability of accurately cataloging most of human knowledge.
The growth/refinement of object behaviours is even more significant!
As users make new models portraying the objects in new scenarios (the scenarios they are trying to understand/for which they have data),
the object's behaviour gets increasingly comprehensive and exact. If objects have been simulated and evaluated in sufficiently many environments, we expect that it will be possible to create the most accurate economic models with them!
The Tree of Knowledge uses likelihood-free Bayesian inference and Bayesian updating to determine the rule probabilities.
This is a robust, statistically rigorous formalism that captures all uncertainties (those originating from imperfect knowledge of the world, and those from inherent randomness).
When making predictions with Tree of Knowledge, the Monte-Carlo simulation engine captures and propagates all previously learned uncertainties => the predicted outcomes and the uncertainty of these predictions correspond exactly to the true uncertainty.