Implementing conditional Gaussian scoring function in pygobnilp and learning Bayesian network from complete mixed data
Pygobnilp is a program developed by James Cussens that supports learning a Bayesian network from complete data. However, this is not able to deal with mixed data, which includes both discrete and continuous values. Therefore, in this project, a method that can evaluate a Bayesian network from a mixed dataset was implemented in pygobnilp, and this made it possible to learn every kind of Bayesian networks.
A Baysian networks (BN) is represented as a directed acyclic graph (DAG) G=(V,E) which is a directed graph with the absence of cycles. The following image illustrates an example, called 'Asia', which was introduced by Lauritzen and Spiegelhalter. This represents a probabilistic model for a medical expert system. Each variable can be either TRUE (t) or FALSE (f), and each of them means as follows: A = "visit to Asia", T = "Tuberculosis", X = "Normal X-Ray result", E = "Either tuberculosis or lung cancer", L = "Lung cancer", D = "Dyspnea (shortness of breath)", S = "Smoker", and B = "Bronchitis". It is evident from this BN that variable D is a 'child' of 'parents' E and B, which visually tells that bronchitis and either tuberculosis or lung cancer directly influence the probability to cause dyspnea. In this way, BNs efficiently present probabilistic relationships.
Pygobnilp would predict the most probable BN by calculating scores of DAGs from the provided dataset.
The main differences between this new pygobnilp and the original pygobnilp are in pygobnilp/pygobnilp/scoring.py. The following classes are added or modified:
MixedDataclass: This class holds and deals with mixed data. You can call this class by givingdata_sourcewhich isstr,array_like, orPandas.DataFrame, and you can also designatevarnamesandaritiesif you want. For example, you may callMixedData('sample.txt')._AbsLLPenalisedclass: This class is an abstract class for calculating penalised log likelihood scores.GaussianLLclass: This class offers calculation of Gaussian LL score which can be used to evaluate a DAG. In this case, the DAG must be learned from a dataset which consists of only continuous data. In particular, newly added method,score_dag, calculates the Gaussian score or local Gaussian scores of the given DAG based on the previously given dataset.AbsCGaussianLLScoreclass: This is a abstract class for calculating mixed log likelihood scores.CGaussianLLclass: This class offers calculation of conditional Gaussian LL score which can be used to evaluate a DAG. In this case, the DAG must be learned from a dataset which consists of discrete and continuous data.CGaussianBICclass: This class offers calculation of conditional Gaussian BIC score, which is the log-likelihood penalised bydf * log(N) / 2for the each pair of child and parents, wheredfis the degrees of freedom of the pair andNis the number of variables.CGaussianAICclass: This class offers calculation of conditional Gaussian AIC score, which is the log-likelihood penalised bydffor the each pair of child and parents.
A simple example usage can be seen in the testMixedDataLearning.ipynb notebook. This tutorial uses mixed and continuous dataset extracted from an R package, bnlearn.
pygobnilp depends on (1) a number of Python packages (scipy, pygraphviz, matplotlib, networkx, pandas, numpy, scikit-learn and numba) and (2) the Gurobi MIP solver. pygraphviz also requires graphviz to be installed.
Although one can install all these separately the easier option is to install Anaconda Python and Gurobi together. Just go here. Installing Anaconda will get you most of the required packages but not (at present) pygraphviz, which, once Anaconda is in place, you can install with:
conda install pygraphviz
graphviz is not a Python package and has to be installed separately (if you do not already have it on your system).
Gurobi is a commercial system and requires a licence to run. However, an academic licence is free, see https://www.gurobi.com/academia/academic-program-and-licenses/. Although you can use pygobnilp with restricted license, the output differs from when using academic license.
One installation option is to download this repository and run the following command:
python pygobnilp/setup.py develop
Full documentation is also available in pygobnilp/_build/html/index.html. The original source code of pygobnilp can be found here.
レポートの要約とドキュメントが doc ディレクトリにあります。よろしければご覧ください。
