Probabilistic Neural Programming (PNP) is a Scala library for expressing, training and running inference in neural network models that include discrete choices. The enhanced expressivity of PNP is useful for structured prediction, reinforcement learning, and latent variable models.
Probabilistic neural programs have several advantages over computation graph libraries for neural networks, such as TensorFlow:
- Probabilistic inference is implemented within the library. For example, running a beam search to (approximately) generate the highest-scoring output sequence of a sequence-to-sequence model takes 1 line of code in PNP.
- Additional training algorithms that require running inference during training are part of the library. This includes learning-to-search algorithms, such as LaSO, reinforcement learning, and training latent variable models.
- Computation graphs are a subset of probabilistic neural programs. We use DyNet to express neural networks, which provides a rich set of operations and efficient training.
This library depends on DyNet with the
Scala DyNet bindings.
See the link for build instructions. After building this library, run
the following commands from the pnp
root directory:
cd lib
ln -s <PATH_TO_DYNET>/build/contrib/swig/dynet_swigJNI_scala.jar .
ln -s <PATH_TO_DYNET>/build/contrib/swig/dynet_swigJNI_dylib.jar .
That's it! Verify that your installation works by running sbt test
in the root directory.
This section describes how to use probabilistic neural programs to define and train a model. The typical usage has three steps:
- Define a model. Models are implemented by writing a function
that takes your problem input and outputs
Pnp[X]
objects. The probabilistic neural program typePnp[X]
represents a function from neural network parameters to probability distributions over values of typeX
. Each program describes a (possibly infinite) space of executions, each of which returns a value of typeX
. - Train. Training is performed by passing a list of examples
to a
Trainer
, where each example consists of aPnp[X]
object and a label. Labels are implemented as functions that assign costs to program executions or as conditional distributions over correct executions. Many training algorithms can be used, from loglikelihood to learning-to-search algorithms. - Run the model. A model can be run on a new input by
constructing the appropriate
Pnp[X]
object, then running inference on this object with trained parameters.
These steps are illustrated in detail for a sequence-to-sequence model in Seq2Seq2.scala. For a more complex example, run the GeoQuery semantic parsing experiment.
Probabilistic neural programs are specified by writing the forward
computation of a neural network, using the choose
operation to
represent discrete choices. Roughly, we can write:
val pnp = for {
scores1 <- ... some neural net operations ...
// Make a discrete choice
x1 <- choose(values, scores1)
scores2 <- ... more neural net operations, may depend on x1 ...
...
xn <- choose(values, scoresn)
} yield {
xn
}
pnp
then represents a function that takes some neural network
parameters and returns a distribution over possible values of xn
(which in turn depends on the values of intermediate choices). We
evaluate pnp
by running inference, which simultaneously runs the
forward pass of the network and performs probabilistic inference:
nnParams = ...
val dist = pnp.beamSearch(10, nnParams)
The choose
operator defines a distribution over a list of values:
val flip: Pnp[Boolean] = choose(Array(true, false), Array(0.5, 0.5))
This snippet creates a probability distribution that returns either
true or false with 50% probability. flip
has type Pnp[Boolean]
,
which represents a function from neural network parameters to
probability distributions over values of type Boolean
. (In this case
it's just a probability distribution since we haven't referenced any
parameters.) Note that flip
is not a draw from the distribution,
rather, it is the distribution itself. The probability of each
choice can be given to choose
either in an explicit list (as above)
or via an Expression
of a neural network.
We compose distributions using for {...} yield {...}
:
val twoFlips: Pnp[Boolean] = for {
x <- flip
y <- flip
} yield {
x && y
}
This program returns true
if two independent draws from flip
both
return true
. The notation x <- flip
can be thought of as drawing a
value from flip
and assigning it to x
. However, we can only use
the value within the for/yield block to construct another probability
distribution. We can now run inference on this object:
val marginals3 = twoFlips.beamSearch(5)
println(marginals3.marginals().getProbabilityMap)
This prints out the expected probabilities:
{false=0.75, true=0.25}
Probabilistic neural programs have access to an underlying computation graph that is used to define neural networks:
def mlp(x: FloatVector): Pnp[Boolean] = {
for {
// Get the computation graph
cg <- computationGraph()
// Get the parameters of a multilayer perceptron by name.
// The dimensionalities and values of these parameters are
// defined in a PnpModel that is passed to inference.
weights1 <- param("layer1Weights")
bias1 <- param("layer1Bias")
weights2 <- param("layer2Weights")
// Input the feature vector to the computation graph and
// run the multilayer perceptron to produce scores.
inputExpression = input(cg.cg, Seq(FEATURE_VECTOR_DIM), x)
scores = weights2 * tanh((weights1 * inputExpression) + bias1)
// Choose a label given the scores. Scores is expected to
// be a 2-element vector, where the first element is the score
// of true, etc.
y <- choose(Array(true, false), scores)
} yield {
y
}
}
We can then evaluate the network on an example:
val model = PnpModel.init(true)
// Initialize the network parameters. The values are
// chosen randomly.
model.addParameter("layer1Weights", Seq(HIDDEN_DIM, FEATURE_VECTOR_DIM))
model.addParameter("layer1Bias", Seq(HIDDEN_DIM))
model.addParameter("layer2Weights", Seq(2, HIDDEN_DIM))
// Run the multilayer perceptron on featureVector
val featureVector = new FloatVector(Seq(1.0f, 2.0f, 3.0f))
val dist = mlp(featureVector)
val marginals = dist.beamSearch(2, model)
for (x <- marginals.executions) {
println(x)
}
This prints something like:
[Execution true -0.4261836111545563]
[Execution false -1.058420181274414]
Each execution has a single value that is an output of our program and a score derived from the neural network computation. In this case, the scores are log probabilities, but the scores may have different semantics depending on the way the model is defined and its parameters are trained.
Pnp uses Dynet as the underlying neural network library, which provides a rich set of operations (e.g., LSTMs). See the Dynet documentation for details, along with the documentation for Dynet Scala bindings.
TODO: document usage of RNNBuilders, which have to be used statelessly.
Probabilistic neural programs can be easily composed to construct
richer models using for {...} yield {...}
. For example, we can
define a CRF sequence tagger using the multilayer perceptron above:
def sequenceTag(xs: Seq[FloatVector]): Pnp[List[Boolean]] = {
xs.foldLeft(Pnp.value(List[Boolean]()))((x, y) => for {
cur <- mlp(y)
rest <- x
cg <- computationGraph()
_ <- if (rest.length > 0) {
// Add a factor to the model that scores adjacent labels
// in the sequence. Here, labelNn runs a neural network
// whose inputs are cur and the next label, and whose output
// is a 1-element vector containing the score.
score(labelNn(cur, rest.head, cg.cg))
} else {
value(())
}
} yield {
cur :: rest
})
}
We can now run this model on a sequence of feature vectors in the same way as the multilayer perceptron:
// Same model as before, but make it globally normalized
// and add some more parameters for labelNn
model.locallyNormalized = false
model.addLookupParameter("left", 2, Seq(LABEL_DIM))
model.addLookupParameter("right", 2, Seq(LABEL_DIM))
val featureVectors = Seq(new FloatVector(...), new FloatVector(...), new FloatVector(...))
val dist = sequenceTag(featureVectors)
val marginals = dist.beamSearch(5, model)
for (x <- marginals.executions) {
println(x)
}
This prints something like:
[Execution List(true, true, true) 5.28779661655426]
[Execution List(false, true, true) 1.7529568672180176]
[Execution List(true, true, false) 1.4970757961273193]
[Execution List(true, false, false) -0.007531404495239258]
[Execution List(true, false, true) -0.42748916149139404]
TODO
TODO