Functional libraries in the OpenMOLE ecosystem

Jonathan Passerat-palmbach <j.passerat-palmbach@imperial.ac.uk>

 Scala X bytes - 10 Jul 2017

Model?

Stuff that you can launch, taking inputs and producing outputs

Example?

Model / Parameter Exploration

Limits of complete / systematic parameter exploration

  • very different parameter sets can lead to similar outputs

Limits of complete / systematic parameter exploration

  • very close parameter sets can lead to very different outputs

Another approach?

Explore output space instead!

Pattern Space Exploration

MGO (Multi-Goal Optimisation)

Beyond Corroboration: Strengthening Model Validation by Looking for Unexpected Patterns, Cherel et al., 2015

A New Method to Evaluate Simulation Models: The Calibration Profile (CP) Algorithm, Reuillon et al., 2015

From a user point of view

import algorithm.pse._

val pse = PSE(
  lambda = 10,
  phenotype = zdt4.compute,
  pattern =
    boundedGrid(
      lowBound = Vector(0.0, 0.0),
      highBound = Vector(1.0, 200.0),
      definition = Vector(10, 10)),
  genomeSize = 10)
val runResult = run(pse).
                until(afterGeneration(1000)).
                trace((s, is) => println(s.generation))

runResult: mgo.RunResult[mgo.algorithm.pse.PSE,
                         mgo.algorithm.pse.context.M,
                         mgo.algorithm.pse.Individual,
                         mgo.algorithm.pse.Genome,
                         mgo.algorithm.EvolutionState[mgo.algorithm.pse.HitMap]]

Data representation

package object pse {
  ...
  
  case class PSE(
    lambda: Int,
    phenotype: Expression[Vector[Double], Vector[Double]],
    pattern: Vector[Double] => Vector[Int],
    genomeSize: Int,
    operatorExploration: Double = 0.1)

  case class Genome(values: Array[Double], operator: Option[Int])
  case class Individual(genome: Genome, fitness: Array[Double], age: Long)

}
package object algorithm {  
  
    case class EvolutionState[S](
    generation: Long = 0,
    startTime: Long = System.currentTimeMillis(),
    random: util.Random = newRNG(System.currentTimeMillis()),
    s: S) // -> this is our HitMap
}

Using it?

val runResult = run(pse).
                until(afterGeneration(1000)).
                trace((s, is) => println(s.generation))

val (finalState, finalPopulation) =
  runResult.eval(new util.Random(42))

Typeclass to guide (future) implementations

trait Algorithm[T, M[_], I, G, S] {
  def initialState(t: T, rng: Random): S
  def initialPopulation(t: T): M[Vector[I]]
  def step(t: T): Kleisli[M, Vector[I], Vector[I]]
  def state: M[S]
  def run[A](m: M[A], s: S): A
}

Monadic style computation

package object mgo {
    ...
  case class RunResult[T, M[_], I, G, S](...) {
  
    def evolution = for {
      initialPop <- algo.initialPopulation(t)
      finalPop <- step.fold(initialPop)(stopCondition.getOrElse(never[M, I])) // until(afterGeneration(1000))
      s <- algo.state
    } yield (s, finalPop)
  ...
  }

}

Zooming in step

  implicit def isAlgorithm = new Algorithm[PSE, M, Individual, Genome, EvolutionState[HitMap]] {
  
    def step(t: PSE) =
      deterministicStep[M, Individual, Genome](
        pse.breeding(t.lambda, t.pattern, t.operatorExploration),
        pse.expression(t.phenotype),
        pse.elitism(t.pattern))
  }

  def breeding[M[_]: Monad: Generation: Random, I, G](
      fitness: I => Vector[Double],
      genome: I => G,
      genomeValues: G => Vector[Double],
      genomeOperator: G => Option[Int],
      buildGenome: (Vector[Double], Option[Int]) => G)
}

Decoupling without enforcing design

pseOperations.breeding[M, Individual, Genome](
      Individual.genome.get,
      vectorValues.get,
      Genome.operator.get,
      vectorPhenotype.get _ andThen pattern,
      ...)

@Lenses case class Genome(values: Array[Double], operator: Option[Int])
@Lenses case class Individual(genome: Genome, fitness: Array[Double], age: Long)

def vectorPhenotype = Individual.phenotype composeLens arrayToVectorLens
def vectorValues = Genome.values composeLens arrayToVectorLens

What makes MGO

  • Typeclass (contract of what an Algorithm is)
  • Case classes (represent domain data structures)
  • Optics (Lenses to access a specific field as a function)
  • Monadic computations
  • Kleisli
  • (Free)Monad

What do we have so far?

  • Estimations:
    • Initial computation time (grid search) = 2551 years
    • PSE = 65 years

Not bad..

But what if we had access to more machines?

GridScale

  • High level access to various High Performance / Throughput Computing environments
  • Batch jobs

High Throughput Computing

  • Multi-thread
  • Delegation to SSH server
  • Batch schedulers (via SSH)
    • PBS
    • SLURM
    • Condor
    • SGE
    • OAR
  • Grid Computing (EGI trough DIRAC)

Cake pattern / Mix-ins

trait SLURMJobService extends JobService
                      with SSHHost 
                      with SSHStorage 
                      with BashShell

Problems

trait SLURMJobService extends JobService
                      with SSHHost
                      with SSHStorage
                      with SSHSimpleConnection
                      with SSHCachedConnection
                      with BashShell
  • Which Connection is used?
  • No connection?
  • Implementation leaks in type
    • SSH
    • Bash

Function composition


def submit[M[_]: Monad, S, D](
    workDirectory: D ⇒ String,
    buildScript: (D, String) ⇒ String,
    scriptSuffix: ⇒ String,
    submitCommand: String ⇒ String,
    retrieveJobID: String ⇒ BatchJobID)
    ( server: S, jobDescription: D )

Guide the design with a TypeClass

trait BatchScheduler[D] {

  def submit[M[_]: Monad, S](server: S, jobDescription: D)(implicit hn: HeadNode[S, M], system: System[M], errorHandler: ErrorHandler[M]): M[BatchJob]

  def state[M[_]: Monad, S](server: S, job: BatchJob)(implicit hn: HeadNode[S, M], error: ErrorHandler[M]): M[JobState]

  def clean[M[_]: Monad, S](server: S, job: BatchJob)(implicit hn: HeadNode[S, M]): M[Unit]

  def stdOut[M[_], S](server: S, job: BatchJob)(implicit hn: HeadNode[S, M]): M[String] = BatchScheduler.stdOut[M, S](server, job)
  
  def stdErr[M[_], S](server: S, job: BatchJob)(implicit hn: HeadNode[S, M]): M[String] = BatchScheduler.stdErr[M, S](server, job)

}

Here comes this imperative style again..


for {
  _ ← hn.execute(server, s"mkdir -p $workDir")
  
  uniqId ← system.randomUUID.map(_.toString)
  
  script = buildScript(jobDescription, uniqId)
  sName = scriptName(scriptSuffix)(uniqId)
  
  _ ← hn.write(server, script.getBytes, scriptPath(workDir, scriptSuffix)(uniqId))
  
  cmdRet ← hn.execute(server, command)
  ExecutionResult(ret, out, error) = cmdRet
  
  _ ← if (ret != 0) errorHandler.errorMessage(ExecutionResult.error(command, cmdRet)) 
      else ().pure[M]
  _ ← if (out == null) errorHandler.errorMessage(s"$submitCommand did not return a JobID")
      else ().pure[M]
  jobId = retrieveJobID(out)
} yield BatchJob(uniqId, jobId, workDir)

Free Monad

  • Same pattern as in MGO
  • Express imperative style computation
  • Account for side effects
  • Distinct modules => separate DSLs

Chris Birchall’s talk: free vs tagless final

FreeDSL

  • Macro generating boilerplate on top of Freek
  • Runtime merge function
  • Highly composable DSL
  • ISCPIF/freedsl
  • Because we can

FreeDSL applied to GridScale


val context = merge(SSH, System, ErrorHandler)

import context.M
import context.implicits._
import gridscale.slurm.slurmDSL._

val jobDescription = SlurmJobDescription(...)

val res = for {
  job ← submit[M, SSHServer](headNode, jobDescription)
  s ← waitUntilEnded[M](state[M, SSHServer](headNode, job))
  out ← stdOut[M, SSHServer](headNode, job)
  _ ← clean[M, SSHServer](headNode, job)
} yield (s, out)

val interpreter = merge(SSH.interpreter, System.interpreter, ErrorHandler.interpreter)
println(interpreter.run(res))

OO designs post-mortem

  • Object didn’t work for us
  • Initial design impacts future enhancements
  • Learnt it the hard way…
  • Functional patterns provide (more?) powerful abstractions

What do we have so far?

Usability?

Usability?

Scala as a tool to build DSLs

Parthenon Architecture

Courtesy of Dean Wampler & Alex Paynem, “Programming Scala”, Chap 23., 2nd edition, 2015

Parthenon Architecture

applied to the OpenMOLE ecosystem

OpenMOLE DSL

// simplified
implicit class PuzzlePieceDecorator(puzzle: PuzzlePiece) {
  def on(env: EnvironmentProvider) =
    puzzle.copy(environment = Some(env))
  
  def hook(hooks: Hook*) =
    puzzle.copy(hooks = puzzle.hooks.toList ::: hooks.toList)
  
  def by(strategy: Grouping) =
    puzzle.copy(grouping = Some(strategy))
}

Demo Time

Summary - OpenMOLE

  • Scientific platform to explore models
  • (Hyper)Parameter tuning
  • Transparent use of HTC / distributed computing (GridScale)
  • Genetic-Algorithm based optimisation methods (MGO)
  • Ecosystem of functional libraries:
    • GridScale
    • MGO
    • FreeDSL

See also

Thanks!

romain.reuillon@iscpif.fr paul.chapron@iscpif.fr
mathieu.leclaire@iscpif.fr guillaume.cherel@iscpif.fr
j.passerat-palmbach@imperial.ac.uk you?