Reading List

The Selfish Gene
The Psychopath Test: A Journey Through the Madness Industry
Bad Science
The Feynman Lectures on Physics
The Theory of Everything: The Origin and Fate of the Universe


ifknot's favorite books »

Saturday 7 June 2014

Mathematical Semantics for FBP - Describing Fundamental FBP Program Types

A glittering treasure of mathematical semantics for FBP or fool's gold?


TL;DR Examples of using mathematical semantics for FBP by combining (named channel) Theory of CSP and Graph Theory to describe 3 fundamental FBP program graph types which can form the basis for both reasoning about experimenting with libfbp programs.

As I have argued so far here, here, here, here, here and here, my overarching proposition in searching for a useful mathematical description for Flow Based Programming is that an FBP network of black box processing components that communicate information packets via one-way connections can be modelled as a directed graph of communicating sequential processes.

But how might such a model be usefully applied to reasoning about FBP and what might a fundamental FBP program be?

In searching for fundamental classes of FBP programs is it worth trying to define programmatic identities that can be compared both within and across implementations to reason about a class of FBP programs?


My wider aim with this work is to describe basic classes of FBP programs and then use libfbp to produce concrete executable examples to observe performance and, in particular, the scalability of such libfbp software artefacts.

1. A totally disconnected embarrassingly parallel composition by interleaving FBP program.


The most reduced form of a FBP workflow is the isolated component. Although this proposition may be controversial if one considers the working definition of an FBP program as connected components. However, there is nothing about FBP that prevents such a reduction and, besides, denying this proposition removes a fundamental aspect from the ensuing mathematical semantics and denies FBP programmers the right to develop solutions for embarrassingly parallel problems.  

Consider then, such an isolated sequential processing fbp::component v as the sole occupant of its component set V of the FBP program graph G with an empty fbp::connection set i.e. where E= ∅

Prior to G starting execution v takes on, by some form of deserialization, an initial internal state s that is wholly unobservable by the external environment. At the termination of execution v has a final state s' which is serialized to some form of display or storage. 

In this way the component v behaves like a CHAOS component process ignoring all other inputs but making internal progress until an internal SUCCESS event is generated. Strictly speaking because v is processing a potentially infinite sequence of internal events, symbolically τ  within CSP, then it is the constant process div.

Consider then if the order of the component set |V| is increased to some arbitrarily large value n by adding more fbp::components but with no connections such that E= ∅, then because of the invariant, and therefore defining, properties of G such that the order |V| is arbitrarily large and the degree |E| = 0 it follows that G meets the definition for an embarrassingly parallel problem and each Component v ∈ V can execute in interleaved parallel, denoted by the operator ||| as per Hoare [1]

G ∷ [ v1 ||| v2 ||| … ||| vn ]

Therefore, for any problem P that can be expressed as a libfbp program graph G=(V,E) where |V|=n and n is arbitrarily large but E= ∅ then P meets the criteria established above for an embarrassingly parallel problem. Then G can be implemented as a graph of disconnected internally progressing CHAOS components consuming event b from their internal traces s, denoted s\b until, possibly, transitioning to SKIP consuming and STOP (successfully): 

n( CHAOS →s\b→ SKIP →✓→ STOP )

Or diagrammatically:

Figure 1: Embarrassingly parallel FBP problem 

The conclusion that logically follows is that for the class of all such disconnected interleaved problems P||| there exists an FBP program graph:

 G||| = ( VCHAOS, E ) 

With the invariant properties:
 |V| ≥ 1 ⋀ E=∅ 

Of which an embarrassingly parallel problem P|||e has a defining FBP program graph:

G|||e ⇔ |V|⟶∞ ⋀ E=∅

In this manner might a totally disconnected, embarrassingly parallel libfbp software artefact be constructed whose solution times could be observed in order to draw conclusions about how such solution time vary with the number of processors for this special case class of problems—as desired.

2. A directed connected pipeline parallel composition by intersection FBP program.


Perhaps less controversially, the most reduced form of a connected FBP workflow is a pair of fbp::connection connected fbp::component(s) v1 and v2. (Although the single component with a connection that loops back on itself is also an option but one that actually has the component serving 2 jobs as both producer and consumer and results in a more confusing mathematical description that leads to the conclusion that it is, in fact, not the most reduced connected form.)

Consider an FBP program graph G defined as follows: 2 components v1 and v2 communicate via a single connection e1 of any type T.

As G executes on some hardware processing substructure v1 sends t of type T into e1 and v2 receives t from e1. In this way the component v1 behaves as the CSP constant RUN processes that will always engage in any event offered by e1 i.e. it can be defined as a sink for all and any inputs in.t behaving like a ‘black hole’ down which events disappear.

The corollary of this definition is that the component v1 behaves like a source, generating events out.t and that both are chained together, or using CSP notation the chain operator >>, and results in the following redefinition for the fundamental connected pair:

SOURCE ≫ SINK

Henceforth v1 and v2 are termed vsource and vsink connected by e1. This minimalist graph can be grown by inserting a new component vn between vsource and vsink such that vn now receives in.t from e1 and communicates out.t via a second connection e2 of type T to the terminating component vsink the component vn behaves as a filter sending an out.t event for every in.t event received and can be represented with the following CSP notation:

SOURCE ≫ FILTER ≫ SINK

This process can be repeated to grow a pipeline of components by adding n filter components.

SOURCE ≫ FILTER0 ≫ ⋯ ≫ FILTERn ≫ SINK

In this manner the vertex-edge-vertex journey of the information packet, or walk in graph theory terminology, of the graph G can be considered as a pipeline of directed connection connected components.

Regardless of size, then the directed connected FBP program graph G has the invariant, but not defining, properties that its size (the number of edges or in this context the number of connections) |E| is equal to its order (the number of vertices or in this context number of components) |V|-1.

However these properties do not necessarily define the desired invariant of a pipeline walk because multiple connections and circular connections are not excluded. Therefore, in order to define the desired pipeline graph it is necessary to specify invariant connectivity properties for the maximum degree (the maximum number of edges that can connect to a vertex) of the graph G, denoted by Δ(G), and the minimum degree (the minimum number of edges that can connect to a vertex) of the graph, denoted by δ(G) which are, hence, the maximum and minimum connections to its components.

Then G can be implemented as a graph of n connected RUN components communicating an event en.t consisting of an information packet t of type T via named ports en such that by the logically simultaneous and invisible sending of everything output by out.RUNn-1 on channel out as input to in.RUNon channel in they are all chained together:

RUN1 ≫ RUN2 ≫ ⋯ ≫ RUNn

Therefore, for a problem P that can be expressed as an FBP program graph G=(V,E) where ∀|V|>1 and Δ(G)=2 and δ(G)=1 the P meets the criteria established above for a directed connected pipeline. Then G can be implemented as a graph of chained RUN components connected by a typed connection such that each component v∈V can execute in intersecting parallel, denoted || as per Hoare. [1]

G ∷ [ v1 || v2 |||| vn ]

Or diagrammatically:

Figure 2: Directed connected interleaved parallel FBP problem

The conclusion that logically follows is that for the class of all such directed connected intersected problems P|| there exists an FBP program graph:

G|| = ( VRUN, E ) 

With the invariant properties:

Δ(G) = 2 ⋀ δ(G) = 1

In this manner might a pipeline libfbp software artefact be constructed whose solution times could be observed in order to draw conclusions about how the solution time varies with the number of processors for this special case class of problems in order to draw conclusions about performance and scalability.

3. A multiple pipeline directed connected components parallel composition by interleaving intersection FBP program.


It then follows that the classes of problems that can be expressed as an FBP program graph examined above so far are the two special cases of a general class of problems that can be composed from them.

Built into this proposition is the assumption that for any single problem demonstrating parallel composition solely by intersection that can be expressed as FBP program graph G=(V,E) where the maximum degree (the maximum number of connections) is greater than 2 i.e. Δ(G) > 2 ⋀ δ(G)=1 then G can be decomposed into n subgraphs:

Sn = ( VRUN, E|V|-1 ) 

Where the maximum number of connections is 2, invariant properties such that:

 Δ(Sn) = 2 ⋀ δ(Sn ) = 1

And ∀S parallel composition is by interleaving, denoted by the operator ||| as per Hoare [1] .

In this way the proposition follows that for the disconnected embarrassingly parallel first class of general problems, above, composed by interleaving P|| and the single directed connected pipeline composed by intersection second class of general problems P||| then there is a third class of general problems P that are composed from each of them:

P = P|| ∪ P|||

Then it follows, as its constituent classes of problems have been shown above, for such a general problem it can be expressed as an FBP program graph G built, recursively as necessary, from interleaved pipelines:

G = G|| ∪ G|||

Consider then the simplified symmetrical FBP program graph G consisting of j subgraphs S1, S2 ... Sj where j has the possibility of becoming arbitrarily large and S is a pipeline of order i components described above such that:

S = (V,E) where |V| = i  |E| = i-1 ⋀ Δ(Sn) = 2 ⋀ δ(Sn ) = 1

Then:
                                                                       
S1 ∷ [ v1 || v2 || … || vi ]
|||
S2 ∷ [ v1 || v2 || … || vi ]
|||
G ::                             .                                   
.
.
|||
Sj ∷ [ v1 || v2 || … || vi ]
                                                                      ⌋

Or diagrammatically:

Figure 3: Intersected directed connected interleaved parallel FBP problem

The conclusion that logically follows is that for the class of all such problems P, composed in this manner, there exists an FBP program graph:

G = G|| ∪ G|||

In this manner might a multiple pipeline libfbp software artefact be constructed whose solution times could be observed in order to draw conclusions about how the solution time varies with the number of processors for this class of problems.

However, before I can start experimenting with libfbp implementations of these 3 problems I need an implementation of the "the hardware processing substructure" that will allow me to vary the number of processing cores available to the test artefact.

But how can this be achieved?

Yes you've guessed it... What an interesting topic for my next blog post!

References:




No comments:

Post a Comment