CSP for Executable Scientific Workflows

Rune Møllegaard Friborg

1226 Downloads (Pure)

Abstract

This thesis presents CSP as a means of orchestrating the execution of tasks in a scientific workflow. Scientific workflow systems are popular in a wide range of scientific areas, where tasks are organised in directed graphs. Execution of such graphs is handled by the scientific workflow systems and can usually benefit performance-wise from both multiprocessing, cluster and grid environments.

PyCSP is an implementation of Communicating Sequential Processes (CSP) for the Python programming language and takes advantage of CSP's formal and verifiable approach to controlling concurrency and the readability of Python source code. Python is a popular programming language in the scientific community, with many scientific libraries (modules) and simple integration to external languages. This thesis presents a PyCSP extended with many new features and a more robust implementation to allow scientific applications to run on heterogenous hardware, combining multiple hardware architectures. This is especially important in scientific computing as the performance of computational tasks may be orders of magnitude faster depending on the hardware architecture used.

To ensure the robustness of the PyCSP library the internal synchronisation model has been model-checked successfully using the SPIN Model Checker. This has checked the synchronisation model for the presence of deadlocks, livelocks, starvation, race conditions and correct channel communication behaviour.

The use of PyCSP for scientific workflows is demonstrated through examples. By providing a robust library for organising scientific workflows in a Python application I hope to inspire scientific users to adopt PyCSP. As a proof-of-concept this thesis demonstrates three scientific applications: kNN, stochastic minimum search and McStas to scale well on multi-processing and cluster computing using PyCSP. Additionally, McStas is demonstrated to utilise grid computing resources using PyCSP.

Finally, this thesis presents a new dynamic channel model, which has not yet been implemented for PyCSP. The dynamic channel is able to change the internal synchronisation mechanisms on-the-fly, depending on the location and number of channel-ends connected. Thus it may start out as a simple local pipe and evolve into a distributed channel spanning multiple nodes. This channel is a necessary next step for PyCSP to allow for complete freedom in executing CSP processes on local and remote resources.
Original languageEnglish
Place of PublicationNiels Bohr Institute
PublisherUniversity of Copenhagen
Number of pages235
Publication statusPublished - 29 Nov 2011

Cite this