|
|
|
| Synthesizing
a Parallel Sorting Algorithms based on Design Patterns Dr. Sabah M.A. Mohammed Dr. Jinan A.W. Fiaidhi |
|
Abstract Design Patterns have proven valuable in
the creation of flexible and reusable object-oriented designs. In recent
years there has been much interest in this topic and several catalogues
of design patterns have been created. However, the issue of the application
of a design pattern to an existing program has not received much attention.
Applying a design pattern to a program may involve a large amount of
code restructuring and extension. This paper refactors a traditional
and important problem of sorting data based on design patterns and parallelism.
This refactoring process passes through three stages: Setting Common
Ontology, Partitioning using Divide and Conquer, and Building using
Design Patterns. 1. Introduction Sorting is arguably the most studied problem in Computer Science, both because of its intrinsic theoretical importance and its use in so many applications. Traditionally, sorting algorithms are classified into three categories: insertion, selection, and exchange, according to their main operational characteristics [1]. An alternative taxonomy proposed in Merritt [2] which recognize sorting as a split and join procedure: given a set of things to sort, split the set into two parts, recursively sort each part; and finally join the two parts into a sorted set. Indeed, Merritt's divide and conquer sorting taxonomy is considered as very powerful method for studying and understanding sorting because it exhibits much object oriented flavor and potentially can be described in terms of object oriented concepts [3]. Moreover, parallel algorithms for sorting have been studied intensively ever since Batcher [4] and followed by many other theoretical [5,6] and empirical studies in the late 1980's when multiprocessors and concurrent programming languages [7,8] became widely available. On one hand, parallel computing has failed to attract significant numbers of programmers outside the specialized world of supercomputing. This is due in part to the fact that despite the best efforts of many researchers, parallel programming environments are so hard to use that most programmers won't risk using them. We believe the major cause for this state of affairs is that most parallel programming environments focus on how concurrency is implemented rather than on how concurrency is exploited in a program design. In other words, bringing parallel computing to a broader audience requires solving the problem at the algorithm design level. On the other hand, for software development in general, the use of design patterns has emerged as an effective way to help programmers design high-quality software. To be most useful, patterns that work together to solve design problems are collected into structured hierarchical catalogues called design patterns or pattern languages [3]. The purpose of design patterns is to capture software design know-how and make it reusable. Design patterns provide a powerful way to describe useful architectural configurations, building on the collective experience of skilled designers and software engineers. Patterns capture these solutions in an approachable form that describes problems, contexts and examples to which they apply. This paper develops parallel sorting algorithms based on the spirit of design patterns. The quality, in this instance, is not necessarily the ability to draw people in (though that would be a nice side effect). Rather, it's the ability to restrict ambiguity in the problem space while simultaneously introducing latitude into the solution. 2. Using a Pattern Language for Setting a Common Ontology An ontology formally defines a common set
of terms that are used to describe and represent a domain. However,
a design pattern is a solution to a problem in a context. So one needs
to look for a design pattern within a common terminology. Indeed, we
find design patterns by looking at high-quality solutions to recurring
problems. The design pattern is written down in a systematic way giving
us a record of that quality-without-a-name that distinguishes a good
solution from a poor one. We can design complex systems in terms of
patterns. At each decision point, the designer selects the appropriate
pattern from a pattern catalogue. Each pattern leads to other patterns,
resulting in a final design as a web of patterns. A structured catalogue
of patterns that supports this design style is called a pattern language.
It is more than just a catalogue of patterns. A pattern language embodies
a design methodology and provides domain-specific advice to the designer.
The pattern language described here is intended for application programmers
who want to design new sorting programs for execution on parallel computers.
The top-level patterns help the designer expose exploitable concurrency.
The lower-level patterns guide the programmer as they decide how to
take advantage of the problem's concurrency. Finally, the lowest-level
patterns encode specific constructs that coordinate the execution of
concurrent tasks. 2.1 Finding Concurrency Looking at a problem and finding concurrency
that can be effectively exploited is a difficult problem. However, one
may follow the following sequel (See Figure 1) .
Getting Started: Where do you start
when you want to solve a problem using a parallel algorithm? This
pattern helps you answer that question. After working through these patterns, you will have the information you need to work with the Algorithm Structure patterns. 2.2 Algorithm Structure To create a parallel program, you need two things. First, you must have concurrency: It must be possible to decompose your problem into tasks that can execute simultaneously. Second, you must be able to map these tasks onto units of execution (usually threads or processes) so you can take advantage of the concurrency in a parallel program. The first of these requirements ÜÜ finding
exploitable concurrency ÜÜ is addressed in the Finding Concurrency
design space. After you work through the patterns in the Finding Concurrency
design space, you will have decomposed your problem into tasks, ordered
groups of tasks, and data that could be concurrently updated by the
tasks. The second requirement ÜÜ how to take those tasks/data/groups
and map them onto units of execution ÜÜis addressed in the present
design space, which we call Algorithm Structure. In a way, in this
design space we are refining the parallel algorithm design from abstract
concurrency into a form that can be realized on the target machine.
This will require two lower design spaces, Supporting Structures and
Implementation Mechanisms. 2.2.1 "Organize by ordering" patterns These patterns are used when the ordering
of groups of tasks is the major organizing principle for the parallel
algorithm. This group has two members, reflecting two ways task groups
can be ordered. One choice represents "regular" orderings
that do not change during the algorithm; the other represents "irregular"
orderings that are more dynamic and unpredictable. 2.2.2 "Organize by tasks" patterns These patterns are those for which the
tasks themselves are the best organizing principle. There are many
ways to work with such "task-parallel" problems, making
this the largest pattern group. 2.2.3 "Organize by data" patterns These patterns are those for which the
decomposition of the data is the major organizing principle in understanding
the concurrency. There are two patterns in this group, differing in
how the decomposition is structured (linearly in each dimension or
recursively). We are now ready to actually select
an algorithm structure. We do this with an understanding of constraints
from our target platform, an appreciation of the role of hierarchy
and composition, and a major organizing principle for our problem.
To guide us through this selection, we use the decision tree presented
in the following figure.
Starting at the top of the tree, consider your concurrency and the major organizing principle, and use this information to select one of the three branches of the tree. Then follow the discussion below for the appropriate subtree. 2.3 Supporting Structures This design space represents an optional intermediate stage between the Algorithm Structure and Implementation Mechanisms design spaces. While it is sometimes possible to go directly from the Algorithm Structure space to an implementation for a target programming environment, it often makes sense to construct the implementation in terms of other patterns that constitute an intermediate stage between the problem-oriented patterns of the Algorithm Structure design space and the machine-oriented "patterns" of the Implementation Mechanisms space. Two important groups of patterns in this space are those that represent program-structuring constructs (such as SPMD) and those that represent commonly used shared data structures (such as Shared Queue). We organize these patterns into the Supporting Structures design space. The patterns of the Algorithm Structure design space capture recurring solutions to the problem of turning problems into parallel algorithms. But these patterns in turn contain recurring solutions to the problem of mapping high-level parallel algorithms into programs using a particular parallel language or library. Those solutions are what the patterns in this space capture. Two important groups of patterns in this space are those that represent program-structuring constructs and those that represent commonly-used shared data structures. Many of the elements of this design space are not usually thought of as patterns, and indeed some of them (Shared Queue, for example) would ideally be provided to the programmer as part of a framework of reusable software components. We nevertheless document them as patterns, for two reasons: First, documenting all the elements of our pattern language as patterns provides a consistent notation for the programmer. Second, the existence of such pattern descriptions of components provides guidance for programmers who might need to create their own implementations. Patterns in this space fall into two main groups. Patterns in the first group describe
program-structuring constructs i.e., constructs for structuring
source code. Patterns in the second group describe
commonly-used shared data structures. 2.4 Implementation Mechanisms The Implementation Mechanisms design space is concerned with how the patterns of the higher-level spaces are mapped into particular programming environments. We use it to provide pattern-based descriptions of common mechanisms for process/thread management (e.g., creating or destroying processes/threads) and process/thread interaction (e.g., monitors, semaphores, barriers, or message-passing). Patterns in this design space, like those in the Supporting Structures design space, describe entities that strictly speaking are not patterns at all. As noted previously, however, we include them in our pattern language to provide a complete path from problem description to code, and we document them using our pattern notation for the sake of consistency. The implementation of these patterns will vary greatly depending on the environment. Most of them correspond to primitives in one or more parallel programming environments.
Consider the divide-and-conquer
strategy employed in many sequential algorithms. With this strategy,
a problem is solved by splitting it into subproblems, solving them
independently, and merging their solutions into a solution for the
whole problem. The subproblems can be solved directly, or they can
in turn be solved using the same divide-and-conquer strategy, leading
to an overall recursive program structure. The potential concurrency
in this strategy is not hard to see: Since the subproblems are solved
independently, their solutions can be computed concurrently, leading
to a parallel program that is very similar to its sequential counterpart.
The following figure illustrates the strategy and the potential
concurrency.
Similarly in sorting, we may use the divide and conquer algorithm to sort n size data. Data can be split into segments according to the sorting mechanism under selection and sorted by N processes/threads where N<n. What is important after that, is how to merge any two sorted subsequence and produce a new merged sorted sequence. The resulted sequence can be iteratively merged with the next sequence until all sequences are merged to produce a whole sorted list. However, this merging process can be performed in parallel using an algorithm used in[5] and its latest developments[9]. 4. Using Template Method and Strategy Patterns For Constructability Patterns for software development
are one of the latest "hot topics" to emerge from the
object-oriented community. They are a literary form of software
engineering problem-solving discipline. Software patterns first
became popular with the wide acceptance of the book Design Patterns:
Elements of Reusable Object-Oriented Software by Erich Gamma,
Richard Helm, Ralph Johnson, and John Vlissides (frequently referred
to as the Gang of Four or just GoF)[3]. Patterns have been used
for many different domains ranging from organizations and processes
to teaching and architecture. At present, the software community
is using patterns largely for software architecture and design,
and (more recently) software development processes and organizations
[10].
5. Conclusions This paper presented a model
for synthesizing parallel sorting algorithms based on design patterns.
It draws on the principle of divide and conquer. Design patterns
provide a clean and concise way of formulating and implementing
such model. Sorting is modeled as an abstract class with a template
method to perform the sorting. This method relegates the splitting
and joining of arrays to the concrete subclasses, which uses an
abstract strategy pattern to perform comparisons. The sorting
model also provides the flexibility to add new capabilities without
modification of existing code. For example, by using the decorator
pattern, one can add performance visualizations without even knowing
what sort algorithm is being used.
Moreover, Figure 6 shows the performance advantage of using our model in comparison with the traditional non-parallel bubble sort algorithm based on different array sizes.
[1] KnuthD., "The Art of Computer
Programming", Sorting and Searching Reading :
|
|
|
|