- Martin Dietzfelbinger, Miroslaw Kutylowski, Rüdiger Reischuk:
Feasible Time-Optimal Algorithms for Boolean Functions on Exclusive - Write PRAMS.
Technischer Bericht SIIM-TR-A-95-05,
Schriftenreihe der Institute für Informatik/Mathematik der Universität zu Lübeck,
1995.
Postscript anzeigen | Zusammenfassung anzeigen
It was shown some years ago that the computation time for many important Boolean
functions of n arguments on concurrent-read exclusive-write parallel random-access
machines (CREW PRAMs) of unlimited size is at least v(n) = 0.72 log_2 n.
On the other hand, it is known that every Boolean function of n arguments can be
computed in v(n)+1 steps on a CREW PRAM with n * 2^{n-1} processors and memory cells.
In the case of the OR of n bits, n processors and cells are sufficient.
In this paper it is shown that for many important functions
there are CREW PRAM algorithms that almost meet the lower bound
in that they take v(n) + o(log n) steps,
but use only a small number of processors and memory cells (in most cases, n).
In addition, the cells only have to store binary words of bounded length
(in most cases, length 1). We call such algorithms "feasible".
The functions concerned include: the PARITY function and, more generally,
all symmetric functions; a large class of Boolean formulas; some functions over
non-Boolean domains {0,.. ,k-1} for small k, in particular parallel prefix sums;
addition of n-bit-numbers; sorting n/l binary numbers of length l.
Further, it is shown that Boolean circuits with fan-in 2, depth d, and size s
can be evaluated by CREW PRAMs with fewer than s processors in
v(2^d)+o(d) = 0.72d + o(d) steps.
For the exclusive-read exclusive-write model (EREW PRAM) a feasible algorithm is
described that computes PARITY of n bits in 0.86 log_2 n steps.
Key words:
parallel random-access machine, exclusive-write,
concurrent-read, exclusive-read, parallel time complexity,
Boolean functions, Boolean formulas, Boolean circuits, symmetric functions,
parallel prefix, parity, addition, sorting
AMS(MOS) subject classifications: 68Q10, 68Q05, 68Q25
A preliminary version of these results has been presented at the
Second Annual ACM Symposium on Parallel Algorithms and Architectures, Crete, July 1990.
This extended and improved version will appear in SIAM Journal on Computing.
- Danny Dolev, Rüdiger Reischuk, Ray Strong, Ed Wimmers:
A Decentralized High Performance Time Service Architecture.
Technischer Bericht SIIM-TR-A-95-26,
Schriftenreihe der Institute für Informatik/Mathematik der Universität zu Lübeck,
1995.
Postscript anzeigen | Zusammenfassung anzeigen
A time service is a means by which users, applications, etc.
obtain values associated with some measurement of elapsed time from
some agreed on event. In a relativistic world, time services are
necessarily imperfect, there being no such thing as simultaneity.
In this paper we discuss an architecture for delivering
required time services to a collection of heterogeneous machines
communicating over a wide area network consisting of multiply connected
and overlapping local area networks with various communication media.
The primary building block of our architecture is a logical clock.
We explain how logical clock objects simplify the treatment and
resolution of a number of problems associated with time services.
Having used logical clock objects to decouple clock synchronization
problems from problems associated with smoothness and conformity
to political time, we provide a straightforward master-slave
algorithm for clock synchronization and show how to transform it
into an eavesdropping algorithm that takes advantage of
the broadcast nature of most local area networks to dramatically
reduce message overhead.
Finally, we show how to integrate eavesdropping and regular
clock synchronization algorithms into a fault tolerant
network time service that tolerates and coordinates multiple
master clocks or external sources of time.
- Danny Dolev, Rüdiger Reischuk, Ray Strong:
Observable Clock Synchronization.
Technischer Bericht SIIM-TR-A-95-04,
Schriftenreihe der Institute für Informatik/Mathematik der Universität zu Lübeck,
1995.
Postscript anzeigen | Zusammenfassung anzeigen
While the synchronization of time-of-day clocks ordinarily requires
information flow in both directions between the clocks, this information
need not flow directly via messages.
However, to take advantage of indirect information flow, we have to
make a number of complex assumptions about the behavior of the communication
medium. To facilitate the verification of such assumptions, we
develop a relativistic theory of observable clock synchronization that
does not use or depend on a Newtonian framework or real time. Within
the context of this theory, we focus on the problem of estimating the
time on a remote clock.
We generalize the concept of {\bf rapport} to capture the situation when
such an estimate is sufficient for clock synchronization purposes.
With a single property, called the Observable Drift Property, we characterize
the information flow required for obtaining rapport.
We compare our relativized and observable concepts with analogs based
on the notion of real time in order to show that we are studying
the right quantities.
We then give several clock synchronization algorithms that replace round trip
synchronization algorithms but use only passive message receipt in a broadcast
medium. These algorithms make the message cost of synchronization independent
of the number of participants. Each algorithm measures and can report
its own precision and does not depend on any assumed bound on
message transmission and processing time. These properties have
in the past only been associated with variants of round trip
synchronization where the message cost is proportional to the number
of participants. Verifying that a particular network can support one
of these algorithms is reduced (by our results) to verifying
four straightforward axioms and the Observable Drift Property.
presented at the 14.~ACM Symposium on Principles of Distributed Computing,
Los Angeles, 1994
- Andreas Jakoby, Rüdiger Reischuk, Christian Schindelhauer:
Circuit Complexity: from the Worst Case to the Average Case.
Technischer Bericht SIIM-TR-A-95-02,
Schriftenreihe der Institute für Informatik/Mathematik der Universität zu Lübeck,
1995.
Postscript anzeigen | Zusammenfassung anzeigen
In contrast to machine models like Turing machines or random access
machines, circuits are a rigid computational model. The internal
information flow of a computation is fixed in advance, independent
of the actual input. Therefore, in complexity theory only worst case
complexity measures have been used to analyse this model. Concerning
the circuit size this seems to be the best one can do.
The delay between feeding the input bits into a circuit and being
able to read off the output bits at the output gates is usually
measured by the depth of the circuit. One might try to take advantage
of favourable cases in which the output values are obtained much
earlier. This will be the case when critical paths, e.g. paths
between input and output gates of maximal length, have no influence
on the final output.
Inspired by recent successful attempts to develop a meaningful
average case analysis for TM computations [L86,G91,BCGL92,RS93a], we
follow the same goal for the circuit model. For this purpose, a new
comlexity measure for the internal delay is defined, called time.
This may be given implicitely or explicitely, where in the latter case
a gate has to signal that it is ``ready'', e.g. has computed the desired
result. Using a simple coding both models are shown to be equivalent.
By doubling the number of gates any circuit with implicit timing can
easily be transformed to an equivalent circuit with explicit time
signals.
Based on the notion of time, two average case measures for the circuit
delay are defined. The analysis is not restricted to uniform distributions
over the input space, instead a large class of distributions will be
considered. For this purpose, a complexity notion is needed for
distributions. We define a measure based on the circuit model by considering
the complexity of circuits with uniform distributed random input bits
that generate such distributions.
Finally, this new approach is applied to concrete examples. We derive
matching lower and upper bounds for the average circuit delay for basic
functions like the OR, ADDITION, THRESHOLD and PARITY. It will be shown
that for PARITY the average delay remains logarithmic. In many cases,
however, an exponential speedup compared to the worst case can be achieved.
For example, the average delay for n-Bit-ADDITION is of order loglog n.
The circuit designs to achieve these bounds turn out to be very different
from the standard ones for optimal worst case results.
A preliminary version was presented at 26th STOC'94
- Andreas Jakoby, Rüdiger Reischuk:
Data Transmission in Processor Networks.
Technischer Bericht SIIM-TR-A-95-18,
Schriftenreihe der Institute für Informatik/Mathematik der Universität zu Lübeck,
1995.
Postscript anzeigen | Zusammenfassung anzeigen
We investigate the communication capacity and optimal data
transmission schedules for processor networks connected by
communication links, for example Transputer clusters. Each
link allows the two processors at its endpoints to exchange
data with a given fixed transmission rate tau_d. The commu-
nication itself is done in a blocking mode, that means the
two processors have to synchronize before starting to exchange
data and at any time each processor cannot communicate with
more than one other processor.
Our efficiency analysis will be more realistic by also taking
into account the setup time for a communication, which will be
assumed to be a fixed constant tau_s > 0. Thus, a large amount
of data can be sent from one processor to a neighbour faster by
a single long communication step than by a bunch of small data
exchange steps: sending p data units in one step takes time
tau_s + p * tau_d. However, there is a tradeoff since the receiver
has to wait until it has received the complete set of data before
it can forward pieces to other processors.
The following prototype task called scattering will be considered:
At the beginning one processor called the source possesses a set of
unit size data packets, one for each processor in the network. The
goal is to distribute the packets in minimal time to all recipients.
- Andreas Jakoby, Rüdiger Reischuk, Christian Schindelhauer:
Malign Distributions for Average Case Circuit Complexity.
Technischer Bericht SIIM-TR-A-95-06,
Schriftenreihe der Institute für Informatik/Mathematik der Universität zu Lübeck,
1995.
Postscript anzeigen | Zusammenfassung anzeigen
In contrast to machine models like Turing machines or random access
machines, circuits are a rigid computational model. The internal
information flow of a computation is fixed in advance, independent of
the actual input. Therefore, in complexity theory only worst case
complexity measures have been used to analyse this model. In [JRS94] we
have defined an average case measure for the time complexity of circuits.
Using this notion tight upper and lower bounds could be obtained for
the average case complexity of several basic Boolean functions.
In this paper we will examine the asymptotic average case complexity of
the set of n-input Boolean functions. In contrast to the worst case a
simple counting argument does not work. We prove that almost all Boolean
function require at least n-log n-llog n expected time even for the uniform
probability distribution. On the other hand, we show that there are
significant subsets of functions that can be computed with a constant
average delay.
Finally, the worst case and average case complexity of a Boolean function
will be compared. We show that for each function that requires circuit
depth d, the expected time complexity will be at least d-log n-log d
with respect to an explicitely defined probability distribution. A nontrivial
bound on the complexity of such a distribution is obtained.
- Andreas Jakoby, Rüdiger Reischuk, Christian Schindelhauer, Stephan Weis:
The Average Case Complexity of the Parallel Prefix Problem.
Technischer Bericht SIIM-TR-A-95-03,
Schriftenreihe der Institute für Informatik/Mathematik der Universität zu Lübeck,
1995.
Postscript anzeigen | Zusammenfassung anzeigen
We analyse the average case complexity of evaluating all prefixes of
an input vector over a given semigroup. As computational model
circuits over the semigroup are used and a complexity measure for the
average delay of such circuits, called time, is introduced. Based on
this notion, we then define the average case complexity of a
computational problem for arbitrary input distributions.
For highly nonuniform distributions the average case complexity turns
out to be as large as the worst case complexity. Thus, in order to
make the average case analysis meaningful we also develop a complexity
measure for distributions.
Using this framework we show that two n-bit numbers can be added with
an average delay of order loglog n for a large class of distributions.
We then give a complete characterization of the average case
complexity of the parallel prefix problem with respect to the
underlying semigroup. By considering a related reachability problem
for finite automata it is shown that the complexity only depends on a
property of the semigroup we will call a confluence.
Our analysis yields that only two different cases can arise for the
reachability question. We show that the parallel prefix problem
either can be solved with an average delay of order loglog n, that
means with an exponential speedup compared to the worst case, or in
case of nonconfluent semigroups that no speedup is possible. Circuit
designs are presented that for confluent semigroups achieve the
optimal double logarithmic delay while keeping the circuit size
linear.
The analysis and results are illustrated at some concrete functions.
For the n-ary Boolean OR, THRESHOLD and PARITY, for example, the
average case circuit delay is determined exactly up to small constant
factors for arbitrary distributions.
Finally, we determine the complexity of the reachability problem
itself and show that it is at most quadratic in the size of the
semigroup.
- Andreas Jakoby, Rüdiger Reischuk, Christian Schindelhauer:
The Complexity of Broadcasting in Planar and Decomposable Graphs.
Technischer Bericht SIIM-TR-A-95-08,
Schriftenreihe der Institute für Informatik/Mathematik der Universität zu Lübeck,
1995.
Postscript anzeigen | Zusammenfassung anzeigen
Broadcasting in processor networks means disseminating a single piece
of information, which is originally known only at some nodes, to all
members of the network. The goal is to inform everybody using as few
rounds as possible, that is minimize the broadcasting time.
Given a graph and a subset of nodes, the sources, the problem to determine
its specific broadcast time, or more general to find a broadcast schedule
of minimal length has shown to be NP-complete. In contrast to other
optimization problems for graphs, like vertex cover or traveling salesman,
little was known about restricted graph classes for which polynomial time
algorithms exist, for example for graphs of bounded treewidth. The
broadcasting problem is harder in this respect because it does not have the
finite state property. Here, we will investigate this problem in detail and
prove that it remains hard even if one restricts to planar graphs of bounded
degree or constant broadcasting time. A simple consequence is that the
minimal broadcasting time cannot even be approximated with an error less than
1/8, unless P = NP.
On the other hand, we will investigate for which classes of graphs this
problem can be solved efficiently and show that broadcasting and even a more
general version of this problem becomes easy for graphs with good
decomposition properties. The solution strategy can efficiently be
parallelized, too. Combining the negative and the positive results reveals
the parameters that make broadcasting difficult. Depending on simple graph
properties the complexity jumps from NC or P to NP.
Classification: algorithms and data structures, computational complexity.
- Maciej Liskiewicz, Rüdiger Reischuk:
The Complexity World below Logarithmic Space.
Technischer Bericht SIIM-TR-A-95-07,
Schriftenreihe der Institute für Informatik/Mathematik der Universität zu Lübeck,
1995.
Postscript anzeigen | Zusammenfassung anzeigen
We investigate space complexity classes defined by
Turing machines that use less than logarithmic space.
Because of the limited counting ability of such machines
most of the standard simulation techniques do not work for
sublogarithmic space classes.
However, machines with such little space may still be quite powerful.
Therefore, it was not obvious how to obtain analogs for inclusion and
separation results known for classes above logspace.
We review known facts about the sublogarithmic space world
and present several new results, which
show that these classes really behave differently.
For example, certain closure properties do not hold.
The restricted power of these machines makes it possible to prove explicit
separations -- even for alternating complexity classes --
by combinatorial arguments and to obtain a hierarchy of
non-relativized complexity classes without any unproven assumption.
We will also discuss upward and downward translation issues.
Finally, these complexity classes are related to other classes within P,
in particular to context-free languages.
- Maciej Liskiewicz, Rüdiger Reischuk:
The Sublogarithmic Alternating Space World.
Technischer Bericht SIIM-TR-A-95-01,
Schriftenreihe der Institute für Informatik/Mathematik der Universität zu Lübeck,
1995.
Postscript anzeigen | Zusammenfassung anzeigen
This paper tries to fully characterize the properties and relationships
of space classes defined by Turing machines
that use less than logarithmic space -- may they be deterministic,
nondeterministic or alternating (DTM, NTM or ATM).
We provide several examples of specific languages and show that such machines
are unable to accept these languages.
The basic proof method is a nontrivial extension of the
1^n -> 1^{n+n!} technique to alternating TMs.
Let \llog denote the logarithmic function \log iterated twice,
and Sigma_k Space(S), Pi_k Space (S) be the complexity classes defined
by S-space-bounded ATMs that alternate at most k-1 times and
start in an existential, resp.~universal state.
Our first result shows that for each k > 1 the sets
Sigma_k Space(llog) \ Pi_k Space (o(log)) and
Pi_k Space(llog) \ Sigma_k Space (o(log))
are both not empty. This implies that for each S in the intersection of
Omega(llog) and o(log )
Sigma_1 Space(S), Sigma_2 Space(S), Sigma_3 Space(S) .. ..
form an infinite hierarchy.
Furthermore, this separation is extended to space classes defined
by ATMs with a nonconstant alternation bound A provided that
the product A * S grows sublogarithmically.
These lower bounds can also be used to show that basic closure
properties do not hold for such classes. We obtain
that for any S in Omega(llog) intersection o(log) and all k>1
\Sigma_k Space (S) and \Pi_k Space (S) are not
closed under complementation and concatenation.
Moreover, \Sigma_k Space (S) is not closed under intersection,
and \Pi_k Space (S) is not closed under union.
It is also shown that ATMs recognizing bounded languages can always be
guaranteed to halt. For the class of Z--bounded languages with
Z <= \exp S we obtain the equality
co-Sigma_k Space(S) = Pi_k Space (S) .
Finally, for sublogarithmic bounded ATMs we give a separation between
the weak and strong space measure, and prove a logarithmic lower space
bound for the recognition of nonregular context-free languages.
Key words:
space complexity, sublogarithmic complexity bounds,
alternating Turing machines, halting computations,
complementation of languages,
complexity hierachies, closure properties,
context-free languages, bounded languages.
AMS(MOS) subject classifications: 68Q05, 68Q10, 68Q25, 68Q45
Preliminary versions of these results have been presented at the
10. GI-AFCET Symposium on Theoretical Aspects of Computer Science,
W"urzburg, 1993,
and at the 9. IEEE-SIGACT-EATCS Symposium on Structure in
Complexity Theory, Amsterdam, 1994.
A final version will appear in SIAM Journal on Computing.
- Maciej Liskiewicz, Rüdiger Reischuk:
Separating Small Space Complexity Classes of Stochastic Turing Machines.
Technischer Bericht SIIM-TR-A-96-17,
Schriftenreihe der Institute für Informatik/Mathematik der Universität zu Lübeck,
1996.
Postscript anzeigen | Zusammenfassung anzeigen
Stochastic Turing machines are Turing machines that can perform
nondeterministic and probabilistic moves.
Such devices are also called games against nature, Arthur-Merlin games,
or interactive proof systems with public coins.
We investigate stochastic machines
with space ressources between constant and logarithmic size
and constant or sublinear bounds on the number of alternations
between nondeterministic and probabilistic moves.
Separation results are proven for the corresponding complexity classes
by showing the inclusion, resp.~noninclusion of certain languages.
The lower bounds hold without any restriction on the run time
and will follow from more general combinatorial properties.
These results imply an infinite hierarchy with linear distance
based on the number of alternations.
The separations are obtained by extending and improving lower bound techniques
developed for probabilistic automata and stochastic machines by which
previously the lower levels have been separated, resp.~a hierarchy with
exponential distance has been shown.
Furthermore, we establish a hierarchy for stochastic Turing machines
with arbitrary small nonconstant space bounds.
COMMENTS: extends Technical Report A-96-08
- Maciej Liskiewicz, Rüdiger Reischuk:
Space Bounds for Interactive Proof Systems with Public Coins and Bounded Number of Rounds.
Technischer Bericht SIIM-TR-A-96-08,
Schriftenreihe der Institute für Informatik/Mathematik der Universität zu Lübeck,
1996.
Postscript anzeigen | Zusammenfassung anzeigen
This paper studies interactive proofs, resp. Arthur-Merlin games
with a sublogarithmic space-bounded verifier.
We provide examples of specific languages and show that such systems
are unable to accept these languages. As a consequence, a new separation
is obtained for the complexity classes on the second
level of the round/alternation hierarchy.
- Rüdiger Reischuk:
Zeit und Raum in Rechnernetzen.
Technischer Bericht SIIM-TR-A-96-07,
Schriftenreihe der Institute für Informatik/Mathematik der Universität zu Lübeck,
1996.
Postscript anzeigen | Zusammenfassung anzeigen
Verteilte Rechnersysteme erweisen sich als eine attraktive Architektur
mit einer Reihe von Vorteilen.
Vor allem ist hier zu nennen die lokale Verf"ugbarkeit und
die M"oglichkeit des dezentralen Managements, so da"s bei Ausfall einzelner
Komponenten nicht zwangsl"aufig das gesamte System zum Stillstand kommen
mu"s. Andererseits erzeugt gerade diese r"aumliche Trennung eine neuartige
Synchronisationsproblematik.
Wir diskutieren exemplarisch zwei Prototypprobleme: Konsistenz bei
divergenter lokaler Information -- auch Consensus oder
das Problem der Byzantinischen Gener"ale genannt -- und die
zeitliche Synchronisation der lokalen Uhren in den einzelnen Knoten
und gemeinsamer Aktionen.
Grundlegend hierf"ur ist eine pr"azise Analyse wie aus lokalem Wissen
der einzelnen Komponenten globales Wissen "uber den Gesamtzustand
des Systems gewonnen werden kann, und zwar durch den Austausch von
Nachrichten und das Eintreffen oder Nichteintreffen von Ereignissen.
Es zeigt sich, da"s in Abh"angigkeit von bestimmten Systemeigenschaften
diese Ziele nur bis zu einem gewissen Grad erreichbar sind.
Dabei k"onnen kryptografische und probabilistische Verfahren hilfreich sein.
Auf die wesentlichen Systemparameter und ihre Bedeutung wird n"aher eingegangen.
Fehlertoleranz ist in manchen F"allen "uberhaupt nicht erzielbar.
Wir stellen in diesem Aufsatz
einige der grundlegenden Ergebnisse und methodischen Ans"atze vor.
- Andreas Jakoby, Rüdiger Reischuk:
Scheduling Trees with Communication Delays.
Technischer Bericht SIIM-TR-A-97-15,
Schriftenreihe der Institute für Informatik/Mathematik der Universität zu Lübeck,
1997.
Postscript anzeigen | Zusammenfassung anzeigen
Several variants of scheduling task graphs with interprocessor
communication delays have been shown to be NP-complete.
This paper completely characterizes the complexity of this problem under
the restriction that the task graphs are trees and that each task has
unit execution time. It is shown that the problem remains
NP-complete for binary trees with uniform communication delays.
The same holds for complete binary trees, but varying communication delays.
On the other hand, a polynomial time algorithm
is presented and analyzed that constructs an optimal schedule
for complete k-ary trees with uniform communication delays.
Finally, the parallel complexity of scheduling trees and dags with constant
communication delay is estimated by proving inclusion in NC, resp.
P-completeness results.
- Rüdiger Reischuk:
Can Large Fanin Circuits Perform Reliable Computations in the Presence of Faults?
Technischer Bericht SIIM-TR-A-97-05,
Schriftenreihe der Institute für Informatik/Mathematik der Universität zu Lübeck,
1997.
Postscript anzeigen | Zusammenfassung anzeigen
For ordinary circuits with a fixed upper bound on the maximal fanin
of gates it has been shown that logarithmic redundancy is necessary and
sufficient to overcome random hardware faults.
Here, we consider the same question for unbounded fanin circuits that
in the noiseless case can compute Boolean functions in sublogarithmic depth.
In this case the details of the fault model become more important.
One may assume that only gates, resp.~only wires may deliver wrong values,
or that both gates and wires may behave faulty due to random noise.
The fault tolerance depends on the types of gates that are used, and whether
the error probabilities are known exactly or only an upper bound for them.
Concerning the first distinction the two most important models are circuits
consisting of and/or-gates with arbitrarily many inputs,
and circuits built from the more general type of threshold gates.
We will show that reliable computation is basically impossible
for such circuits with unknown error probabilities.
Gates with large fanin are of no use in this case.
Circuits of arbitrary size, but fixed depth can compute
only a tiny subset of all Boolean functions reliably.
Only in case of threshold circuits and exactly known error probabilities
redundancy is able to compensate faults.
We describe a transformation from fault-free to fault-tolerant circuits
that is optimal with respect to depth keeping the circuit size polynomial.
- Rüdiger Reischuk, Thomas Zeugmann:
Learning One-Variable Pattern Languages in Linear Average Time.
Technischer Bericht SIIM-TR-A-97-13,
Schriftenreihe der Institute für Informatik/Mathematik der Universität zu Lübeck,
1997.
Postscript anzeigen | Zusammenfassung anzeigen
A new algorithm for learning one-variable pattern languages is
proposed and analyzed with respect to its average-case behavior.
We consider the total learning time that takes into account all
operations till an algorithm has converged to a correct hypothesis.
For the expectation it is shown that for almost all meaningful distributions
defining how the pattern variable is replaced by a string to generate
random samples of the target pattern language this algorithm converges
within a constant number of rounds with a total learning time
that is linear in the pattern length.
Thus, the algorithm is average-case optimal in a strong sense.
- Karin Genther, Rüdiger Reischuk:
Analysing Data Access Strategies in Cache Coherent Architectures.
Technischer Bericht SIIM-TR-A-98-25,
Schriftenreihe der Institute für Informatik/Mathematik der Universität zu Lübeck,
1998.
Postscript anzeigen | Zusammenfassung anzeigen
The conception of multiprocessor systems based on shared memory is
to provide logically uniform data access, even in systems with many
processors. Since data requests that cannot be satisfied physically
within the processor environment suffer from high latencies, these
systems employ memory hierarchies including private caches. Two
powerful techniques that reduce and hide latency in memory
hierarchies are buffering and pipelining of data accesses. In order
to utilize these techniques in global shared-memory architectures,
new memory consistency models have been defined. In this paper, we
discuss the cache coherence problem and three consistency models
that are commonly used to enhance performance. We develop a uniform
parameterized model for different parallel architectures and
estimate the latency for data access in each consistency model
abstracting from the details of hardware mechanisms. The goal of
our approach is to obtain realistic predictions of running-times for
the various models.
- Andreas Jakoby, Rüdiger Reischuk:
Average Case Complexity of Unbounded Fanin Circuits.
Technischer Bericht SIIM-TR-A-98-17,
Schriftenreihe der Institute für Informatik/Mathematik der Universität zu Lübeck,
1998.
Postscript anzeigen | Zusammenfassung anzeigen
Several authors have shown that the PARITY-function cannot be computed by
unbounded fanin circuits of small depth and polynomial size.
Even more, constant depth k circuits of size exp n^Theta(1/k) give
wrong results for PARITY for almost half of all inputs.
We generalize these results in two directions.
First, we obtain similar tight lower bounds for the average
case complexity of circuits, measuring the computational delay
instead of the static circuit depth.
Secondly, with respect to average delay of unbounded fanin circuits
we completely classify all parallel prefix functions,
for which PARITY is just one promiment example.
It is shown that only two cases can occur: a parallel prefix functions f either
has the same average complexity as PARITY, that is the average delay has to be
of order Theta(log n / loglog s) for circuits of size s,
or f can be computed with constant average delay and almost linear size --
there is no complexity level in between.
This classification is achieved by analyzing the algebraic structure of
semigroups that correspond to parallel prefix functions.
It extends methods developed for bounded fanin circuits
by the first author in his Ph.D. Thesis.
- Andreas Jakoby, Maciej Liskiewicz, Rüdiger Reischuk:
Scheduling Dynamic Graphs.
Technischer Bericht SIIM-TR-A-98-07,
Schriftenreihe der Institute für Informatik/Mathematik der Universität zu Lübeck,
1998.
Postscript anzeigen | Zusammenfassung anzeigen
In parallel and distributed computing scheduling low level tasks on the
available hardware is a fundamental problem. Traditionally, one has assumed
that the set of tasks to be executed is known beforehand. Then the scheduling
constraints are given by a precedence graph. Nodes represent the elementary
tasks and edges the dependencies among tasks. This static approach is not
appropriate in situations where the set of tasks is not known exactly in
advance, for example, when different options how to continue a program may
be granted.
In this paper a new model for parallel and distributed programs, the dynamic
process graph will be introduced, which represents all possible executions of
a program in a compact way. The size of this representation is small -- in
many cases only logarithmically with respect to the size of any execution.
An important feature of our model is that unlike precedence graphs, the
encoded executions are directed acyclic graphs having a "regular" structure
that is typical of parallel programs. Dynamic process graphs embed
constructors for parallel programs, synchronization mechanisms as well as
conditional branches. With respect to such a compact representation we
investigate the complexity of different aspects of the scheduling problem:
the question whether a legal schedule exists at all and how to find an
optimal schedule. Our analysis takes into account communication delays
between processors exchanging data.
Precise characterization of the computational complexity of various variants
of this compact scheduling problem will be given in this paper. The results
range from easy, that is NL-complete, to very hard, namely NEXPTIME-complete.
- Rüdiger Reischuk, Thomas Zeugmann:
A Complete and Tight Average-Case Analysis of Learning Monomials.
Technischer Bericht IIM-TR-A-98-15,
Schriftenreihe der Institute für Informatik/Mathematik der Universität zu Lübeck,
1998.
Postscript anzeigen | Zusammenfassung anzeigen
We advocate to analyze the average complexity of learning problems.
An appropriate framework for this purpose is introduced.
Based on it we consider the problem of learning monomials and the
special case of learning monotone monomials
in the limit and for on-line predictions in two variants:
from positive data only, and from positive and negative examples.
The well-known Wholist algorithm is completely analyzed,
in particular its average-case behavior with respect to the class of
binomial distributions. We consider different complexity measures:
the number of mind changes, the number of prediction errors, and
the total learning time.
Tight bounds are obtained implying that worst case bounds are too pessimistic.
On the average learning can be achieved exponentially faster.
Furthermore, we study a new learning model, stochastic finite learning,
in which, in contrast to PAC learning, some information about the
underlying distribution is given and the goal is to find a correct
(not only approximatively correct) hypothesis.
We develop techniques to obtain good bounds for stochastic finite learning
from a precise average case analysis of strategies for learning in the limit
and illustrate our approach for the case of learning monomials.
- Rüdiger Reischuk, Thomas Zeugmann:
An Average-Case Optimal One-Variable Pattern Learner.
Technischer Bericht SIIM-TR-A-98-22,
Schriftenreihe der Institute für Informatik/Mathematik der Universität zu Lübeck,
1998.
Postscript anzeigen | Zusammenfassung anzeigen
A new algorithm for learning one-variable pattern languages from positive data
is proposed and analyzed with respect to its average-case behavior.
We consider the total learning time that takes into account all
operations till convergence to a correct hypothesis is achieved.
For almost all meaningful distributions
defining how the pattern variable is replaced by a string to generate
random examples of the target pattern language,
it is shown that this algorithm converges within an expected
constant number of rounds and a total learning time that is linear in the
pattern length. Thus, our solution is average-case optimal in a strong sense.
Though one-variable pattern languages can neither be finitely inferred
from positive data nor PAC-learned,
our approach can also be extended to a probabilistic finite learner
that it exactly infers all one-variable pattern languages from positive
data with high confidence.
It is a long standing open problem whether pattern languages
can be learned in case that substitutions of pattern variables by
the empty string can also occur.
Our learning strategy can be generalized to this situation as well.
Finally, we show some experimental results for the behaviour
of this new learning algorithm in pratice.
This TR extends A-97-13
- Andreas Jakoby, Maciej Liskiewicz, Rüdiger Reischuk:
Space Efficient Algorithms for Series-Parallel Graphs.
Technischer Bericht SIIM-TR-A-00-17,
Schriftenreihe der Institute für Informatik/Mathematik der Universität zu Lübeck,
2000.
Postscript anzeigen | Zusammenfassung anzeigen
The subclass of directed series-parallel graphs plays an important role in
computer science. To determine whether a graph is series-parallel is a
well studied problem in algorithmic graph theory. Fast sequential and
parallel algorithms for this problem have been developed in a sequence
of papers. For series-parallel graphs methods are also known to solve
the reachability and the decomposition problem time efficiently.
However, no dedicated results have been obtained for the space complexity
of these problems -- the topic of this paper.
For this special class of graphs, we develop deterministic algorithms
for the recognition, reachability, decomposition and the path counting
problem that use only logarithmic space. Since for arbitrary directed
graphs reachability and path counting are believed not to be solvable
in log-space the main contribution of this work are novel deterministic
path finding routines that work correctly in series-parallel graphs,
and a new characterization of series-parallel graphs by forbidden subgraphs.
We then sketch how these results can be generalised to extension of
the series-parallel graph family: to graphs with multiple sources or multiple
sinks and to the class of minimal vertex series-parallel graphs.
It it also shown that the space bounds are best possible,
i.e. the decision problems are L-complete with respect to AC-reductions.
Finally, we discuss implications of theses small space bounds for
the parallel time complexity of series-parallel graphs.
- Andreas Jakoby, Maciej Liskiewicz, Rüdiger Reischuk:
The Expressive Power and Complexity of Dynamic Process Graphs.
Technischer Bericht SIIM-TR-A-00-07,
Schriftenreihe der Institute für Informatik/Mathematik der Universität zu Lübeck,
2000.
Postscript anzeigen | Zusammenfassung anzeigen
A model for parallel and distributed programs, the dynamic process graph,
is investigated under graph-theoretic and complexity aspects. Such graphs
are capable of representing all possible executions of a parallel or
distributed program in a very compact way. The size of this representation
is small -- in many cases only logarithmic with respect to the size of any
execution of the program. An important feature of this model is that
the encoded executions are directed acyclic graphs with a regular structure,
which is typical of parallel programs, and that it embeds constructors for
parallel programs, synchronization mechanisms as well as conditional branches.
In a previous paper we have analysed the expressive power of the general
model and various restrictions. Furthermore, from an algorithmic point
of view it is important to decide whether a given dynamic process graph can
be executed correctly and to estimate the minimal deadline given enough
parallelism. Our model takes into account communication delays between
processors when exchanging data.
In this paper we study a variant with output restriction. It is appropriate
in many situations, but its expressive power has not been known exactly.
First, we investigate structural properties of the executions of such dynamic
process graphs G. A natural graph-theoretic conjecture that executions must
always split into components isomorphic to subgraphs of G turns out to be
wrong. We are able to establish a weaker property. This implies a quadratic
bound on the maximal deadline in contrast to the general case, where
the execution time may be exponential. However, we show that the problem
to determine the minimal deadline is still intractable, namely this problem
is NEXPTIME-complete as is the general case. The lower bound is obtained by
showing that this kind of dynamic process graphs can represent certain
Boolean formulas in a highly succinct way.
This is an extended abstract to be presented at 26th International Workshop
on Graph-Theoretic Concepts in Computer Science, Konstanz, Germany, 2000.
- Andreas Jakoby, Maciej Liskiewicz, Rüdiger Reischuk:
Dynamic Process Graphs and the Complexity of Scheduling.
Technischer Bericht SIIM-TR-A-00-02,
Schriftenreihe der Institute für Informatik/Mathematik der Universität zu Lübeck,
2002.
Postscript anzeigen | Zusammenfassung anzeigen
In parallel and distributed computing scheduling low level tasks on the
available hardware is a fundamental problem. Traditionally, one has
assumed that the set of tasks to be executed is known beforehand.
Then the scheduling constraints are given by a precedence graph.
Nodes represent the elementary tasks and edges the dependencies among
tasks. This static approach is not appropriate in situations where the
set of tasks is not known exactly in advance, for example, when different
options how to continue a program may be granted.
In this paper a new model for parallel and distributed programs, the dynamic
process graph, will be introduced, which represents all possible executions
of a program in a compact way. The size of this representation is small -- in
many cases only logarithmically with respect to the size of any execution.
An important feature of our model is that the encoded executions are directed
acyclic graphs having a 'regular' structure that is typical of parallel
programs.
Dynamic process graphs embed constructors for parallel programs,
synchronization mechanisms as well as conditional branches. With respect
to such a compact representation we investigate the complexity of different
aspects of the scheduling problem: the question whether a legal schedule
exists at all and how to find an optimal schedule. Our analysis takes into
account communication delays between processors exchanging data.
Precise characterization of the computational complexity of various variants
of this compact scheduling problem will be given in this paper. The results
range from easy, that is NL-complete, to very hard, namely NEXPTIME-complete.
This is a revised and complete version of results discussed in TR A-98-07.
An extended abstract of this paper has been presented at 16th International
Symposium on Theoretical Aspects of Computer Science, Trier, Germany, 1999.
- Bodo Manthey, Rüdiger Reischuk:
The Intractability of Computing the Hamming Distance.
Technischer Bericht SIIM-TR-A-02-17,
Schriftenreihe der Institute für Informatik/Mathematik der Universität zu Lübeck,
2002.
Postscript anzeigen | Zusammenfassung anzeigen
Given a language $L$, the Hamming distance of a string $x$ to $L$
is the minimum Hamming distance of $x$ to any string in $L$. The
edit distance of a string to a given language is defined similarly.
First, we prove that the Hamming distance to languages in $\AC^0$
is hard to approximate; it cannot be approximated within a factor
$O\bigl(n^{\frac 13 - \epsilon}\bigr)$, for any $\epsilon > 0$,
unless $\mathsf P = \mathsf{NP}$ ($n$ denotes the length of the
input string).
Second, we show the parameterised intractability of computing the
Hamming distance. We prove that for every $t \in \nat$ there exists
a language in $\AC^0$ such that the problem of computing the
Hamming distance is $\mathsf W[t]$-hard. Moreover, there is a
language in \DP\ such that computing the Hamming distance to it is
$\mathsf W[P]$-hard.
Furthermore, we show that the problems of computing the Hamming
distance and of computing the edit distance are in some sense
equivalent by presenting reductions from the former to the latter
problem and vice versa.
- Jan Arpe, Rüdiger Reischuk:
Robust Inference of Functional Relations.
Technischer Bericht SIIM-TR-A-03-12,
Schriftenreihe der Institute für Informatik/Mathematik der Universität zu Lübeck,
2003.
Postscript anzeigen | Zusammenfassung anzeigen
Given n Boolean input variables representing a set of attritubes, we
consider Boolean functions f (i.e., binary classifications of tuples)
that actually depend only on a small but unknown subset of these
variables/attributes, in the following called relevant. The goal is to
determine these relevant attributes given a sequence of examples -
input vectors X with their classification f(X).
We analyze two simple greedy strategies and prove that they are able
to achieve this goal for various kinds of Boolean functions and
various input distributions according to which the examples are drawn
at random. This generalizes results obtained by Akutsu, Miyano, and
Kuhara for the uniform distribution.
The analysis also provides explicit upper bounds on the number of
necessary examples. They depend on the distribution and combinatorial
properties of the function to be inferred.
Our second contribution is an extension of these results to a
situation where the examples contain errors, that means a certain
number of input bits x_i may be wrong. This is a typical situation,
for example in medical research, where not all attributes can be
measured reliably.
We show that even in such an error-prone situation reliable inference
of the relevant attributes can be performed by proving that our greedy
strategies are robust even against a linear number of errors.
- Jan Arpe, Rüdiger Reischuk:
On the Complexity of Optimal Grammar Based Compression.
Technischer Bericht SIIM-TR-A-04-14,
Schriftenreihe der Institute für Informatik/Mathematik der Universität zu Lübeck,
2004.
Postscript anzeigen | Zusammenfassung anzeigen
The task of grammar-based compression is to find a small context-free grammar generating exactly one given string.
We investigate the relationship between grammar-based compression of strings over unbounded and bounded alphabets.
Specifically, we show how to transform a grammar for a string over an unbounded alphabet into a grammar for a block
coding of that string over a fixed bounded alphabet and vice versa. From these constructions, we obtain asymptotically
tight relationships between the minimum grammar sizes for strings and their block codings. Finally, we exploit an
improved bound of our construction for overlap-free block codings to show that a polynomial time algorithm for
approximating the minimum grammar for binary strings within a factor of c yields a polynomial time algorithm for
approximating the minimum grammar for strings over arbitrary alphabets within a factor of 24c+? (for arbitrary ?>0).
Since the latter problem is known to be NP-hard to approximate within a factor of 8569/8568, we provide a first step
towards solving the long standing open question whether minimum grammar-based compression of binary strings is
NP-complete.
- Bodo Manthey, Rüdiger Reischuk:
Smoothed Analysis of Binary Search Trees.
Technischer Bericht SIIM-TR-A-05-17,
Schriftenreihe der Institute für Informatik/Mathematik der Universität zu Lübeck,
2005.
Postscript anzeigen | Zusammenfassung anzeigen
Binary search trees are one of the most fundamental data structures. While the worst case height of a binary search tree is linear, the average height of binary search trees under the uniform distribution is logarithmic and one of the best studied problems in average case complexity.
We investigate what happens between worst and average case, namely the smoothed height of binary search trees: we randomly perturb a given (adversarial) sequence and then consider the height of the binary search tree generated by the resulting sequence. As perturbation models, we consider partial permutations, partial alterations, and partial deletions.
On the one hand, we prove tight lower and upper bounds of roughly ?(?n) (in contrast to height n in the worst case and ?(log n) on average) for the expected height of binary search trees under partial permutations and partial alterations. That means worst case instances break down under slight perturbations. On the other hand, we examine how much a perturbation can increase the height of a binary search tree, i.e. how much worse best case instances can become.
- Maciej Liskiewicz, Rüdiger Reischuk, Ulrich Wölfel:
Grey-Box Steganography.
Technischer Bericht SIIM-TR-A-09-03,
Schriftenreihe der Institute für Informatik/Mathematik der Universität zu Lübeck,
2009.
PDF anzeigen | Zusammenfassung anzeigen
In Steganography secret messages are encoded into unsuspicious covertexts such that an
adversary cannot distinguish the resulting stegotexts from original covertexts. To accomplish their respective tasks, encoder and adversary need information about the covertext distribution. In previous analyses, the knowledge about the covertext channel was unbalanced: while the adversary had full knowledge, the encoder could only query a black-box sampling oracle. In such a situation, the only general steganographic method known is rejection sampling, for which the sampling complexity has been shown to be exponential in the message length. To overcome this somewhat unrealistic scenario
and to get finer-grained security analyses, we propose a new model, called grey-box steganography. Here, the encoder starts with at least some partial knowledge about the type of covertext channel. Using the sampling oracle, he first uses machine learning techniques to learn the covertext distribution and then tries to actively construct a suitable stegotext – either by modifying a covertext or by creating a new one. The efficiency of grey-box steganography depends on the learning complexity of the concept class for the covertext channel, the efficiency of membership tests, and the complexity of the modification
procedure. We will give examples showing that this finer-grained distinction provides more insight and helps constructing efficient stegosystems that are provably secure against chosen hiddentext attacks. This way we can also easily model semantic steganography.