A Fault Tolerant Java Virtual Machine Fault

Werbung
2005-06-22
Fault-tolerance
A Fault Tolerant Java Virtual Machine
Introduction
Fault-tolerance
Fault-tolerance
What is Fault-tolerance ?
Definition
... is the property of a system that continues operating properly in the
event of failure of some of its parts.
www.wikipedia.org
In our case we implement a System (JRE), that tolerates fail-stop
failures: In response to a failure, the component changes to a state that
permits other components to detect that a failure has occurred, and then
stops. Note, that this do not cover Byzantine Failures.
A Fault Tolerant Java Virtual Machine
Malte Tiedje
Seminar
Zuverlässigkeit von Software in sicherheitskritischen Systemen
28. Juni 2005
Malte Tiedje ( Seminar Zuverlässigkeit von Software
A FaultinTolerant
sicherheitskritischen
Java Virtual Systemen)
Machine
Introduction
28. Juni 2005
1 / 25
Introduction
Fault-tolerance
Fault-tolerance
Why Java?
What is Fault-tolerance ?
Java is ...
Why Java?
portable
Definition
... is the property of a system that continues operating properly in the
event of failure of some of its parts.
secure: strong-typing, ...
distributed: RMI
and of course: OO, simple, wide-used
www.wikipedia.org
Malte Tiedje ( Seminar Zuverlässigkeit von Software
sicherheitskritischen
A FaultinTolerant
Java Virtual Systemen)
Machine
but Java is not fault-tolerant
28. Juni 2005
2 / 25
Malte Tiedje ( Seminar Zuverlässigkeit von Software
sicherheitskritischen
A FaultinTolerant
Java Virtual Systemen)
Machine
28. Juni 2005
3 / 25
Why Java?
2005-06-22
2005-06-22
A Fault Tolerant Java Virtual Machine
Introduction
Why Java?
Java is ...
portable
secure: strong-typing, ...
distributed: RMI
Why Java?
and of course: OO, simple, wide-used
but Java is not fault-tolerant
Because the JVM is defined independently of the hardware that
implements it, Java programmes can run unmodified on any platform
that implements a JVM. This implementation also only changes machine
independent code to archive fault-tolerance.
4 Steps
A Fault Tolerant Java Virtual Machine
The approach
4 Steps
1. Define a deterministic state machine a unit of replication
2. Implement independently failing replicas of the state machine
3. Ensure all replicas start from identical states and perform the same
sequence of state transitions
4 Steps
4. Ensure each output-producing transition yields in a single output to
the environment
Currently, fault-tolerance is solved on application-level, such transaction
numbers or group technology.
The approach
The approach
4 Steps
State-Machines
State Machines
State Machines are ..
a set of state variables and a sequence of commands
4 Steps
A command ...
1. Define a deterministic state machine a unit of replication
2. Implement independently failing replicas of the state machine
reads a subset of state variables (read set values = rsvs)
3. Ensure all replicas start from identical states and perform the same
sequence of state transitions
modifies a subset of states variables (write set values = wsvs)
A command is deterministic ...
4. Ensure each output-producing transition yields in a single output to
the environment
when a comand produces a deterministic wsvs and outputs an given
rsvs
A deterministic state machine ...
reads fixed sequence of deterministic commands
Malte Tiedje ( Seminar Zuverlässigkeit von Software
sicherheitskritischen
A FaultinTolerant
Java Virtual Systemen)
Machine
28. Juni 2005
4 / 25
Malte Tiedje ( Seminar Zuverlässigkeit von Software
sicherheitskritischen
A FaultinTolerant
Java Virtual Systemen)
Machine
28. Juni 2005
5 / 25
The approach
The approach
State-Machines
Fault-tolerance by duplication
JVM as State Machine
JVM as State Machine II
Problem: JVM is multi-threaded and a state-machines typical are not
Replication
Solution: every thread is a state-machine and the JVM is a set of
cooperating state-machines
Definition
Providing multiple identical instances of the same system, directing tasks
to all of them in parallel, and choosing the correct result on the basis of a
quorum
In particular: BEE (Bytecode Execution Engines) as set of functions
that define together a replica
www.wikipedia.org
Each replica undergoes the sames sequence of state transitions and
produces the sames output!
Napper 2003
Malte Tiedje ( Seminar Zuverlässigkeit von Software
A FaultinTolerant
sicherheitskritischen
Java Virtual Systemen)
Machine
The approach
28. Juni 2005
6 / 25
Malte Tiedje ( Seminar Zuverlässigkeit von Software
A FaultinTolerant
sicherheitskritischen
Java Virtual Systemen)
Machine
JVM as State Machine
2005-06-22
JVM as State Machine I
JVM as State Machine
JVM as State Machine II
1. not all commands executed by the JVM are deterministic
2. replicas of a JVM do not in general execute identical sequence of
commands
3. the read-set for a given command is not guaranteed to contain
identical values at all replicas
28. Juni 2005
7 / 25
8 / 25
JVM as State Machine II
Problem: JVM is multi-threaded and a state-machines typical are not
Solution: every thread is a state-machine and the JVM is a set of
cooperating state-machines
In particular: BEE (Bytecode Execution Engines) as set of functions
that define together a replica
Napper 2003
Although BEE’s do not explicitly exist as components of the JVM, we
can conceptually associate a BEE with a set of function that perform
bytecode execution and track the state of each tread.
Implement replica coordination in the JVM:
3 Challenges
Malte Tiedje ( Seminar Zuverlässigkeit von Software
sicherheitskritischen
A FaultinTolerant
Java Virtual Systemen)
Machine
A Fault Tolerant Java Virtual Machine
The approach
28. Juni 2005
Details
Details
Non-deterministic commands
Non-deterministic commands
Non-deterministic commands
Restriction 1 and 2
Restriction 2
Native methods must invoke other methods deterministically
Exclusively invoked by Java Native Interface (JNI)
Example
e.g read the hardware clock
Problem: the replicas have different input values, because the input is
performed outside the scope of the JVM
Solution: the protocol forces the backup to adopt the writes-set
values produces by the primary
But: this is not enough: we have to restrict the behavior of the native
methods
Malte Tiedje ( Seminar Zuverlässigkeit von Software
A FaultinTolerant
sicherheitskritischen
Java Virtual Systemen)
Machine
Details
28. Juni 2005
9 / 25
native void DoNotDo() {
lc = read time of day ();
if ( lc > 17:24:32)
acquire lock ();}
native long Input () {
return read time of day ();
}
void do(long lc ) {
lc = Input ();
if ( lc > 17:24:32)
acquire lock ();}
Malte Tiedje ( Seminar Zuverlässigkeit von Software
A FaultinTolerant
sicherheitskritischen
Java Virtual Systemen)
Machine
Details
Non-deterministic commands
Restriction 1 and 2
28. Juni 2005
11 / 25
Non-deterministic commands
Implementation
Restriction 1
Native methods must not produce non-deterministic output to the
environment
Checked all native methods in JRE libraries
less then 100 are non-deterministic
Example
Stored signature of each method in hash table
(class, method, arguments)
native void DoNotDo() {
lc = read time of day ();
print ( lc );
}
When primary invokes native method, check hash table
On match, send backup return values and modified arguments
native long Input () {
return read time of day ();
}
native void Output(long lc) {
print ( lc );
}
Malte Tiedje ( Seminar Zuverlässigkeit von Software
sicherheitskritischen
A FaultinTolerant
Java Virtual Systemen)
Machine
On recovery, backup may use logged values
28. Juni 2005
10 / 25
Malte Tiedje ( Seminar Zuverlässigkeit von Software
sicherheitskritischen
A FaultinTolerant
Java Virtual Systemen)
Machine
28. Juni 2005
12 / 25
Details
Non-deterministic rsvs
2005-06-22
Non-deterministic Read Sets I
Because of Multi-Threading in the JVM the values of shared variables
are non-deterministic
Solution I:
Implementation I
A Fault Tolerant Java Virtual Machine
Details
Non-deterministic rsvs
Implementation I
Napper 2003
Definition
< tid , tasn , lid , lasn >
tid thread id of the locking thread
asn
thread acquire sequence number recording the number of
locks acquired so far by thread tid
lid lock id
lasn lock acquire sequence number recording the number of times
lid has been acquired so far
Sun’s JVM provides two implementations of multithreading. The native
threads version provides scheduling in the underlying OS, while the green
threads version implements a user-level thread library for a uniprocessor
inside the JVM.
All access to shared data is wrapped by correct use of monitors (using
synchronized)
therefore we need replicating the Lock Synchronization
Malte Tiedje ( Seminar Zuverlässigkeit von Software
A FaultinTolerant
sicherheitskritischen
Java Virtual Systemen)
Machine
Details
28. Juni 2005
13 / 25
Details
Non-deterministic rsvs
Implementation I
Non-deterministic rsvs
Implementation II
Hard to create unambiguous ids
Cannot use object address as li d
Napper 2003
Cannot use order of events at primary
Definition
< tid , tasn , lid , lasn >
tid thread id of the locking thread
asn
thread acquire sequence number recording the number of
locks acquired so far by thread tid
lid lock id
lasn lock acquire sequence number recording the number of times
lid has been acquired so far
Malte Tiedje ( Seminar Zuverlässigkeit von Software
sicherheitskritischen
A FaultinTolerant
Java Virtual Systemen)
Machine
28. Juni 2005
14 / 25
Napper 2003
Malte Tiedje ( Seminar Zuverlässigkeit von Software
sicherheitskritischen
A FaultinTolerant
Java Virtual Systemen)
Machine
28. Juni 2005
15 / 25
Details
Details
Non-deterministic rsvs
Non-deterministic Read Sets II
Non-deterministic rsvs
Implementation
Solution I: many programs do not meet this condition (not even Sun’s
JRE)
Example
Definition
< brcnt , pcoff , moncnt , lasn , tid >
brcnt counts the control flow changes executed (e.g. branches,
jumps, and methods invocations)
class Example {
pcoff records the bytecode offset of the PC within the method
currently executed by t
static Formatter shared data = null ;
moncnt counts the monitor acquisitions and releases performed by t
String toString (){
if ( shared data == null){
shared data = new Formater();
synchronized method();
...
}}}
lasn records the lock acquisition sequence number when t is
rescheduled while waiting on a lock
tid the thread id of the next scheduled thread
Malte Tiedje ( Seminar Zuverlässigkeit von Software
A FaultinTolerant
sicherheitskritischen
Java Virtual Systemen)
Machine
Details
28. Juni 2005
16 / 25
Malte Tiedje ( Seminar Zuverlässigkeit von Software
A FaultinTolerant
sicherheitskritischen
Java Virtual Systemen)
Machine
Details
Non-deterministic rsvs
Non-deterministic Read Sets II
28. Juni 2005
18 / 25
Output to the environment
Output to the environment
Objective: Simulate a single, fault-tolerant state-machine
Solution II:
A thread has exclusives access to all shared variables while scheduled
In general impossible
Restriction 3
All native method output to the environment is either idempotent or
testable
therefore we need to replicate the thread scheduling
therefor we need a Side Effect Handler
Malte Tiedje ( Seminar Zuverlässigkeit von Software
sicherheitskritischen
A FaultinTolerant
Java Virtual Systemen)
Machine
28. Juni 2005
17 / 25
Malte Tiedje ( Seminar Zuverlässigkeit von Software
sicherheitskritischen
A FaultinTolerant
Java Virtual Systemen)
Machine
28. Juni 2005
19 / 25
2005-06-22
Evaluation
Output to the environment
A Fault Tolerant Java Virtual Machine
Details
Output to the environment
Objective: Simulate a single, fault-tolerant state-machine
Evaluation
In general impossible
Restriction 3
All native method output to the environment is either idempotent or
testable
Output to the environment
therefor we need a Side Effect Handler
A function f (x) is idempotent, iff (f ◦ f )(x) = f (x).
A action is testable, when the environment can be queried to determine a
specific output completed.
Overhead: depends on application and Rep. Lock-Sync / Rep. Thread
Sched.
Experiments: SPEC JVM98 benchmark (i.a: compress, db, raytracer
rendering)
Qualitative: differ from 5% up to 375%, average 60% for rts, 140%
for rla
Malte Tiedje ( Seminar Zuverlässigkeit von Software
A FaultinTolerant
sicherheitskritischen
Java Virtual Systemen)
Machine
Details
28. Juni 2005
21 / 25
Evaluation
Output to the environment
Eval.: Replicated Lock Acquisition
Side Effect Handler
register: method’s signature, what should be logged, etc
test: called on testable, uncertain commands
log & receive: how primary and backup exchange state
restore: called at the backup during recovery
Napper 2003
Malte Tiedje ( Seminar Zuverlässigkeit von Software
sicherheitskritischen
A FaultinTolerant
Java Virtual Systemen)
Machine
28. Juni 2005
20 / 25
Malte Tiedje ( Seminar Zuverlässigkeit von Software
sicherheitskritischen
A FaultinTolerant
Java Virtual Systemen)
Machine
28. Juni 2005
22 / 25
Evaluation
Evaluation
Eval.: Replicated Thread Scheduling
References
A Fault-Tolerant Java Virtual Machine: Jeff Napper, Lorenzo
Alvisi, Harrick Vin
http://www.cs.utexas.edu/users/jmn/papers/napper03fault.ppt
www.wikipedia.org
Napper 2003
Malte Tiedje ( Seminar Zuverlässigkeit von Software
A FaultinTolerant
sicherheitskritischen
Java Virtual Systemen)
Machine
28. Juni 2005
23 / 25
28. Juni 2005
24 / 25
Evaluation
Conclusion
A fault-tolerant JVM (at a reasonable cost)
Write Once, Run Anywhere
A framework for replicating multi-threaded SMs
Malte Tiedje ( Seminar Zuverlässigkeit von Software
sicherheitskritischen
A FaultinTolerant
Java Virtual Systemen)
Machine
Malte Tiedje ( Seminar Zuverlässigkeit von Software
A FaultinTolerant
sicherheitskritischen
Java Virtual Systemen)
Machine
28. Juni 2005
25 / 25
Herunterladen