T5.1: Grid Data Management Architecture and

Werbung
Collaborative Climate Community Data and
Processing Grid (C3Grid)
T5.1: Grid Data Management
Architecture and Specication
Work Package:
Author(s):
Version Idenitier:
Publication Date:
Work Package Coordinator:
Partners:
Contact:
E-mail:
AP 5: Grid Data Management
Tobias Langhammer, Florian Schintke
C3Grid-T5.1-002-Draft
March 2006
Zuse Institute Berlin (ZIB)
ZIB, GKSS, Uni Kln FUB, DLR, DWD
Tobias Langhammer, Florian Schintke
[email protected], [email protected]
T5.1: Grid Data Management Architecture and Specication
Contents
Contents
2
1. Introduction
1.1. Notationy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2. Common Termsy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
4
5
2. The C3Grid Projecty
7
3. Global C3Grid Architecturey
8
4. The C3Grid Grid Data Management System
4.1. Requirements . . . . . . . . . . . . . . . . . . . .
4.2. Available Tools and Solutions . . . . . . . . . . . .
4.2.1. dCache . . . . . . . . . . . . . . . . . . .
4.2.2. Storage Resource Broker (SRB) . . . . . .
4.2.3. gridFTP . . . . . . . . . . . . . . . . . . .
4.2.4. Chimera . . . . . . . . . . . . . . . . . . .
4.3. Open Challenges . . . . . . . . . . . . . . . . . .
4.3.1. Grid State Prediction . . . . . . . . . . . .
4.3.2. ISO19115 as Scheme for Climate Metadata
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
9
9
9
10
10
13
14
14
14
14
5. Architecture of the Grid Data Management System
15
5.1. Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
6. AP5 Use Cases
6.1. Querying the Current State . .
6.2. Querying State Predictions . .
6.3. Agreements . . . . . . . . . .
6.4. Staging and Transferring Files
6.5. Registration of Generated Files
6.6. High-level File Operations . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
19
19
20
20
21
21
22
7. Interfaces of the Grid Data Management System
7.1. Interface for the Workow Scheduler . . . . . . . . . . . . . . . . . . . .
7.1.1. Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.1.2. Datatypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.1.3. Operations of the DataService . . . . . . . . . . . . . . . . . . .
7.2. Interface of the Primary Data Provider for the Data Management Service
7.2.1. Operations of the DatabaseAccess . . . . . . . . . . . . . . . . .
7.2.2. Operations of the FlatFileAccess . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
24
24
24
24
24
27
27
28
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
8. Authorizationy
2 / 48
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
29
C3Grid-T5.1-002-Draft
T5.1: Grid Data Management Architecture and Specication
9. Conclusion
30
A. Interface Spezication Syntaxy
31
B. Questionnaire for Users and Data Providers
B.1. Important Aspects . . . . . . . . . . . . .
B.2. Miscellaneous . . . . . . . . . . . . . . .
B.3. Aspects of Data Management . . . . . .
B.4. Aspects of Metadata Management . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
32
32
38
39
39
C. Quotes from the Development Discussiony
C.1. E-Mail . . . . . . . . . . . . . . . . . . . . . . . . . . .
C.2. C3Grid-Wiki . . . . . . . . . . . . . . . . . . . . . . . .
C.2.1. WF im Portal (Stand 06.03.2006) . . . . . . . .
C.2.2. Metadaten im WF (Stand 06.03.2006) . . . . .
C.2.3. AG-Metadaten (Auszug, Stand 06.02.2006) . . .
C.2.4. AG-Metadaten-Meeting1 (Stand 06.02.2006) . .
C.2.5. Metadaten-Meeting-Marum (Stand 06.03.2006)
C.3. Mailing Lists . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
41
41
44
44
44
45
45
46
47
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Bibliography
.
.
.
.
48
C3Grid-T5.1-002-Draft
3 / 48
T5.1: Grid Data Management Architecture and Specication
1. Introduction
This document describes the results of the discussion on the architecture, interfaces, and functionality
of the C3Grid Grid Data Management.
The document is structured as follows ...
1.1. Notationy
The y-Sign
The two documents [LS06a, LS06b], which describe the C3Grid Grid Information Service and the
C3Grid Grid Data Management System are related to each other. To make each document self
contained, some sections are shared between both documents and printed in the exactly same way in
both documents. Such chapters, sections, gures, etc. are marked by the y -sign.
Interface Denitions
For the specication of interfaces, this document uses a notation, which is described in the following.
Datatypes used are common basic datatypes (like int, double, string) and constructed types. Constructed types can be lists, pairs, records, enumerations or variants. They can be bound to new names
by a type denition. For example,
type foo = (int,double) list
denes a new type foo which is a list of (int,double) pairs. For dening a record with an int and a
oat entry we write
type foobar = f foo : int, bar : double g
For dening enumerations we write
type color = ( Red j Green j Blue j Yellow )
Variants are a generalization of enumerations, where each handle can be assigned a type
type color = Name string j RGB (int, int, int)1
Service Interfaces are decribed in sections called 'Operations of the FooService'. Such section
lists the respective operations of the interface. Callers can use the functionality of the interfaces by
invoking the listed operations remotely (using Web Services).
Remote operations are given by their signature (consisting of the name, parameter list and return
type), a short functional description and a detailed list describing the parameters, the return value and
exceptions. The following example demonstrates this notation.
1
Note that a variant data type can be mapped to a C union.
4 / 48
C3Grid-T5.1-002-Draft
T5.1: Grid Data Management Architecture and Specication
foo (in huey : double, inout dewey : bool, out louie : string ) : int
Example operation with some special functionality.
Parameters:
in huey Example input argument which is a oating point value
inout dewey Example boolean argument which is modied by the operation.
out louie Example string argument which is set by the operation.
Example integer return value.
Returns:
For a detailed BNF-style description of the interface denition syntax see Appendix A.
1.2. Common Termsy
The terms must, must not, should, should not and may are used in accordance to RFC 2119 [Bra97].
This specication also uses the following terms.
base data: data which is not metadata, i.e. actual climate data les.
grid sites: sites within the C3Grid providing data capacity (workspaces) for processing and storing
les temporarily.
metadata: Data that describes other data. In the C3Grid we distinguish metadata by their origin and
purpose.
primary metadata: metadata generated and stored at primary data sites. Though parts of
this metadata may be cached by the Data Information System, the original copy always
remains out of reach for C3Grid middleware.
grid metadata: metadata generated in the grid workspaces and managed by the Data
Information Service. This metadata also describes data only stored in the grid workspaces.
discovery metadata: data necessary to nd climate data in the C3Grid.
use metadata: metadata describing how the processing tools can access a data object.
le metadata: metadata used by the Grid Data Management System in order to track the
location of replicated les.
Note, that the set of primary metadata objects and the set of grid metadata objects are mutually
exclusive. Discovery and use metadata are either primary or grid metadata, le metadata are
always grid metadata.
primary data: Read-only data at a primary site. The creation of primary data is outside the scope of
the C3Grid.
primary site: Organisation providing a large data repository which allows read-only access (DBMS,
archive,. . . )
process: In terms of workows, an atomic processing unit. In terms of operating system, a running
program.
workspace: Local (to a grid site) disk space, where data is made available for processing.
C3Grid-T5.1-002-Draft
5 / 48
T5.1: Grid Data Management Architecture and Specication
DOI: Digital Object Identier [DOI]. A system for identifying objects on the Internet. The DOI of
a data object is unique and does not assign it to a certain location like a URI. DOIs provide a
persistent way of accessing data objects by a mapping service.
In the C3Grid DOIs are used for identifying single datasets. The data service provides DOIs to
identify search results. These DOIs can then be used for requesting primary data from the data
repositories.
6 / 48
C3Grid-T5.1-002-Draft
T5.1: Grid Data Management Architecture and Specication
2. The C3Grid Projecty
The aim of the Collaborative Climate Community Data and Processing Grid is to develop a productive,
grid-based environment for the German earth system science, that supports eective, distributed
data processing and the interchange of high volume datasets from models and observations between
participating institutions. The collaborative system as a whole will provide unied and transparent
access to the geographically distributed archives of dierent institutions and simplify the processing
of these data for the individual researcher.
Specic challenges are to provide a collaborative platform for accessing and processing huge amounts
of data, as well as the eective management of community specic meta information.
Earth system research studies the system of earth and the processes within, and wants to develop
models to predict its climate changes. The C3Grid will provide the infrastructure to share data and
processing power across institutional boundaries.
C3Grid-T5.1-002-Draft
7 / 48
T5.1: Grid Data Management Architecture and Specication
3. Global C3Grid Architecturey
The C3Grid provides an infrastructure which is designed to the specic needs of the community of
climate researchers. It provides a means of collaboration, integrating the heterogeneous data sources
of the participating organizations and oering easy access by a common users interface.
The main application of the C3Grid is the analysis of huge amounts of data. To increase the
eciency the system provides automatic reduction or pre-processing operations and exploits data
locality, replication and caching.
user
Userinterfaces
API
GUI
Nutzer
Grid scheduling
Grid data management
Distributed
Grid
infrastructure
Grid information service
data-transfer service
userrmodules
Local
interfaces to
institutional
resources
data
metadata
Existing
resources
of the institutes
jobs
pre-processing
access to
archive
distributed
distributed
data archives
data archives
distributed
distributed
computecomputeresources
resources
data
archive
Figure 3.1.: Architecture of the C3Grid.
The architecture overview of Figure 3.1 shows the major components of the C3Grid in their operational context. Existing data repositories are integrated by dedicated interfaces for providing primary
base data and respective metadata to the middleware. Pre-processing capabilities allow a rst on-site
data reduction.
A distributed grid middleware consists of an information service, data management and transfer
services, and a scheduler. Actual jobs are executed on data which was extracted (staged) from
primary repositories and made available on a local disk share. Each processing not only produces new
base data, but also generates new metadata from metadata describing the original input.
The information services provides a comprehensive search for resources, primary data available in
local repositories, and data created at local grid shares. The Portal uses these services to assist users
in their data analysis workows.
8 / 48
C3Grid-T5.1-002-Draft
T5.1: Grid Data Management Architecture and Specication
4. The C3Grid Grid Data Management System
Still to be written: Responsibilities of the Service.
4.1. Requirements
The following requirements for the Grid Data Management System are specied in the C3Grid project
proposal [DG]
(R1) The system must coordinate the access and transfer of data in the grid. Files must not
be created or transferred unless it is necessary. The system must be economic in handling
huge amount of data.
(R2) The system must provide consistent access to all data provided by primary repositories.
(R3) On request, the system must provide replicas at processing resources.
(R4) The system must provide information about transfer times by referring to knowledge about
network topologies and bandwidths.
(R5) The system must provide information about replicas.
(R6) The system must be tightly coordinated with the Workow Scheduler to support it in
providing an ecient data and process scheduling.
(R7) The system must be fault tolerant, i.e., in the case of a breakdown it should recover
automatically.
Special requirements derive from the answers by the C3Grid users and data providers given in the
questionnaire of Appendix ??. and from the discussion among the C3Grid development groups (see
Appendix C)
(R8) The system must be usable with databases (like CERA or Pangaea) as well as at le
storages, like storage area networks (SAN) and hierarchical storage management systems
(HSM).
(R9) The system must support data providers of a total amount of several hundred terabytes,
data sets of several hundreds of gigabytes and les of several gigabytes.
(R10) For staging data from databases, the system must pass pre-processing specications
from the workow specication to the data provider. The system may not be aware of
the structure of these specications. 1
(R11) The system must support metadata as input and output les of sets at least as les.
4.2. Available Tools and Solutions
Zu
1.
2.
3.
4.
1
untersuchen hinsichtlich
Transfer
Manamgement und Replikate
Zugriff terti
are Speicher
globales Verzeichnis existierender Speicher
See also the quotes from the C3Grid wiki in Section C.2.5 and the use case discussed in the e-mail of Section C.1.
C3Grid-T5.1-002-Draft
9 / 48
T5.1: Grid Data Management Architecture and Specication
Tools:
dCache (3)
SRB (1,2,3)
gridFTP, GSI openSSH (1)
Chimera (2)
EGEE?
4.2.1. dCache
DCache [Fuh] is a data management system which aims to provide the storage and retrieval of huge
amounts of data. This data is distributed over a big number of heterogeneous data nodes.
Transparent access is provides via a logical le system. The exchange of data between the backend
servers is automated. the system provides features like disk space management, automatic data
replication, hot-spot detection and error recovery. External hierarchical storage systems (HMS) are
integrated transparently and automatically. An NFS interface allows name space operations (no
reading and writing of les).
DCache is a development of the CERN in the context of a high energy physics experiment which
will produce a continuous stream of 400 MB/sec in 2007. The experiment consists of three tiers. A
single tier 0 site (CERN) will be the main data source. Few tier 1 sites save this data in a distributed
scheme providing persistence through tape backups. Many tier 2 sites provide additional CPU and
storage resources.
The architecture of dCache consists of a central Workload manager and resource broker accepting
jobs and passing them to local sites. A site consist of a compute element (CE) accessing a storage
element (SE). Each SE is controlled by a Storage Resource Manager (SRM) and is connected to
remote SEs to exchange data via GsiFTP or GridFTP.
Applicability for the Grid Data Management System
DCache meets very few of the requirements of the Grid Data Management System. As its name
already suggests, it is mainly a distributed cache integrating dierent sources of a single virtual organization. This architecture does not apply to the independent administrative domains of the C3Grid
data providers. The transparent replication mechanism of dCache does not meet the need for the
close collaboration between the Grid Data Management System and the Workow Scheduler. This
collaboration involves detailed agreements about transfer and staging times.
Furthermore, with respect to the requirements of AP2 [LS06a], dCache does not provide the managment of additional metadata.
4.2.2. Storage Resource Broker (SRB)
The Storage Resource Broker (SRB) is a data management system which has been developed by the
San Diego Supercomputer Center (SDSC). In the beginning, this development was motivated by the
need for a uniform access to the data of the SDSC. Today it is deployed in many dierent projects
worldwide. The SDSC managed projects alone use SRB for a total of 626 TB, 99 million les and
5000 users.
SRB provides uniform access to heterogeneous data resources, like le systems and relational
databases. Uniformity is reached by a logical namespace and a common metadata and user management. SRB also provides the creation and management of le replicas.
10 / 48
C3Grid-T5.1-002-Draft
T5.1: Grid Data Management Architecture and Specication
Table 4.1.: Repository types supported by SRB.y
Abstraction
Systems
Database
IBM DB2, Oracle, Sybase, PostgreSQL, Informix
Storage Repository Archives { Tape, SAM-QFS, HPSS, ADSM, UniTree, ADS
ORB
File systems { UNIX, NT, Mac OSX
Databases { DB2, Oracle, PostgreSQL, mySQL, Informix
Components of the SRBy
An SRB deployment is structured in three layers. The physical layer consists of the dierent types of
data repositories SRB provides access to. There are two abstractions providing either a database or
storage repository style access. Table 4.1 gives an overview of the repositories supported so far.
The central layer of an SRB deployment consists of the SRB middleware, which provides the main
functionalities { logical namespace, latency management, data transport and metadata transport. The
functionalities are used by a common consistency and metadata management. The middleware also
provides authorization, authentication and audit.
The application layer consists of a collection of tools and interfaces providing high-level access. The
following list gives an uncompleted overview.
APIs for C, Java, Python, Perl, ...
Unix shell commands.
Graphical user clients (NT Browser, mySRB ...).
Access via HTTP, DSpace, OpenDAP, GridFTP
Webservice-based access via OAI, WSDL, WSRF.
Federated Zonesy
The typical distributed architecture of the SRB middleware is distributed in administrative units, called
SRB zones. A zone consists and one or many SRB servers. One server also keeps the metadata catalog
(MCAT) of the SRB zone. The MCAT contains information like user information, access rights and
resource information. It is deployed in a relational DBMS. Each SRB servers can provide access to
several resources.
For the connection of dierent zones, SRB provides the federated zones mechanism. It allows
mutual access of resources, data and metadata of several zones. Though, federation user accounts
of each local zone are made available in all remote zones. Still, these zones remain independent
administrative domains.
Synchronization of the MCATs of federated zones is reached by the periodical execution of a
provided script.
Metadatay
Metadata stored in the MCAT can be classied by their purpose.
administrative and system metadata allow the mapping from logical to physical le names and
to authenticate users.
C3Grid-T5.1-002-Draft
11 / 48
T5.1: Grid Data Management Architecture and Specication
user metadata provide a means of describing data objects by attribute values.
extensible schema metadata use the facility SRB provides to integrate external metadata schemas
into the MCAT.
SRB provides a detailed search on its MCAT metadata, e.g., for le creation times or user-given
attributes.
Replica Managementy
Replication is used by client or server side strategies which aim to enhance the access time by providing
a le closer to the location it is needed. Replicas can be created in dierent ways: by copying manually,
implicitly by logical resources or by user registration. Replicas are synchronized by a call of a special
command.
Rights Managementy
Users of an SRB Zone must be registered in the respective MCAT. For Authentication several techniques are provided (password, GSI, etc.), though, they are not supported by all clients. The denition
of user groups is possible, as well. The SRB server provides ne-grain authorized access on its object
by access control lists (ACL).
An additional concept of authorization is via tickets. Users which access the SRB with tickets need
not be registered in the MCAT.
Scalability by Master and Slave MCATsy
In order to enhance the response time of MCAT operations, the MCAT can be replicated to several
SRB servers in a single zone. These MCAT replicas are able to share the load of requests. Reading
requests are done on one of the many slave MCAT. Modifying access is handled by a dedicated master
MCAT. To be able to replicate MCATs the underlying DBMS must support database replication.
Logical Resourcesy
Another feature of the SRB server is to provide a single logical view for a number of physical resources.
A special application for this is the automatic replication of objects to several servers. For example,
two servers may each have a local physical resource and share both as single logical resource. By
setting the number of replicas to 2, a replication on both physical resources is guaranteed.
Other Featuresy
The SRB comprises the following additional features.
Extraction of data can be combined with preprocessing, e.g. to generate previews of images.
Many small les can be combined in containers to prevent their fragmentation over several tape
archives.
Support of parallel I/O operations.
Bulk operations, which enhance the transport of many small les.
12 / 48
C3Grid-T5.1-002-Draft
T5.1: Grid Data Management Architecture and Specication
SDSC SRB vs. Nirvana SRBy
Currently, there are two development branches of the Storage Resource Broker. SDSC maintains the
open source branch of the SRB development, whereas Nirvana oers a commercial version of SRB.
Though Nirvana SRB originates from the SDSC branch, both versions are mutually incompatible. The
following lists gives a brief overview of the dierences between both versions.
Nirvana SRB does not support federated zones
Nirvana SRB uses system daemons to guarantee the synchronicity of replicas and the global
namespace. SDSC SRB uses a rsync-like mechanism, which has to be called externally.
Nirvana SRB supports drivers for a wider range of repository types
The licence of SDSC is restricted to non-prot use. The short-term use of the C3Grid would
meet this criteria.
Nirvana provides commercial support. SDSC provides free community support.
Applicability for the Grid Data Management System
SRB provides many features which are required in the Grid Data Management System which makes
it a candidate to be assessed for two types of deployments: as main component of the Grid Data
Management System and as multiple deployment providing coherent access on data at local sites
individually.
In the rst case, SRB's federated zones are a must, because C3Grid sites remain independent
administrative domains. This requirement is met by the SDSC SRB and rules out the application of
Nirvana SRB.
The replica management of SRB provides basic functionality. Nevertheless, the close coordination
with the Workow Scheduler to provide ecient data scheduling and process scheduling implies agreements about the availability of replica at dened periods of time (also planned periods in the future).
Therefore, a lot of extra coding is required to create, transfer and manage replicas according to the
agreed lifetimes.
Also extra coding is required to make the Grid Data Management System aware of metadata les.
The registration of a le as metadata can be done by dening a special attribute in the MCAT, the
integration of the ISO 19115 in the DIS needs to be solved (see also [LS06a])
SRB is especially suited to access grid processing workspaces or le storage systems. Database
access would require a deep intrusion in existing DBMS, by-passing existing abstractions and interfaces
which already provide features required in the C3-Grid (spatial, temporal cuts, aggregations, etc.).
The implementation of le transfers in SRB, using techniques like parallel I/O, suites well the need
of optimized data exchange between C3Grid sites.
Also an important requirement of the C3Grid, the user management and authorization scheme of
SRB could be integrated in the Grid Data Management System.
The second type of deployment, i.e., multiple SRB installations at the sites, may be considered
by local data providers and is outside the scope of the Grid Data Management System. The data
management could use the SRB interface as common interface for accessing all data providers. Nevertheless, this setting does not seem reasonable because it requires all data providers to either use
SRB or implement also the parts of the interface which are not used.
4.2.3. gridFTP
GridFTP is a protocal for secure, robust, fast and ecient transfer of data. The Globus Toolkit 4
provides an implementation of GridFTP, which is the most commonly used tool for this protocol. It
C3Grid-T5.1-002-Draft
13 / 48
T5.1: Grid Data Management Architecture and Specication
conists of a GridFTP server, which has access to data through an appropriate Data Storage Interface.
This typically means a standard POSIX le system, but can also be a storage system like the SRB.
To access remote GridFTP servers, GT 4 also provides a respective client. It is capable of accessing
data via a range of protocols (http, https, ftp, gsiftp, and le). Because the client is a command
line tool it is especially suited for scripting solutions. For special demands, GT4 provides a set of
development libraries for custom client operations.
Applicability for the Grid Data Management System
GridFTP is a suitable tool for transfering les between C3Grid sites. It can be integrated in any
implementation of a Grid Data Management System quite easily.
Still to be written.
4.2.4. Chimera
Still to be written.
4.3. Open Challenges
4.3.1. Grid State Prediction
A key unique feature of the C3Grid middleware is the ability to use knowledge about access and
transfer times to optimize le staging and job execution. Because the DMS directly communicates
with the data providers and executes le transfers within the grid, it also needs to request and manage
temporal information. Based on this information, the Grid Data Management System provides the
Workow Scheduler with the following information.
The estimation of staging and transfer times
The estimation of le availability at a given location in the future
The estimation of future disk space use
Furthermore, the DMS reaches agreements with the Workow Scheduler basing on this information.
So the actual execution plan of workows will be based not only on knowledge at present but also on
predictions about the planned grid state in the future.
4.3.2. ISO19115 as Scheme for Climate Metadata
Still to be written.
Timeconstraints, AP2, 5, 6-coupling, ISO19115
14 / 48
C3Grid-T5.1-002-Draft
T5.1: Grid Data Management Architecture and Specication
5. Architecture of the Grid Data Management System
portal
distributed
grid Infrastructure
GUI/API
AP2
A
global DIS
AP5
B
AP6
workflow
scheduler
global DMS
catalog
local resources
and interfaces
at the
institutes
C
D
primary
metadata
primary
data
DIS: Data Information Service
DMS: Data Management Service
process
execution
local DMS
local DIS
preproc.
workspace
metadata
data
direction of request
internal data flow
X
external interface
Figure 5.1.: Overview of the external interfaces of the data information service (DIS) and the data
management service (DMS).y
5.1. Design
The main purpose of the Grid Data Management System (DMS) is to access data stores and to
manage les used as input or output of computations at grid compute resources. It must co-operate
with the following components of the grid middleware.
the Workow Scheduler, issuing staging requests
the local data providers, oering primary base data and respective metadata
the data information system, keeping track of which meta-information corresponds to which
data.
Virtualization of Files
The DMS aims to oer a high-level view on data by hiding the actual location of les. The users
should be able to specify workows by giving the required input data and compute capabilities without
the need to know where these resources are actually located.
Consequently, the DMS manages le transfers at a lower level. It also keeps track of distributed
copies of a le to be able to refer to its metadata.
To avoid confusion, we use the term logical le, replicated le or simply le, if we refer to the
high-level view of the location-independent data unit. The term replica is used for a single remote
C3Grid-T5.1-002-Draft
15 / 48
T5.1: Grid Data Management Architecture and Specication
copy of a le. This term also implies that the DMS knows about the respective representation as
logical le.
Local Workspaces
Each C3Grid site oers a le system share which we call workspace. A workspace is used as target
location for les which were requested from data providers or produced by grid jobs.
The grid workspace at each site is identied by a root path pointing to a location in the le system.
All sub-directories of this root path are dedicated to be controlled by the the DMS. Especially, no
other user of the local system should have read or write access to this le share.
Primary Data Providers
Primary data providers are the main source of data for the processing tasks of the grid. The way
the DMS can access them mainly depends on whether they are storage systems providing at les
or databases providing a more exible access. If the DMS gets a staging request, in the rst case,
it simply keeps a reference to the location of the requested le. In the case of databases, the DMS
must stage data to at les in order to make it accessible for the processing. Because databases
already implement pre-processing capabilities (e.g., by selective queries of temporal or spatial cuts),
the DMS provides extra operations for this type of data provider. Output of such a pre-processing is
a new data object which is stored only in the grid workspaces and has no counterpart in the database
it originates.
Note that we consider pre-processing a processing capabilitiy which is provided as integral feature
of a data repository. Independent tools may also provide on-site data reduction (e.g., CDOs) but are
better specied inside a job to be passed to the Workow Scheduler. Only this way the distributed
grid environment can be used.
Internal Structure
The Grid Data Management System consists of two conceptually independent components: a gridwide global DMS and many local DMS.
The global DMS is closely coupled with the Data Information Service (DIS) [LS06a] of AP2, with
which it shares a common data model, which is depicted as entity-relationship diagram1 in Figure 5.2.
This data model contains not only DMS-related information, but also DIS-related information and
relations between both. The most important aspects of this model are as follows.
A single set of discovery metadata describes one or many data les.
For each set of discovery metadata describing les in the grid there is an extra le containing
this metadata.2
The respective discovery metadata of a le can be identied by referring to the metadata object
identier OID.
A le has a name and a workspace path.
1
An entity-relationship diagram uses two kinds of concepts: entities (boxes) and relations between entities (diamonds
connecting boxes). Both kinds of concepts can have attributes (ovals). Key attributes (underlined attribute name)
uniquely identify instances of the respective entity. Relations are annotated with (mi n; max ) information. A pair
(n; m) between an entity E and a relation R indicates that an instance of E participates in R at least n and at most
m times.
2
see also Section C.2.2
16 / 48
C3Grid-T5.1-002-Draft
T5.1: Grid Data Management Architecture and Specication
Data Information System
Data Management System
(0,1)
OID
keeps meta
host
(0,1)
ISO
model
disc. meta
project
search attributes
(1,*)
describes
(1,1)
file
(1,*)
is located
logical path
refers to
max life
primary path
min life
(1,1)
format
replica
(1,1)
URI
OID
(1,1)
pin
Cera
use meta
Pangaea
use meta
...
use meta
Grid Workspace
use meta
stored/generated
at local providers
Figure 5.2.: Entity-relationship diagram of the common data model of the Data Information Service
and the Grid Data Management System.y
A replica has a host where it is saved, a lifetime range, and a pinning ag to save it from
automatic deletion.
A replica may have a primary path if it is not located below the workspace root path. (In fact
this 'replica' is a primary at le the DMS have direct reading access to.)
A local DMS is running at each C3Grid site which oers storage and le access capabilities. Its
task is to communicate with primary data providers, and to manage local le stores.
Replica Management
In order to facilitate the staging of processing input, the DMS creates le replicas at the site where
the data are needed. In order to facilitate the management of replicas, all replications of a single le
are stored at the same location relative to the workspace root path.
In the management of replicas, the DMS takes special care for a at le oered by a primary
data providers. In this case, the DMS creates replicas only in workspaces remote to the primary le.
At the location of the primary le no extra replica is generated in the workspace because the le is
accessible directly via its path. Note that this scheme of omitting unnecessary replication conforms
to requirement (R2).
In case a primary provider is a transparent hierarchical storage system (HSM), the DMS must assure
that the le it provides to the Workow Scheduler remains staged to the disk (e.g., by requesting a
pin from the HSM). If the HSM is not able to keep the primary le staged, the DMS must copy it to
its local workspace.
The path of a primary le points outside the sub-directory structure of the local workspace root
directory. Therefore, the uniform naming scheme for replications in remote workspaces can not be
C3Grid-T5.1-002-Draft
17 / 48
T5.1: Grid Data Management Architecture and Specication
maintained. For this purpose the DMS keeps two paths, a primary path pointing to the primary le in
its local le system and a logical path within the workspace namespace used for all remote replicas
In case of a disc space shortage, the RIS may decide to remove individual replicas. By setting a
special ag (pinning ) a replica will be saved from deletion.
The Data Management System as Data Provider
From a higher-level perspective, not only a primary data source but also the Data Information Service
acts as data provider. Especially for the workow scheduler, the discrimination of staging from a
primary source or creation of remote replications is irrelevant. For the purpose of le requests it uses
the same operations of the DMS interface. The scheme for providing replicas at an agreed host and
time is presented below in the section about use cases.
18 / 48
C3Grid-T5.1-002-Draft
T5.1: Grid Data Management Architecture and Specication
6. AP5 Use Cases
As described in the denition of the Data Information Service [LS06a], the portal can request detailed
information about how to access data. The following subset of this information is need to specify the
input of a workow.
An object identier
Optional aggregation or pre-processing commands for databases.
A list of base data le names
A metadata le name
With this information the Workow Scheduler starts to interact with the DMS to prepare an optimized execution of the workows.
6.1. Querying the Current State
Workflow Scheduler
DMS
Primary Provider
get current state
lookup in catalog
or workspace
time
get state prediction
get time estimate for staging
Figure 6.1.: Sequence diagram: Two types of state queries { current state and predicted state
The following queries are sent by the Workow Scheduler in order to get information about the
current state of data. (See also Figure 6.1)
What is the provider of a data object?
Input: object identier
Output: host name, host type (database, at le or workspace only)
What are the les of a data object?
Input: object identier
Output: list of logical le paths
Note that in the case of a database a replica of a le may not exist. Though, there is always a
logical le.
C3Grid-T5.1-002-Draft
19 / 48
T5.1: Grid Data Management Architecture and Specication
What is the name of the metadata le?
Input: object identier
Output: logical path of metadata le
Note that in the case of a database a replica of a metadata le may not exist. Though, there
is always a logical metadata le.
What are the replicas of a le?
Input: le path
Output: list of replica URIs In the case of a database no replica may exist for the given le.
What is the respective object identier of a le?
Input: le path
Output: object identier
6.2. Querying State Predictions
The following queries by the Workow Scheduler require the prediction of the future state of data in
the grid. This prediction is based on knowledge about respective staging and transfer times. Because
the DMS knows only about transfer times within the grid, it has to ask data providers for en estimation
of the time they need for staging.
When can a certain le be available at a specic host?
Input: le path, host
Output: time stamp
What is the earliest time, a specic le can be available?
Input: le path, time window of interest, period of lifetime
Output: time stamp
Exception: availability cannot be guaranteed for the given constraints.
What is the earliest time, a specic number of bytes can be available?
Input: le path, time window of interest, period of lifetime
Output: time stamp
Exception: availability cannot be guaranteed for the given constraints.
What is the transfer time for a specic le to a specic host?
Input: le path, target host, start time.
Output: period of time
6.3. Agreements
The Workow Scheduler and the DMS meet agreements about the predictions made from the operations described in the use case of Section 6.2. Operations oering agreements are called in two
steps.
1. A preliminary request
2. A nal commitment
20 / 48
C3Grid-T5.1-002-Draft
T5.1: Grid Data Management Architecture and Specication
Workflow Scheduler
DMS
Data Provider
provide file at t (REQUEST)
get time estimate for staging
OK
dT
time
provide file at t (COMMIT)
stage file
dT
t
is file available?
true
Figure 6.2.: Sequence diagram: Agreement for providing a le at a specied host and time.
If the request can be fullled then the DMS conrms it and includes it in his own scheme. Though, the
request remains preliminary until the Workow Scheduler acknowledges it in the second step. After a
predened period of time the DMS will dump requests if no acknowledgment has been received.
Figure 6.2 shows a successful agreement for providing a le on a target host. After the request and
commit step, the DMS manages the required staging of the le to be nished at the specied time.
Nevertheless, the Workow Scheduler should always check the availability of the respective replica
before it is using it.
6.4. Staging and Transferring Files
For providing a le at a specied host, the DMS needs to take dierent actions, depending on the
type of data provider it is dealing with. As mentioned before, the C3Grid has three types of data
providers: primary database systems, primary le systems and the DMS
For the Workow Scheduler, which requests the DMS to provide a replica, the source of the le is
irrelevant as long as the replica is available at the agreed time. Therefore, the sequence of operations
of the Workow Scheduler depicted in the example of Figure 6.2 applies to all data providers. The
preparation of time estimates and the actual staging or transfer diers for each provider.
Table 6.1 gives an overview of the dierent types of data providers and how le requests by the
Workow Scheduler are handled.
6.5. Registration of Generated Files
Commonly, new grid les are results from jobs executed by the Workow Scheduler. As convention
in the C3Grid, the output of a single job execution contains base data les and one metadata le.
C3Grid-T5.1-002-Draft
21 / 48
T5.1: Grid Data Management Architecture and Specication
Table 6.1.: Execution of data providing requests for dierent storage types.
Primary Database
Primary Flat File Store DMS Workspace
object id, target host
object id, target host
Request
object id, climate
arguments
parameter,
spatial/temporal cut,
pre-processing spec.,
target host
Target host is local to provider:
Target store
Action by DMS
local workspace
delegation to provider
reference to primary le
pin le at primary store
local workspace
none
Target store
Action by DMS
remote workspace
local stg. + replication
remote workspace
local stg. + replication
remote workspace
replication
Target host is remote to provider:
Because only the Workow Scheduler knows about the execution and output of jobs, it must also
care for the registration of new les in the Grid Data Management System as well as the Data
Information Service. Figure 6.3 depicts this registration, which is done by a single interface operation
for both data services.
Input: URIs of physical base les, and one physical metadata le.
Output: object identier
Note that the input of this registration consists of physical les in one of the grid workspaces. By
registration, each physical les becomes a replica and gets associated with a logical le.
6.6. High-level File Operations
The DMS provides the following operations for managing les at a host-independent level.
Copy a le to a logical path.
Move a le to a logical path.
Create a logical sub-directory.
Remove a logical sub-directory.
List all les in the directory.
Show the size of a le.
Remove a le.
Note that an operation on a le applies to all replicas. I.e., replicas are copied, moved or deleted
locally. The operation is ignored for hosts, where a replica does not exist.
The remove operation has non-standard semantics if a site does not provide a replica but a path to
a primary at le. The DMS does not remove the primary le but releases it from its control. E.g.,
if the le is in a transparent HSM then it is un-pinned to allow the HSM to remove it from its cache.
22 / 48
C3Grid-T5.1-002-Draft
T5.1: Grid Data Management Architecture and Specication
Data Information System
DIS/DMS Interface
1.1 register metadata
Workflow Scheduler
1. register base
and meta file
register discovery and
use metadata
time
Data Management System
1.2 register base and metafile
register files and
local replicas
object identifier
Figure 6.3.: Sequence diagram: registration of new base data and metadata produced in the grid.y
C3Grid-T5.1-002-Draft
23 / 48
T5.1: Grid Data Management Architecture and Specication
7. Interfaces of the Grid Data Management System
7.1. Interface for the Workow Scheduler
The interface for the Workow Scheduler (named B in Figure 5.1) provides the functionality of both the
Grid Data Management System and the Data Information Service. Therefore, most of the operations
and remote object specications of the interface for the Portal are also part of this interface.
The data services consists of a two catalogs: one which keeps meta-information of primary and grid
data objects and another which keeps track of replicates work copies.
7.1.1. Exceptions
DiskSpaceException This exception is thrown when the available disk space is exceeded.
FileRetrievalException This exception is thrown when the retrieval of a le failed.
7.1.2. Datatypes
Type oid is used for unique identiers of metadata sets.
Type uri is a string for identifying logical les and replica. This URI uses to the le scheme of
RFC1738. For logical les the host section of a URI is left out.
Type providerType is used to indicate the provider of an object.
type providerType = ( DB | FlatFile | DMS )
DB and FlatFile indicates a primary provider, DMS provides les which are managed by the DMS and
are not stored on a primary site.
Type action is used for the agreements.
type action = ( Request | Commit )
If an operation is called with an action argument Request it only returns a prediction for the request.
If the Workow Scheduler considers the prediction useful, it acknowledges it by re-invoking the same
operation with the a Commit action argument.
7.1.3. Operations of the DataService
getProvider (in object:oid) : (host, providerType)
Returns the name and type of the data provider of a given data object.
Parameters:
object Data object identier
Returns:
Host and provider type of the data object.
24 / 48
C3Grid-T5.1-002-Draft
T5.1: Grid Data Management Architecture and Specication
getFiles (in object:oid) : uri list
Returns a list of logical les paths. These les constitute the content of the data object. This operation does not discriminate base data and meta data les. Note that a logical le path may not be
equivalent to the workspace sub-path of a respective replica.
Parameters:
object Data object identier
List of le paths
Returns:
getMetaFile (in object:oid) : uri
Returns the logical path of the metadata le of a given data object. Note that the logical le path may
not be equivalent to the workspace sub-path of a respective replica. Also, in the case of a database
provider, there is no guarantee that a replica of the metale exists currently.
Parameters:
object Data object identier
path of metadata le
Returns:
getReplica (in le:uri) : uri list
Get all replica locations as URIs for a virtual le.
Parameters:
le Virtual le location, i.e. the path of the le
Returns:
List of replica locations.
getOID (in le :uri) : oid
Get the identier of the object a given le is part of.
Parameters:
le Virtual le location.
Returns:
the respective object identier.
availableReplica (in le :uri, in host :string, in minTime :time, in maxTime :time, duration :time)
: time
Request information about the predicted availability of a replica. The returned value is the earliest
time within time interval [minTime; maxTime] in which le le can be available at host host for a time
period of duration.
Parameters:
le Virtual le location.
host Target host for replica.
minTime Earliest time of interest. An undened value defaults to the time
of the operation call.
maxTime latest time of interest. An undened value defaults to innity.
duration period of time, the replica should be available. an undened value
defaults to maxTime minTime
Returns:
the earliest time the replica can be made available.
C3Grid-T5.1-002-Draft
25 / 48
T5.1: Grid Data Management Architecture and Specication
provideReplica (in le :uri, in host :string, in time :time, in duration :time, action :action) : bool
Negotiate the creation of a replica at a specied host. The caller and the callee negotiate by two
calls of this operation. First, an informativ request is sent (action=Request). Then, on success, the
request is conrmed (action=Commit).
Parameters:
uri
host
time
duration
Path of the le to be retrieved.
Target host.
The time the replica is available.
minimum lifetime of replica. An undened value defaults to an
innite lifetime.
action Request or Commit.
true on success, false otherwise.
Returns:
availableSpace (in bytes :int, in host :string, in minTime :time, in maxTime:time, duration :time)
: time
Request information about the predicted availability of free workspace. The returned value is the earliest time within time interval [minTime; maxTime] in which bytes bytes of disc space can be available
at host for a time period of duration.
Parameters:
bytes Number of free bytes.
host Target host.
minTime Earliest time of interest. An undened value defaults to the time
of the operation call.
maxTime latest time of interest. An undened value defaults to innity.
duration period of time, the replica should be available. an undened value
defaults to maxTime minTime
Returns:
the earliest time the requested disk space can be made available.
provideSpace (in bytes :int, in host :string, in time :time, in duration :time, action :action) : bool
Negotiate the allocation of free disc space a specied host. The caller and the callee negotiate by two
calls of this operation. First, an informativ request is sent (action=Request). Then, on success, the
request is conrmed (action=Commit).
Parameters:
bytes
host
time
duration
Number of bytes to be retrieved.
Target host.
The time the replica is available.
minimum lifetime of replica. An undened value defaults to an
innite lifetime.
action Request or Commit.
Returns:
true on success, false otherwise.
Raises:
DiskSpaceException
registerFile (in base :uri list, in meta :uri) : oid
Register new les in DMS. New les commonly originate from grid processings. After registration
26 / 48
C3Grid-T5.1-002-Draft
T5.1: Grid Data Management Architecture and Specication
new les becomes replicas of respective logical les.
Parameters:
base List of base les
meta Metadata le
New unique object identier
Returns:
7.2. Interface of the Primary Data Provider for the Data Management Service
7.2.1. Operations of the DatabaseAccess
The staging of les from Databases follows a special scheme because they provide special preprocessing functionality.1
stageFiles (in objs :oid list, in constraints :(attribute,value) list, in outDir :uri, out baseles :uri list,
out newObj :oid, out metale :uri, out stagingTime :time, in dummy :bool ) : void
Stage le form database to workspace. This will create a new data object with a new unique object
identier
Parameters:
in objs List of object identiers to be staged
in constraints Constraints reducing the data of the request. A database provider
must support at least following attributes.
in outDir
out baseles
out metale
out newObj
out stagingTime
out dataSize
in dummy
parameters (list of climate parameters)
minLat, maxLat, minLong, ... (3-dimensional spacial cut)
minTime, maxTime (temporal cut)
preproc (pre-processing specication)
format (le format: GRIB, NetCDF,...)
Output directory as target for staging
list of le paths of generated base data les
name of metadata le
new object identier
Estimated staging time.
Estimated data size.
if true, do not stage, just estimate the staging time and data size.
simpleStageFiles (in obj :oid, in outDir :uri, out base :uri list, out meta :uri, out stageTime :time, in
dummy :bool) : void
Requests the data provider to dump a data object as les to the local disk space. Note that this op1
see also Section C.2.5
C3Grid-T5.1-002-Draft
27 / 48
T5.1: Grid Data Management Architecture and Specication
eration does not create a new object but replicates a database object as le.
Parameters:
in obj Object identier
in outDir Output directory as target for dump
out base Paths of local les. These les must be readable by the DMS and
located in outDir.
out stageTime Estimated staging time
out dataSize Estimated data size
in dummy If true, do not stage, just estimate the staging time and data size.
7.2.2. Operations of the FlatFileAccess
pinFiles (in obj :oid, out base :uri list, out meta :uri, out stageTime :time, in dummy :bool) : void
Requests the data provider to stage les of a given data object in its local disk space.
Parameters:
in obj Object identier
out base Paths of local les. These les must be readable by the DMS and
be staged to outDir.
out stageTime Estimated staging time
out dataSize Estimated data size
in dummy If true, do not stage, just estimate the staging time and data size.
28 / 48
C3Grid-T5.1-002-Draft
T5.1: Grid Data Management Architecture and Specication
8. Authorizationy
Still to be written.
C3Grid-T5.1-002-Draft
29 / 48
T5.1: Grid Data Management Architecture and Specication
9. Conclusion
Still to be written.
30 / 48
C3Grid-T5.1-002-Draft
T5.1: Grid Data Management Architecture and Specication
A. Interface Spezication Syntaxy
For the specication of interfaces, the following BNF-style syntax is used in the denition of remote
functions. Non-terminals are set in italics, `terminals' are quoted.
functionDef
paramList
param
returnType
name `(' paramList `) :' type
j param ( `,' param )
( `in' j `out' j `inout' ) paramName `:' returnType
`void' j type
(A.1)
(A.2)
(A.3)
(A.4)
Each parameter of a function has a specier, which denes whether an argument is read (in), set
(out) or read and set (inout) by the called function.
For the denition of constructed datatypes the following syntax is used
typedef
type
basetype
record
variant
`type' typeName `=' type
typeName j basetype j record j variant
`int' j `double' j `string' j type `list' j `(' type `,' type `)'
`f' name `:' type ( `,' name `:' type ) `g'
`(' handle type? ( `j' handle type? ) `)'
(A.5)
(A.6)
(A.7)
(A.8)
(A.9)
Record entry names (name) and type names (typeName) consist of arbitrary upper or lower case
alphanumeric characters, except for the rst character, which must be an alphabetic character. Variant
handles (handle) are names starting with an upper-case character.
C3Grid-T5.1-002-Draft
31 / 48
T5.1: Grid Data Management Architecture and Specication
B. Questionnaire for Users and Data Providers
This chapter still needs some translation.
The following questionnaire was lled out by the users and data providers of the C3Grid. Its purpose
is to give a detailed description of the compute and data resources present at the institutes.
Due to the demand of the users, some questions were rephrased after 2006-12-19 to be more
concise. The answers that we received before the rephrasing are documented with an 'old' appended
to the question number.
The questions and answers were partly translated from the German original to English by the authors
of this document.
B.1. Important Aspects
Question 1.1: Storage location: local, distributed, partitioned, replicated?
Answers:
HH: DKRZ: lokal : global shared lesystem (GFS from NEC, SAN, FC based, )
for subsystems:
archive -> $UT (HSM) / $UTF
cross (compute server) : $WRKSHR , ...
hurikan (HPC : NEC SX6 )
HH { MPI: yang (SunFire15k) : locally in /scratch/local1 (1 FS (42 TB (+ 32 TB =
74 TB in 01/2006)) ; QFS
HH { CERA DB: 1 metadata DB, several (currently 5) DB for actual data
Uni Berlin: partial datasets (at les; contain not all information, i.e. parameters,
niveous etc.)
Uni Koln: like Uni Berlin
PIK: locally Note: 1. to 5. only represents the data that have to be provided in the
C3Grid project:
1. Modelloutput Validierungslauf CLM, Europa mit ERA15-Antrieb, 10'x10',
6h-Werte 1979-1993)
2. Modelloutput Multi-Run Experimente CLM fur Unsicherheitsuntersuchungen
von Extremepisoden in ausgewahlten Regionen Mitteleuropas fur
Referenzzeitraume und Szenarien (jeweils max. 5 Monate)
3. Modelloutput STAR: gegitterte tagliche Meteorologie Deutschland, 7km x
7km, Referenzzeitraum 1950-2004 und 3 SRES-Szenarien 2005-2055
4. PIK-CDS Datensatz: glob. Monatsmeteorologie (tmean, precip) uber
Landmassen, 30'x30', 1901-2003
5. Modelloutput globales Vegetationsmodell LPJ, monatliche Daten, 30'x30',
Referenzzeitraum 1901-2003 und 3 SRES-Szenarien 2004-2100
DWD: lokal
32 / 48
C3Grid-T5.1-002-Draft
T5.1: Grid Data Management Architecture and Specication
Question 1.2: Storage system: (raid) server, tape system with NFS interface?
Answers:
HH: tape system : yes (DKRZ) ; HSM with GFS- and NFS-Frontend RAID-System :
yes, for all DKRZ-Server and for yang NFS-Interface: not for non-DKRZ users,
mount CERA DB data are stored in HSM in a split way.
Uni Berlin: t ape archive
Uni Koln: S AN
PIK:
GPFS with NFS, in the future SAN
tape robot using IBM TSM
DWD: Server
Question 1.3: Storage type: database system (which?), les?
Answers:
HH { atles: $UT (s.a.) with at les (ca. 4 PBytes)
HH { CERA DB: im ersten Schritt ca. 100 TB von derzeit ca. 200 TB ins GRID. Die
Datenmenge wird weiter wachsen. Diese sind in Datenbank gespeichert. Oracle
9205 mit Oracle Application Server 10g. Zugri erfolgt entweder uber Application
Server (http, Connection Manager) oder auch direkt von den Datenbanken.
HH { Dkrz: ca. 5% des gesamten Datenvolumns derzeit in DB, Rest auf dem Archiv
gespeichert; die Tendenz zur Speicherung in der DB wird hoeher, z.B. jetzt auch 6h
Auoesung, daher spricht MPI t.w. auch von 10% in der DB
PIK:
1., 2. Files
3.{5. RDBMS Oracle
DWD: Datenbanksystem, Oracle v9.2.0.5.0
Question 1.4: Structure of data?
How are the dierent data units (experiments, time series, . . . ) distributed over les, table entries,
blobs, etc.? For example, one time series of one variable per le/table/row.
Answers:
PIK:
1. one le per timestep (6h)
2. one le per single run of a multi-run experiment and per post-processor-output
for a multi-run experiment
3.{5. per Referenzzeitraum / scenario and variable one Oracle-Table
DWD: Relational DBMS
Question 1.4.old: Structure of the data, e.g., one dataset per le, per table/row?
Answers:
HH { at les: - Sei Datensatz=Daten aus 1 Experiment : bis zu 10 TB/Exp. setzt
sich z.B. aus monatlichen Files a 1-7 GB zusammen (also z.B. (typisches?)
C3Grid-T5.1-002-Draft
33 / 48
T5.1: Grid Data Management Architecture and Specication
Experiment uber 100 Jahre mit 5GB/Monat -> 1200 Files : S=1200*5GB =6TB
IPCC-T63L31: 2400 les (200a) mit 15.8 GB/Jahr (12 les!)-> 3 TB)
HH { CERA DB: Datensatze entsprechen ganzen Tabellen. Es ist eine granulare
Abfrage in den in den Metadaten abgelegten Inkrementen der Zeitserie moglich. Ein
weiteres Processing (Ausschneiden von Gebieten) ist Beta. Datensatz ist in der
Regel (also nicht immer) die Zeitserie einer Variablen. Experiment ist die Summe
aller DS. Derzeit Experiment bis zu ca. 20 TB, ein Datensatz bis zu 300 GB. Also
eine Tabelle bis zu 300 GB in Einheiten (blobs von wenigen k bis zu einigen hundert
M). Durchaus auch goere Einheiten moglich.
Uni Berlin: In den Teildatensatzen sind individuelle Datenrecords (uber Grib einzeln
identizierbar) zusammengefasst, und zwar
1. Bei Modell-Lauf aus einem Lauf pro Datei, aber nicht alle Informationen (s.o.)
2. Bei ERA Reanalysedaten Daten fur einen bestimmten Zeitraum (Monat) in
einer Datei.
Erganzung: 12/01/2006:
Die Dateien, die Koln / Berlin anbietet sind unterschiedlich aufgebaut, so konnen
die Dateien durch folgende 3 Beispiele beschrieben werden: Bsp:
1. \Bodendaten": Temp, Taupunkt, Bodendruck ......
2. \Atmospharendaten a)": Geopotential in 1000, 925, 800, 750 ... hPa
3. \Atmospharendaten b)": Wind: u und v Komponente auf mehreren Niveaus
Wir bieten Dateien an, die einen Ausschnitt einer Zeitserie (meistens ein Monat)
beinhalten und dabei moeglicherweise mehrere Niveaus der gleichen physikalischen
Groesse und / oder mehrere Variablen enthalten sind.
Uni Koln: like Uni Berlin
Question 1.5: Number of les, rows, blobs?
Answers:
PIK:
1. 22000
2. abhangig vom Experimenttyp (z.B. globale Sensitivitatsanalyse, Monte Carlo
Analyse) insgesamt 1000
3. pro Referenzzeitraum / Szenario 1.5G Oracle-rows + Index
4. 200M Oracle-rows + Index
5. pro Referenzzeitraum / Szenario 1.5G Oracle-rows + Index
DWD: 40 { 50 Tabellen mit Klimamesswerten
Question 1.5.old: Number of les, rows (datasets)?
Answers:
HH { DKRZ: ca. 30 Millionen Dateien
HH { CERA DB: ca. 5 Milliarden Zeilen
Unis Berlin/Koln: ERA Reanalysen: je 480 Dateien fur 1. Atmospharen-NiveauGruppen, 2. Boden, zusammen rund 1000 Dateien.
34 / 48
C3Grid-T5.1-002-Draft
T5.1: Grid Data Management Architecture and Specication
Question 1.6: Size of one le, row, blob?
Answers:
PIK:
1. 18 MB pro Datei
2. 5 GB pro Datei
DWD: Tabellengrossen von 1MB z.B. Langper Werte Termin bis 8 GB z.B.
Terminwerte Klima
Question 1.6.old: Size of one le or dataset, respectively?
Answers:
HH { at les: Dateigroesse : von 1 MB bis 7 GB
Datensatz : von wenigen MB bis 10 TB (s.o.)
HH { CERA DB: Einheiten siehe 1.4
Unis Berlin/Koln: Zwischen 1 GB und 80 MB
Question 1.7: Total size of the data?
Answers:
HH { at les: 4 PB, Tendenz exponentiell steigend
HH { CERA DB: 200 TB, exponentiell steigend; Zuwachs bis zu 1/3 der DKRZ
Uni Koln: 1/2 TB
Question 1.8: Format of data?
(CSV, XML, GRIB, description not accessable, ...)
Answers:
ZMAW: GRIB, NetCDF und binaer
HH { CERA DB: Stufe 0: nur GRIB, spater auch netCDF, IEG, ASCII, tarles, ...
Unis Berlin/Koln: (Bemerkung: Zwischenprodukte in den Diagnostik-Workows
werden z.Z. auch im LOLA-Format abgespeichert. Diese sind aber in den o.g.
Daten nicht enthalten.)
PIK:
1. Grib1
2. NetCDF-CF 1.0
3. Rows in Oracle-Tabellen
4. Rows in Oracle-Tabellen
5. Rows in Oracle-Tabellen
DWD: ASCII
Question 1.9: Which access methods do you or your programs use?
Datei-basiert (read, seek, regelmaige Muster, andere Muster)
SQL, XML Query (XPath, XQuery, ...), ...
Zugrisbibliothek (frei zuganglich, Quellen vorhanden?)
eigene Programme (Sprache ?)
C3Grid-T5.1-002-Draft
35 / 48
T5.1: Grid Data Management Architecture and Specication
Answers:
HH { at les: C-lib, viel random seek (teilweise auch seek), grib lib, netCDF lib,
(MPI, UCAR?), CDO (ANSI-C) & pthreads
HH { CERA DB: sql
Unis Berlin/Koln:
Datei-basiert: READ
SQL, XML Query: (vgl. Zugri auf CERA-Datenbank)
Zugrisbibliothek: GRIB-Decodierung uber EMOS-lib des ECMWF, frei
zuganglich, Quellen vorhanden.
eigene Programme: FTN90
PIK:
1.,2. dateibasiert read
3.{5. SQL-Skripte und HTML-/Java-Oberachen zur Erzeugung von Files mit
extrahierten Daten, die dann weiterverarbeitet werden.
DWD: SQL
Question 1.10: Are datasets modied or are they written only once and read an arbitary
number of times?
Answers:
HH: mainly 'subset extraction'
Unis Berlin/Koln: Teildatensatze: 1x geschrieben und beliebig oft gelesen.
PIK: 1x geschrieben, beliebig oft gelesen
DWD: Beobachtungsdaten werden verandert und erweitert
Question 1.11: How often/when is new data generated?
Answers:
HH: Stetig wahrend den Experimenten auf den NEC-SX6 und von da auf's Archiv
geschrieben ca. 2 TB/Tag
Unis Berlin/Koln: Teildatensatze:
1. Nach Vorliegen eines neuen Modell-Laufs, der ausgewertet werden soll,
2. Bei Anwendung anderer Diagnostischer Tools oder einer modizierten
Anwendung des gleichen Tools, fur die noch keine passenden Teildatensatze
vorliegen.
PIK: 1.{3., 5.: nach neuen Szenarienrechnungen
4.: 1 neues Jahr 1x jahrlich hinzufugen
DWD: standig (im Minutentakt)
Question 1.12: Description of software resources, e.g., data lter services.
) Evtl. Teil eines Anwendungsszenarios!?
Answers:
HH: CDOs, afterburner siehe generische Skripte, vgl. mit C3Grid-Nutzungsszenario
Unis Berlin/Koln: EMOS-LIB: dokumentiert.
36 / 48
C3Grid-T5.1-002-Draft
T5.1: Grid Data Management Architecture and Specication
PIK:
1. Post-Processor Vadi (Fortran)
2. eventuell SimEnv Experiment Post-Prozessor (Fortran, C): abhangig davon, ob
in C3Grid Experimente post-prozessiert werden sollen oder nur auf
post-prozessierten Experimentoutput zugegrien werden soll.
3.{5. abhangig von den Modellen, die die Files mit den extrahierten Daten
weiterverarbeiten
DWD: QualiMet, Prufung und Validierung Meteorologischer Daten
Question 1.13: Life cycle of metadata for the data.
z.B. 1xErstellen/N-mal Lesen, abwechselnd aendern/Lesen, Loeschen, etc.
Answers:
HH { MPI: Soweit oder sobald Metadaten ueberhaupt Verwendung nden:
selbstbeschreibender Anteil : ca. 2 Jahre
Rest (=??) : unbeschraenkt
v.a. abwechselnd aendern/Lesen, Loeschen, etc.
HH { CERA DB: in der Regel einmal erstellt und wenig modiziert, Ausnahme:
Datensatzgroe, die sich permanent andert. Sobald DOI erstellt, keine A nderung
von ausgewahlten Metadaten mehr moglich.
Unis Berlin/Koln: Bisher keine Metadaten fur Teildatenstze auer ggf. GRIB-Header.
PIK: 1x erstellen, n-mal lesen
DWD: abwechselnd andern/lesen
Question 1.14: Availability of the metadata.
(Tabellenschema + einzelne Werte, benden sich Anfang der Datei, werden einmal/dynamisch generiert)
Answers:
HH: Beispiele : ... !! Werden einmal in Headern von GRIB, netCDF generiert.
Unis Berlin/Koln: entfallt
PIK: Tabellenschema z.Zt. PIK-CERA-2 1., 2.: zusatzlich Header-Informationen der
einzelnen Files
DWD: liegen in XML vor
Question 1.15: Extent of the metadata schema for an object, e.g., xed/unchagngeble,
extendable
Answers:
HH { MPI: Umfang noch unbestimmt ? (Zukunftsvision) veranderbar, erweiterbar
HH { CERA DB: erweiterbar und veranderbar Unis Berlin/Koln: entfallt
PIK: erweiterbar
DWD: erweiterbar
C3Grid-T5.1-002-Draft
37 / 48
T5.1: Grid Data Management Architecture and Specication
B.2. Miscellaneous
Question 2.1: Which constraints of your international integration must/should be
considered?
data formats
access protocols
metadata schemes
etc.
Answers:
HH { at les:
Datenformate : GRIB, CF-1.0 Konvention
Zugrisprotokolle : Was sich anbietet : gridFTP, eccopy, ... ???
Metadatenschemata (inkl. Datentypen) : selbsbeschreibend, table based,
standardisert (WMO, CF group)
HH { CERA DB: Funktion als WDCC, ggf. Richtlinien der Projektzusammenarbeit,
siehe auch Metadatentreen Hamburg
Unis Berlin/Koln:
Datenformate: GRIB, NetCDF (WMO-Standard)
Zugrisprotokolle: Berechtigung fur Datenzugri muss gepruft werden.
Metadatenschemata (inkl. Datentypen): entfallt.
PIK: Datenformate: fur 1. und 2. NetCDF, Grib Zugrisprotokolle:
Zugrisberechtigung muss gepruft werden Metadatenschemata: keine Bedingungen
DWD: Datenformate: XML Zugrisprotokolle: WEBService, OGSA-DAI/UNIDART
Metadatenschemata (inkl. Datentypen): Metadatenschema nach ISO 19115
Question 2.2: How many participants, resource/data providers and consumers are in your
domain? How are they distributed (local, Campus, D-Grid, international)?
Answers:
HH { at les:
local (MPI) : ca. 200 (ca. 75%)
Campus (ZMAW) : + 10 %
D-Grid : bis jetzt noch nicht, bzw. assoz. C-Grid-Institute (AWI,GKSS): + 10
%
international : + 5% (stark schwankend)
HH { CERA DB: Provider ca. 20, aber wechselnd, ca. 700 eingetragene Benutzer
Unis Berlin/Koln: International Provider: DKRZ, ECMWF, Hadley-Centre (England),
NCEP (USA),... Beteiligte am Diagnose-Cluster: z.Zt 4 in Koln, 4 in Berlin.
PIK: FR 1.{5.:
Ressourcen-Provider: lokal und international
Ressourcen-Consumer: Campus und C3Grid
Daten-Provider: Campus und international
Daten-Consumer: Campus, C3Grid und international
DWD: lokal, international (uber UNIDART ca. 12 Beteiligte)
38 / 48
C3Grid-T5.1-002-Draft
T5.1: Grid Data Management Architecture and Specication
B.3. Aspects of Data Management
Question 3.1: Access restrictions.
granularity: directories, les, databases, tables, rows
users, groups, organizations
Answers:
HH { at les: UNIX-groups und -User
HH { CERA DB: Zugriseinschrnkungen auf Tabellen Ebene, Metadaten sind frei
Unis Berlin/Koln:
Granularitat: Durch Benutzer- und Gruppenrechte deniert. Quota uber
Datenvolumen.
Benutzer, Gruppen, Organisationen: Nur mit Berechtigung
PIK:
Granularitat: Datei, Tabellenzeile
z. Zt. Berechtigung fur einzelne Nutzer
DWD:
Granularitat: Datenbanken, Tabellen
Benutzer
Question 3.2: Access methods/data search which meet certain criteria
test on exact match
test on membership in a range (range queries)
Answers:
HH { at les: - Zur Zeit Ordner im Schrank des jeweiligen Mitarbeiters
(Experimentbeschreibung angefangen im Dokuwiki)
HH { CERA DB: Meta DB vorhanden
Unis Berlin/Koln: Keine (nur at les, durch Namen und Speicherort identiziert).
PIK:
1., 2. exakt match und range queries
3.{5. range queries
DWD: Umfang SQL
B.4. Aspects of Metadata Management
Question 4.1a: Were can metadata be stored? At one of your hardware resources?
(requires installation of the respective middleware)
C3Grid-T5.1-002-Draft
39 / 48
T5.1: Grid Data Management Architecture and Specication
Answers:
HH: Datenbanken, lokale Ressourcen (Hardware)
Unis Berlin/Koln: ja!
PIK: auf eigenen Hardwareressourcen
DWD: Oracle / XML
Question 4.1b: Were can metadata be stored? At external resources?
Answers:
HH: fuer community Experimente bei M&D CERA DB Datenbank
Unis Berlin/Koln: ja
Question 4.2: How long can/must metadata be stored? Must they be archived?
Answers:
HH { at les: unbeschrankt, selbstbeschreibender Anteil kann geloscht werden
HH { CERA DB: unbeschrankt
Unis Berlin/Koln: solange Dateien existieren.
PIK: Solange die Daten vorgehalten werden sollen.
DWD: dauerhaft
Question 4.3: Do certain datatypes have to be supported for the attributes? which are these?
Answers:
HH: Was ist mit Datentyp genau gemeint ?
HH { at les ?: -> LDAP
HH { CERA DB: Speicherung der Daten in Blobs ?
Unis Berlin/Koln: Unklar.
PIK: nein
DWD: Nein
Question 4.4: What is the volume of the metadata (bytes)? How many attributes does each
schema contain?
Answers:
at les: ? Ordner
CERA DB: Abhangig vom Fokus der Metadaten, Core: ca. 6 GB, gesamt ca. 500 GB
PIK:
Gesamtvolumen Metadaten 10 MB
40 Attribute pro Eintrag
DWD: 659MB XML
40 / 48
C3Grid-T5.1-002-Draft
T5.1: Grid Data Management Architecture and Specication
C. Quotes from the Development Discussiony
This appendix gives selected excerpts from the discussion among C3Grid developers. The architecture
specication in this document is largely based on this discussion.
C.1. E-Mail
Date: Sat, 4 Feb 2006 13:13:27 +0100 (CET)
Subject: Re: [C3] Anwendungsszenario AP2, AP5
From: "Uwe Ulbrich" <[email protected]>
To: [email protected]
Cc: "Tobias Langhammer" <[email protected]>
Liebe Workflow-Mitarbeitende (und f
ur Tobias Langhammer zur Kenntnis),
ich gehe davon aus, dass es von unserer Seite noch keine Reaktion auf
diese auerst konstruktive Email gegeben hat (habe ich jedenfalls nicht
gesehen). Hier meine MEINUNG, die aber erst diskutiert werden sollte:
> 1. Der User stellt Suchanfrage nach Discovery-Metadaten (Portal->DIS)
>
-> Suche wird in globaler Metadatenbank durchgef
uhrt
>
-> Ergebnis sind Einheiten von Discovery-Metadaten
>
> 2. Auswahl des Users von Workflow-relevanten Ergebnissen im Portal.
>
> 3. Der User stellt Suchanfrage nach Use-Metadaten f
ur die Auswahl
>
von 2. (Portal->DIS)
>
-> Anfrage wird von DIS an lokale Metadatenprovider
>
weitergeleitet.
Ich bin mir nicht sicher, was den Unterschied zwischen Discovery und Use
Metadaten ausmacht, aber er erscheint mir auf den zweiten Blick hilfreich:
Ich gebe das mal mit eigenen Worten wieder:
1. Der User stellt seine Anfrage und bekommt als erstes die M
oglichkeit,
die in der globalen Datenbank verf
ugbaren Modellexperimente (oder met.
Beobachtungsdaten/Analysen) bez
uglich seines anzugebenden Workflows
auszuw
ahlen. Damit ist die prinzipielle Realisierbarkeit des Workflows
gew
ahrleistet. Er sieht hier noch NICHT, wo was f
ur Teildatens
atze oder
Replikate/Kopien vorliegen. Entscheidend ist nur, dass es eine
vollst
andige Urversion (f
ur die als erstes s
amtliche Metadaten erzeugt
wurden) existiert. Discovery w
are damit nur bezogen auf das vorliegen
einer Urversion.
2.-3. Die Auswahl wird abgeschickt. Der Nutzer erh
alt als Ergebnis wenige
M
oglichkeiten, auf Basis der Nutzung/Kombination welcher vorhandener
Datens
atze sein Workflow realisiert werden k
onnte. Eine M
oglichkeit ist
immer die Nutzung des urspr
unglichen Basisdatensatzes.
Hier bestehen sp
ater Erweiterungsm
oglichkeiten, z.b. abgesch
atzte
Zeitangaben f
ur Realisierung. Die Informationen uber Duplikate/Replikate,
die ein fortgeschrittener Nutzer f
ur seine erwarteten sp
ateren Aufgaben
anfordern kann entsprechen F
UR DIE BASISDATEN (also in den ersten Teilen
der Diagnose-Workflows) nach meinem jetzigen Verst
andnis dem Hamburger
DATENEXTRAKTIONS-WORKFLOW.
> 4. User vervollst
andigt die Workflow-Spezifikation um Use-Information
>
und sendet sie an den Scheduler
C3Grid-T5.1-002-Draft
41 / 48
T5.1: Grid Data Management Architecture and Specication
4. Damit ist die Auswahl der benutzten (ggf. zu kombinierenden) Datens
atze
erfolgt, und der Auftrag, die Daten zu holen, geht hinaus.
> 5. Scheduler stellt DMS den Auftrag zur Datenbereitstellung
>
-> DMS deligiert Auftrag an lokale Datenprovider
>
-> Datenprovider kopieren Daten + Metadaten auf lokalen Share
>
> 6. Scheduler stellt DMS den Auftrag Replikate an einem spezifizierten
>
Ort zu erstellen
>
-> auf den replizierten Daten wird ein Processing-Schritt
>
ausgef
uhrt, der neue Daten + Metadaten erzeugt.
>
> 7. Scheduler stellt DMS/DIS den Auftrag neue Metadaten in die globale
>
Datenbank aufzunehmen.
7. ...und damit stehen die Daten dem Nutzer selbst und allen anderen
Nutzern zur Verf
ugung.
Soweit entspricht das genau meinen Vorstellungen.
>
> ***************
>
> In diesem Szenario sind noch folgende Punkte f
ur uns ungekl
art:
>
> - Nach unserem Verst
andnis beschreiben Use-Metadaten u.a. a)
>
in welchem Format die Daten vorliegen und b) wie Daten uber Files
>
verteilt sind. Sind somit mehrere Use-Metadatens
atze einem
>
Discovery-Metadatensatz zugeordnet?
Nach meinem Verst
andnis ist a) korrekt, aber b) m
usste f
ur jeden
KOMPLETTEN Datensatz individuell gelten. Zwei Fallbeispiele:
i) es handelt sich um Daten in einer Datenbank: Dann sind von dort alle
Daten, die zum Ur-Datensatz geh
uren, als verf
ugbar gemeldet, sind also in
EINEM Metadatensatz zusammengefasst.
ii) es handelt sich um Flat Files. Die Metadaten enthalten Informationen
f
ur den Datensatz insgesamt.
Die M
oglichkeit der Extraktion aus einer Datenbank oder das entsprechende
Pr
a-Prozessing von Flat Files muss auch im Informationssystem behandelt
werden. Das ist doch Teil des Hamburger Workflows?!
>
>
>
>
>
- Wie werden nach einem Processing die Metadaten abgelegt?
Ein File mit Discovery & Use-Metadaten?
Oder ein File mit Discovery-Metadaten und viele Files mit
Use-Metadaten, d.h. also je erzeugtes Daten-File ein
Use-Metadaten-File?
Nach meinem Verst
andnis werden die Daten, die zu einer Ur-Version der
Basisdaten geh
oren, jeweils f
ur einen Flat File/eine Datenbank abgelegt
und gemeldet. Bei Abfrage der Metadaten zu einer Ur-Version werden alle
darauf bezogenen Metadaten gepr
uft.
Ich weise uber die jetzigen Fragestellungen weit hinausgehend darauf hin,
das kombinierte Datens
atze, die sich auf mehrere Basisdaten beziehen, hier
NICHT vorgesehen sind (auch wenn es vielleicht irgendwann ein nettes
Feature w
are...)
>
>
>
>
>
>
>
Zum 3. Schritt des Szenarios sehe ich eine Implikation die
vielleicht noch vorgehoben werden sollte: Das Ergebnis der Anfrage,
also die Use-Metadaten, bestimmt die Eingabe des Workflows. Sollte
ein Preprocessing auf Seite des Datenproviders vorgesehen sein, muss
dieser Use-Metadaten f
ur das Ergebnis erzeugen, noch vor der
Bereitstellung der eigentlichen Preprocessing-Ergebnisse. Der Grund
daf
ur ist, dass ja die Workflow-Spezifikation vor dem eigentlichen
42 / 48
C3Grid-T5.1-002-Draft
T5.1: Grid Data Management Architecture and Specication
> Preprocessing-Lauf stattfindet.
Das ist richtig. Hier stellt sich die Frage, wo Pre-Processing abl
auft,
und ob (ggf. wo) die Ergebnisse des Pre-Processing wieder verf
ugbar
gemacht werden sollen. Bei einer Datenbank, aus der die angeforderten
Daten liegen, wird man typischerweise die Preprocessing-Ergebnisse nicht
l
angerfristig ablegen (oder doch?). Sie w
urden aber auf einer geeigneten
Plattform (konkret: nach Abholen von CERA auf der UTF-Platte in Hamburg,
oder auf dem Filesystem des Nutzers) zu liegen kommen und w
urden von dort
per Metadaten gemeldet.
>
>
>
>
>
>
>
>
>
>
Dazu gleich noch eine Frage: Wird es ein Standard-Preprocessing
geben, das von allen Datenprovider unterst
utzt werden soll? Im
Meeting vom 13.01. (siehe
http://web.awi-bremerhaven.de/php/c3ki/index.php/Metadaten-Meeting-Marum)
haben wir bereits uber zeitliches und r
aumliches Ausschneiden
gesprochen. Das sind F
ahigkeiten die Datenbanken z.T. bereits mit
sich bringen. F
ur den Anbieter von Flat-Files bedeutet das aber
extra Implementierungsaufwand. Es ist f
ur ihn vielleicht auch nicht
sinnvoll, weil man dieses Processing besser in einem Workflow
spezifiziert, damit es im Grid verteilt laufen gelassen werden kann.
Hier stellt sich in der Tat die Frage, wo das Pr
aprocessing
stattfindet. Ich tendiere dazu, es als Teil des (Hamburger) Workflows
und damit als Teil der Diagnosecluster-Workflows zu sehen. Die Frage,
wo dieses Pr
a-Processing stattfindet ist als Hamburger Workflow im
Grid l
osen.
>
>
>
>
>
>
>
>
Somit wird es sehr individuelle Preprocessing-M
oglichkeiten geben
und mit der Zeit werden wohl auch neue hinzu kommen. Deswegen ist
es f
ur unser DIS/DMS wohl am sinnvollsten, eine
Preprocessing-Spezifikation generell als Black-Box vom Portal an die
Daten- und Metadaten-Provider durchzureichen. Das heit, die Suche
nach Variablen und Raum-Zeit-Ausschnitten usw. ist Aufgabe unseres
DIS, die Selektion von Daten- und Metadaten nach diesen Kriterien
aber nicht.
Ich sehe das jetzt als Hierarchie (und bitte um Korrektur, wenn ich da
falsch liege):
Das Pr
a-Processing IST der Hamburger Workflow. Dieser ist Workflow muss
also die Selektion, Pr
aprocessing und das Zusammenf
uhren der Daten f
ur die
weitere Bearbeitung umfassen. Die Diagnose-Workflows (die diesen Teil
implizit beinhalten) setzen darauf auf.
Ich hoffe, die Ideen sind so ok und widersprechen nicht den Vorstellungen
der anderen Diagnose-Cluster-Mitstreiter.
Viele Gr
ue, Uwe
> Sch
one Gr
ue,
> Tobias
>
> -> Dipl-Inf. Tobias Langhammer
>
> Konrad-Zuse-Zentrum f
ur
> Informationstechnik Berlin (ZIB)
> Department Computer Science Research
>
> Takustr. 7, D-14195 Berlin-Dahlem, Germany
>
> [email protected]
>
> Tel: +49-30-84185-410, Fax: +49-30-84185-311
--------------------------------------------------
C3Grid-T5.1-002-Draft
43 / 48
T5.1: Grid Data Management Architecture and Specication
Prof. Dr. Uwe Ulbrich
Institut fuer Meteorologie
Freie Universitaet Berlin
Carl-Heinrich-Becker-Weg 6-10
12165 Berlin
Tel.: +49 (0)30 838 71186
Fax: +49 (0)30 838 71128
email: [email protected]
C.2. C3Grid-Wiki
C.2.1. WF im Portal (Stand 06.03.2006)
Aunden von Daten:
Vorschlag einer mehrstugen Suche uber Keywords und/oder ...
uber Keywords:
Projekt (IPCC, WOCE...)
Modell (ECHAM, OPA, MPI-OM,...), Beobachtung (...), ...
Kompartiment (Atmosphare, Ozean, Lithosphare, ...)
Provider (WDCs, Institutionen)
Auswahl:
Interessenbereichs: zeitlich, raumlich, Variablen
Processing/WF
Anzeige:
vor Anfrage: Groe, Format, DB oder Rohdaten, Ort, (Erstellungszeit ?) ausgewahlter Daten vermutliche Groe Outputle, vermutliche Zeit fur Bereitstellung+Processing
nach Anfrage: Stadium der Anfrage, vermutliche Restzeit fur Bereitstellung+Processing
C.2.2. Metadaten im WF (Stand 06.03.2006)
Granularitat in Workow:
Vorschlag ein Metadatenle pro WF (Processings) bzw. Outputle D.h. fur den reprasentativen
ECHAM-WF, dass bei Rohdatenaccess nur ein Metadatenle verarbeitet wird, um eine Zeitreihe aus
den Einzelles zu erzeugen
Vorteile:
wenige Metadatenles mussen verarbeitet werden (bessere Performance der Tools)
Problem der Reduktion von Historyeintragen bei zusammenfugen der einzelnen processierten
Files zu Zeitreihe entfallt
Nachteile: ...
Alternative: ein Metadatensatz pro angefragtem Datensatz
44 / 48
C3Grid-T5.1-002-Draft
T5.1: Grid Data Management Architecture and Specication
C.2.3. AG-Metadaten (Auszug, Stand 06.02.2006)
Discovery Metadaten
Metadaten die zum Finden von Datenprodukten genutzt werden. Eine erstes Treen der Metadatengruppe ergab eine Einigung auf ISO 19115 konforme Beschreibung der Discovery Metadaten, siehe
AG-Metadaten-Meeting1
Nachstes Treen am 13.03 in Bremen: MetadatenTreen Bremen
Use Metadaten
Metadaten, die zur Nutzung (z.B. in Processing Tools) von Datenproduktinstanzen genutzt werden
Austausch von Metadaten
Festlegung des Protokolles zum Austausch von Discovery Metadaten zwischen den Datenanbietern
und dem zentralen C3Grid Metadaten Katalog
C.2.4. AG-Metadaten-Meeting1 (Stand 06.02.2006)
Wesentliches Ergebnis der ersten Metadatensitzung war die Festlegung auf den ISO19139 Metadatenstandard. Aktuelle Schemas in der (fast) nalen Version: http://ws.pangaea.de/schemas/iso19139/
Einstiegspunkt ist http://ws.pangaea.de/schemas/iso19139/gmd/metadataEntity.xsd
Es wurde versucht, die Elemente des Schemas mit den in den Datenzentren vorhandenen Metadaten
zu korrelieren und zu kommentieren. Das Ergebnis ist nicht vollstandig, im Gefolge der Arbeit von
Uwe Schindler und in Gesprachen mit Wolfgang Kresse (TC211) haben wir noch einiges erganzt und
geandert.
Im folgenden einige Kommentare zu wesentlichen Elementen in XPath Notation:
MD Metadata
{ contact : der Kontakt fur die Metdatenbeschreibung - nicht PIs oder Autoren des Daten-
satzes!
 nderung der Metadaten
{ dateStamp: letzte A
{ identicationInfo
citation: bibliographische Daten
extent mit BBOX ist mandatory fur C3; Punkt ist BBOX mit gleichen Koordinaten;
verticalExtent fehlt noch CRS Denition (Coordinate Reference System), Citation.Identier
m. oder ohne Lokalisierung des Datensatzes, unter ResourceConstraints Zugrisbeschrankungen
{ contentInfo :
MD CoverageDescription (n mal): hier konnen Parameter/Variablen/Megroen etc
incl. Denition von Methoden, Medium und Einheiten gelistet werden. Parameterlisten
sollten untereinander abgeglichen werden und konnen uber xlink zitiert werden.
MD FeatureCatalogDescription: Zitat der Parameterliste, Wetterstationsliste (DWD)
oder Modellbeschreibungen (CF) als Gesamtheit.
{ distributionInfo : Hier die URI auf den Datensatz oder allgemeine Info, wie man auf den
Datensatz zugreifen kann, unter Format listen, in welchen Formaten die Daten verfugbar
sind, braucht noch Feinschli (Formatwandlungen etc.)
C3Grid-T5.1-002-Draft
45 / 48
T5.1: Grid Data Management Architecture and Specication
{ dataQualityInfo :
scope : Typ der Daten (z.B. "Datensatz")
lineage/source : Referenzen auf den Datensatz
lineage/processStep: Infos zu Stations- oder Beprobungungsevents, Datenprozessierungsschritte
report : Freitext zur Qualitatsbeschreibung
{ series : Durch Verweis auf den ubergeordneten Datensatz konnen Hierarchien und Verknupfungen
(Replikate, Versionen) aufgebaut werden.
Beispiele
Die folgende URLs geben Beispiele fur die Metadaten aus PANGAEA:
http://doi.pangaea.de/10.1594/PANGAEA.80967
http://doi.pangaea.de/10.1594/PANGAEA.75909
Das Gleiche als xml entsprechend dem internen proprietaren PANGAEA Schema:
http://ws.pangaea.de/oai/?verb=GetRecord&metadataPrex=pan md
&identier=oai:pangaea.de:doi:10.1594/PANGAEA.80967
http://ws.pangaea.de/oai/?verb=GetRecord&metadataPrex=pan md
&identier=oai:pangaea.de:doi:10.1594/PANGAEA.75909
Und schlielich entsprechend ISO19139. Die Befullung ist hier bereits wesentlich vollstandiger. Einzelne
Punkte sind vielleicht nicht ad hoc verstandlich und mussen weiter diskutiert und dokumentiert werden.
Generell konnen die Beispiele aber als Vorlage fur die Implementierung in den anderen Datenzentren
genommen werden:
http://ws.pangaea.de/oai/?verb=GetRecord&metadataPrex=iso19139
&identier=oai:pangaea.de:doi:10.1594/PANGAEA.80967
http://ws.pangaea.de/oai/?verb=GetRecord&metadataPrex=iso19139
&identier=oai:pangaea.de:doi:10.1594/PANGAEA.75909
Das Metadaten xml ist eingeschlossen in einen von OAI (Open Archives Initiatives) vorgegebenen
Rahmen. Die eigentlichen Metadaten benden sich innerhalb des <metadata> Elements.
C.2.5. Metadaten-Meeting-Marum (Stand 06.03.2006)
Wesentliche Ergebnisse des Treens:
1. die gemeinsame Arbeit an der Denition einer WSDL fur Datenabfragen
2. ISO 19115+19139 als Metadatenschema wurde erneut diskutiert
prototypische Spezikation von In- und Output des Webservice:
INPUT (vgl. a)
1. Suchkriterien: Parameter, Raum, Zeit, Liste von Identiern
min 1 Identier oder 1 Parameter muss gegeben sein
Identier sollten persistent sein, resultieren aus vorgehender Suche im Portal
46 / 48
C3Grid-T5.1-002-Draft
T5.1: Grid Data Management Architecture and Specication
uber Portal muss gewahrleistet sein, dass sinnvolle Abfragen entstehen
2. Ziel f. Lokalisierung des directories f. Result les
default lokal
ggf. auch als output, dann als erstes generiert und zuruckgeworfen
3. Output format
4. Preprocessing (vgl. b)
entspricht GAT od.CDOs (HH)
als Liste von gewunschten Aufgaben
OUTPUT
1. Zeitabschatzung f. Prozessierung der Anfrage
optional
2. Groenabschatzung des Result Sets
optional
3. Fehler
Jeder Job erzeugt mindestens zwei les
1. Datenle(s)
2. Metadatenle
als Kompilat oder Zitatliste (noch oen)
a) GAT o.a sollte als Vorlage einbezogen werden
b) kann auch sequentiell als separate nachgeschalter job abgearbeitet werden
Metadaten
1. Discovery Metadaten
OAI-PMH - entweder DIF oder ISO19115
2. Metadaten f. Dateninformationsdienst
halt nur dynamisch erzeugte Eintrage
Metadaten werden datenbankmaig erfasst f. Darstellung im Portal und Statistik etc.
C.3. Mailing Lists
C3Grid-T5.1-002-Draft
47 / 48
T5.1: Grid Data Management Architecture and Specication
Bibliography
[Bra97] S. Bradner. RFC 2119: Key words for use in RFCs to indicate requirement levels, March
1997.
[DG] D-Grid. Collaborative climate community data and processing grid (C3Grid), project proposal.
[DOI] The digital object identier system.
http://www.doi.org.
The International DOI Foundation (IDF),
[Fuh] Patrick Fuhrmann. dCache { the overview. Technical report, Deutsches Elektronen Synchrotron Notkestrasse 85, 22607 Hamburg. http://www.dcache.org.
[LS06a] Tobias Langhammer and Florian Schintke. T2.1: Grid Information Service Architecture and
Specication. Collaborative Climate Community Data and Processing Grid (C3Grid), 3 2006.
[LS06b] Tobias Langhammer and Florian Schintke. T5.1: Grid Data Management Architecture and
Specication. Collaborative Climate Community Data and Processing Grid (C3Grid), 3 2006.
48 / 48
C3Grid-T5.1-002-Draft
Herunterladen