Collaborative Climate Community Data and Processing Grid (C3Grid) T5.1: Grid Data Management Architecture and Specication Work Package: Author(s): Version Idenitier: Publication Date: Work Package Coordinator: Partners: Contact: E-mail: AP 5: Grid Data Management Tobias Langhammer, Florian Schintke C3Grid-T5.1-002-Draft March 2006 Zuse Institute Berlin (ZIB) ZIB, GKSS, Uni Kln FUB, DLR, DWD Tobias Langhammer, Florian Schintke [email protected], [email protected] T5.1: Grid Data Management Architecture and Specication Contents Contents 2 1. Introduction 1.1. Notationy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2. Common Termsy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 4 5 2. The C3Grid Projecty 7 3. Global C3Grid Architecturey 8 4. The C3Grid Grid Data Management System 4.1. Requirements . . . . . . . . . . . . . . . . . . . . 4.2. Available Tools and Solutions . . . . . . . . . . . . 4.2.1. dCache . . . . . . . . . . . . . . . . . . . 4.2.2. Storage Resource Broker (SRB) . . . . . . 4.2.3. gridFTP . . . . . . . . . . . . . . . . . . . 4.2.4. Chimera . . . . . . . . . . . . . . . . . . . 4.3. Open Challenges . . . . . . . . . . . . . . . . . . 4.3.1. Grid State Prediction . . . . . . . . . . . . 4.3.2. ISO19115 as Scheme for Climate Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 9 9 10 10 13 14 14 14 14 5. Architecture of the Grid Data Management System 15 5.1. Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 6. AP5 Use Cases 6.1. Querying the Current State . . 6.2. Querying State Predictions . . 6.3. Agreements . . . . . . . . . . 6.4. Staging and Transferring Files 6.5. Registration of Generated Files 6.6. High-level File Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 19 20 20 21 21 22 7. Interfaces of the Grid Data Management System 7.1. Interface for the Workow Scheduler . . . . . . . . . . . . . . . . . . . . 7.1.1. Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.2. Datatypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.3. Operations of the DataService . . . . . . . . . . . . . . . . . . . 7.2. Interface of the Primary Data Provider for the Data Management Service 7.2.1. Operations of the DatabaseAccess . . . . . . . . . . . . . . . . . 7.2.2. Operations of the FlatFileAccess . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 24 24 24 24 27 27 28 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8. Authorizationy 2 / 48 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 C3Grid-T5.1-002-Draft T5.1: Grid Data Management Architecture and Specication 9. Conclusion 30 A. Interface Spezication Syntaxy 31 B. Questionnaire for Users and Data Providers B.1. Important Aspects . . . . . . . . . . . . . B.2. Miscellaneous . . . . . . . . . . . . . . . B.3. Aspects of Data Management . . . . . . B.4. Aspects of Metadata Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 32 38 39 39 C. Quotes from the Development Discussiony C.1. E-Mail . . . . . . . . . . . . . . . . . . . . . . . . . . . C.2. C3Grid-Wiki . . . . . . . . . . . . . . . . . . . . . . . . C.2.1. WF im Portal (Stand 06.03.2006) . . . . . . . . C.2.2. Metadaten im WF (Stand 06.03.2006) . . . . . C.2.3. AG-Metadaten (Auszug, Stand 06.02.2006) . . . C.2.4. AG-Metadaten-Meeting1 (Stand 06.02.2006) . . C.2.5. Metadaten-Meeting-Marum (Stand 06.03.2006) C.3. Mailing Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 41 44 44 44 45 45 46 47 . . . . . . . . . . . . . . . . . . . . . . . . Bibliography . . . . 48 C3Grid-T5.1-002-Draft 3 / 48 T5.1: Grid Data Management Architecture and Specication 1. Introduction This document describes the results of the discussion on the architecture, interfaces, and functionality of the C3Grid Grid Data Management. The document is structured as follows ... 1.1. Notationy The y-Sign The two documents [LS06a, LS06b], which describe the C3Grid Grid Information Service and the C3Grid Grid Data Management System are related to each other. To make each document self contained, some sections are shared between both documents and printed in the exactly same way in both documents. Such chapters, sections, gures, etc. are marked by the y -sign. Interface Denitions For the specication of interfaces, this document uses a notation, which is described in the following. Datatypes used are common basic datatypes (like int, double, string) and constructed types. Constructed types can be lists, pairs, records, enumerations or variants. They can be bound to new names by a type denition. For example, type foo = (int,double) list denes a new type foo which is a list of (int,double) pairs. For dening a record with an int and a oat entry we write type foobar = f foo : int, bar : double g For dening enumerations we write type color = ( Red j Green j Blue j Yellow ) Variants are a generalization of enumerations, where each handle can be assigned a type type color = Name string j RGB (int, int, int)1 Service Interfaces are decribed in sections called 'Operations of the FooService'. Such section lists the respective operations of the interface. Callers can use the functionality of the interfaces by invoking the listed operations remotely (using Web Services). Remote operations are given by their signature (consisting of the name, parameter list and return type), a short functional description and a detailed list describing the parameters, the return value and exceptions. The following example demonstrates this notation. 1 Note that a variant data type can be mapped to a C union. 4 / 48 C3Grid-T5.1-002-Draft T5.1: Grid Data Management Architecture and Specication foo (in huey : double, inout dewey : bool, out louie : string ) : int Example operation with some special functionality. Parameters: in huey Example input argument which is a oating point value inout dewey Example boolean argument which is modied by the operation. out louie Example string argument which is set by the operation. Example integer return value. Returns: For a detailed BNF-style description of the interface denition syntax see Appendix A. 1.2. Common Termsy The terms must, must not, should, should not and may are used in accordance to RFC 2119 [Bra97]. This specication also uses the following terms. base data: data which is not metadata, i.e. actual climate data les. grid sites: sites within the C3Grid providing data capacity (workspaces) for processing and storing les temporarily. metadata: Data that describes other data. In the C3Grid we distinguish metadata by their origin and purpose. primary metadata: metadata generated and stored at primary data sites. Though parts of this metadata may be cached by the Data Information System, the original copy always remains out of reach for C3Grid middleware. grid metadata: metadata generated in the grid workspaces and managed by the Data Information Service. This metadata also describes data only stored in the grid workspaces. discovery metadata: data necessary to nd climate data in the C3Grid. use metadata: metadata describing how the processing tools can access a data object. le metadata: metadata used by the Grid Data Management System in order to track the location of replicated les. Note, that the set of primary metadata objects and the set of grid metadata objects are mutually exclusive. Discovery and use metadata are either primary or grid metadata, le metadata are always grid metadata. primary data: Read-only data at a primary site. The creation of primary data is outside the scope of the C3Grid. primary site: Organisation providing a large data repository which allows read-only access (DBMS, archive,. . . ) process: In terms of workows, an atomic processing unit. In terms of operating system, a running program. workspace: Local (to a grid site) disk space, where data is made available for processing. C3Grid-T5.1-002-Draft 5 / 48 T5.1: Grid Data Management Architecture and Specication DOI: Digital Object Identier [DOI]. A system for identifying objects on the Internet. The DOI of a data object is unique and does not assign it to a certain location like a URI. DOIs provide a persistent way of accessing data objects by a mapping service. In the C3Grid DOIs are used for identifying single datasets. The data service provides DOIs to identify search results. These DOIs can then be used for requesting primary data from the data repositories. 6 / 48 C3Grid-T5.1-002-Draft T5.1: Grid Data Management Architecture and Specication 2. The C3Grid Projecty The aim of the Collaborative Climate Community Data and Processing Grid is to develop a productive, grid-based environment for the German earth system science, that supports eective, distributed data processing and the interchange of high volume datasets from models and observations between participating institutions. The collaborative system as a whole will provide unied and transparent access to the geographically distributed archives of dierent institutions and simplify the processing of these data for the individual researcher. Specic challenges are to provide a collaborative platform for accessing and processing huge amounts of data, as well as the eective management of community specic meta information. Earth system research studies the system of earth and the processes within, and wants to develop models to predict its climate changes. The C3Grid will provide the infrastructure to share data and processing power across institutional boundaries. C3Grid-T5.1-002-Draft 7 / 48 T5.1: Grid Data Management Architecture and Specication 3. Global C3Grid Architecturey The C3Grid provides an infrastructure which is designed to the specic needs of the community of climate researchers. It provides a means of collaboration, integrating the heterogeneous data sources of the participating organizations and oering easy access by a common users interface. The main application of the C3Grid is the analysis of huge amounts of data. To increase the eciency the system provides automatic reduction or pre-processing operations and exploits data locality, replication and caching. user Userinterfaces API GUI Nutzer Grid scheduling Grid data management Distributed Grid infrastructure Grid information service data-transfer service userrmodules Local interfaces to institutional resources data metadata Existing resources of the institutes jobs pre-processing access to archive distributed distributed data archives data archives distributed distributed computecomputeresources resources data archive Figure 3.1.: Architecture of the C3Grid. The architecture overview of Figure 3.1 shows the major components of the C3Grid in their operational context. Existing data repositories are integrated by dedicated interfaces for providing primary base data and respective metadata to the middleware. Pre-processing capabilities allow a rst on-site data reduction. A distributed grid middleware consists of an information service, data management and transfer services, and a scheduler. Actual jobs are executed on data which was extracted (staged) from primary repositories and made available on a local disk share. Each processing not only produces new base data, but also generates new metadata from metadata describing the original input. The information services provides a comprehensive search for resources, primary data available in local repositories, and data created at local grid shares. The Portal uses these services to assist users in their data analysis workows. 8 / 48 C3Grid-T5.1-002-Draft T5.1: Grid Data Management Architecture and Specication 4. The C3Grid Grid Data Management System Still to be written: Responsibilities of the Service. 4.1. Requirements The following requirements for the Grid Data Management System are specied in the C3Grid project proposal [DG] (R1) The system must coordinate the access and transfer of data in the grid. Files must not be created or transferred unless it is necessary. The system must be economic in handling huge amount of data. (R2) The system must provide consistent access to all data provided by primary repositories. (R3) On request, the system must provide replicas at processing resources. (R4) The system must provide information about transfer times by referring to knowledge about network topologies and bandwidths. (R5) The system must provide information about replicas. (R6) The system must be tightly coordinated with the Workow Scheduler to support it in providing an ecient data and process scheduling. (R7) The system must be fault tolerant, i.e., in the case of a breakdown it should recover automatically. Special requirements derive from the answers by the C3Grid users and data providers given in the questionnaire of Appendix ??. and from the discussion among the C3Grid development groups (see Appendix C) (R8) The system must be usable with databases (like CERA or Pangaea) as well as at le storages, like storage area networks (SAN) and hierarchical storage management systems (HSM). (R9) The system must support data providers of a total amount of several hundred terabytes, data sets of several hundreds of gigabytes and les of several gigabytes. (R10) For staging data from databases, the system must pass pre-processing specications from the workow specication to the data provider. The system may not be aware of the structure of these specications. 1 (R11) The system must support metadata as input and output les of sets at least as les. 4.2. Available Tools and Solutions Zu 1. 2. 3. 4. 1 untersuchen hinsichtlich Transfer Manamgement und Replikate Zugriff terti are Speicher globales Verzeichnis existierender Speicher See also the quotes from the C3Grid wiki in Section C.2.5 and the use case discussed in the e-mail of Section C.1. C3Grid-T5.1-002-Draft 9 / 48 T5.1: Grid Data Management Architecture and Specication Tools: dCache (3) SRB (1,2,3) gridFTP, GSI openSSH (1) Chimera (2) EGEE? 4.2.1. dCache DCache [Fuh] is a data management system which aims to provide the storage and retrieval of huge amounts of data. This data is distributed over a big number of heterogeneous data nodes. Transparent access is provides via a logical le system. The exchange of data between the backend servers is automated. the system provides features like disk space management, automatic data replication, hot-spot detection and error recovery. External hierarchical storage systems (HMS) are integrated transparently and automatically. An NFS interface allows name space operations (no reading and writing of les). DCache is a development of the CERN in the context of a high energy physics experiment which will produce a continuous stream of 400 MB/sec in 2007. The experiment consists of three tiers. A single tier 0 site (CERN) will be the main data source. Few tier 1 sites save this data in a distributed scheme providing persistence through tape backups. Many tier 2 sites provide additional CPU and storage resources. The architecture of dCache consists of a central Workload manager and resource broker accepting jobs and passing them to local sites. A site consist of a compute element (CE) accessing a storage element (SE). Each SE is controlled by a Storage Resource Manager (SRM) and is connected to remote SEs to exchange data via GsiFTP or GridFTP. Applicability for the Grid Data Management System DCache meets very few of the requirements of the Grid Data Management System. As its name already suggests, it is mainly a distributed cache integrating dierent sources of a single virtual organization. This architecture does not apply to the independent administrative domains of the C3Grid data providers. The transparent replication mechanism of dCache does not meet the need for the close collaboration between the Grid Data Management System and the Workow Scheduler. This collaboration involves detailed agreements about transfer and staging times. Furthermore, with respect to the requirements of AP2 [LS06a], dCache does not provide the managment of additional metadata. 4.2.2. Storage Resource Broker (SRB) The Storage Resource Broker (SRB) is a data management system which has been developed by the San Diego Supercomputer Center (SDSC). In the beginning, this development was motivated by the need for a uniform access to the data of the SDSC. Today it is deployed in many dierent projects worldwide. The SDSC managed projects alone use SRB for a total of 626 TB, 99 million les and 5000 users. SRB provides uniform access to heterogeneous data resources, like le systems and relational databases. Uniformity is reached by a logical namespace and a common metadata and user management. SRB also provides the creation and management of le replicas. 10 / 48 C3Grid-T5.1-002-Draft T5.1: Grid Data Management Architecture and Specication Table 4.1.: Repository types supported by SRB.y Abstraction Systems Database IBM DB2, Oracle, Sybase, PostgreSQL, Informix Storage Repository Archives { Tape, SAM-QFS, HPSS, ADSM, UniTree, ADS ORB File systems { UNIX, NT, Mac OSX Databases { DB2, Oracle, PostgreSQL, mySQL, Informix Components of the SRBy An SRB deployment is structured in three layers. The physical layer consists of the dierent types of data repositories SRB provides access to. There are two abstractions providing either a database or storage repository style access. Table 4.1 gives an overview of the repositories supported so far. The central layer of an SRB deployment consists of the SRB middleware, which provides the main functionalities { logical namespace, latency management, data transport and metadata transport. The functionalities are used by a common consistency and metadata management. The middleware also provides authorization, authentication and audit. The application layer consists of a collection of tools and interfaces providing high-level access. The following list gives an uncompleted overview. APIs for C, Java, Python, Perl, ... Unix shell commands. Graphical user clients (NT Browser, mySRB ...). Access via HTTP, DSpace, OpenDAP, GridFTP Webservice-based access via OAI, WSDL, WSRF. Federated Zonesy The typical distributed architecture of the SRB middleware is distributed in administrative units, called SRB zones. A zone consists and one or many SRB servers. One server also keeps the metadata catalog (MCAT) of the SRB zone. The MCAT contains information like user information, access rights and resource information. It is deployed in a relational DBMS. Each SRB servers can provide access to several resources. For the connection of dierent zones, SRB provides the federated zones mechanism. It allows mutual access of resources, data and metadata of several zones. Though, federation user accounts of each local zone are made available in all remote zones. Still, these zones remain independent administrative domains. Synchronization of the MCATs of federated zones is reached by the periodical execution of a provided script. Metadatay Metadata stored in the MCAT can be classied by their purpose. administrative and system metadata allow the mapping from logical to physical le names and to authenticate users. C3Grid-T5.1-002-Draft 11 / 48 T5.1: Grid Data Management Architecture and Specication user metadata provide a means of describing data objects by attribute values. extensible schema metadata use the facility SRB provides to integrate external metadata schemas into the MCAT. SRB provides a detailed search on its MCAT metadata, e.g., for le creation times or user-given attributes. Replica Managementy Replication is used by client or server side strategies which aim to enhance the access time by providing a le closer to the location it is needed. Replicas can be created in dierent ways: by copying manually, implicitly by logical resources or by user registration. Replicas are synchronized by a call of a special command. Rights Managementy Users of an SRB Zone must be registered in the respective MCAT. For Authentication several techniques are provided (password, GSI, etc.), though, they are not supported by all clients. The denition of user groups is possible, as well. The SRB server provides ne-grain authorized access on its object by access control lists (ACL). An additional concept of authorization is via tickets. Users which access the SRB with tickets need not be registered in the MCAT. Scalability by Master and Slave MCATsy In order to enhance the response time of MCAT operations, the MCAT can be replicated to several SRB servers in a single zone. These MCAT replicas are able to share the load of requests. Reading requests are done on one of the many slave MCAT. Modifying access is handled by a dedicated master MCAT. To be able to replicate MCATs the underlying DBMS must support database replication. Logical Resourcesy Another feature of the SRB server is to provide a single logical view for a number of physical resources. A special application for this is the automatic replication of objects to several servers. For example, two servers may each have a local physical resource and share both as single logical resource. By setting the number of replicas to 2, a replication on both physical resources is guaranteed. Other Featuresy The SRB comprises the following additional features. Extraction of data can be combined with preprocessing, e.g. to generate previews of images. Many small les can be combined in containers to prevent their fragmentation over several tape archives. Support of parallel I/O operations. Bulk operations, which enhance the transport of many small les. 12 / 48 C3Grid-T5.1-002-Draft T5.1: Grid Data Management Architecture and Specication SDSC SRB vs. Nirvana SRBy Currently, there are two development branches of the Storage Resource Broker. SDSC maintains the open source branch of the SRB development, whereas Nirvana oers a commercial version of SRB. Though Nirvana SRB originates from the SDSC branch, both versions are mutually incompatible. The following lists gives a brief overview of the dierences between both versions. Nirvana SRB does not support federated zones Nirvana SRB uses system daemons to guarantee the synchronicity of replicas and the global namespace. SDSC SRB uses a rsync-like mechanism, which has to be called externally. Nirvana SRB supports drivers for a wider range of repository types The licence of SDSC is restricted to non-prot use. The short-term use of the C3Grid would meet this criteria. Nirvana provides commercial support. SDSC provides free community support. Applicability for the Grid Data Management System SRB provides many features which are required in the Grid Data Management System which makes it a candidate to be assessed for two types of deployments: as main component of the Grid Data Management System and as multiple deployment providing coherent access on data at local sites individually. In the rst case, SRB's federated zones are a must, because C3Grid sites remain independent administrative domains. This requirement is met by the SDSC SRB and rules out the application of Nirvana SRB. The replica management of SRB provides basic functionality. Nevertheless, the close coordination with the Workow Scheduler to provide ecient data scheduling and process scheduling implies agreements about the availability of replica at dened periods of time (also planned periods in the future). Therefore, a lot of extra coding is required to create, transfer and manage replicas according to the agreed lifetimes. Also extra coding is required to make the Grid Data Management System aware of metadata les. The registration of a le as metadata can be done by dening a special attribute in the MCAT, the integration of the ISO 19115 in the DIS needs to be solved (see also [LS06a]) SRB is especially suited to access grid processing workspaces or le storage systems. Database access would require a deep intrusion in existing DBMS, by-passing existing abstractions and interfaces which already provide features required in the C3-Grid (spatial, temporal cuts, aggregations, etc.). The implementation of le transfers in SRB, using techniques like parallel I/O, suites well the need of optimized data exchange between C3Grid sites. Also an important requirement of the C3Grid, the user management and authorization scheme of SRB could be integrated in the Grid Data Management System. The second type of deployment, i.e., multiple SRB installations at the sites, may be considered by local data providers and is outside the scope of the Grid Data Management System. The data management could use the SRB interface as common interface for accessing all data providers. Nevertheless, this setting does not seem reasonable because it requires all data providers to either use SRB or implement also the parts of the interface which are not used. 4.2.3. gridFTP GridFTP is a protocal for secure, robust, fast and ecient transfer of data. The Globus Toolkit 4 provides an implementation of GridFTP, which is the most commonly used tool for this protocol. It C3Grid-T5.1-002-Draft 13 / 48 T5.1: Grid Data Management Architecture and Specication conists of a GridFTP server, which has access to data through an appropriate Data Storage Interface. This typically means a standard POSIX le system, but can also be a storage system like the SRB. To access remote GridFTP servers, GT 4 also provides a respective client. It is capable of accessing data via a range of protocols (http, https, ftp, gsiftp, and le). Because the client is a command line tool it is especially suited for scripting solutions. For special demands, GT4 provides a set of development libraries for custom client operations. Applicability for the Grid Data Management System GridFTP is a suitable tool for transfering les between C3Grid sites. It can be integrated in any implementation of a Grid Data Management System quite easily. Still to be written. 4.2.4. Chimera Still to be written. 4.3. Open Challenges 4.3.1. Grid State Prediction A key unique feature of the C3Grid middleware is the ability to use knowledge about access and transfer times to optimize le staging and job execution. Because the DMS directly communicates with the data providers and executes le transfers within the grid, it also needs to request and manage temporal information. Based on this information, the Grid Data Management System provides the Workow Scheduler with the following information. The estimation of staging and transfer times The estimation of le availability at a given location in the future The estimation of future disk space use Furthermore, the DMS reaches agreements with the Workow Scheduler basing on this information. So the actual execution plan of workows will be based not only on knowledge at present but also on predictions about the planned grid state in the future. 4.3.2. ISO19115 as Scheme for Climate Metadata Still to be written. Timeconstraints, AP2, 5, 6-coupling, ISO19115 14 / 48 C3Grid-T5.1-002-Draft T5.1: Grid Data Management Architecture and Specication 5. Architecture of the Grid Data Management System portal distributed grid Infrastructure GUI/API AP2 A global DIS AP5 B AP6 workflow scheduler global DMS catalog local resources and interfaces at the institutes C D primary metadata primary data DIS: Data Information Service DMS: Data Management Service process execution local DMS local DIS preproc. workspace metadata data direction of request internal data flow X external interface Figure 5.1.: Overview of the external interfaces of the data information service (DIS) and the data management service (DMS).y 5.1. Design The main purpose of the Grid Data Management System (DMS) is to access data stores and to manage les used as input or output of computations at grid compute resources. It must co-operate with the following components of the grid middleware. the Workow Scheduler, issuing staging requests the local data providers, oering primary base data and respective metadata the data information system, keeping track of which meta-information corresponds to which data. Virtualization of Files The DMS aims to oer a high-level view on data by hiding the actual location of les. The users should be able to specify workows by giving the required input data and compute capabilities without the need to know where these resources are actually located. Consequently, the DMS manages le transfers at a lower level. It also keeps track of distributed copies of a le to be able to refer to its metadata. To avoid confusion, we use the term logical le, replicated le or simply le, if we refer to the high-level view of the location-independent data unit. The term replica is used for a single remote C3Grid-T5.1-002-Draft 15 / 48 T5.1: Grid Data Management Architecture and Specication copy of a le. This term also implies that the DMS knows about the respective representation as logical le. Local Workspaces Each C3Grid site oers a le system share which we call workspace. A workspace is used as target location for les which were requested from data providers or produced by grid jobs. The grid workspace at each site is identied by a root path pointing to a location in the le system. All sub-directories of this root path are dedicated to be controlled by the the DMS. Especially, no other user of the local system should have read or write access to this le share. Primary Data Providers Primary data providers are the main source of data for the processing tasks of the grid. The way the DMS can access them mainly depends on whether they are storage systems providing at les or databases providing a more exible access. If the DMS gets a staging request, in the rst case, it simply keeps a reference to the location of the requested le. In the case of databases, the DMS must stage data to at les in order to make it accessible for the processing. Because databases already implement pre-processing capabilities (e.g., by selective queries of temporal or spatial cuts), the DMS provides extra operations for this type of data provider. Output of such a pre-processing is a new data object which is stored only in the grid workspaces and has no counterpart in the database it originates. Note that we consider pre-processing a processing capabilitiy which is provided as integral feature of a data repository. Independent tools may also provide on-site data reduction (e.g., CDOs) but are better specied inside a job to be passed to the Workow Scheduler. Only this way the distributed grid environment can be used. Internal Structure The Grid Data Management System consists of two conceptually independent components: a gridwide global DMS and many local DMS. The global DMS is closely coupled with the Data Information Service (DIS) [LS06a] of AP2, with which it shares a common data model, which is depicted as entity-relationship diagram1 in Figure 5.2. This data model contains not only DMS-related information, but also DIS-related information and relations between both. The most important aspects of this model are as follows. A single set of discovery metadata describes one or many data les. For each set of discovery metadata describing les in the grid there is an extra le containing this metadata.2 The respective discovery metadata of a le can be identied by referring to the metadata object identier OID. A le has a name and a workspace path. 1 An entity-relationship diagram uses two kinds of concepts: entities (boxes) and relations between entities (diamonds connecting boxes). Both kinds of concepts can have attributes (ovals). Key attributes (underlined attribute name) uniquely identify instances of the respective entity. Relations are annotated with (mi n; max ) information. A pair (n; m) between an entity E and a relation R indicates that an instance of E participates in R at least n and at most m times. 2 see also Section C.2.2 16 / 48 C3Grid-T5.1-002-Draft T5.1: Grid Data Management Architecture and Specication Data Information System Data Management System (0,1) OID keeps meta host (0,1) ISO model disc. meta project search attributes (1,*) describes (1,1) file (1,*) is located logical path refers to max life primary path min life (1,1) format replica (1,1) URI OID (1,1) pin Cera use meta Pangaea use meta ... use meta Grid Workspace use meta stored/generated at local providers Figure 5.2.: Entity-relationship diagram of the common data model of the Data Information Service and the Grid Data Management System.y A replica has a host where it is saved, a lifetime range, and a pinning ag to save it from automatic deletion. A replica may have a primary path if it is not located below the workspace root path. (In fact this 'replica' is a primary at le the DMS have direct reading access to.) A local DMS is running at each C3Grid site which oers storage and le access capabilities. Its task is to communicate with primary data providers, and to manage local le stores. Replica Management In order to facilitate the staging of processing input, the DMS creates le replicas at the site where the data are needed. In order to facilitate the management of replicas, all replications of a single le are stored at the same location relative to the workspace root path. In the management of replicas, the DMS takes special care for a at le oered by a primary data providers. In this case, the DMS creates replicas only in workspaces remote to the primary le. At the location of the primary le no extra replica is generated in the workspace because the le is accessible directly via its path. Note that this scheme of omitting unnecessary replication conforms to requirement (R2). In case a primary provider is a transparent hierarchical storage system (HSM), the DMS must assure that the le it provides to the Workow Scheduler remains staged to the disk (e.g., by requesting a pin from the HSM). If the HSM is not able to keep the primary le staged, the DMS must copy it to its local workspace. The path of a primary le points outside the sub-directory structure of the local workspace root directory. Therefore, the uniform naming scheme for replications in remote workspaces can not be C3Grid-T5.1-002-Draft 17 / 48 T5.1: Grid Data Management Architecture and Specication maintained. For this purpose the DMS keeps two paths, a primary path pointing to the primary le in its local le system and a logical path within the workspace namespace used for all remote replicas In case of a disc space shortage, the RIS may decide to remove individual replicas. By setting a special ag (pinning ) a replica will be saved from deletion. The Data Management System as Data Provider From a higher-level perspective, not only a primary data source but also the Data Information Service acts as data provider. Especially for the workow scheduler, the discrimination of staging from a primary source or creation of remote replications is irrelevant. For the purpose of le requests it uses the same operations of the DMS interface. The scheme for providing replicas at an agreed host and time is presented below in the section about use cases. 18 / 48 C3Grid-T5.1-002-Draft T5.1: Grid Data Management Architecture and Specication 6. AP5 Use Cases As described in the denition of the Data Information Service [LS06a], the portal can request detailed information about how to access data. The following subset of this information is need to specify the input of a workow. An object identier Optional aggregation or pre-processing commands for databases. A list of base data le names A metadata le name With this information the Workow Scheduler starts to interact with the DMS to prepare an optimized execution of the workows. 6.1. Querying the Current State Workflow Scheduler DMS Primary Provider get current state lookup in catalog or workspace time get state prediction get time estimate for staging Figure 6.1.: Sequence diagram: Two types of state queries { current state and predicted state The following queries are sent by the Workow Scheduler in order to get information about the current state of data. (See also Figure 6.1) What is the provider of a data object? Input: object identier Output: host name, host type (database, at le or workspace only) What are the les of a data object? Input: object identier Output: list of logical le paths Note that in the case of a database a replica of a le may not exist. Though, there is always a logical le. C3Grid-T5.1-002-Draft 19 / 48 T5.1: Grid Data Management Architecture and Specication What is the name of the metadata le? Input: object identier Output: logical path of metadata le Note that in the case of a database a replica of a metadata le may not exist. Though, there is always a logical metadata le. What are the replicas of a le? Input: le path Output: list of replica URIs In the case of a database no replica may exist for the given le. What is the respective object identier of a le? Input: le path Output: object identier 6.2. Querying State Predictions The following queries by the Workow Scheduler require the prediction of the future state of data in the grid. This prediction is based on knowledge about respective staging and transfer times. Because the DMS knows only about transfer times within the grid, it has to ask data providers for en estimation of the time they need for staging. When can a certain le be available at a specic host? Input: le path, host Output: time stamp What is the earliest time, a specic le can be available? Input: le path, time window of interest, period of lifetime Output: time stamp Exception: availability cannot be guaranteed for the given constraints. What is the earliest time, a specic number of bytes can be available? Input: le path, time window of interest, period of lifetime Output: time stamp Exception: availability cannot be guaranteed for the given constraints. What is the transfer time for a specic le to a specic host? Input: le path, target host, start time. Output: period of time 6.3. Agreements The Workow Scheduler and the DMS meet agreements about the predictions made from the operations described in the use case of Section 6.2. Operations oering agreements are called in two steps. 1. A preliminary request 2. A nal commitment 20 / 48 C3Grid-T5.1-002-Draft T5.1: Grid Data Management Architecture and Specication Workflow Scheduler DMS Data Provider provide file at t (REQUEST) get time estimate for staging OK dT time provide file at t (COMMIT) stage file dT t is file available? true Figure 6.2.: Sequence diagram: Agreement for providing a le at a specied host and time. If the request can be fullled then the DMS conrms it and includes it in his own scheme. Though, the request remains preliminary until the Workow Scheduler acknowledges it in the second step. After a predened period of time the DMS will dump requests if no acknowledgment has been received. Figure 6.2 shows a successful agreement for providing a le on a target host. After the request and commit step, the DMS manages the required staging of the le to be nished at the specied time. Nevertheless, the Workow Scheduler should always check the availability of the respective replica before it is using it. 6.4. Staging and Transferring Files For providing a le at a specied host, the DMS needs to take dierent actions, depending on the type of data provider it is dealing with. As mentioned before, the C3Grid has three types of data providers: primary database systems, primary le systems and the DMS For the Workow Scheduler, which requests the DMS to provide a replica, the source of the le is irrelevant as long as the replica is available at the agreed time. Therefore, the sequence of operations of the Workow Scheduler depicted in the example of Figure 6.2 applies to all data providers. The preparation of time estimates and the actual staging or transfer diers for each provider. Table 6.1 gives an overview of the dierent types of data providers and how le requests by the Workow Scheduler are handled. 6.5. Registration of Generated Files Commonly, new grid les are results from jobs executed by the Workow Scheduler. As convention in the C3Grid, the output of a single job execution contains base data les and one metadata le. C3Grid-T5.1-002-Draft 21 / 48 T5.1: Grid Data Management Architecture and Specication Table 6.1.: Execution of data providing requests for dierent storage types. Primary Database Primary Flat File Store DMS Workspace object id, target host object id, target host Request object id, climate arguments parameter, spatial/temporal cut, pre-processing spec., target host Target host is local to provider: Target store Action by DMS local workspace delegation to provider reference to primary le pin le at primary store local workspace none Target store Action by DMS remote workspace local stg. + replication remote workspace local stg. + replication remote workspace replication Target host is remote to provider: Because only the Workow Scheduler knows about the execution and output of jobs, it must also care for the registration of new les in the Grid Data Management System as well as the Data Information Service. Figure 6.3 depicts this registration, which is done by a single interface operation for both data services. Input: URIs of physical base les, and one physical metadata le. Output: object identier Note that the input of this registration consists of physical les in one of the grid workspaces. By registration, each physical les becomes a replica and gets associated with a logical le. 6.6. High-level File Operations The DMS provides the following operations for managing les at a host-independent level. Copy a le to a logical path. Move a le to a logical path. Create a logical sub-directory. Remove a logical sub-directory. List all les in the directory. Show the size of a le. Remove a le. Note that an operation on a le applies to all replicas. I.e., replicas are copied, moved or deleted locally. The operation is ignored for hosts, where a replica does not exist. The remove operation has non-standard semantics if a site does not provide a replica but a path to a primary at le. The DMS does not remove the primary le but releases it from its control. E.g., if the le is in a transparent HSM then it is un-pinned to allow the HSM to remove it from its cache. 22 / 48 C3Grid-T5.1-002-Draft T5.1: Grid Data Management Architecture and Specication Data Information System DIS/DMS Interface 1.1 register metadata Workflow Scheduler 1. register base and meta file register discovery and use metadata time Data Management System 1.2 register base and metafile register files and local replicas object identifier Figure 6.3.: Sequence diagram: registration of new base data and metadata produced in the grid.y C3Grid-T5.1-002-Draft 23 / 48 T5.1: Grid Data Management Architecture and Specication 7. Interfaces of the Grid Data Management System 7.1. Interface for the Workow Scheduler The interface for the Workow Scheduler (named B in Figure 5.1) provides the functionality of both the Grid Data Management System and the Data Information Service. Therefore, most of the operations and remote object specications of the interface for the Portal are also part of this interface. The data services consists of a two catalogs: one which keeps meta-information of primary and grid data objects and another which keeps track of replicates work copies. 7.1.1. Exceptions DiskSpaceException This exception is thrown when the available disk space is exceeded. FileRetrievalException This exception is thrown when the retrieval of a le failed. 7.1.2. Datatypes Type oid is used for unique identiers of metadata sets. Type uri is a string for identifying logical les and replica. This URI uses to the le scheme of RFC1738. For logical les the host section of a URI is left out. Type providerType is used to indicate the provider of an object. type providerType = ( DB | FlatFile | DMS ) DB and FlatFile indicates a primary provider, DMS provides les which are managed by the DMS and are not stored on a primary site. Type action is used for the agreements. type action = ( Request | Commit ) If an operation is called with an action argument Request it only returns a prediction for the request. If the Workow Scheduler considers the prediction useful, it acknowledges it by re-invoking the same operation with the a Commit action argument. 7.1.3. Operations of the DataService getProvider (in object:oid) : (host, providerType) Returns the name and type of the data provider of a given data object. Parameters: object Data object identier Returns: Host and provider type of the data object. 24 / 48 C3Grid-T5.1-002-Draft T5.1: Grid Data Management Architecture and Specication getFiles (in object:oid) : uri list Returns a list of logical les paths. These les constitute the content of the data object. This operation does not discriminate base data and meta data les. Note that a logical le path may not be equivalent to the workspace sub-path of a respective replica. Parameters: object Data object identier List of le paths Returns: getMetaFile (in object:oid) : uri Returns the logical path of the metadata le of a given data object. Note that the logical le path may not be equivalent to the workspace sub-path of a respective replica. Also, in the case of a database provider, there is no guarantee that a replica of the metale exists currently. Parameters: object Data object identier path of metadata le Returns: getReplica (in le:uri) : uri list Get all replica locations as URIs for a virtual le. Parameters: le Virtual le location, i.e. the path of the le Returns: List of replica locations. getOID (in le :uri) : oid Get the identier of the object a given le is part of. Parameters: le Virtual le location. Returns: the respective object identier. availableReplica (in le :uri, in host :string, in minTime :time, in maxTime :time, duration :time) : time Request information about the predicted availability of a replica. The returned value is the earliest time within time interval [minTime; maxTime] in which le le can be available at host host for a time period of duration. Parameters: le Virtual le location. host Target host for replica. minTime Earliest time of interest. An undened value defaults to the time of the operation call. maxTime latest time of interest. An undened value defaults to innity. duration period of time, the replica should be available. an undened value defaults to maxTime minTime Returns: the earliest time the replica can be made available. C3Grid-T5.1-002-Draft 25 / 48 T5.1: Grid Data Management Architecture and Specication provideReplica (in le :uri, in host :string, in time :time, in duration :time, action :action) : bool Negotiate the creation of a replica at a specied host. The caller and the callee negotiate by two calls of this operation. First, an informativ request is sent (action=Request). Then, on success, the request is conrmed (action=Commit). Parameters: uri host time duration Path of the le to be retrieved. Target host. The time the replica is available. minimum lifetime of replica. An undened value defaults to an innite lifetime. action Request or Commit. true on success, false otherwise. Returns: availableSpace (in bytes :int, in host :string, in minTime :time, in maxTime:time, duration :time) : time Request information about the predicted availability of free workspace. The returned value is the earliest time within time interval [minTime; maxTime] in which bytes bytes of disc space can be available at host for a time period of duration. Parameters: bytes Number of free bytes. host Target host. minTime Earliest time of interest. An undened value defaults to the time of the operation call. maxTime latest time of interest. An undened value defaults to innity. duration period of time, the replica should be available. an undened value defaults to maxTime minTime Returns: the earliest time the requested disk space can be made available. provideSpace (in bytes :int, in host :string, in time :time, in duration :time, action :action) : bool Negotiate the allocation of free disc space a specied host. The caller and the callee negotiate by two calls of this operation. First, an informativ request is sent (action=Request). Then, on success, the request is conrmed (action=Commit). Parameters: bytes host time duration Number of bytes to be retrieved. Target host. The time the replica is available. minimum lifetime of replica. An undened value defaults to an innite lifetime. action Request or Commit. Returns: true on success, false otherwise. Raises: DiskSpaceException registerFile (in base :uri list, in meta :uri) : oid Register new les in DMS. New les commonly originate from grid processings. After registration 26 / 48 C3Grid-T5.1-002-Draft T5.1: Grid Data Management Architecture and Specication new les becomes replicas of respective logical les. Parameters: base List of base les meta Metadata le New unique object identier Returns: 7.2. Interface of the Primary Data Provider for the Data Management Service 7.2.1. Operations of the DatabaseAccess The staging of les from Databases follows a special scheme because they provide special preprocessing functionality.1 stageFiles (in objs :oid list, in constraints :(attribute,value) list, in outDir :uri, out baseles :uri list, out newObj :oid, out metale :uri, out stagingTime :time, in dummy :bool ) : void Stage le form database to workspace. This will create a new data object with a new unique object identier Parameters: in objs List of object identiers to be staged in constraints Constraints reducing the data of the request. A database provider must support at least following attributes. in outDir out baseles out metale out newObj out stagingTime out dataSize in dummy parameters (list of climate parameters) minLat, maxLat, minLong, ... (3-dimensional spacial cut) minTime, maxTime (temporal cut) preproc (pre-processing specication) format (le format: GRIB, NetCDF,...) Output directory as target for staging list of le paths of generated base data les name of metadata le new object identier Estimated staging time. Estimated data size. if true, do not stage, just estimate the staging time and data size. simpleStageFiles (in obj :oid, in outDir :uri, out base :uri list, out meta :uri, out stageTime :time, in dummy :bool) : void Requests the data provider to dump a data object as les to the local disk space. Note that this op1 see also Section C.2.5 C3Grid-T5.1-002-Draft 27 / 48 T5.1: Grid Data Management Architecture and Specication eration does not create a new object but replicates a database object as le. Parameters: in obj Object identier in outDir Output directory as target for dump out base Paths of local les. These les must be readable by the DMS and located in outDir. out stageTime Estimated staging time out dataSize Estimated data size in dummy If true, do not stage, just estimate the staging time and data size. 7.2.2. Operations of the FlatFileAccess pinFiles (in obj :oid, out base :uri list, out meta :uri, out stageTime :time, in dummy :bool) : void Requests the data provider to stage les of a given data object in its local disk space. Parameters: in obj Object identier out base Paths of local les. These les must be readable by the DMS and be staged to outDir. out stageTime Estimated staging time out dataSize Estimated data size in dummy If true, do not stage, just estimate the staging time and data size. 28 / 48 C3Grid-T5.1-002-Draft T5.1: Grid Data Management Architecture and Specication 8. Authorizationy Still to be written. C3Grid-T5.1-002-Draft 29 / 48 T5.1: Grid Data Management Architecture and Specication 9. Conclusion Still to be written. 30 / 48 C3Grid-T5.1-002-Draft T5.1: Grid Data Management Architecture and Specication A. Interface Spezication Syntaxy For the specication of interfaces, the following BNF-style syntax is used in the denition of remote functions. Non-terminals are set in italics, `terminals' are quoted. functionDef paramList param returnType name `(' paramList `) :' type j param ( `,' param ) ( `in' j `out' j `inout' ) paramName `:' returnType `void' j type (A.1) (A.2) (A.3) (A.4) Each parameter of a function has a specier, which denes whether an argument is read (in), set (out) or read and set (inout) by the called function. For the denition of constructed datatypes the following syntax is used typedef type basetype record variant `type' typeName `=' type typeName j basetype j record j variant `int' j `double' j `string' j type `list' j `(' type `,' type `)' `f' name `:' type ( `,' name `:' type ) `g' `(' handle type? ( `j' handle type? ) `)' (A.5) (A.6) (A.7) (A.8) (A.9) Record entry names (name) and type names (typeName) consist of arbitrary upper or lower case alphanumeric characters, except for the rst character, which must be an alphabetic character. Variant handles (handle) are names starting with an upper-case character. C3Grid-T5.1-002-Draft 31 / 48 T5.1: Grid Data Management Architecture and Specication B. Questionnaire for Users and Data Providers This chapter still needs some translation. The following questionnaire was lled out by the users and data providers of the C3Grid. Its purpose is to give a detailed description of the compute and data resources present at the institutes. Due to the demand of the users, some questions were rephrased after 2006-12-19 to be more concise. The answers that we received before the rephrasing are documented with an 'old' appended to the question number. The questions and answers were partly translated from the German original to English by the authors of this document. B.1. Important Aspects Question 1.1: Storage location: local, distributed, partitioned, replicated? Answers: HH: DKRZ: lokal : global shared lesystem (GFS from NEC, SAN, FC based, ) for subsystems: archive -> $UT (HSM) / $UTF cross (compute server) : $WRKSHR , ... hurikan (HPC : NEC SX6 ) HH { MPI: yang (SunFire15k) : locally in /scratch/local1 (1 FS (42 TB (+ 32 TB = 74 TB in 01/2006)) ; QFS HH { CERA DB: 1 metadata DB, several (currently 5) DB for actual data Uni Berlin: partial datasets (at les; contain not all information, i.e. parameters, niveous etc.) Uni Koln: like Uni Berlin PIK: locally Note: 1. to 5. only represents the data that have to be provided in the C3Grid project: 1. Modelloutput Validierungslauf CLM, Europa mit ERA15-Antrieb, 10'x10', 6h-Werte 1979-1993) 2. Modelloutput Multi-Run Experimente CLM fur Unsicherheitsuntersuchungen von Extremepisoden in ausgewahlten Regionen Mitteleuropas fur Referenzzeitraume und Szenarien (jeweils max. 5 Monate) 3. Modelloutput STAR: gegitterte tagliche Meteorologie Deutschland, 7km x 7km, Referenzzeitraum 1950-2004 und 3 SRES-Szenarien 2005-2055 4. PIK-CDS Datensatz: glob. Monatsmeteorologie (tmean, precip) uber Landmassen, 30'x30', 1901-2003 5. Modelloutput globales Vegetationsmodell LPJ, monatliche Daten, 30'x30', Referenzzeitraum 1901-2003 und 3 SRES-Szenarien 2004-2100 DWD: lokal 32 / 48 C3Grid-T5.1-002-Draft T5.1: Grid Data Management Architecture and Specication Question 1.2: Storage system: (raid) server, tape system with NFS interface? Answers: HH: tape system : yes (DKRZ) ; HSM with GFS- and NFS-Frontend RAID-System : yes, for all DKRZ-Server and for yang NFS-Interface: not for non-DKRZ users, mount CERA DB data are stored in HSM in a split way. Uni Berlin: t ape archive Uni Koln: S AN PIK: GPFS with NFS, in the future SAN tape robot using IBM TSM DWD: Server Question 1.3: Storage type: database system (which?), les? Answers: HH { atles: $UT (s.a.) with at les (ca. 4 PBytes) HH { CERA DB: im ersten Schritt ca. 100 TB von derzeit ca. 200 TB ins GRID. Die Datenmenge wird weiter wachsen. Diese sind in Datenbank gespeichert. Oracle 9205 mit Oracle Application Server 10g. Zugri erfolgt entweder uber Application Server (http, Connection Manager) oder auch direkt von den Datenbanken. HH { Dkrz: ca. 5% des gesamten Datenvolumns derzeit in DB, Rest auf dem Archiv gespeichert; die Tendenz zur Speicherung in der DB wird hoeher, z.B. jetzt auch 6h Auoesung, daher spricht MPI t.w. auch von 10% in der DB PIK: 1., 2. Files 3.{5. RDBMS Oracle DWD: Datenbanksystem, Oracle v9.2.0.5.0 Question 1.4: Structure of data? How are the dierent data units (experiments, time series, . . . ) distributed over les, table entries, blobs, etc.? For example, one time series of one variable per le/table/row. Answers: PIK: 1. one le per timestep (6h) 2. one le per single run of a multi-run experiment and per post-processor-output for a multi-run experiment 3.{5. per Referenzzeitraum / scenario and variable one Oracle-Table DWD: Relational DBMS Question 1.4.old: Structure of the data, e.g., one dataset per le, per table/row? Answers: HH { at les: - Sei Datensatz=Daten aus 1 Experiment : bis zu 10 TB/Exp. setzt sich z.B. aus monatlichen Files a 1-7 GB zusammen (also z.B. (typisches?) C3Grid-T5.1-002-Draft 33 / 48 T5.1: Grid Data Management Architecture and Specication Experiment uber 100 Jahre mit 5GB/Monat -> 1200 Files : S=1200*5GB =6TB IPCC-T63L31: 2400 les (200a) mit 15.8 GB/Jahr (12 les!)-> 3 TB) HH { CERA DB: Datensatze entsprechen ganzen Tabellen. Es ist eine granulare Abfrage in den in den Metadaten abgelegten Inkrementen der Zeitserie moglich. Ein weiteres Processing (Ausschneiden von Gebieten) ist Beta. Datensatz ist in der Regel (also nicht immer) die Zeitserie einer Variablen. Experiment ist die Summe aller DS. Derzeit Experiment bis zu ca. 20 TB, ein Datensatz bis zu 300 GB. Also eine Tabelle bis zu 300 GB in Einheiten (blobs von wenigen k bis zu einigen hundert M). Durchaus auch goere Einheiten moglich. Uni Berlin: In den Teildatensatzen sind individuelle Datenrecords (uber Grib einzeln identizierbar) zusammengefasst, und zwar 1. Bei Modell-Lauf aus einem Lauf pro Datei, aber nicht alle Informationen (s.o.) 2. Bei ERA Reanalysedaten Daten fur einen bestimmten Zeitraum (Monat) in einer Datei. Erganzung: 12/01/2006: Die Dateien, die Koln / Berlin anbietet sind unterschiedlich aufgebaut, so konnen die Dateien durch folgende 3 Beispiele beschrieben werden: Bsp: 1. \Bodendaten": Temp, Taupunkt, Bodendruck ...... 2. \Atmospharendaten a)": Geopotential in 1000, 925, 800, 750 ... hPa 3. \Atmospharendaten b)": Wind: u und v Komponente auf mehreren Niveaus Wir bieten Dateien an, die einen Ausschnitt einer Zeitserie (meistens ein Monat) beinhalten und dabei moeglicherweise mehrere Niveaus der gleichen physikalischen Groesse und / oder mehrere Variablen enthalten sind. Uni Koln: like Uni Berlin Question 1.5: Number of les, rows, blobs? Answers: PIK: 1. 22000 2. abhangig vom Experimenttyp (z.B. globale Sensitivitatsanalyse, Monte Carlo Analyse) insgesamt 1000 3. pro Referenzzeitraum / Szenario 1.5G Oracle-rows + Index 4. 200M Oracle-rows + Index 5. pro Referenzzeitraum / Szenario 1.5G Oracle-rows + Index DWD: 40 { 50 Tabellen mit Klimamesswerten Question 1.5.old: Number of les, rows (datasets)? Answers: HH { DKRZ: ca. 30 Millionen Dateien HH { CERA DB: ca. 5 Milliarden Zeilen Unis Berlin/Koln: ERA Reanalysen: je 480 Dateien fur 1. Atmospharen-NiveauGruppen, 2. Boden, zusammen rund 1000 Dateien. 34 / 48 C3Grid-T5.1-002-Draft T5.1: Grid Data Management Architecture and Specication Question 1.6: Size of one le, row, blob? Answers: PIK: 1. 18 MB pro Datei 2. 5 GB pro Datei DWD: Tabellengrossen von 1MB z.B. Langper Werte Termin bis 8 GB z.B. Terminwerte Klima Question 1.6.old: Size of one le or dataset, respectively? Answers: HH { at les: Dateigroesse : von 1 MB bis 7 GB Datensatz : von wenigen MB bis 10 TB (s.o.) HH { CERA DB: Einheiten siehe 1.4 Unis Berlin/Koln: Zwischen 1 GB und 80 MB Question 1.7: Total size of the data? Answers: HH { at les: 4 PB, Tendenz exponentiell steigend HH { CERA DB: 200 TB, exponentiell steigend; Zuwachs bis zu 1/3 der DKRZ Uni Koln: 1/2 TB Question 1.8: Format of data? (CSV, XML, GRIB, description not accessable, ...) Answers: ZMAW: GRIB, NetCDF und binaer HH { CERA DB: Stufe 0: nur GRIB, spater auch netCDF, IEG, ASCII, tarles, ... Unis Berlin/Koln: (Bemerkung: Zwischenprodukte in den Diagnostik-Workows werden z.Z. auch im LOLA-Format abgespeichert. Diese sind aber in den o.g. Daten nicht enthalten.) PIK: 1. Grib1 2. NetCDF-CF 1.0 3. Rows in Oracle-Tabellen 4. Rows in Oracle-Tabellen 5. Rows in Oracle-Tabellen DWD: ASCII Question 1.9: Which access methods do you or your programs use? Datei-basiert (read, seek, regelmaige Muster, andere Muster) SQL, XML Query (XPath, XQuery, ...), ... Zugrisbibliothek (frei zuganglich, Quellen vorhanden?) eigene Programme (Sprache ?) C3Grid-T5.1-002-Draft 35 / 48 T5.1: Grid Data Management Architecture and Specication Answers: HH { at les: C-lib, viel random seek (teilweise auch seek), grib lib, netCDF lib, (MPI, UCAR?), CDO (ANSI-C) & pthreads HH { CERA DB: sql Unis Berlin/Koln: Datei-basiert: READ SQL, XML Query: (vgl. Zugri auf CERA-Datenbank) Zugrisbibliothek: GRIB-Decodierung uber EMOS-lib des ECMWF, frei zuganglich, Quellen vorhanden. eigene Programme: FTN90 PIK: 1.,2. dateibasiert read 3.{5. SQL-Skripte und HTML-/Java-Oberachen zur Erzeugung von Files mit extrahierten Daten, die dann weiterverarbeitet werden. DWD: SQL Question 1.10: Are datasets modied or are they written only once and read an arbitary number of times? Answers: HH: mainly 'subset extraction' Unis Berlin/Koln: Teildatensatze: 1x geschrieben und beliebig oft gelesen. PIK: 1x geschrieben, beliebig oft gelesen DWD: Beobachtungsdaten werden verandert und erweitert Question 1.11: How often/when is new data generated? Answers: HH: Stetig wahrend den Experimenten auf den NEC-SX6 und von da auf's Archiv geschrieben ca. 2 TB/Tag Unis Berlin/Koln: Teildatensatze: 1. Nach Vorliegen eines neuen Modell-Laufs, der ausgewertet werden soll, 2. Bei Anwendung anderer Diagnostischer Tools oder einer modizierten Anwendung des gleichen Tools, fur die noch keine passenden Teildatensatze vorliegen. PIK: 1.{3., 5.: nach neuen Szenarienrechnungen 4.: 1 neues Jahr 1x jahrlich hinzufugen DWD: standig (im Minutentakt) Question 1.12: Description of software resources, e.g., data lter services. ) Evtl. Teil eines Anwendungsszenarios!? Answers: HH: CDOs, afterburner siehe generische Skripte, vgl. mit C3Grid-Nutzungsszenario Unis Berlin/Koln: EMOS-LIB: dokumentiert. 36 / 48 C3Grid-T5.1-002-Draft T5.1: Grid Data Management Architecture and Specication PIK: 1. Post-Processor Vadi (Fortran) 2. eventuell SimEnv Experiment Post-Prozessor (Fortran, C): abhangig davon, ob in C3Grid Experimente post-prozessiert werden sollen oder nur auf post-prozessierten Experimentoutput zugegrien werden soll. 3.{5. abhangig von den Modellen, die die Files mit den extrahierten Daten weiterverarbeiten DWD: QualiMet, Prufung und Validierung Meteorologischer Daten Question 1.13: Life cycle of metadata for the data. z.B. 1xErstellen/N-mal Lesen, abwechselnd aendern/Lesen, Loeschen, etc. Answers: HH { MPI: Soweit oder sobald Metadaten ueberhaupt Verwendung nden: selbstbeschreibender Anteil : ca. 2 Jahre Rest (=??) : unbeschraenkt v.a. abwechselnd aendern/Lesen, Loeschen, etc. HH { CERA DB: in der Regel einmal erstellt und wenig modiziert, Ausnahme: Datensatzgroe, die sich permanent andert. Sobald DOI erstellt, keine A nderung von ausgewahlten Metadaten mehr moglich. Unis Berlin/Koln: Bisher keine Metadaten fur Teildatenstze auer ggf. GRIB-Header. PIK: 1x erstellen, n-mal lesen DWD: abwechselnd andern/lesen Question 1.14: Availability of the metadata. (Tabellenschema + einzelne Werte, benden sich Anfang der Datei, werden einmal/dynamisch generiert) Answers: HH: Beispiele : ... !! Werden einmal in Headern von GRIB, netCDF generiert. Unis Berlin/Koln: entfallt PIK: Tabellenschema z.Zt. PIK-CERA-2 1., 2.: zusatzlich Header-Informationen der einzelnen Files DWD: liegen in XML vor Question 1.15: Extent of the metadata schema for an object, e.g., xed/unchagngeble, extendable Answers: HH { MPI: Umfang noch unbestimmt ? (Zukunftsvision) veranderbar, erweiterbar HH { CERA DB: erweiterbar und veranderbar Unis Berlin/Koln: entfallt PIK: erweiterbar DWD: erweiterbar C3Grid-T5.1-002-Draft 37 / 48 T5.1: Grid Data Management Architecture and Specication B.2. Miscellaneous Question 2.1: Which constraints of your international integration must/should be considered? data formats access protocols metadata schemes etc. Answers: HH { at les: Datenformate : GRIB, CF-1.0 Konvention Zugrisprotokolle : Was sich anbietet : gridFTP, eccopy, ... ??? Metadatenschemata (inkl. Datentypen) : selbsbeschreibend, table based, standardisert (WMO, CF group) HH { CERA DB: Funktion als WDCC, ggf. Richtlinien der Projektzusammenarbeit, siehe auch Metadatentreen Hamburg Unis Berlin/Koln: Datenformate: GRIB, NetCDF (WMO-Standard) Zugrisprotokolle: Berechtigung fur Datenzugri muss gepruft werden. Metadatenschemata (inkl. Datentypen): entfallt. PIK: Datenformate: fur 1. und 2. NetCDF, Grib Zugrisprotokolle: Zugrisberechtigung muss gepruft werden Metadatenschemata: keine Bedingungen DWD: Datenformate: XML Zugrisprotokolle: WEBService, OGSA-DAI/UNIDART Metadatenschemata (inkl. Datentypen): Metadatenschema nach ISO 19115 Question 2.2: How many participants, resource/data providers and consumers are in your domain? How are they distributed (local, Campus, D-Grid, international)? Answers: HH { at les: local (MPI) : ca. 200 (ca. 75%) Campus (ZMAW) : + 10 % D-Grid : bis jetzt noch nicht, bzw. assoz. C-Grid-Institute (AWI,GKSS): + 10 % international : + 5% (stark schwankend) HH { CERA DB: Provider ca. 20, aber wechselnd, ca. 700 eingetragene Benutzer Unis Berlin/Koln: International Provider: DKRZ, ECMWF, Hadley-Centre (England), NCEP (USA),... Beteiligte am Diagnose-Cluster: z.Zt 4 in Koln, 4 in Berlin. PIK: FR 1.{5.: Ressourcen-Provider: lokal und international Ressourcen-Consumer: Campus und C3Grid Daten-Provider: Campus und international Daten-Consumer: Campus, C3Grid und international DWD: lokal, international (uber UNIDART ca. 12 Beteiligte) 38 / 48 C3Grid-T5.1-002-Draft T5.1: Grid Data Management Architecture and Specication B.3. Aspects of Data Management Question 3.1: Access restrictions. granularity: directories, les, databases, tables, rows users, groups, organizations Answers: HH { at les: UNIX-groups und -User HH { CERA DB: Zugriseinschrnkungen auf Tabellen Ebene, Metadaten sind frei Unis Berlin/Koln: Granularitat: Durch Benutzer- und Gruppenrechte deniert. Quota uber Datenvolumen. Benutzer, Gruppen, Organisationen: Nur mit Berechtigung PIK: Granularitat: Datei, Tabellenzeile z. Zt. Berechtigung fur einzelne Nutzer DWD: Granularitat: Datenbanken, Tabellen Benutzer Question 3.2: Access methods/data search which meet certain criteria test on exact match test on membership in a range (range queries) Answers: HH { at les: - Zur Zeit Ordner im Schrank des jeweiligen Mitarbeiters (Experimentbeschreibung angefangen im Dokuwiki) HH { CERA DB: Meta DB vorhanden Unis Berlin/Koln: Keine (nur at les, durch Namen und Speicherort identiziert). PIK: 1., 2. exakt match und range queries 3.{5. range queries DWD: Umfang SQL B.4. Aspects of Metadata Management Question 4.1a: Were can metadata be stored? At one of your hardware resources? (requires installation of the respective middleware) C3Grid-T5.1-002-Draft 39 / 48 T5.1: Grid Data Management Architecture and Specication Answers: HH: Datenbanken, lokale Ressourcen (Hardware) Unis Berlin/Koln: ja! PIK: auf eigenen Hardwareressourcen DWD: Oracle / XML Question 4.1b: Were can metadata be stored? At external resources? Answers: HH: fuer community Experimente bei M&D CERA DB Datenbank Unis Berlin/Koln: ja Question 4.2: How long can/must metadata be stored? Must they be archived? Answers: HH { at les: unbeschrankt, selbstbeschreibender Anteil kann geloscht werden HH { CERA DB: unbeschrankt Unis Berlin/Koln: solange Dateien existieren. PIK: Solange die Daten vorgehalten werden sollen. DWD: dauerhaft Question 4.3: Do certain datatypes have to be supported for the attributes? which are these? Answers: HH: Was ist mit Datentyp genau gemeint ? HH { at les ?: -> LDAP HH { CERA DB: Speicherung der Daten in Blobs ? Unis Berlin/Koln: Unklar. PIK: nein DWD: Nein Question 4.4: What is the volume of the metadata (bytes)? How many attributes does each schema contain? Answers: at les: ? Ordner CERA DB: Abhangig vom Fokus der Metadaten, Core: ca. 6 GB, gesamt ca. 500 GB PIK: Gesamtvolumen Metadaten 10 MB 40 Attribute pro Eintrag DWD: 659MB XML 40 / 48 C3Grid-T5.1-002-Draft T5.1: Grid Data Management Architecture and Specication C. Quotes from the Development Discussiony This appendix gives selected excerpts from the discussion among C3Grid developers. The architecture specication in this document is largely based on this discussion. C.1. E-Mail Date: Sat, 4 Feb 2006 13:13:27 +0100 (CET) Subject: Re: [C3] Anwendungsszenario AP2, AP5 From: "Uwe Ulbrich" <[email protected]> To: [email protected] Cc: "Tobias Langhammer" <[email protected]> Liebe Workflow-Mitarbeitende (und f ur Tobias Langhammer zur Kenntnis), ich gehe davon aus, dass es von unserer Seite noch keine Reaktion auf diese auerst konstruktive Email gegeben hat (habe ich jedenfalls nicht gesehen). Hier meine MEINUNG, die aber erst diskutiert werden sollte: > 1. Der User stellt Suchanfrage nach Discovery-Metadaten (Portal->DIS) > -> Suche wird in globaler Metadatenbank durchgef uhrt > -> Ergebnis sind Einheiten von Discovery-Metadaten > > 2. Auswahl des Users von Workflow-relevanten Ergebnissen im Portal. > > 3. Der User stellt Suchanfrage nach Use-Metadaten f ur die Auswahl > von 2. (Portal->DIS) > -> Anfrage wird von DIS an lokale Metadatenprovider > weitergeleitet. Ich bin mir nicht sicher, was den Unterschied zwischen Discovery und Use Metadaten ausmacht, aber er erscheint mir auf den zweiten Blick hilfreich: Ich gebe das mal mit eigenen Worten wieder: 1. Der User stellt seine Anfrage und bekommt als erstes die M oglichkeit, die in der globalen Datenbank verf ugbaren Modellexperimente (oder met. Beobachtungsdaten/Analysen) bez uglich seines anzugebenden Workflows auszuw ahlen. Damit ist die prinzipielle Realisierbarkeit des Workflows gew ahrleistet. Er sieht hier noch NICHT, wo was f ur Teildatens atze oder Replikate/Kopien vorliegen. Entscheidend ist nur, dass es eine vollst andige Urversion (f ur die als erstes s amtliche Metadaten erzeugt wurden) existiert. Discovery w are damit nur bezogen auf das vorliegen einer Urversion. 2.-3. Die Auswahl wird abgeschickt. Der Nutzer erh alt als Ergebnis wenige M oglichkeiten, auf Basis der Nutzung/Kombination welcher vorhandener Datens atze sein Workflow realisiert werden k onnte. Eine M oglichkeit ist immer die Nutzung des urspr unglichen Basisdatensatzes. Hier bestehen sp ater Erweiterungsm oglichkeiten, z.b. abgesch atzte Zeitangaben f ur Realisierung. Die Informationen uber Duplikate/Replikate, die ein fortgeschrittener Nutzer f ur seine erwarteten sp ateren Aufgaben anfordern kann entsprechen F UR DIE BASISDATEN (also in den ersten Teilen der Diagnose-Workflows) nach meinem jetzigen Verst andnis dem Hamburger DATENEXTRAKTIONS-WORKFLOW. > 4. User vervollst andigt die Workflow-Spezifikation um Use-Information > und sendet sie an den Scheduler C3Grid-T5.1-002-Draft 41 / 48 T5.1: Grid Data Management Architecture and Specication 4. Damit ist die Auswahl der benutzten (ggf. zu kombinierenden) Datens atze erfolgt, und der Auftrag, die Daten zu holen, geht hinaus. > 5. Scheduler stellt DMS den Auftrag zur Datenbereitstellung > -> DMS deligiert Auftrag an lokale Datenprovider > -> Datenprovider kopieren Daten + Metadaten auf lokalen Share > > 6. Scheduler stellt DMS den Auftrag Replikate an einem spezifizierten > Ort zu erstellen > -> auf den replizierten Daten wird ein Processing-Schritt > ausgef uhrt, der neue Daten + Metadaten erzeugt. > > 7. Scheduler stellt DMS/DIS den Auftrag neue Metadaten in die globale > Datenbank aufzunehmen. 7. ...und damit stehen die Daten dem Nutzer selbst und allen anderen Nutzern zur Verf ugung. Soweit entspricht das genau meinen Vorstellungen. > > *************** > > In diesem Szenario sind noch folgende Punkte f ur uns ungekl art: > > - Nach unserem Verst andnis beschreiben Use-Metadaten u.a. a) > in welchem Format die Daten vorliegen und b) wie Daten uber Files > verteilt sind. Sind somit mehrere Use-Metadatens atze einem > Discovery-Metadatensatz zugeordnet? Nach meinem Verst andnis ist a) korrekt, aber b) m usste f ur jeden KOMPLETTEN Datensatz individuell gelten. Zwei Fallbeispiele: i) es handelt sich um Daten in einer Datenbank: Dann sind von dort alle Daten, die zum Ur-Datensatz geh uren, als verf ugbar gemeldet, sind also in EINEM Metadatensatz zusammengefasst. ii) es handelt sich um Flat Files. Die Metadaten enthalten Informationen f ur den Datensatz insgesamt. Die M oglichkeit der Extraktion aus einer Datenbank oder das entsprechende Pr a-Prozessing von Flat Files muss auch im Informationssystem behandelt werden. Das ist doch Teil des Hamburger Workflows?! > > > > > - Wie werden nach einem Processing die Metadaten abgelegt? Ein File mit Discovery & Use-Metadaten? Oder ein File mit Discovery-Metadaten und viele Files mit Use-Metadaten, d.h. also je erzeugtes Daten-File ein Use-Metadaten-File? Nach meinem Verst andnis werden die Daten, die zu einer Ur-Version der Basisdaten geh oren, jeweils f ur einen Flat File/eine Datenbank abgelegt und gemeldet. Bei Abfrage der Metadaten zu einer Ur-Version werden alle darauf bezogenen Metadaten gepr uft. Ich weise uber die jetzigen Fragestellungen weit hinausgehend darauf hin, das kombinierte Datens atze, die sich auf mehrere Basisdaten beziehen, hier NICHT vorgesehen sind (auch wenn es vielleicht irgendwann ein nettes Feature w are...) > > > > > > > Zum 3. Schritt des Szenarios sehe ich eine Implikation die vielleicht noch vorgehoben werden sollte: Das Ergebnis der Anfrage, also die Use-Metadaten, bestimmt die Eingabe des Workflows. Sollte ein Preprocessing auf Seite des Datenproviders vorgesehen sein, muss dieser Use-Metadaten f ur das Ergebnis erzeugen, noch vor der Bereitstellung der eigentlichen Preprocessing-Ergebnisse. Der Grund daf ur ist, dass ja die Workflow-Spezifikation vor dem eigentlichen 42 / 48 C3Grid-T5.1-002-Draft T5.1: Grid Data Management Architecture and Specication > Preprocessing-Lauf stattfindet. Das ist richtig. Hier stellt sich die Frage, wo Pre-Processing abl auft, und ob (ggf. wo) die Ergebnisse des Pre-Processing wieder verf ugbar gemacht werden sollen. Bei einer Datenbank, aus der die angeforderten Daten liegen, wird man typischerweise die Preprocessing-Ergebnisse nicht l angerfristig ablegen (oder doch?). Sie w urden aber auf einer geeigneten Plattform (konkret: nach Abholen von CERA auf der UTF-Platte in Hamburg, oder auf dem Filesystem des Nutzers) zu liegen kommen und w urden von dort per Metadaten gemeldet. > > > > > > > > > > Dazu gleich noch eine Frage: Wird es ein Standard-Preprocessing geben, das von allen Datenprovider unterst utzt werden soll? Im Meeting vom 13.01. (siehe http://web.awi-bremerhaven.de/php/c3ki/index.php/Metadaten-Meeting-Marum) haben wir bereits uber zeitliches und r aumliches Ausschneiden gesprochen. Das sind F ahigkeiten die Datenbanken z.T. bereits mit sich bringen. F ur den Anbieter von Flat-Files bedeutet das aber extra Implementierungsaufwand. Es ist f ur ihn vielleicht auch nicht sinnvoll, weil man dieses Processing besser in einem Workflow spezifiziert, damit es im Grid verteilt laufen gelassen werden kann. Hier stellt sich in der Tat die Frage, wo das Pr aprocessing stattfindet. Ich tendiere dazu, es als Teil des (Hamburger) Workflows und damit als Teil der Diagnosecluster-Workflows zu sehen. Die Frage, wo dieses Pr a-Processing stattfindet ist als Hamburger Workflow im Grid l osen. > > > > > > > > Somit wird es sehr individuelle Preprocessing-M oglichkeiten geben und mit der Zeit werden wohl auch neue hinzu kommen. Deswegen ist es f ur unser DIS/DMS wohl am sinnvollsten, eine Preprocessing-Spezifikation generell als Black-Box vom Portal an die Daten- und Metadaten-Provider durchzureichen. Das heit, die Suche nach Variablen und Raum-Zeit-Ausschnitten usw. ist Aufgabe unseres DIS, die Selektion von Daten- und Metadaten nach diesen Kriterien aber nicht. Ich sehe das jetzt als Hierarchie (und bitte um Korrektur, wenn ich da falsch liege): Das Pr a-Processing IST der Hamburger Workflow. Dieser ist Workflow muss also die Selektion, Pr aprocessing und das Zusammenf uhren der Daten f ur die weitere Bearbeitung umfassen. Die Diagnose-Workflows (die diesen Teil implizit beinhalten) setzen darauf auf. Ich hoffe, die Ideen sind so ok und widersprechen nicht den Vorstellungen der anderen Diagnose-Cluster-Mitstreiter. Viele Gr ue, Uwe > Sch one Gr ue, > Tobias > > -> Dipl-Inf. Tobias Langhammer > > Konrad-Zuse-Zentrum f ur > Informationstechnik Berlin (ZIB) > Department Computer Science Research > > Takustr. 7, D-14195 Berlin-Dahlem, Germany > > [email protected] > > Tel: +49-30-84185-410, Fax: +49-30-84185-311 -------------------------------------------------- C3Grid-T5.1-002-Draft 43 / 48 T5.1: Grid Data Management Architecture and Specication Prof. Dr. Uwe Ulbrich Institut fuer Meteorologie Freie Universitaet Berlin Carl-Heinrich-Becker-Weg 6-10 12165 Berlin Tel.: +49 (0)30 838 71186 Fax: +49 (0)30 838 71128 email: [email protected] C.2. C3Grid-Wiki C.2.1. WF im Portal (Stand 06.03.2006) Aunden von Daten: Vorschlag einer mehrstugen Suche uber Keywords und/oder ... uber Keywords: Projekt (IPCC, WOCE...) Modell (ECHAM, OPA, MPI-OM,...), Beobachtung (...), ... Kompartiment (Atmosphare, Ozean, Lithosphare, ...) Provider (WDCs, Institutionen) Auswahl: Interessenbereichs: zeitlich, raumlich, Variablen Processing/WF Anzeige: vor Anfrage: Groe, Format, DB oder Rohdaten, Ort, (Erstellungszeit ?) ausgewahlter Daten vermutliche Groe Outputle, vermutliche Zeit fur Bereitstellung+Processing nach Anfrage: Stadium der Anfrage, vermutliche Restzeit fur Bereitstellung+Processing C.2.2. Metadaten im WF (Stand 06.03.2006) Granularitat in Workow: Vorschlag ein Metadatenle pro WF (Processings) bzw. Outputle D.h. fur den reprasentativen ECHAM-WF, dass bei Rohdatenaccess nur ein Metadatenle verarbeitet wird, um eine Zeitreihe aus den Einzelles zu erzeugen Vorteile: wenige Metadatenles mussen verarbeitet werden (bessere Performance der Tools) Problem der Reduktion von Historyeintragen bei zusammenfugen der einzelnen processierten Files zu Zeitreihe entfallt Nachteile: ... Alternative: ein Metadatensatz pro angefragtem Datensatz 44 / 48 C3Grid-T5.1-002-Draft T5.1: Grid Data Management Architecture and Specication C.2.3. AG-Metadaten (Auszug, Stand 06.02.2006) Discovery Metadaten Metadaten die zum Finden von Datenprodukten genutzt werden. Eine erstes Treen der Metadatengruppe ergab eine Einigung auf ISO 19115 konforme Beschreibung der Discovery Metadaten, siehe AG-Metadaten-Meeting1 Nachstes Treen am 13.03 in Bremen: MetadatenTreen Bremen Use Metadaten Metadaten, die zur Nutzung (z.B. in Processing Tools) von Datenproduktinstanzen genutzt werden Austausch von Metadaten Festlegung des Protokolles zum Austausch von Discovery Metadaten zwischen den Datenanbietern und dem zentralen C3Grid Metadaten Katalog C.2.4. AG-Metadaten-Meeting1 (Stand 06.02.2006) Wesentliches Ergebnis der ersten Metadatensitzung war die Festlegung auf den ISO19139 Metadatenstandard. Aktuelle Schemas in der (fast) nalen Version: http://ws.pangaea.de/schemas/iso19139/ Einstiegspunkt ist http://ws.pangaea.de/schemas/iso19139/gmd/metadataEntity.xsd Es wurde versucht, die Elemente des Schemas mit den in den Datenzentren vorhandenen Metadaten zu korrelieren und zu kommentieren. Das Ergebnis ist nicht vollstandig, im Gefolge der Arbeit von Uwe Schindler und in Gesprachen mit Wolfgang Kresse (TC211) haben wir noch einiges erganzt und geandert. Im folgenden einige Kommentare zu wesentlichen Elementen in XPath Notation: MD Metadata { contact : der Kontakt fur die Metdatenbeschreibung - nicht PIs oder Autoren des Daten- satzes! nderung der Metadaten { dateStamp: letzte A { identicationInfo citation: bibliographische Daten extent mit BBOX ist mandatory fur C3; Punkt ist BBOX mit gleichen Koordinaten; verticalExtent fehlt noch CRS Denition (Coordinate Reference System), Citation.Identier m. oder ohne Lokalisierung des Datensatzes, unter ResourceConstraints Zugrisbeschrankungen { contentInfo : MD CoverageDescription (n mal): hier konnen Parameter/Variablen/Megroen etc incl. Denition von Methoden, Medium und Einheiten gelistet werden. Parameterlisten sollten untereinander abgeglichen werden und konnen uber xlink zitiert werden. MD FeatureCatalogDescription: Zitat der Parameterliste, Wetterstationsliste (DWD) oder Modellbeschreibungen (CF) als Gesamtheit. { distributionInfo : Hier die URI auf den Datensatz oder allgemeine Info, wie man auf den Datensatz zugreifen kann, unter Format listen, in welchen Formaten die Daten verfugbar sind, braucht noch Feinschli (Formatwandlungen etc.) C3Grid-T5.1-002-Draft 45 / 48 T5.1: Grid Data Management Architecture and Specication { dataQualityInfo : scope : Typ der Daten (z.B. "Datensatz") lineage/source : Referenzen auf den Datensatz lineage/processStep: Infos zu Stations- oder Beprobungungsevents, Datenprozessierungsschritte report : Freitext zur Qualitatsbeschreibung { series : Durch Verweis auf den ubergeordneten Datensatz konnen Hierarchien und Verknupfungen (Replikate, Versionen) aufgebaut werden. Beispiele Die folgende URLs geben Beispiele fur die Metadaten aus PANGAEA: http://doi.pangaea.de/10.1594/PANGAEA.80967 http://doi.pangaea.de/10.1594/PANGAEA.75909 Das Gleiche als xml entsprechend dem internen proprietaren PANGAEA Schema: http://ws.pangaea.de/oai/?verb=GetRecord&metadataPrex=pan md &identier=oai:pangaea.de:doi:10.1594/PANGAEA.80967 http://ws.pangaea.de/oai/?verb=GetRecord&metadataPrex=pan md &identier=oai:pangaea.de:doi:10.1594/PANGAEA.75909 Und schlielich entsprechend ISO19139. Die Befullung ist hier bereits wesentlich vollstandiger. Einzelne Punkte sind vielleicht nicht ad hoc verstandlich und mussen weiter diskutiert und dokumentiert werden. Generell konnen die Beispiele aber als Vorlage fur die Implementierung in den anderen Datenzentren genommen werden: http://ws.pangaea.de/oai/?verb=GetRecord&metadataPrex=iso19139 &identier=oai:pangaea.de:doi:10.1594/PANGAEA.80967 http://ws.pangaea.de/oai/?verb=GetRecord&metadataPrex=iso19139 &identier=oai:pangaea.de:doi:10.1594/PANGAEA.75909 Das Metadaten xml ist eingeschlossen in einen von OAI (Open Archives Initiatives) vorgegebenen Rahmen. Die eigentlichen Metadaten benden sich innerhalb des <metadata> Elements. C.2.5. Metadaten-Meeting-Marum (Stand 06.03.2006) Wesentliche Ergebnisse des Treens: 1. die gemeinsame Arbeit an der Denition einer WSDL fur Datenabfragen 2. ISO 19115+19139 als Metadatenschema wurde erneut diskutiert prototypische Spezikation von In- und Output des Webservice: INPUT (vgl. a) 1. Suchkriterien: Parameter, Raum, Zeit, Liste von Identiern min 1 Identier oder 1 Parameter muss gegeben sein Identier sollten persistent sein, resultieren aus vorgehender Suche im Portal 46 / 48 C3Grid-T5.1-002-Draft T5.1: Grid Data Management Architecture and Specication uber Portal muss gewahrleistet sein, dass sinnvolle Abfragen entstehen 2. Ziel f. Lokalisierung des directories f. Result les default lokal ggf. auch als output, dann als erstes generiert und zuruckgeworfen 3. Output format 4. Preprocessing (vgl. b) entspricht GAT od.CDOs (HH) als Liste von gewunschten Aufgaben OUTPUT 1. Zeitabschatzung f. Prozessierung der Anfrage optional 2. Groenabschatzung des Result Sets optional 3. Fehler Jeder Job erzeugt mindestens zwei les 1. Datenle(s) 2. Metadatenle als Kompilat oder Zitatliste (noch oen) a) GAT o.a sollte als Vorlage einbezogen werden b) kann auch sequentiell als separate nachgeschalter job abgearbeitet werden Metadaten 1. Discovery Metadaten OAI-PMH - entweder DIF oder ISO19115 2. Metadaten f. Dateninformationsdienst halt nur dynamisch erzeugte Eintrage Metadaten werden datenbankmaig erfasst f. Darstellung im Portal und Statistik etc. C.3. Mailing Lists C3Grid-T5.1-002-Draft 47 / 48 T5.1: Grid Data Management Architecture and Specication Bibliography [Bra97] S. Bradner. RFC 2119: Key words for use in RFCs to indicate requirement levels, March 1997. [DG] D-Grid. Collaborative climate community data and processing grid (C3Grid), project proposal. [DOI] The digital object identier system. http://www.doi.org. The International DOI Foundation (IDF), [Fuh] Patrick Fuhrmann. dCache { the overview. Technical report, Deutsches Elektronen Synchrotron Notkestrasse 85, 22607 Hamburg. http://www.dcache.org. [LS06a] Tobias Langhammer and Florian Schintke. T2.1: Grid Information Service Architecture and Specication. Collaborative Climate Community Data and Processing Grid (C3Grid), 3 2006. [LS06b] Tobias Langhammer and Florian Schintke. T5.1: Grid Data Management Architecture and Specication. Collaborative Climate Community Data and Processing Grid (C3Grid), 3 2006. 48 / 48 C3Grid-T5.1-002-Draft