Technical Papers
Wednesday, Oct 14
10:30 - 12:00 - Past, Present, and Future
13:00 - 14:30 - Clouds 1
14:45 - 16:15 - Clouds 2, Grids 1
Thursday, Oct 15
10:30 - 12:00 - Grids 2
13:00 - 14:30 - Grids 3, Resource Management
14:45 - 16:15 - IEEE Grid Invited Speaker: Mike Vetterli
--------------------------------------------------------------------------------------------------
Session 1: Past, Present, and Future
Critical Perspectives on Large-Scale Distributed
Applications and Production Grids
Shantenu Jha, Daniel S. Katz, Manish Parashar, Omer Rana and Jon Weissman
It is generally accepted that the ability to develop large-scale distributed applications that are extensible and independent of infrastructure details has lagged seriously behind other developments in cyberinfrastructure. As the sophistication and scale of distributed infrastructure increases, the complexity of successfully developing and deploying distributed applications increases both quantitatively and in qualitatively newer ways. In this paper we trace the evolution of a representative set of “state-of-the-art” distributed applications and production infrastructure; in doing so we aim to provide insight into the evolving sophistication of distributed applications – from simple generalizations of legacy static high-performance to applications composed of multiple loosely-coupled and dynamic components. The ultimate aim of this work is to highlight that even accounting for the fact that developing applications for distributed infrastructure is a difficult undertaking, there are suspiciously few novel and interesting distributed applications that utilize production Grid infrastructure. Along the way, we aim to provide an appreciation for the fact that developing distributed applications and the theory and practice of production Grid infrastructure have often not progressed in phase. Progress in the next phase and generation of distributed applications will require stronger coupling between the design and implementation of production infrastructure and the theory of distributed applications, including but not limited to explicit support for distributed application usage modes and advances that enable distributed applications to scale-out.
Exploring Mobile Devices as Grid Resources: Using an x86 Virtual Machine to Run BOINC on an iPhone
Michael Black and William Edgar
The increasing power and number of mobile devices makes them an attractive target for grid computing. To date, most research related to mobile devices and grid computing is focused on the access and use of grid resources. In this paper we propose the use of mobile devices themselves as grid computing nodes. We demonstrate the feasibility of this concept by implementing the BOINC client on an Apple iPhone. Work units are downloaded from a BOINC server and executed on the iPhone via a virtual machine emulating an x86 processor, and results are uploaded to the server. The world of mobile devices brings renewed challenges to the problem of grid client design in the areas of network bandwidth, processor capability, storage, and energy consumption. Using our prototype, we conduct initial studies evaluating performance, energy efficiency, and bandwidth on a cellular device against a more traditional grid computing node.
Session 2: Clouds 1 (1/2)
A Survey and Taxonomy of Infrastructure as a Service and Web Hosting Cloud Providers
Radu Prodan and Simon Ostermann
With an increasing number of providers claiming to offer Cloud infrastructures, there is a lack in the community for a common terminology, accompanied by a clear definition and classification of Cloud features. We conduct in this paper a survey on a selection of Cloud providers, and propose a taxonomy of eight important Cloud computing elements covering service type, resource deployment, hardware.
A Quantitative Analysis of High Performance Computing with Amazon’s EC2 Infrastructure: The Death of the Local Cluster?
Zach Hill and Marty Humphrey
The introduction of affordable infrastructure on demand, specifically Amazon’s Elastic Compute Cloud (EC2), has had a significant impact in the business IT community and provides reasonable and attractive alternatives to locally-owned infrastructure. For scientific computation however, the viability of EC2 has come into question due to its use of virtualization and network shaping and the performance impacts of both. Several works have shown that EC2 cannot compete with a dedicated HPC cluster utilizing high-performance interconnects, but how does EC2 compare with smaller departmental and lab-sized commodity clusters that are often the primary computational resource for scientists? To answer that question we have run MPI and memory bandwidth benchmarks on EC2 clusters with each of the 64-bit instance types to compare the performance of a 16 node cluster of each to a dedicated locally-owned commodity cluster based on Gigabit Ethernet. Our results show that while EC2 does experience reduced performance, it is still viable for smaller-scale applications.
Investigating the Use of Autonomic Cloudbursts for High-Throughput Medical Image Registration
Hyunjoo Kim, Manish Parashar, David J. Foran and Lin Yang
This paper investigates the use of clouds and autonomic cloudbursting to support a medical image registration. The goal is to enable a virtual computational cloud that integrates local computational environments and public cloud services on-the-fly, and support image registration re- quests from different distributed researcher groups with varied computational requirements and QoS constraints. The virtual cloud essentially implements shared and coordinated task-spaces, which coordinates the scheduling of jobs submitted by a dynamic set of research groups to their local job queues. A policy-driven scheduling agent uses the QoS constraints along with performance history and the state of the resources to determine the appropriate size and mix of the public and private cloud resource that should be allocated to a specific request. The virtual computational cloud and the medical image registration service have been developed using the CometCloud engine and have been deployed on a combination of private clouds at Rutgers University and the Cancer Institute of New Jersey and Amazon EC2. An experimental evaluation demonstrates the effectiveness of autonomic cloudbursts and application.
Session 3: Clouds 2 (2/2)
Extending Grids with Cloud Resource Management for Scientific Computing
Simon Ostermann, Radu Prodan and Thomas Fahringer
From its start using supercomputers, scientific computing constantly evolved to the next levels such as cluster computing, meta-computing, or computational Grids. Today, Cloud Computing is emerging as the paradigm for the next generation of large-scale scientific computing, eliminating the need of hosting expensive computing hardware. Scientists still have their Grid environments in place and can benefit from extending them by leased Cloud resources whenever needed. This paradigm shift opens new problems that need to be analyzed, such as integration of this new resource class into existing environments, applications on the resources and security. The virtualization overheads for deployment and starting of a virtual machine image are new factors, which will need to be considered when choosing scheduling mechanisms. In this paper we investigate the usability of compute Clouds to extend a Grid workflow middleware and show on a real implementation of scientific workflows.
Towards Autonomic Workload Provisioning for Enterprise Grids and Clouds
Andres Quiroz, Nathan Gnanasambandam, Manish Parashar, Hyunjoo Kim and Naveen Sharma
This paper explores autonomic approaches for optimizing provisioning for heterogeneous workloads on enterprise Grids and clouds. Specifically, this paper presents a decentralized, robust online clustering approach that addresses the distributed nature of these environments, and can be used to detect patterns and trends, and use this information to optimize provisioning of virtual (VM) resources. It then presents a model-based approach for estimating application service time using long-term application performance monitoring, to provide feedback about the appropriateness of requested resources as well as the system’s ability to meet QoS constraints and SLAs. Specifically for high-performance computing workloads, the use of a quadratic response surface model (QRSM) is justified with respect to traditional models, demonstrating the need for application-specific modeling. The proposed approaches are evaluated using a real computing center workload trace and the results demonstrate both their effectiveness and cost-efficiency.
An Efficient and Scalable Algorithm for Policy Compatibility in Service Virtualization
Eunjin (EJ) Jung, Bruno Crispo and Lasantha Ranaweera
More and more companies are adopting the emerging business model usually referred as “Service Virtualization”. In service virtualization scenario, resource providers can host third-party services, as long as their resource and security requirements are compatible with those of service providers. XACML is a standard language for access control specification. We propose a complete support system for any number of resource and service providers 1) to specify their access control and resource requirements and 2) to check the compatibility between any set of resource providers and that of service providers. We describe our algorithms and show the efficiency and scalability of our system via testing on simulated policies and policies from Margrave and Fedora Core. Our experimental results show that our performance is fast enough to support realtime resource scheduling.
Session 4: Grids 1 (1/3)
Connecting OGC Web Services and the Grid using Globus Toolkit 4 and OGSA-DAI
Ralf Groeper, Christopher Kunz and Christian Grimm
Adapting existing distributed infrastructures to the Grid is a major aim of Germany’s D-Grid initiative. As generic compute and data storage resources as well as management services for Virtual Organizations are well established in the D-Grid infrastructure, it is now possible for existing research communities to use these resources to enhance their existing infrastructures or develop new ones that had not been possible without the enormous amount of compute power and storage provided by the Grid. GDI-Grid is one of these communities and adaptation of existing geospatial infrastructures, especially those based upon the Open Geospatial Consortium (OGC) web services, is subject to current research. We present the results of this research and the resulting implementations in this paper. Special focus hereby lies on compute services, data management, and security.
Davis: A Generic Interface for iRODS and SRB
Shunde Zhang, Paul Coddington and Andrew Wendelborn
The Storage Resource Broker (SRB), and its successor, the Integrated Rule-Oriented Data System (iRODS), are typical data grid software systems, which are widely exploited by researchers to store large data collections with associated metadata. On the other hand, the proprietary iRODS/SRB protocol results in limited support of high-quality and generic user interfaces. Although a few Graphical User Interfaces (GUIs) and web interfaces have been developed for SRB and iRODS, they either have restricted functionality, or are customized for the needs of particular user groups. Moreover, they are not easy or straightforward to use, especially for basic users, which sometimes stops potential users from adapting iRODS/SRB as their storage solution. In order to tackle these shortcomings, Davis, a generic gateway to iRODS/SRB, has been developed, aiming to expand the use of iRODS/SRB to users with any level of computer skills, as well as making it easier to interface iRODS/SRB with other applications. Davis conforms to the open standard and broadly accepted WebDAV protocol, with additional features to facilitate HTTP access. This paper first investigates existing iRODS/SRB client tools, and then describes the approach of implementing Davis and its features, followed by some use cases to demonstrate the usefulness of a WebDAV interface to iRODS/SRB.
Performance Analysis of Grid Applications in the ASKALON Environment
Radu Prodan, Simon Ostermann and Kassian Plankensteiner
With an increasing number of providers claiming to offer Cloud
infrastructures, there is a lack in the community for a common
terminology, accompanied by a clear definition and classification of
Cloud features. We conduct in this paper a survey on a selection of
Cloud providers, and propose a taxonomy of eight important Cloud
computing elements covering service type, resource deployment,
hardware, runtime tuning, business model, middleware, and performance.
We conclude that the provisioning of Service Level Agreements as
utilities, of open and interoperable middleware solutions, as well as
of sustained performance metrics for high-performance computing
applications are threewith the highest need of further community
research.
Building Dynamic Integrity Protection for Multiple Independent Authorities in Virtualization-based Infrastructure
Ge Cheng, Hai Jin, Deqing Zou, Xinwen Zhang, Min Li, Chen Yu and Guofu Xiang
In
grid and cloud computing infrastructures, the integrity of a computing
platform is a critical security requirement in order to provide secure
and honest computing environments to service providers and resource
consumers. However, due to the fact that software components running on
a single platform are usually provided and maintained by different
authorities which are potentially untrusted to each other, the problem
to monitor and protect runtime system integrity become very challenging
and has not been well addressed yet. In this paper, we present a
virtualization based dynamic integrity protection method which ensures
that only appropriate authorities can control over their components
without interfering with other component providers or authorities. In
our solution, integrity requirements defined by the authorities of
upper components (e.g., service middleware and applications) are
respected by preventing the underlying components (e.g., operating
system) from exposing their sensitive data, which can be caused by
update of the underlying components or other malicious actions. We
implement our solution on Xen-based platform, and our evaluation
results show that the solution is effective for integrity protection
with acceptable performance overhead.
Session 5: Grids 2 (2/3)
ADEM: Automating Deployment and Management of Application Software on the Open Science Grid
Zhengxiong Hou, Mike Wilde, Xingshe Zhou, Ian Foster and Jing Tie
In
grid environments, the deployment and management of application
software presents a major practical challenge for end users. Performing
these tasks manually is error-prone and not scalable to large grids. In
this work, we propose an automation tool, ADEM, for grid application
software deployment and management, and demonstrate and evaluate the
tool on the Open Science Grid. ADEM uses Globus for basic grid
services, and integrates the grid software installer Pacman. It
supports both centralized “prebuild” and on-site “dynamic-build”
approaches to software compilation, using the NMI Build and Test system
to perform central prebuilds for specific target platforms. ADEM’s
parallel workflow automatically determines available grid sites and
their platform “signatures”, checks for and integrates dependencies,
and performs software build, installation, and testing. ADEM’s tracking
log of build and installation activities is helpful for troubleshooting
potential exceptions. Experimental results on the Open Science Grid
show that ADEM is easy to use and more productive for users than manual
operation.
Semantic Framework for Free-Form Search of Grid Resources and Services
Chaitali Gupta, Rajdeep Bhowmik and Madhusudhan Govindaraju
If the model of free-form queries, which has proved successful for
HTML-based search on the Web, is made available for grid services, it
will serve as a powerful tool for scientists to retrieve information on
resources, monitoring data, replica location sets, and meta-data on
scientific data sets, etc., in an intuitive manner. To enable this
vision, there is a critical need to design and develop tools that
abstract away the fundamental complexity of XML-based grid
specifications and toolkits, and provide an elegant, simple, and
powerful free-form query-based invocation system to end users. Current
implementations of XML-based grid service descriptions require end
users to have intimate knowledge of service descriptions, related
toolkits, and query languages. We present the design, implementation,
and performance analysis of our ontological framework that employs
matching algorithms, automated extension of ontologies, and
optimizations to match free-form user queries with corresponding
operations in grid services and resource information stored in RDF/OWL
format. Our system uses Semantic Web concepts, ontologies, and lexical
tools to automate discovery and matchmaking of grid services and
resource information. We quantify the performance improvements due to
knowledge acquisition and extension of ontology files with each missed
query, and present precision and rethe accuracy of the free-form search.
Parallel and Distributed Approach for Processing Large-Scale XML Datasets
Zacharia Fadika, Michael R. Head and Madhusudhan Govindaraju
An emerging trend is the use of XML as the data format for many distributed scientific applications, with the size of these documents ranging from tens of megabytes to hundreds of megabytes. Our earlier benchmarking results revealed that most of the widely available XML processing toolkits do not scale well for large sized XML data. A significant transformation is necessary in the design of XML processing for scientific applications so that the overall application turn-around time is not negatively affected. We present both a parallel and distributed approach to analyze how the scalability and performance requirements of large-scale XML-based data processing can be achieved. We have adapted the Hadoop implementation to determine the threshold data sizes and computation work required per node, for a distributed solution to be effective. We also present an analysis of parallelism using our PI XI MAL toolkit for processing large-scale XML datasets that utilizes the capabilities for parallelism that are available in the emerging multi-core architectures. Multi-core processors are expected to be widely available in research clusters and scientific desktops, and it is critical to harness the opportunities for parallelism in the middleware, instead of passing on the task to application programmers. Our parallelization approach for a multi-core node is to employ a DFA-based parser that recognizes a useful subset of the XML specification, and convert the DFA into an NFA that can be applied to an arbitrary subset of the input. Speculative NFAs are scheduled on available cores in a node to effectively utilize the processing capabilities and achieve overall performance gains. We evaluate the efficacy of this approach in terms of potential speedup that can be achieved for representative XML data sets.
Session 6: Grids 3 (3/3)
Re:GRIDiT – Coordinating Distributed Update Transactions on Replicated Data in the Grid
Laura Cristiana Voicu, Heiko Schuldt, Fuat Akal, Yuri Breitbart and Hans-Jorg Schek
The recent proliferation of Grid environments for eScience applications led to common computing infrastructures with nearly unlimited storage capabilities. In terms of data management, the Grid allows keeping a large number of replicas of data objects to allow for a high degree of availability, reliability and performance. Due to the particular characteristics of the Grid, especially due to the absence of a global coordinator, dealing with many updateable replicas per data object urgently requires new protocols for the synchronization of updates and their subsequent propagation. Currently there is no protocol which can be seamlessly applied to a data Grid environment without impacting correctness and/or overall performance. In this paper we address the problem of replication in the Data Grid in the presence of updates. We have designed the Re:GRIDiT protocol that focuses on the correct synchronization of updates to several replicas in the Grid in a completely distributed way, extending well-established database replication techniques. Globally correct execution is provided by communication between transactions and sites. Re:GRIDiT takes into account the special characteristics of eScience applications such as the distinction between mutable objects, that can be updated by users and immutable objects. Finally, we provide a detailed evaluation of the performance of the Re:GRIDiT protocol when being applied at Grid scale.
Finding Associations in Grid Monitoring Data
Gerhild Maier, Daniel van der Ster and Dieter Kranzlmueller
Error
handling is a crucial task in infrastructures as complex as grids.
Today, there are several monitoring tools, which can be used to report
failing grid jobs including corresponding error codes. However, the
error codes do not always indicate the actual fault, which originally
caused the job failure. Human time and expertise is required to
manually trace errors back to the real fault underlying an error. We
perform Association Rule Mining on grid job monitoring data to
automatically retrieve knowledge about the grid components’ behaviour
by taking dependencies between grid job characteristics into account.
Therewith, problematic grid components are located automatically and
this information – expressed by association rules – is visualised in a
web interface. This work achieves a decrease in time for fault recovery
and liability.
PDC-NH: Popular Data Concentration
on NAND Flash and Hard Disk Drive
DongKyu Lee and Kern Koh
Session 7: Resource Management
Committee-based Evaluation and Selection of Grid Resources for QoS Improvement
Zhen Wang and Junwei Cao
The grid enables the facility of distributed resource sharing but can provide limited QoS guarantee to grid applications, e.g. in terms of execution time. In this paper, we propose a Committee-based Resource Evaluation and Selection Method (CRESM) to evaluate and select reliable resources for Users, which have higher possibility of finishing tasks in time and provide better QoS. CRESM is composed with a representative layer and a committee layer. The representative layer consists of informed and experienced Users who contribute their individual experiences to judge a particular grid resource. The committee layer aggregates these individual judgments and makes a comprehensive decision based on these aggregated information. All the judgments and the final decision are made fuzzily based on historical statistics of resource evaluation. Experimental results show that CRESM is stable and accurate in different grid environments. CRESM can also defeat collusive deceitful behaviors. Meanwhile, CRESM based resource scheduling can improve QoS support for grid applications.
Scheduling Grid Workloads on Multicore Clusters to Minimize Energy and Maximize Performance
Michael Lammie, Paul Brenner and Douglas Thain
Energy is a significant and growing component of the cost of running a large computing facility. A grid workload consisting of millions of jobs running on thousands of processors may consume millions of kilowatt hours of electricity. However, because a grid workload generally consists of many independent sequential processes, we may shape its execution to satisfy energy constraints. By varying the number and frequency of processors available, a scheduler may trade off energy against performance. In this paper, we explore energy and performance tradeoffs in the scheduling of grid workloads on large clusters. We build upon previous work by showing the interaction of intelligent job assignment, automated node scaling, and frequency scaling on multicore clusters. An unexpected result is that, even though low frequency is the most efficient mode of operating a single node, the careful application of frequency scaling can actually reduce overall energy consumption even further by reducing the number of nodes powered on.
Dependable Workflow Scheduling in Global Grids
Mustafizur Rahman, Rajiv Ranjan and Rajkumar Buyya
In this paper, a reputation-based Grid workflow scheduling algorithm is proposed to counter the effect of inherent unreliability and temporal characteristics of computing resources in large scale, decentralized Grid overlays. The proposed approach builds upon structured peer-to-peer indexing and overlay networking techniques to create a scalable wide-area networking of Grid sites for supporting dependable scheduling of applications. The scheduling algorithm considers reliability of a Grid resource as a statistical property, which is globally computed in the decentralized Grid overlay based on dynamic feedbacks or reputation scores as- signed by individual service consumers (Grid Resource Brokers). The proposed algorithm can dynamically adapt to changing resource conditions and offer significant performance gains as compared to traditional approaches in the event of unsuccessful job execution or resource failure. We evaluate and demonstrate the feasibility of our approach through an extensive trace driven simulation. The results show that our scheduling technique can reduce the makespan up to 50% and successfully isolate the failure-prone resources from the system.
------------------------------------------------------------------------------------
Accepted papers:
1. Shunde Zhang, Paul Coddington and Andrew Wendelborn. Davis: A
Generic Interface for iRODS and SRB
2. Zhengxiong Hou, Mike Wilde, Xingshe Zhou, Ian Foster and Jing
Tie. ADEM: Automating Deployment and Management of Application Software on the Open Science Grid
3. Ralf Groeper, Christopher Kunz and Christian Grimm. Connecting
OGC Web Services and the Grid using Globus Toolkit 4 and OGSA-DAI
4. Laura Cristiana Voicu, Heiko Schuldt, Fuat Akal, Yuri Breitbart
and Hans-Jorg Schek. Re:GRIDiT – Coordinating Distributed Update
Transactions on Replicated Data in the Grid
5. Zhen Wang and Junwei Cao. Committee-based Evaluation and Selection
of Grid Resources for QoS Improvement
6. Ge Cheng, Hai Jin, Deqing Zou, Xinwen Zhang, Min Li, Chen Yu and Guofu Xiang. Building Dynamic
Integrity Protection for Multiple Independent Authorities in
Virtualization-based Infrastructure
7. Gerhild Maier, Daniel van der Ster and Dieter Kranzlmueller.
Finding Associations in Grid Monitoring Data
8. Michael Black and William Edgar. Exploring Mobile Devices
as Grid Resources: Using an x86 Virtual Machine to Run BOINC on an
iPhone
9. Radu Prodan and Simon Ostermann. A Survey and Taxonomy of Infrastructure as a Service and Web Hosting Cloud Providers
10. Simon Ostermann, Radu Prodan and Thomas Fahringer. Extending
Grids with Cloud Resource Management for Scientific Computing
11. Radu Prodan, Simon Ostermann and Kassian Plankensteiner. Performance Analysis of Grid
Applications in the ASKALON Environment
12. Chaitali Gupta, Rajdeep Bhowmik and Madhusudhan Govindaraju.
Semantic Framework for Free-Form Search of Grid Resources and
Services
13. Zacharia Fadika, Michael R. Head and Madhusudhan Govindaraju.
Parallel and Distributed Approach for Processing Large-Scale XML
Datasets
14. Michael Lammie, Paul Brenner and Douglas Thain. Scheduling
Grid Workloads on Multicore Clusters to Minimize Energy and Maximize
Performance
15. Eunjin (EJ) Jung, Bruno Crispo and Lasantha Ranaweera. An Efficient
and Scalable Algorithm for Policy Compatibility in Service
Virtualization
16. Mustafizur Rahman, Rajiv Ranjan and Rajkumar Buyya.
Dependable Workflow Scheduling in Global Grids
17. Andres Quiroz, Nathan Gnanasambandam, Manish Parashar, Hyunjoo
Kim and Naveen Sharma. Towards Autonomic Workload Provisioning for
Enterprise Grids and Clouds
18. Zach Hill and Marty Humphrey. A Quantitative Analysis of High
Performance Computing with Amazon’s EC2 Infrastructure: The Death
of the Local Cluster?
19. Shantenu Jha, Daniel S. Katz, Manish Parashar, Omer Rana and
Jon Weissman. Critical Perspectives on Large-Scale Distributed
Applications and Production Grids
20. Hyunjoo Kim, Manish Parashar, David J. Foran and Lin Yang.
Investigating the Use of Autonomic Cloudbursts for High-Throughput Medical Image Registration
