Publications
A full list of my publications can be found on Google Scholar and dblp.
Journal Publications, Conference Publications, Preprints.
Journal Publications
2022
- TONA Study on the Impact of Memory DoS Attacks on Cloud Applications and Exploring Real-Time Detection SchemesZhuozhao Li, Tanmoy Sen, Haiying Shen, and Mooi Choo ChuahIEEE/ACM Transactions on Networking, 2022
Even though memory denial-of-service attacks can cause severe performance degradations on co-located virtual machines, a previous detection scheme against such attacks cannot accurately detect the attacks and also generates high detection delay and high performance overhead since it assumes that cache-related statistics of an application follow the same probability distribution at all times, which may not be true for all types of applications. In this paper, we present the experimental results showing the impacts of memory DoS attacks on different types of cloud-based applications. Based on these results, we propose two lightweight and responsive Statistical based Detection Schemes (SDS/B and SDS/P) that can detect such attacks accurately. SDS/B constructs a profile of normal range of cache-related statistics for all applications and use statistical methods to infer an attack when the real-time collected statistics exceed this normal range, while SDS/P exploits the increased periods of access patterns for periodic applications to infer an attack. Upon SDS, we further leverage deep neural network (DNN) techniques to design a DNN-based detection scheme that is general to various types of applications and more robust to adaptive attack scenarios. Our evaluation results show that SDS/B, SDS/P and DNN outperform the state-of-the-art detection scheme, e.g., with 65% higher specificity, 40% shorter detection delay, and 7% less performance overhead. We also discuss how to use SDS and DNN-based detection schemes under different situations.
@article{li2022mdos, author = {Li, Zhuozhao and Sen, Tanmoy and Shen, Haiying and Chuah, Mooi Choo}, journal = {IEEE/ACM Transactions on Networking}, title = {A Study on the Impact of Memory DoS Attacks on Cloud Applications and Exploring Real-Time Detection Schemes}, year = {2022}, volume = {30}, number = {4}, pages = {1644-1658}, doi = {10.1109/TNET.2022.3144895}, abbr = {TON}, bibtex_show = {true}, url = {https://ieeexplore.ieee.org/abstract/document/9708719}, pdf = {TON-li-mdos-2022.pdf}, selected = {false} }
- TONCo-Scheduler: A Coflow-Aware Data-Parallel Job Scheduler in Hybrid Electrical/Optical Datacenter NetworksZhuozhao Li, and Haiying ShenIEEE/ACM Transactions on Networking, 2022
To support higher demand for datacenter networks, networking researchers have proposed hybrid electrical/optical datacenter networks (Hybrid-DCN) that leverages optical circuit switching (OCS) along with traditional electrical packet switching (EPS). However, due to the high reconfiguration delay of OCS, OCS is used only for bulk data transfers between racks to amortize the reconfiguration delay. Existing job schedulers for data-parallel frameworks are not designed for Hybrid-DCN, since they neither place tasks to aggregate data traffic to take advantage of OCS, nor schedule tasks to minimize the Coflow completion time (CCT). In this paper, we describe the mismatch between existing job schedulers and the advanced Hybrid-DCN, introduce the requirements for the new scheduler, and present the implementation of Co-scheduler , a job scheduler for data-parallel frameworks that aims to improve job performance by placing the tasks of jobs to aggregate enough data traffic to better leverage OCS to minimize the CCT in Hybrid-DCN. Specifically, for every job, Co-scheduler computes guidelines on how many racks to place the job’s input data and the job’s tasks. The guidelines are dynamically generated based on the real-time job characteristics or predictable job characteristics from prior runs, with the aim of leveraging OCS whenever possible and efficient and minimizing CCT of jobs. Co-scheduler then schedules the tasks of jobs based on the guidelines. We evaluate the effectiveness of Co-scheduler using trace-driven simulation. The evaluation demonstrates that Co-scheduler can improve makespan, average job completion time, and average CCT of a workload by up to 56%, 61%, and 79%, respectively, compared to the state-of-the-art schedulers.
@article{li2022coscheduler, author = {Li, Zhuozhao and Shen, Haiying}, journal = {IEEE/ACM Transactions on Networking}, title = {Co-Scheduler: A Coflow-Aware Data-Parallel Job Scheduler in Hybrid Electrical/Optical Datacenter Networks}, year = {2022}, volume = {30}, number = {4}, pages = {1599-1612}, doi = {10.1109/TNET.2022.3143232}, abbr = {TON}, bibtex_show = {true}, url = {https://ieeexplore.ieee.org/abstract/document/9695973}, pdf = {TON-li-coscheduler-2022.pdf}, selected = {true} }
- TPDSfuncX: Federated Function as a Service for ScienceZhuozhao Li, Ryan Chard, Yadu Babuji, Ben Galewsky, Tyler J. Skluzacek, Kirill Nagaitsev, Anna Woodard, Ben Blaiszik, Josh Bryan, Daniel S. Katz, Ian Foster, and Kyle ChardIEEE Transactions on Parallel and Distributed Systems, 2022
funcX is a distributed function as a service (FaaS) platform that enables flexible, scalable, and high performance remote function execution. Unlike centralized FaaS systems, funcX decouples the cloud-hosted management functionality from the edge-hosted execution functionality. funcX’s endpoint software can be deployed, by users or administrators, on arbitrary laptops, clouds, clusters, and supercomputers, in effect turning them into function serving systems. funcX’s cloud-hosted service provides a single location for registering, sharing, and managing both functions and endpoints. It allows for transparent, secure, and reliable function execution across the federated ecosystem of endpoints—enabling users to route functions to endpoints based on specific needs. funcX uses containers (e.g., Docker, Singularity, and Shifter) to provide common execution environments across endpoints. funcX implements various container management strategies to execute functions with high performance and efficiency on diverse funcX endpoints. funcX also integrates with an in-memory data store and Globus for managing data that may span endpoints. We motivate the need for funcX, present our prototype design and implementation, and demonstrate, via experiments on two supercomputers, that funcX can scale to more than 130000 concurrent workers. We show that funcX’s container warming-aware routing algorithm can reduce the completion time for 3,000 functions by up to 61% compared to a randomized algorithm and the in-memory data store can speed up data transfers by up to 3x compared to a shared file system.
@article{li2022funcx, author = {Li, Zhuozhao and Chard, Ryan and Babuji, Yadu and Galewsky, Ben and Skluzacek, Tyler J. and Nagaitsev, Kirill and Woodard, Anna and Blaiszik, Ben and Bryan, Josh and Katz, Daniel S. and Foster, Ian and Chard, Kyle}, journal = {IEEE Transactions on Parallel and Distributed Systems}, title = {funcX: Federated Function as a Service for Science}, year = {2022}, volume = {33}, number = {12}, pages = {4948-4963}, doi = {10.1109/TPDS.2022.3208767}, abbr = {TPDS}, bibtex_show = {true}, url = {https://ieeexplore.ieee.org/abstract/document/9899739}, pdf = {TPDS-li-funcx-2022.pdf}, selected = {true} }
- JCIMHigh Throughput Virtual Screening and Validation of a SARS-CoV-2 Main Protease Non-Covalent InhibitorAustin Clyde, Stephanie Galanie, Daniel W. Kneller, Heng Ma, Yadu Babuji, Ben Blaiszik, Alexander Brace, Thomas Brettin, Kyle Chard, Ryan Chard, Leighton Coates, Ian Foster, Darin Hauner, Vilmos Kertesz, Neeraj Kumar, Hyungro Lee, Zhuozhao Li, Andre Merzky, Jurgen G. Schmidt, Li Tan, Mikhail Titov, Anda Trifan, Matteo Turilli, Hubertus Van Dam, Srinivas C. Chennubhotla, Shantenu Jha, Andrey Kovalevsky, Arvind Ramanathan, Martha S. Head, and Rick StevensJournal of Chemical Information and Modeling, 2022
Despite the recent availability of vaccines against the acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the search for inhibitory therapeutic agents has assumed importance especially in the context of emerging new viral variants. In this paper, we describe the discovery of a novel non-covalent small-molecule inhibitor, MCULE-5948770040, that binds to and inhibits the SARS-Cov-2 main protease (Mpro) by employing a scalable high throughput virtual screening (HTVS) framework and a targeted compound library of over 6.5 million molecules that could be readily ordered and purchased. Our HTVS framework leverages the U.S. supercomputing infrastructure achieving nearly 91% resource utilization and nearly 126 million docking calculations per hour. Downstream biochemical assays validate this Mpro inhibitor with an inhibition constant (Ki) of 2.9 \textmuM [95% CI 2.2, 4.0]. Further, using room-temperature X-ray crystallography, we show that MCULE-5948770040 binds to a cleft in the primary binding site of Mpro forming stable hydrogen bond and hydrophobic interactions. We then used multiple \textmus-timescale molecular dynamics (MD) simulations, and machine learning (ML) techniques to elucidate how the bound ligand alters the conformational states accessed by Mpro, involving motions both proximal and distal to the binding site. Together, our results demonstrate how MCULE-5948770040 inhibits Mpro and offers a springboard for further therapeutic design.Significance StatementSignificance Statement The ongoing novel coronavirus pandemic (COVID-19) has prompted a global race towards finding effective therapeutics that can target the various viral proteins. Despite many virtual screening campaigns in development, the discovery of validated inhibitors for SARS-CoV-2 protein targets has been limited. We discover a novel inhibitor against the SARS-CoV-2 main protease. Our integrated platform applies downstream biochemical assays, X-ray crystallography, and atomistic simulations to obtain a comprehensive characterization of its inhibitory mechanism. Inhibiting Mpro can lead to significant biomedical advances in targeting SARS-CoV-2 treatment, as it plays a crucial role in viral replication.Competing Interest StatementThe authors have declared no competing interest.
@article{clyde2021high, author = {Clyde, Austin and Galanie, Stephanie and Kneller, Daniel W. and Ma, Heng and Babuji, Yadu and Blaiszik, Ben and Brace, Alexander and Brettin, Thomas and Chard, Kyle and Chard, Ryan and Coates, Leighton and Foster, Ian and Hauner, Darin and Kertesz, Vilmos and Kumar, Neeraj and Lee, Hyungro and Li, Zhuozhao and Merzky, Andre and Schmidt, Jurgen G. and Tan, Li and Titov, Mikhail and Trifan, Anda and Turilli, Matteo and Van Dam, Hubertus and Chennubhotla, Srinivas C. and Jha, Shantenu and Kovalevsky, Andrey and Ramanathan, Arvind and Head, Martha S. and Stevens, Rick}, title = {High Throughput Virtual Screening and Validation of a SARS-CoV-2 Main Protease Non-Covalent Inhibitor}, journal = {Journal of Chemical Information and Modeling}, volume = {62}, number = {1}, pages = {116-128}, year = {2022}, doi = {10.1021/acs.jcim.1c00851}, note = {PMID: 34793155}, url = {https://doi.org/10.1021/acs.jcim.1c00851}, abbr = {JCIM}, bibtex_show = {true}, selected = {true}, pdf = {JCIM-clyde-screening-2022.pdf} }
2021
- JPDCDLHub: Simplifying publication, discovery, and use of machine learning models in scienceZhuozhao Li, Ryan Chard, Logan Ward, Kyle Chard, Tyler J. Skluzacek, Yadu Babuji, Anna Woodard, Steven Tuecke, Ben Blaiszik, Michael J. Franklin, and Ian FosterJournal of Parallel and Distributed Computing, 2021
Machine Learning (ML) has become a critical tool enabling new methods of analysis and driving deeper understanding of phenomena across scientific disciplines. There is a growing need for “learning systems” to support various phases in the ML lifecycle. While others have focused on supporting model development, training, and inference, few have focused on the unique challenges inherent in science, such as the need to publish and share models and to serve them on a range of available computing resources. In this paper, we present the Data and Learning Hub for science (DLHub), a learning system designed to support these use cases. Specifically, DLHub enables publication of models, with descriptive metadata, persistent identifiers, and flexible access control. It packages arbitrary models into portable servable containers, and enables low-latency, distributed serving of these models on heterogeneous compute resources. We show that DLHub supports low-latency model inference comparable to other model serving systems including TensorFlow Serving, SageMaker, and Clipper, and improved performance, by up to 95%, with batching and memoization enabled. We also show that DLHub can scale to concurrently serve models on 500 containers. Finally, we describe five case studies that highlight the use of DLHub for scientific applications.
@article{li2021dlhub, title = {DLHub: Simplifying publication, discovery, and use of machine learning models in science}, journal = {Journal of Parallel and Distributed Computing}, volume = {147}, pages = {64-76}, year = {2021}, issn = {0743-7315}, doi = {https://doi.org/10.1016/j.jpdc.2020.08.006}, url = {https://www.sciencedirect.com/science/article/pii/S0743731520303464}, author = {Li, Zhuozhao and Chard, Ryan and Ward, Logan and Chard, Kyle and Skluzacek, Tyler J. and Babuji, Yadu and Woodard, Anna and Tuecke, Steven and Blaiszik, Ben and Franklin, Michael J. and Foster, Ian}, keywords = {Learning systems, Model serving, Machine learning, DLHub}, bibtex_show = {true}, selected = {true}, abbr = {JPDC}, pdf = {JPDC-li-dlhub-2021.pdf} }
2018
- TBDAnalysis of Knowledge Sharing Activities on a Social Network Incorporated Discussion Forum: A Case Study of DISboardsZhuozhao Li, Harrison Chandler, and Haiying ShenIEEE Transactions on Big Data, 2018
DISboards is a discussion forum that provides a platform for knowledge sharing on planning and resources for Disney-related travel (Disney World, Disney Cruise Line, etc.). Since no previous work has been devoted to studying the online social networks (SNs) in the forums, we examine the SN and knowledge sharing activities in DISboards as a case study of discussion forums. Based on a large amount of data collected, we provide an in-depth study of DISboards. In particular, we analyzed SN structure, effect of SN in the forum, category characteristics and so on. We found that users with more friends are generally more active in the forum; teens are more active and constitute a significant part of the SN. We clustered the selected categories (e.g., resorts, dining, and hotels) into three groups: report, fact, discussion, and characterized their properties. Most users focus narrowly on only a few categories, while very few users participate in many categories. The development of SN should be able to attract more users to involve in the forum. We believe that the results presented in this paper are crucial in understanding SN and knowledge sharing in the forums. The paper also gives an instruction for the enhancement of SNs to incentivize users’ activeness in the forums.
@article{li2017analysis, author = {Li, Zhuozhao and Chandler, Harrison and Shen, Haiying}, journal = {IEEE Transactions on Big Data}, title = {Analysis of Knowledge Sharing Activities on a Social Network Incorporated Discussion Forum: A Case Study of DISboards}, year = {2018}, volume = {4}, number = {4}, pages = {432-446}, keywords = {}, doi = {10.1109/TBDATA.2017.2749307}, issn = {2332-7790}, month = dec, abbr = {TBD}, bibtex_show = {true}, url = {https://ieeexplore.ieee.org/abstract/document/8025798}, pdf = {TBD-li-disboards-2017.pdf} }
- UbiCompEmploying Opportunistic Charging for Electric Taxicabs to Reduce Idle TimeLi Yan, Haiying Shen, Zhuozhao Li, Ankur Sarker, John A. Stankovic, Chenxi Qiu, Juanjuan Zhao, and Chengzhong XuProc. ACM Interact. Mob. Wearable Ubiquitous Technol., 2018
For electric taxicabs, the idle time spent on cruising for passengers, seeking chargers, and charging is wasteful. Previous works can only save cruising time through better routing, or charger seeking and charging time through proper charger deployment, but not for both. With the advancement of wireless charging techniques, efficient opportunistic charging of electric vehicles at their parked positions becomes possible. This enables a taxicab to get charged while waiting for the next passenger. In this paper, we present an opportunistic wireless charger deployment scheme in a city, which both maximizes the taxicabs’ opportunity of picking up passengers at the chargers and supports the taxicabs’ continuous operability on roads, while minimizing the total deployment cost. We studied a metropolitan-scale taxicab dataset on several factors important for deploying wireless chargers and determining the numbers of the chargers in the regions: the number of passengers, the functionalities of buildings, and the frequency of passenger appearance in a region, and taxicab traffic flows in a city. Then, we formulate a multi-objective optimization problem and find the solution. Our trace-driven experiments demonstrate the superior performance of our scheme over other representative methods in terms of reducing idle time and supporting the operability of the taxicabs.
@article{yan2018employing, author = {Yan, Li and Shen, Haiying and Li, Zhuozhao and Sarker, Ankur and Stankovic, John A. and Qiu, Chenxi and Zhao, Juanjuan and Xu, Chengzhong}, title = {Employing Opportunistic Charging for Electric Taxicabs to Reduce Idle Time}, year = {2018}, issue_date = {March 2018}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, volume = {2}, number = {1}, url = {https://doi.org/10.1145/3191779}, doi = {10.1145/3191779}, journal = {Proc. ACM Interact. Mob. Wearable Ubiquitous Technol.}, month = mar, articleno = {47}, numpages = {25}, keywords = {mobile data analysis, kernel density estimation, Vehicle wireless charging, charger deployment}, abbr = {UbiComp}, bibtex_show = {true}, selected = {true}, pdf = {UbiComp-yan-employing-2018.pdf} }
2017
- TPDSMeasuring Scale-Up and Scale-Out Hadoop with Remote and Local File Systems and Selecting the Best PlatformZhuozhao Li, and Haiying ShenIEEE Transactions on Parallel and Distributed Systems, 2017
MapReduce is a popular computing model for parallel data processing on large-scale datasets, which can vary from gigabytes to terabytes and petabytes. Though Hadoop MapReduce normally uses Hadoop Distributed File System (HDFS) local file system, it can be configured to use a remote file system. Then, an interesting question is raised: for a given application, which is the best running platform among the different combinations of scale-up and scale-out Hadoop with remote and local file systems. However, there has been no previous research on how different types of applications (e.g., CPU-intensive, data-intensive) with different characteristics (e.g., input data size) can benefit from the different platforms. Thus, in this paper, we conduct a comprehensive performance measurement of different applications on scale-up and scale-out clusters configured with HDFS and a remote file system (i.e., OFS), respectively. We identify and study how different job characteristics (e.g., input data size, the number of file reads/writes, and the amount of computations) affect the performance of different applications on the different platforms. Based on the measurement results, we also propose a performance prediction model to help users select the best platforms that lead to the minimum latency. Our evaluation using a Facebook workload trace demonstrates the effectiveness of our prediction model. This study is expected to provide a guidance for users to choose the best platform to run different applications with different characteristics in the environment that provides both remote and local storage, such as HPC cluster and cloud environment.
@article{li2017measuring, author = {Li, Zhuozhao and Shen, Haiying}, journal = {IEEE Transactions on Parallel and Distributed Systems}, title = {Measuring Scale-Up and Scale-Out Hadoop with Remote and Local File Systems and Selecting the Best Platform}, year = {2017}, volume = {28}, number = {11}, pages = {3201-3214}, keywords = {}, doi = {10.1109/TPDS.2017.2712635}, issn = {1558-2183}, month = nov, abbr = {TPDS}, bibtex_show = {true}, url = {https://ieeexplore.ieee.org/abstract/document/7940040}, selected = {true}, pdf = {TPDS-li-measurement-2017.pdf} }
- TPDSAn Exploration of Designing a Hybrid Scale-Up/Out Hadoop Architecture Based on Performance MeasurementsZhuozhao Li, Haiying Shen, Walter Ligon, and Jeffrey DentonIEEE Transactions on Parallel and Distributed Systems, 2017
Scale-up machines perform better for jobs with small and median (KB, MB) data sizes, while scale-out machines perform better for jobs with large (GB, TB) data size. Since a workload usually consists of jobs with different data size levels, we propose building a hybrid Hadoop architecture that includes both scale-up and scale-out machines, which however is not trivial. The first challenge is workload data storage. Thousands of small data size jobs in a workload may overload the limited local disks of scale-up machines. Jobs from scale-up and scale-out machines may both request the same set of data, which leads to data transmission between the machines. The second challenge is to automatically schedule jobs to either scale-up or scale-out cluster to achieve the best performance. We conduct a thorough performance measurement of different applications on scale-up and scale-out clusters, configured with Hadoop Distributed File System (HDFS) and a remote file system (i.e., OFS), respectively. We find that using OFS rather than HDFS can solve the data storage challenge. Also, we identify the factors that determine the performance differences on the scale-up and scale-out clusters and their cross points to make the choice. Accordingly, we design and implement the hybrid scale-up/out Hadoop architecture. Our trace-driven experimental results show that our hybrid architecture outperforms both the traditional Hadoop architecture with HDFS and with OFS in terms of job completion time, throughput and job failure rate.
@article{li2017exploration, author = {Li, Zhuozhao and Shen, Haiying and Ligon, Walter and Denton, Jeffrey}, journal = {IEEE Transactions on Parallel and Distributed Systems}, title = {An Exploration of Designing a Hybrid Scale-Up/Out Hadoop Architecture Based on Performance Measurements}, year = {2017}, volume = {28}, number = {2}, pages = {386-400}, keywords = {}, doi = {10.1109/TPDS.2016.2573820}, issn = {1558-2183}, month = feb, abbr = {TPDS}, bibtex_show = {true}, url = {https://ieeexplore.ieee.org/abstract/document/7480403}, selected = {true}, pdf = {TPDS-li-hybrid-2016.pdf} }
2016
- TPDSNew Bandwidth Sharing and Pricing Policies to Achieve a Win-Win Situation for Cloud Provider and TenantsHaiying Shen, and Zhuozhao LiIEEE Transactions on Parallel and Distributed Systems, 2016
For predictable application performance or fairness in network sharing in clouds, many bandwidth allocation policies have been proposed. However, with these policies, tenants are not incentivized to use idle bandwidth or prevent link congestion, and may even take advantage of the policies to gain unfair bandwidth allocation. Increasing network utilization while avoiding congestion not only benefits cloud provider but also the tenants by improving application performance. In this paper, we propose a new pricing model that sets different unit prices for reserved bandwidth, the bandwidth on congested links and on uncongested links, and makes the unit price for congested links proportional to their congestion degrees. We use game theory model to analyze tenants’ behaviors in our model and the current pricing models, which shows the effectiveness of our model in providing the incentives. With the pricing model, we propose a network sharing policy to achieve both min-guarantee and proportionality, while prevent tenants from earning unfair bandwidth. We further propose methods for each virtual machine to arrange its traffic to reduce its unsatisfied demand and maximize its utility, while increase network utilization. As a result, our solution creates a win-win situation, where tenants strive to increase their benefits in bandwidth sharing, which also concurrently increases the utilities of cloud provider and other tenants. Our simulation and trace-driven experimental results show the effectiveness of our solutions in creating the win-win situation.
@article{shen2016new, author = {Shen, Haiying and Li, Zhuozhao}, journal = {IEEE Transactions on Parallel and Distributed Systems}, title = {New Bandwidth Sharing and Pricing Policies to Achieve a Win-Win Situation for Cloud Provider and Tenants}, year = {2016}, volume = {27}, number = {9}, pages = {2682-2697}, keywords = {}, doi = {10.1109/TPDS.2015.2497701}, issn = {1558-2183}, month = sep, abbr = {TPDS}, bibtex_show = {true}, url = {https://ieeexplore.ieee.org/abstract/document/7317795}, selected = {true}, pdf = {TPDS-shen-winwin-2016.pdf} }
Conference Publications
2021
- IPDPSWCoding the Computing Continuum: Fluid Function Execution in Heterogeneous Computing EnvironmentsRohan Kumar, Matt Baughman, Ryan Chard, Zhuozhao Li, Yadu Babuji, Ian Foster, and Kyle ChardIn 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2021
Advances in network technologies have greatly decreased barriers to accessing physically distributed computers. This newfound accessibility coincides with increasing hardware specialization, creating exciting new opportunities to dispatch workloads to the best resource for a specific purpose, rather than those that are closest or most easily accessible. We present Delta, a service designed to intelligently schedule function-based workloads across a distributed set of heterogeneous computing resources. Delta implements an extensible architecture in which different predictors and scheduling algorithms can be integrated to provide dynamically evolving estimates of function execution times on different resources—estimates that can be used to determine the most appropriate location for execution. We describe predictors for function runtime, data transfer time, and cold-start resource provisioning and configuration delay; dynamic learning methods that update predictor models over time; and scheduling strategies that take into account both function and endpoint information. We show that these methods can halve workload makespan when compared with a strategy that selects the fastest resource, and decrease makespan by a factor of five when compared to a round robin strategy, when deployed on a heterogeneous testbed with resources ranging from a Raspberry Pi to a GPU node in an academic cloud.
@inproceedings{kumar2021coding, author = {Kumar, Rohan and Baughman, Matt and Chard, Ryan and Li, Zhuozhao and Babuji, Yadu and Foster, Ian and Chard, Kyle}, booktitle = {2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)}, title = {Coding the Computing Continuum: Fluid Function Execution in Heterogeneous Computing Environments}, year = {2021}, volume = {}, number = {}, pages = {66-75}, keywords = {}, doi = {10.1109/IPDPSW52791.2021.00018}, url = {https://ieeexplore.ieee.org/abstract/document/9460607}, pdf = {IPDPSW-kumar-delta-2021.pdf}, issn = {}, bibtex_show = {true}, month = jun, abbr = {IPDPSW} }
- IPDPSLightweight Function Monitors for Fine-Grained Management in Large Scale Python ApplicationsTim Shaffer, Zhuozhao Li, Ben Tovar, Yadu Babuji, TJ Dasso, Zoe Surma, Kyle Chard, Ian Foster, and Douglas ThainIn 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2021
Python has become a widely used programming language for research, not only for small one-off analyses, but also for complex application pipelines running at supercomputer-scale. Modern parallel programming frameworks for Python present users with a more granular unit of management than traditional Unix processes and batch submissions: the Python function. We review the challenges involved in running native Python functions at scale, and present techniques for dynamically determining a minimal set of dependencies and for assembling a lightweight function monitor (LFM) that captures the software environment and manages resources at the granularity of single functions. We evaluate these techniques in a range of environments, from campus cluster to supercomputer, and show that our advanced dependency management planning and dynamic resource management methods provide superior performance and utilization relative to coarser-grained management approaches, achieving several-fold decrease in execution time for several large Python applications.
@inproceedings{shaffer2021lightweight, author = {Shaffer, Tim and Li, Zhuozhao and Tovar, Ben and Babuji, Yadu and Dasso, TJ and Surma, Zoe and Chard, Kyle and Foster, Ian and Thain, Douglas}, booktitle = {2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)}, title = {Lightweight Function Monitors for Fine-Grained Management in Large Scale Python Applications}, year = {2021}, volume = {}, number = {}, pages = {786-796}, keywords = {}, doi = {10.1109/IPDPS49936.2021.00088}, url = {https://ieeexplore.ieee.org/abstract/document/9460530}, pdf = {IPDPS-shaffer-lfm-2021.pdf}, bibtex_show = {true}, issn = {1530-2075}, abbr = {IPDPS}, selected = {true}, month = may }
- ICPPImpeccable: Integrated modeling pipeline for covid cure by assessing better leadsAymen Al Saadi, Dario Alfe, Yadu Babuji, Agastya Bhati, Ben Blaiszik, Thomas Brettin, Kyle Chard, Ryan Chard, Peter Coveney, Anda Trifan, Alex Brace, Austin Clyde, Ian Foster, Tom Gibbs, Shantenu Jha, Kristopher Keipert, Thorsten Kurth, Dieter Kranzlmüller, Hyungro Lee, Zhuozhao Li, Heng Ma, Andre Merzky, Gerald Mathias, Alexander Partin, Junqi Yin, Arvind Ramanathan, Ashka Shah, Abraham Stern, Rick Stevens, Li Tan, Mikhail Titov, Aristeidis Tsaris, Matteo Turilli, Huub Van Dam, Shunzhou Wan, and David WiflingIn Proceedings of the 50th International Conference on Parallel Processing, 2021
The drug discovery process currently employed in the pharmaceutical industry typically requires about 10 years and $2-3 billion to deliver one new drug. This is both too expensive and too slow, especially in emergencies like the COVID-19 pandemic. In silicomethodologies need to be improved to better select lead compounds that can proceed to later stages of the drug discovery protocol accelerating the entire process. No single methodological approach can achieve the necessary accuracy with required efficiency. Here we describe multiple algorithmic innovations to overcome this fundamental limitation, development and deployment of computational infrastructure at scale integrates multiple artificial intelligence and simulation-based approaches. Three measures of performance are:(i) throughput, the number of ligands per unit time; (ii) scientific performance, the number of effective ligands sampled per unit time and (iii) peak performance, in flop/s. The capabilities outlined here have been used in production for several months as the workhorse of the computational infrastructure to support the capabilities of the US-DOE National Virtual Biotechnology Laboratory in combination with resources from the EU Centre of Excellence in Computational Biomedicine.
@inproceedings{saadi2020impeccable, title = {Impeccable: Integrated modeling pipeline for covid cure by assessing better leads}, author = {Saadi, Aymen Al and Alfe, Dario and Babuji, Yadu and Bhati, Agastya and Blaiszik, Ben and Brettin, Thomas and Chard, Kyle and Chard, Ryan and Coveney, Peter and Trifan, Anda and Brace, Alex and Clyde, Austin and Foster, Ian and Gibbs, Tom and Jha, Shantenu and Keipert, Kristopher and Kurth, Thorsten and Kranzlmüller, Dieter and Lee, Hyungro and Li, Zhuozhao and Ma, Heng and Merzky, Andre and Mathias, Gerald and Partin, Alexander and Yin, Junqi and Ramanathan, Arvind and Shah, Ashka and Stern, Abraham and Stevens, Rick and Tan, Li and Titov, Mikhail and Tsaris, Aristeidis and Turilli, Matteo and Dam, Huub Van and Wan, Shunzhou and Wifling, David}, booktitle = {Proceedings of the 50th International Conference on Parallel Processing}, articleno = {40}, numpages = {12}, location = {Lemont, IL, USA}, series = {ICPP '21}, url = {https://doi.org/10.1145/3472456.3473524}, year = {2021}, abbr = {ICPP}, pdf = {ICPP-saadi-impeccable-2021.pdf}, selected = {false}, bibtex_show = {true} }
- HPDCA Serverless Framework for Distributed Bulk Metadata ExtractionTyler J. Skluzacek, Ryan Wong, Zhuozhao Li, Ryan Chard, Kyle Chard, and Ian FosterIn Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing, 2021
We introduce Xtract, an automated and scalable system for bulk metadata extraction from large, distributed research data repositories. Xtract orchestrates the application of metadata extractors to groups of files, determining which extractors to apply to each file and, for each extractor and file, where to execute. A hybrid computing model, built on the funcX federated FaaS platform, enables Xtract to balance tradeoffs between extraction time and data transfer costs by dispatching each extraction task to the most appropriate location. Experiments on a range of clouds and supercomputers show that Xtract can efficiently process multi-million-file repositories by orchestrating the concurrent execution of container-based extractors on thousands of nodes. We highlight the flexibility of Xtract by applying it to a large, semi-curated scientific data repository and to an uncurated scientific Google Drive repository. We show that by remotely orchestrating metadata extraction across decentralized storage and compute nodes, Xtract can process large repositories in 50% of the time it takes just to transfer the same data to a machine within the same computing facility. We also show that when transferring data is necessary (e.g., no local compute is available), Xtract can scale to process files as fast as they are received, even over a multi-GB/s network.
@inproceedings{tyler2021serverless, author = {Skluzacek, Tyler J. and Wong, Ryan and Li, Zhuozhao and Chard, Ryan and Chard, Kyle and Foster, Ian}, title = {A Serverless Framework for Distributed Bulk Metadata Extraction}, year = {2021}, isbn = {9781450382175}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3431379.3460636}, doi = {10.1145/3431379.3460636}, booktitle = {Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing}, pages = {7–18}, numpages = {12}, keywords = {serverless, storage, files, metadata extraction, search index}, location = {Virtual Event, Sweden}, series = {HPDC'21}, bibtex_show = {true}, pdf = {HPDC-skluzacek-xtract-2021.pdf}, abbr = {HPDC} }
2020
- HPDCFuncX: A Federated Function Serving Fabric for ScienceRyan Chard, Yadu Babuji, Zhuozhao Li, Tyler Skluzacek, Anna Woodard, Ben Blaiszik, Ian Foster, and Kyle ChardIn Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing, 2020
Exploding data volumes and velocities, new computational methods and platforms, and ubiquitous connectivity demand new approaches to computation in the sciences. These new approaches must enable computation to be mobile, so that, for example, it can occur near data, be triggered by events (e.g., arrival of new data), be offloaded to specialized accelerators, or run remotely where resources are available. They also require new design approaches in which monolithic applications can be decomposed into smaller components, that may in turn be executed separately and on the most suitable resources. To address these needs we present funcX—a distributed function as a service (FaaS) platform that enables flexible, scalable, and high performance remote function execution. funcX’s endpoint software can transform existing clouds, clusters, and supercomputers into function serving systems, while funcX’s cloud-hosted service provides transparent, secure, and reliable function execution across a federated ecosystem of endpoints. We motivate the need for funcX with several scientific case studies, present our prototype design and implementation, show optimizations that deliver throughput in excess of 1 million functions per second, and demonstrate, via experiments on two supercomputers, that funcX can scale to more than more than 130 000 concurrent workers.
@inproceedings{chard2020funcx, author = {Chard, Ryan and Babuji, Yadu and Li, Zhuozhao and Skluzacek, Tyler and Woodard, Anna and Blaiszik, Ben and Foster, Ian and Chard, Kyle}, title = {FuncX: A Federated Function Serving Fabric for Science}, year = {2020}, isbn = {9781450370523}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3369583.3392683}, doi = {10.1145/3369583.3392683}, booktitle = {Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing}, pages = {65–76}, numpages = {12}, keywords = {federated function serving, function as a service, funcX}, location = {Stockholm, Sweden}, series = {HPDC '20}, abbr = {HPDC}, pdf = {HPDC-chard-funcx-2020.pdf}, selected = {true}, bibtex_show = {true} }
- ICPPImpact of Memory DoS Attacks on Cloud Applications and Real-Time Detection SchemesZhuozhao Li, Tanmoy Sen, Haiying Shen, and Mooi Choo ChuahIn 49th International Conference on Parallel Processing - ICPP, 2020
Even though memory-based denial-of-service attacks can cause severe performance degradations on co-located virtual machines, a previous detection scheme against such attacks cannot accurately detect the attacks and also generates high detection delay and high performance overhead since it assumes that cache-related statistics of an application follow the same probability distribution at all times, which may not be true for all types of applications. In this paper, we present the experimental results showing the impacts of memory DoS attacks on different types of cloud-based applications. Based on these results, we propose two lightweight, responsive Statistical based Detection Schemes (SDS/B and SDS/P) that can detect such attacks accurately. SDS/B constructs a profile of normal range of cache-related statistics for all applications and use statistical methods to infer an attack when the real-time collected statistics exceed this normal range, while SDS/P exploits the increased periods of access patterns for periodic applications to infer an attack. Our evaluation results show that SDS/B and SDS/P outperform the state-of-the-art detection scheme, e.g., with 65% higher specificity, 40% shorter detection delay, and 7% less performance overhead.
@inproceedings{li2020impact, author = {Li, Zhuozhao and Sen, Tanmoy and Shen, Haiying and Chuah, Mooi Choo}, title = {Impact of Memory DoS Attacks on Cloud Applications and Real-Time Detection Schemes}, year = {2020}, isbn = {9781450388160}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3404397.3404465}, doi = {10.1145/3404397.3404465}, booktitle = {49th International Conference on Parallel Processing - ICPP}, articleno = {67}, numpages = {11}, keywords = {detection schemes, memory Denial-of-Service attack, virtual machine}, location = {Edmonton, AB, Canada}, series = {ICPP '20}, abbr = {ICPP}, pdf = {ICPP-li-mdos-2020.pdf}, selected = {true}, bibtex_show = {true} }
- CHEPReal-time HEP analysis with funcX, a high-performance platform for function as a serviceAnna Elizabeth Woodard, Ana Trisovic, Zhuozhao Li, Yadu Babuji, Ryan Chard, Tyler Skluzacek, Ben Blaiszik, Daniel S Katz, Ian Foster, and Kyle ChardIn 24th International Conference on Computing in High Energy and Nuclear Physics (CHEP 2019), EPJ Web of Conferences, 2020
We explore how the function as a service paradigm can be used to address the computing challenges in experimental high-energy physics at CERN. As a case study, we use funcX—a high-performance function as a service platform that enables intuitive, flexible, efficient, and scalable remote function execution on existing infrastructure—to parallelize an analysis operating on columnar data to aggregate histograms of analysis products of interest in real-time. We demonstrate efficient execution of such analyses on heterogeneous resources.
@inproceedings{woodard2020real, title = {Real-time HEP analysis with funcX, a high-performance platform for function as a service}, author = {Woodard, Anna Elizabeth and Trisovic, Ana and Li, Zhuozhao and Babuji, Yadu and Chard, Ryan and Skluzacek, Tyler and Blaiszik, Ben and Katz, Daniel S and Foster, Ian and Chard, Kyle}, booktitle = {24th International Conference on Computing in High Energy and Nuclear Physics (CHEP 2019), EPJ Web of Conferences}, volume = {245}, pages = {07046}, year = {2020}, url = {https://doi.org/10.1051/epjconf/202024507046}, organization = {EDP Sciences}, abbr = {CHEP}, pdf = {CHEP-woodard-funcx-2020.pdf}, bibtex_show = {true} }
- CIKMTime-Efficient Geo-Obfuscation to Protect Worker Location Privacy over Road Networks in Spatial CrowdsourcingChenxi Qiu, Anna Squicciarini, Zhuozhao Li, Ce Pang, and Li YanIn Proceedings of the 29th ACM International Conference on Information & Knowledge Management, 2020
To promote cost-effective task assignment in Spatial Crowdsourcing (SC), workers are required to report their location to servers, which raises serious privacy concerns. As a solution, geo-obfuscation has been widely used to protect the location privacy of SC workers, where workers are allowed to report perturbed location instead of the true location. Yet, most existing geo-obfuscation methods consider workers? mobility on a 2 dimensional (2D) plane, wherein workers can move in arbitrary directions. Unfortunately, 2D-based geo-obfuscation is likely to generate high traveling cost for task assignment over roads, as it cannot accurately estimate the traveling costs distortion caused by location obfuscation. In this paper, we tackle the SC worker location privacy problem over road networks. Considering the network-constrained mobility features of workers, we describe workers? mobility by a weighted directed graph, which considers the dynamic traffic condition and road network topology. Based on the graph model, we design a geo-obfuscation (GO) function for workers to maximize the workers? overall location privacy without compromising the task assignment efficiency. We formulate the problem of deriving the optimal GO function as a linear programming (LP) problem. By using the angular block structure of the LP’s constraint matrix, we apply Dantzig-Wolfe decomposition to improve the time-efficiency of the GO function generation. Our experimental results in the real-trace driven simulation and the real-world experiment demonstrate the effectiveness of our approach in terms of both privacy and task assignment efficiency.
@inproceedings{qiu2020time, author = {Qiu, Chenxi and Squicciarini, Anna and Li, Zhuozhao and Pang, Ce and Yan, Li}, title = {Time-Efficient Geo-Obfuscation to Protect Worker Location Privacy over Road Networks in Spatial Crowdsourcing}, year = {2020}, isbn = {9781450368599}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3340531.3411863}, doi = {10.1145/3340531.3411863}, booktitle = {Proceedings of the 29th ACM International Conference on Information & Knowledge Management}, pages = {1275–1284}, numpages = {10}, keywords = {geo-obfuscation, location privacy, spatial crowdsourcing}, location = {Virtual Event, Ireland}, series = {CIKM '20}, abbr = {CIKM}, bibtex_show = {true}, selected = {true}, pdf = {CIKM-qiu-obfuscation-2020.pdf} }
2019
- ICDCSCo-scheduler: Accelerating Data-Parallel Jobs in Datacenter Networks with Optical Circuit SwitchingZhuozhao Li, and Haiying ShenIn 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS), 2019
The optical circuit switch (OCS) in recently proposed hybrid electrical/optical datacenter networks (Hybrid-DCN) can only be used to transfer large flows (i.e., flows with a large size of data). Current job schedulers for data-parallel frameworks are not suitable for Hybrid-DCN, since they neither place tasks to aggregate data traffic to take advantage of OCS nor schedule tasks to minimize the Coflow completion time (CCT). In this paper, we propose Co-scheduler, a job scheduler for dataparallel frameworks that aims to improve job performance by attempting to place the tasks of a job to aggregate enough data traffic to take advantage of OCS and minimize the CCT in Hybrid-DCN. Specifically, for each job, Co-scheduler computes a guideline on the number of racks to place the job’s input data and to run the job’s map tasks, so that the job can potentially take full advantage of OCS to transfer its data. When the map tasks of a job complete, Co-scheduler computes all the possible schedules of the reduce tasks of the job. Each possible schedule includes the number of racks to schedule the reduce tasks that enables the job to use OCS to transfer its data, and the number of reduce tasks to place on each of the racks that minimizes CCT of the job. Next, Co-scheduler selects a best schedule among all the possible schedules so that the job completion time is minimized. Finally, Co-scheduler schedules the map tasks and reduce tasks of the job based on the computed guideline and best schedule. The evaluation demonstrates that compared to the state-of-theart schedulers, Co-scheduler achieves performance improvements on makespan, average job completion time, and average CCT by up to 51.2%, 54.6% and 73.6%, respectively.
@inproceedings{li2019co, author = {Li, Zhuozhao and Shen, Haiying}, booktitle = {2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS)}, title = {Co-scheduler: Accelerating Data-Parallel Jobs in Datacenter Networks with Optical Circuit Switching}, year = {2019}, volume = {}, number = {}, pages = {186-195}, keywords = {}, doi = {10.1109/ICDCS.2019.00027}, issn = {2575-8411}, month = jul, abbr = {ICDCS}, bibtex_show = {true}, url = {https://ieeexplore.ieee.org/abstract/document/8885103}, selected = {true}, pdf = {ICDCS-li-coscheduler-2019.pdf} }
- ICDCSRoad Gradient Estimation Using Smartphones: Towards Accurate Estimation on Fuel Consumption and Air Pollution Emission on RoadsLiuwang Kang, Haiying Shen, and Zhuozhao LiIn 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS), 2019
Accurate estimations on vehicle fuel consumption and pollution emission on roads are important for vehicle velocity optimization and driving route planning. Existing methods for such estimations only consider vehicle driving speed and acceleration but neglect the influence of road gradient. This is mainly because the road gradients for most road networks are not available and none of existing methods for road gradient estimation can be conducted inexpensively in practice and keep high road gradient estimation accuracy simultaneously. Thus, how to estimate the road gradient conveniently and accurately is an important but challenging problem. To handle this challenge, we propose a new road gradient estimation system which estimates the road gradient only using a smartphone. When a vehicle is driving, a smartphone in the vehicle continuously measures vehicle states (velocity, acceleration, steering rate, position), which are used to estimate the road gradient. To eliminate measuring noise and drift noise, the deviation between the measured value and estimated value is used to adjust the estimated value. Since measured vehicle states when a vehicle changes lane adversely influence the accuracy of road gradient estimation, we design lane change detection to eliminate such influences. Finally, given a group of road gradient estimates for a given route, we use the track fusion algorithm to further eliminate measuring noise and drift noise and improve road gradient estimation accuracy. We conducted driving experiments in a city area to evaluate our system. The experimental results show that our system’s estimation error is reduced by 22% compared with existing methods. The results also demonstrate the accuracy of our lane change detection. Finally, we integrated the road gradient values into vehicle fuel consumption and air pollution emission model to estimate fuel consumption and air pollution emission and found that the estimation values increase by 33.4% compared with the values without considering road gradient.
@inproceedings{kang2019road, author = {Kang, Liuwang and Shen, Haiying and Li, Zhuozhao}, booktitle = {2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS)}, title = {Road Gradient Estimation Using Smartphones: Towards Accurate Estimation on Fuel Consumption and Air Pollution Emission on Roads}, year = {2019}, volume = {}, number = {}, pages = {768-777}, keywords = {}, doi = {10.1109/ICDCS.2019.00081}, issn = {2575-8411}, month = jul, pdf = {ICDCS-kang-gradient-2019.pdf}, bibtex_show = {true}, url = {https://ieeexplore.ieee.org/abstract/document/8884795}, abbr = {ICDCS} }
- ICCCNAccelerating Big Data Analytics Using Scale-Up/Out Heterogeneous ClustersZhuozhao Li, Haiying Shen, and Lee WardIn 2019 28th International Conference on Computer Communication and Networks (ICCCN), 2019
Production data analytic workloads typically consist of a majority of jobs with small input data sizes and a small number of jobs with large input data sizes. Recent works advocate scale-up/scale-out heterogeneous clusters (in short Hybrid clusters) to handle these heterogeneous workloads, since scaleup machines (i.e., adding more resources to a single machine) can process small jobs faster than simply scaling out the cluster with cheap machines. However, there are several challenges for job placement and data placement to implement such a Hybrid cluster. In this paper, we propose a job placement strategy and a data placement strategy to solve the challenges. The job placement strategy places a job to either scale-up or scale-out machines based on the job’s characteristics, and migrates jobs from scale-up machines to under-utilized scale-out machines to achieve load balance. The data placement strategy allocates data replicas in the two types of machines accordingly to increase the data locality in Hybrid cluster. We implemented a Hybrid cluster on Apache YARN, and evaluated its performance using a Facebook production workload. With our proposed strategies, a Hybrid cluster can reduce the makespan of the workload up to 37% and the median job completion time up to 60%, compared to traditional scale-out clusters with state-of-the-art schedulers.
@inproceedings{li2019accelerating, author = {Li, Zhuozhao and Shen, Haiying and Ward, Lee}, booktitle = {2019 28th International Conference on Computer Communication and Networks (ICCCN)}, title = {Accelerating Big Data Analytics Using Scale-Up/Out Heterogeneous Clusters}, year = {2019}, volume = {}, number = {}, pages = {1-9}, keywords = {}, doi = {10.1109/ICCCN.2019.8847060}, issn = {2637-9430}, month = jul, pdf = {ICCCN-li-hybrid-2019.pdf}, bibtex_show = {true}, url = {https://ieeexplore.ieee.org/abstract/document/8847060}, abbr = {ICCCN} }
- IPDPSDLHub: Model and Data Serving for ScienceRyan Chard, Zhuozhao Li, Kyle Chard, Logan Ward, Yadu Babuji, Anna Woodard, Steven Tuecke, Ben Blaiszik, Michael J. Franklin, and Ian FosterIn 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2019
While the Machine Learning (ML) landscape is evolving rapidly, there has been a relative lag in the development of the “learning systems” needed to enable broad adoption. Furthermore, few such systems are designed to support the specialized requirements of scientific ML. Here we present the Data and Learning Hub for science (DLHub), a multi-tenant system that provides both model repository and serving capabilities with a focus on science applications. DLHub addresses two significant shortcomings in current systems. First, its self-service model repository allows users to share, publish, verify, reproduce, and reuse models, and addresses concerns related to model reproducibility by packaging and distributing models and all constituent components. Second, it implements scalable and low-latency serving capabilities that can leverage parallel and distributed computing resources to democratize access to published models through a simple web interface. Unlike other model serving frameworks, DLHub can store and serve any Python 3-compatible model or processing function, plus multiple-function pipelines. We show that relative to other model serving systems including TensorFlow Serving, SageMaker, and Clipper, DLHub provides greater capabilities, comparable performance without memoization and batching, and significantly better performance when the latter two techniques can be employed. We also describe early uses of DLHub for scientific applications.
@inproceedings{chard2019dlhub, author = {Chard, Ryan and Li, Zhuozhao and Chard, Kyle and Ward, Logan and Babuji, Yadu and Woodard, Anna and Tuecke, Steven and Blaiszik, Ben and Franklin, Michael J. and Foster, Ian}, booktitle = {2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)}, title = {DLHub: Model and Data Serving for Science}, year = {2019}, volume = {}, number = {}, pages = {283-292}, keywords = {}, doi = {10.1109/IPDPS.2019.00038}, issn = {1530-2075}, month = may, abbr = {IPDPS}, bibtex_show = {true}, url = {https://ieeexplore.ieee.org/abstract/document/8821027}, pdf = {IPDPS-chard-dlhub-2019.pdf} }
- HPDCParsl: Pervasive Parallel Programming in PythonYadu Babuji, Anna Woodard, Zhuozhao Li, Daniel S. Katz, Ben Clifford, Rohan Kumar, Lukasz Lacinski, Ryan Chard, Justin M. Wozniak, Ian Foster, Michael Wilde, and Kyle ChardIn Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing, 2019
High-level programming languages such as Python are increasingly used to provide intuitive interfaces to libraries written in lower-level languages and for assembling applications from various components. This migration towards orchestration rather than implementation, coupled with the growing need for parallel computing (e.g., due to big data and the end of Moore’s law), necessitates rethinking how parallelism is expressed in programs. Here, we present Parsl, a parallel scripting library that augments Python with simple, scalable, and flexible constructs for encoding parallelism. These constructs allow Parsl to construct a dynamic dependency graph of components that it can then execute efficiently on one or many processors. Parsl is designed for scalability, with an extensible set of executors tailored to different use cases, such as low-latency, high-throughput, or extreme-scale execution. We show, via experiments on the Blue Waters supercomputer, that Parsl executors can allow Python scripts to execute components with as little as 5 ms of overhead, scale to more than 250000 workers across more than 8000 nodes, and process upward of 1200 tasks per second. Other Parsl features simplify the construction and execution of composite programs by supporting elastic provisioning and scaling of infrastructure, fault-tolerant execution, and integrated wide-area data management. We show that these capabilities satisfy the needs of many-task, interactive, online, and machine learning applications in fields such as biology, cosmology, and materials science.
@inproceedings{babuji2019parsl, author = {Babuji, Yadu and Woodard, Anna and Li, Zhuozhao and Katz, Daniel S. and Clifford, Ben and Kumar, Rohan and Lacinski, Lukasz and Chard, Ryan and Wozniak, Justin M. and Foster, Ian and Wilde, Michael and Chard, Kyle}, title = {Parsl: Pervasive Parallel Programming in Python}, year = {2019}, isbn = {9781450366700}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3307681.3325400}, doi = {10.1145/3307681.3325400}, booktitle = {Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing}, pages = {25–36}, numpages = {12}, keywords = {parsl, parallel programming, python}, location = {Phoenix, AZ, USA}, series = {HPDC '19}, abbr = {HPDC}, pdf = {HPDC-babuji-parsl-2019.pdf}, selected = {true}, bibtex_show = {true} }
- ICPPJobPacker: Job Scheduling for Data-Parallel Frameworks with Hybrid Electrical/Optical Datacenter NetworksZhuozhao Li, and Haiying ShenIn Proceedings of the 48th International Conference on Parallel Processing, 2019
In spite of many advantages of hybrid electrical/optical datacenter networks (Hybrid-DCN), current job schedulers for data-parallel frameworks are not suitable for Hybrid-DCN, since the schedulers do not aggregate data traffic to facilitate using optical circuit switch (OCS). In this paper, we propose JobPacker, a job scheduler for data-parallel frameworks in Hybrid-DCN that aims to take full advantage of OCS to improve job performance. JobPacker aggregates the data transfers of a job in order to use OCS to improve data transfer efficiency. It first explores the tradeoff between parallelism and traffic aggregation for each shuffle-heavy recurring job, and then generates an offline schedule including which racks to run each job and the sequence to run the recurring jobs in each rack that yields the best performance. It has a new sorting method to prioritize recurring jobs in offline-scheduling to prevent high resource contention while fully utilizing cluster resources. In real-time scheduler, JobPacker uses the offline schedule to guide the data placement and schedule recurring jobs, and schedules non-recurring jobs to the idle resources not assigned to recurring jobs. Trace-driven simulation and GENI-based emulation show that JobPacker reduces the makespan up to 49% and the median completion time up to 43%, compared to the state-of-the-art schedulers in Hybrid-DCN.
@inproceedings{li2019jobpacker, author = {Li, Zhuozhao and Shen, Haiying}, title = {JobPacker: Job Scheduling for Data-Parallel Frameworks with Hybrid Electrical/Optical Datacenter Networks}, year = {2019}, isbn = {9781450362955}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3337821.3337880}, doi = {10.1145/3337821.3337880}, booktitle = {Proceedings of the 48th International Conference on Parallel Processing}, articleno = {30}, numpages = {10}, location = {Kyoto, Japan}, series = {ICPP 2019}, abbr = {ICPP}, bibtex_show = {true}, selected = {true}, pdf = {ICPP-li-jobpacker-2019.pdf} }
- PEARCPublishing and Serving Machine Learning Models with DLHubRyan Chard, Logan Ward, Zhuozhao Li, Yadu Babuji, Anna Woodard, Steven Tuecke, Kyle Chard, Ben Blaiszik, and Ian FosterIn Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (Learning), 2019
In this paper we introduce the Data and Learning Hub for Science (DLHub). DLHub serves as a nexus for publishing, sharing, discovering, and reusing machine learning models. It provides a flexible publication platform that enables researchers to describe and deposit models by associating publication and model-specific metadata and assigning a persistent identifier for subsequent citation. DLHub also supports scalable model inference, allowing researchers to execute inference tasks using a distributed execution engine, containerized models, and Kubernetes. Here we describe DLHub and present four scientific use cases that illustrate how DLHub can be used to reliably, efficiently, and scalably integrate ML into scientific processes.
@inproceedings{chard2019publishing, author = {Chard, Ryan and Ward, Logan and Li, Zhuozhao and Babuji, Yadu and Woodard, Anna and Tuecke, Steven and Chard, Kyle and Blaiszik, Ben and Foster, Ian}, title = {Publishing and Serving Machine Learning Models with DLHub}, year = {2019}, isbn = {9781450372275}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3332186.3332246}, doi = {10.1145/3332186.3332246}, booktitle = {Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (Learning)}, articleno = {73}, numpages = {7}, location = {Chicago, IL, USA}, series = {PEARC '19}, abbr = {PEARC}, bibtex_show = {true}, pdf = {PEARC-chard-dlhub-2019.pdf} }
- PEARCScalable Parallel Programming in Python with ParslYadu Babuji, Anna Woodard, Zhuozhao Li, Daniel S. Katz, Ben Clifford, Ian Foster, Michael Wilde, and Kyle ChardIn Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (Learning), 2019
Python is increasingly the lingua franca of scientific computing. It is used as a higher level language to wrap lower-level libraries and to compose scripts from various independent components. However, scaling and moving Python programs from laptops to supercomputers remains a challenge. Here we present Parsl, a parallel scripting library for Python. Parsl makes it straightforward for developers to implement parallelism in Python by annotating functions that can be executed asynchronously and in parallel, and to scale analyses from a laptop to thousands of nodes on a supercomputer or distributed system. We examine how Parsl is implemented, focusing on syntax and usage. We describe two scientific use cases in which Parsl’s intuitive and scalable parallelism is used.
@inproceedings{babuji2019scalable, author = {Babuji, Yadu and Woodard, Anna and Li, Zhuozhao and Katz, Daniel S. and Clifford, Ben and Foster, Ian and Wilde, Michael and Chard, Kyle}, title = {Scalable Parallel Programming in Python with Parsl}, year = {2019}, isbn = {9781450372275}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3332186.3332231}, doi = {10.1145/3332186.3332231}, booktitle = {Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (Learning)}, articleno = {22}, numpages = {8}, location = {Chicago, IL, USA}, series = {PEARC '19}, abbr = {PEARC}, bibtex_show = {true}, pdf = {PEARC-babuji-parsl-2019.pdf} }
- WOSCServerless Workflows for Indexing Large Scientific DataTyler J. Skluzacek, Ryan Chard, Ryan Wong, Zhuozhao Li, Yadu N. Babuji, Logan Ward, Ben Blaiszik, Kyle Chard, and Ian FosterIn Proceedings of the 5th International Workshop on Serverless Computing, 2019
The use and reuse of scientific data is ultimately dependent on the ability to understand what those data represent, how they were captured, and how they can be used. In many ways, data are only as useful as the metadata available to describe them. Unfortunately, due to growing data volumes, large and distributed collaborations, and a desire to store data for long periods of time, scientific "data lakes" quickly become disorganized and lack the metadata necessary to be useful to researchers. New automated approaches are needed to derive metadata from scientific files and to use these metadata for organization and discovery. Here we describe one such system, Xtract, a service capable of processing vast collections of scientific files and automatically extracting metadata from diverse file types. Xtract relies on function as a service models to enable scalable metadata extraction by orchestrating the execution of many, short-running extractor functions. To reduce data transfer costs, Xtract can be configured to deploy extractors centrally or near to the data (i.e., at the edge). We present a prototype implementation of Xtract and demonstrate that it can derive metadata from a 7 TB scientific data repository.
@inproceedings{skluzacek2019serverless, author = {Skluzacek, Tyler J. and Chard, Ryan and Wong, Ryan and Li, Zhuozhao and Babuji, Yadu N. and Ward, Logan and Blaiszik, Ben and Chard, Kyle and Foster, Ian}, title = {Serverless Workflows for Indexing Large Scientific Data}, year = {2019}, isbn = {9781450370387}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3366623.3368140}, doi = {10.1145/3366623.3368140}, booktitle = {Proceedings of the 5th International Workshop on Serverless Computing}, pages = {43–48}, numpages = {6}, keywords = {file systems, materials science, metadata extraction, serverless, data lakes}, location = {Davis, CA, USA}, series = {WOSC '19}, abbr = {WOSC}, pdf = {WOSC-skluzacek-xtract-2019.pdf}, bibtex_show = {true} }
- CloudComParaopt: Automated application parameterization and optimization for the cloudChaofeng Wu, Ted Summer, Zhuozhao Li, Anna Woodard, Ryan Chard, Matt Baughman, Yadu Babuji, Kyle Chard, Jason Pitt, and Ian FosterIn 2019 IEEE International Conference on Cloud Computing Technology and Science (CloudCom), 2019
The variety of instance types available on cloud platforms offers enormous flexibility to match the requirements of applications with available resources. However, selecting the most suitable instance type and configuring an application to optimally execute on that instance type can be complicated and time-consuming. For example, application parallelism flags must match available cores and problem sizes must be tuned to match available memory. As the search space of application configurations can be enormous, we propose an automated approach, called ParaOpt, to automatically explore and tune application configurations on arbitrary cloud instances. ParaOpt supports arbitrary applications, enables use of custom optimization methods, and can be configured with different optimization targets such as runtime and cost. We evaluate ParaOpt by optimizing genomics, molecular dynamics, and machine learning applications with four types of optimizers. We show with as few as 15 parameterized executions of an application, representing between 1.2%-26.7% of the search space, that ParaOpt is able to identify the optimal configuration in 32.7% of experiments and a near-optimal configuration in 83.2% of cases. As a result of using near-optimal configurations, ParaOpt reduces overall execution time by up to 85.8% when compared with using the default configuration.
@inproceedings{wu2019paraopt, title = {Paraopt: Automated application parameterization and optimization for the cloud}, author = {Wu, Chaofeng and Summer, Ted and Li, Zhuozhao and Woodard, Anna and Chard, Ryan and Baughman, Matt and Babuji, Yadu and Chard, Kyle and Pitt, Jason and Foster, Ian}, booktitle = {2019 IEEE International Conference on Cloud Computing Technology and Science (CloudCom)}, pages = {255--262}, year = {2019}, organization = {IEEE}, abbr = {CloudCom}, pdf = {CloudCom-wu-paraopt-2019.pdf}, bibtex_show = {true}, url = {https://ieeexplore.ieee.org/abstract/document/8968866}, doi = {10.1109/CloudCom.2019.00045} }
2018
- ICDCSApproaches for Resilience against Cascading Failures in Cloud DatacentersHaoyu Wang, Haiying Shen, and Zhuozhao LiIn 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS), 2018
In a modern cloud datacenter, a cascading failure will cause many Service Level Objective (SLO) violations. In a cascading failure, when a set of physical machines (PMs) in a failure domain are failed, their workloads are transferred to the PMs in another failure domain to continue. However, the new domain receiving additional workloads may become overloaded due to the resource oversubscription feature in the cloud, which easily leads to domain failures and subsequent workload transfer to other domains. This process repeats and a cascading failure is created finally. However, few previous methods can effectively handle the cascading failures. To handle this problem, we propose a Cascading Failure Resilience System (CFRS), which incorporates three methods: Overload-Avoidance VM Reassignment (OAVR), VM backup set placement (VMset) and Dynamic Oversubscription Ratio Adjustment (DOA). The experiments in trace-driven simulation show that CFRS outperforms other comparison methods in terms of the number of domain failures, the number of failed PMs and the number of SLO violations.
@inproceedings{wang2018approaches, author = {Wang, Haoyu and Shen, Haiying and Li, Zhuozhao}, booktitle = {2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS)}, title = {Approaches for Resilience against Cascading Failures in Cloud Datacenters}, year = {2018}, volume = {}, number = {}, pages = {706-717}, keywords = {}, doi = {10.1109/ICDCS.2018.00074}, issn = {2575-8411}, month = jul, abbr = {ICDCS}, bibtex_show = {true}, url = {https://ieeexplore.ieee.org/abstract/document/8416337}, pdf = {ICDCS-wang-approach-2018.pdf} }
- ICDCSPageRankVM: A PageRank Based Algorithm with Anti-Collocation Constraints for Virtual Machine Placement in Cloud DatacentersZhuozhao Li, Haiying Shen, and Cole MilesIn 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS), 2018
There is a dramatic increase in the variety of virtual machines (VMs) and complexity of VM placement problems in clouds. Previous VM placement approaches attempt to accommodate more VMs efficiently on fewer PMs by balancing the resource usages across multiple dimensions. However, these approaches are not sufficiently accurate in measuring the quality of the PMs in terms of fully utilizing PM resource and having the potential to accommodate more VMs. Therefore, it is critical to design a new method that can more accurately measure the probability of a PM of fully utilizing its resources after accommodating a given VM with the consideration of different types of VMs. In addition, anti-collocation constraints must be handled efficiently. We propose a PageRank based VM placement algorithm with anti-collocation constraints (PageRankVM). PageRankVM defines the best PM resource usage profile, which means that the PM has full resource utilization for every resource dimension, and then ranks PM profiles according to their convergence of transferring (by accommodating VMs) to the best profile. PageRankVM then places a given VM to the PM based on the ranks of the resulted PM profiles after accommodating the VM with the consideration of anti-collocation constraints. Compared to previous approaches, PageRankVM effectively measures the ability of different PM profiles to reach the best profiles by accommodating a given VM, and hence differentiates the effectiveness of different VM placement decisions. We conducted extensive trace-driven simulation and GENI testbed experiments and demonstrated that PageRankVM has superior performance compared with other methods in terms of reducing the number of PMs, the energy consumption, the number of VM migrations, and the service level objective (SLO) violations.
@inproceedings{li2018pagerankvm, author = {Li, Zhuozhao and Shen, Haiying and Miles, Cole}, booktitle = {2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS)}, title = {PageRankVM: A PageRank Based Algorithm with Anti-Collocation Constraints for Virtual Machine Placement in Cloud Datacenters}, year = {2018}, volume = {}, number = {}, pages = {634-644}, keywords = {}, doi = {10.1109/ICDCS.2018.00068}, issn = {2575-8411}, month = jul, abbr = {ICDCS}, bibtex_show = {true}, url = {https://ieeexplore.ieee.org/abstract/document/8416331}, selected = {true}, pdf = {ICDCS-li-pagerankvm-2018.pdf} }
- CCGridA Network-Aware Scheduler in Data-Parallel Clusters for High PerformanceZhuozhao Li, Haiying Shen, and Ankur SarkerIn 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), 2018
In spite of many shuffle-heavy jobs in current commercial data-parallel clusters, few previous studies have considered the network traffic in the shuffle phase, which contains a large amount of data transfers and may adversely affect the cluster performance. In this paper, we propose a network-aware scheduler (NAS) that handles two main challenges associated with the shuffle phase for high performance: i) balancing cross-node network load, and ii) avoiding and reducing cross-rack network congestion. NAS consists of three main mechanisms: i) map task scheduling (MTS), ii) congestion-avoidance reduce task scheduling (CA-RTS) and iii) congestion-reduction reduce task scheduling (CR-RTS). MTS constrains the shuffle data on each node when scheduling the map tasks to balance the cross-node network load. CA-RTS distributes the reduce tasks for each job based on the distribution of its shuffle data among the racks in order to minimize cross-rack traffic. When the network is congested, CR-RTS schedules reduce tasks that generate negligible shuffle traffic to reduce the congestion. We implemented NAS in Hadoop on a cluster. Our trace-driven simulation and real cluster experiment demonstrate the superior performance of NAS on improving the throughput (up to 62%), reducing the average job execution time (up to 44%) and reducing the cross-rack traffic (up to 40%) compared with state-of-the-art schedulers.
@inproceedings{li2018network, author = {Li, Zhuozhao and Shen, Haiying and Sarker, Ankur}, booktitle = {2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)}, title = {A Network-Aware Scheduler in Data-Parallel Clusters for High Performance}, year = {2018}, volume = {}, number = {}, pages = {1-10}, keywords = {}, doi = {10.1109/CCGRID.2018.00015}, issn = {}, month = may, abbr = {CCGrid}, bibtex_show = {true}, url = {https://ieeexplore.ieee.org/abstract/document/8411004}, pdf = {CCGrid-li-nas-2018.pdf} }
2017
- ICDCSOpportunistic Energy Sharing Between Power Grid and Electric Vehicles: A Game Theory-Based Pricing PolicyAnkur Sarker, Zhuozhao Li, William Kolodzey, and Haiying ShenIn 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), 2017
Electric vehicles (EVs) have great potential to reduce dependency on fossil fuels. The recent surge in the development of online EV (OLEV) will help to address the drawbacks associated with current generation EVs, such as the heavy and expensive batteries. OLEVs are integrated with the smart grid of power infrastructure through a wireless power transfer system (WPT) to increase the driving range of the OLEV. However, the integration of OLEVs with the grid creates a tremendous load for the smart grid. The demand of a power grid changes over time and the price of power is not fixed throughout the day. There should be some congestion avoidance and load balancing policy implications to ensure quality of services for OLEVs. In this paper, first, we conduct an analysis to show the existence of unpredictable power load and congestion because of OLEVs. We use the Simulation for Urban MObility tool and hourly traffic counts of a road section of the New York City to analyze the amount of energy OLEVs can receive at different times of the day. Then, we present a game theory based on a distributed power schedule framework to find the optimal schedule between OLEVs and smart grid. In the proposed framework, OLEVs receive the amount of power charging from the smart grid based on a power payment function which is updated using best response strategy. We prove that the updated power requests converge to the optimal power schedule. In this way, the smart grid maximizes the social welfare of OLEVs, which is defined as mixed consideration of total satisfaction and its power charging cost. Finally, we verify the performance of our proposed pricing policy under different scenarios in a simulation study.
@inproceedings{sarker2017opportunistic, author = {Sarker, Ankur and Li, Zhuozhao and Kolodzey, William and Shen, Haiying}, booktitle = {2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)}, title = {Opportunistic Energy Sharing Between Power Grid and Electric Vehicles: A Game Theory-Based Pricing Policy}, year = {2017}, volume = {}, number = {}, pages = {1197-1207}, keywords = {}, doi = {10.1109/ICDCS.2017.219}, issn = {1063-6927}, month = jun, abbr = {ICDCS}, bibtex_show = {true}, url = {https://ieeexplore.ieee.org/abstract/document/7980060}, pdf = {ICDCS-sarker-opportunistic-2017.pdf} }
- SoCCJob Scheduling for Data-Parallel Frameworks with Hybrid Electrical/Optical Datacenter NetworksZhuozhao Li, and Haiying ShenIn Proceedings of the 2017 Symposium on Cloud Computing, poster, 2017
In spite of many advantages of hybrid electrical/optical datacenter networks (Hybrid-DCN), current job schedulers for data-parallel frameworks are not suitable for Hybrid-DCN, since the schedulers do not aggregate data traffic to facilitate using optical circuit switch (OCS). We propose SchedOCS, a job scheduler for data-parallel frameworks in Hybrid-DCN that aims to take full advantage of the OCS to improve the job performance.
@inproceedings{li2017job, author = {Li, Zhuozhao and Shen, Haiying}, title = {Job Scheduling for Data-Parallel Frameworks with Hybrid Electrical/Optical Datacenter Networks}, year = {2017}, isbn = {9781450350280}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3127479.3132694}, doi = {10.1145/3127479.3132694}, booktitle = {Proceedings of the 2017 Symposium on Cloud Computing, poster}, pages = {662}, numpages = {1}, keywords = {optical circuit switch, traffic aggregation, parallelism}, location = {Santa Clara, California}, series = {SoCC '17}, abbr = {SoCC}, bibtex_show = {true}, pdf = {SOCC-li-ocs-2017.pdf} }
2016
- CloudComGame Theory-Based Nonlinear Bandwidth Pricing for Congestion Control in Cloud NetworksAbouzar Ghavami, Zhuozhao Li, and Haiying ShenIn 2016 IEEE International Conference on Cloud Computing Technology and Science (CloudCom), 2016
In the cloud, the network links are shared among tenants, which makes them easy to get fully congested (overloaded). Overloaded links degrade the performance of tenants’ applications, and impose additional costs to the cloud provider. In this paper, we propose a nonlinear bandwidth pricing policy for congestion control in the cloud network. In order to maximize social welfare (i.e., maximize the total satisfaction of the tenants while minimizing the congestion over the link), the cloud provider uses the nonlinear pricing policy that increases the unit price with increment of bandwidth usage. Each tenant competes for bandwidth allocation to maximize its utility (i.e., both maximize its own individual satisfaction and minimize its bandwidth payment cost). We design a game between tenants and the cloud provider, and show that there exists a unique optimal bandwidth schedule (Nash equilibrium) that jointly maximizes the social welfare and the utility of each tenant at the same time. In order to find the optimal schedule, we use an asynchronous-based best response strategy, in which each tenant updates its optimal bandwidth allocation based on the updated bandwidth payment function from the cloud provider. We prove that the updated bandwidth allocations converge to the optimal bandwidth schedule. In our simulation study and real implementation, we verify the performance of our proposed pricing mechanism under different scenarios.
@inproceedings{ghavami2016game, author = {Ghavami, Abouzar and Li, Zhuozhao and Shen, Haiying}, booktitle = {2016 IEEE International Conference on Cloud Computing Technology and Science (CloudCom)}, title = {Game Theory-Based Nonlinear Bandwidth Pricing for Congestion Control in Cloud Networks}, year = {2016}, volume = {}, number = {}, pages = {214-221}, keywords = {}, doi = {10.1109/CloudCom.2016.0045}, issn = {2330-2186}, month = dec, abbr = {CloudCom}, bibtex_show = {true}, url = {https://ieeexplore.ieee.org/abstract/document/7830686}, pdf = {CloudCom-abouzar-pricing-2016.pdf} }
- CloudComGoodbye to Fixed Bandwidth Reservation: Job Scheduling with Elastic Bandwidth Reservation in CloudsHaiying Shen, Lei Yu, Liuhua Chen, and Zhuozhao LiIn 2016 IEEE International Conference on Cloud Computing Technology and Science (CloudCom), 2016
The shared nature of cloud network infrastructures causes unpredictable network performance, which may degrade the performance of these applications. Recently, several works propose to explicitly reserve the network bandwidth in the cloud with virtual network abstraction models, which pre-specify the network bandwidth between virtual machines (VMs) for a tenant job. However, the pre-specification fails to exploit the elastic feature of the bandwidth resource (i.e., more reserved bandwidth within no-elongation threshold bandwidth leads to shorter job execution time and vice versa) in job scheduling. It is difficult for ordinary tenants (without specialized network knowledge) to estimate the exact needed bandwidth. In this paper, we propose a new cloud job scheduler, in which each tenant only needs to specify job deadline and each job’s reserved bandwidth is elastically determined by leveraging the elastic feature to maximize the total job rewards, which represent the worth of successful completion by deadlines. Finally, the scheduler tries to reduce the execution time of each job. It also jointly considers the computational capacity of VMs and reserved VM bandwidth in job scheduling. Using trace-driven and real cluster experiments, we show the efficiency and effectiveness of our job scheduler in comparison with other scheduling strategies.
@inproceedings{shen2016goodbye, author = {Shen, Haiying and Yu, Lei and Chen, Liuhua and Li, Zhuozhao}, booktitle = {2016 IEEE International Conference on Cloud Computing Technology and Science (CloudCom)}, title = {Goodbye to Fixed Bandwidth Reservation: Job Scheduling with Elastic Bandwidth Reservation in Clouds}, year = {2016}, volume = {}, number = {}, pages = {1-8}, keywords = {}, doi = {10.1109/CloudCom.2016.0017}, issn = {2330-2186}, month = dec, abbr = {CloudCom}, bibtex_show = {true}, url = {https://ieeexplore.ieee.org/abstract/document/7830658}, pdf = {CloudCom-shen-goodbye-2016.pdf} }
- BigDataComparing application performance on HPC-based Hadoop platforms with local storage and dedicated storageZhuozhao Li, Haiying Shen, Jeffrey Denton, and Walter LigonIn 2016 IEEE International Conference on Big Data (Big Data), 2016
Many high-performance computing (HPC) sites extend their clusters to support Hadoop MapReduce for a variety of applications. However, HPC cluster differs from Hadoop cluster on the configurations of storage resources. In the Hadoop Distributed File System (HDFS), data resides on the compute nodes, while in the HPC cluster, data is stored on separate nodes dedicated to storage. Dedicated storage offloads I/O load from the compute nodes and provides more powerful storage. Local storage provides better locality and avoids contention for shared storage resources. To gain an insight of the two platforms, in this paper, we investigate the performance and resource utilization of different types (i.e., I/O-intensive, data-intensive and CPU-intensive) of applications on the HPC-based Hadoop platforms with local storage and dedicated storage. We find that the I/O-intensive and data-intensive applications with large input file size can benefit more from the dedicated storage, while these applications with small input file size can benefit more from the local storage. CPU-intensive applications with a large number of small-size input files benefit more from the local storage, while these applications with large-size input files benefit approximately equally from the two platforms. We verify our findings by trace-driven experiments on different types of jobs from the Facebook synthesized trace. This work provides guidance on choosing the best platform to optimize the performance of different types of applications and reduce system overhead.
@inproceedings{li2016comparing, author = {Li, Zhuozhao and Shen, Haiying and Denton, Jeffrey and Ligon, Walter}, booktitle = {2016 IEEE International Conference on Big Data (Big Data)}, title = {Comparing application performance on HPC-based Hadoop platforms with local storage and dedicated storage}, year = {2016}, volume = {}, number = {}, pages = {233-242}, keywords = {}, doi = {10.1109/BigData.2016.7840609}, issn = {}, month = dec, abbr = {BigData}, bibtex_show = {true}, url = {https://ieeexplore.ieee.org/abstract/document/7840609}, pdf = {BigData-li-comparing-2016.pdf} }
- ICCCNLearning Network Graph of SIR Epidemic Cascades Using Minimal Hitting Set Based ApproachZhuozhao Li, Haiying Shen, and Kang ChenIn 2016 25th International Conference on Computer Communication and Networks (ICCCN), 2016
We consider learning the underlying graph structure of a network in which infection spreads based on the observations of node infection times. We give an algorithm based on minimal hitting set to learn the exact underlying graph structure and provide sufficient condition on number of cascades required (i.e. sample complexity) for reliable recovery, which is shown to be O(logn), where n is the number of nodes in the graph. We then analytically evaluate performance of minimal hitting set approach in learning the degree distribution and detecting leaf nodes of a graph and provide a sufficient condition for its sample complexity which is shown to be lower than that of learning the whole graph. We also generalize the exact graph estimation problem to the problem of estimating the graph within a certain distortion, measured by edit distance. We show that this edit distance based graph estimator has a lower sample complexity. Our experimental results based on both synthetic network topologies and a real-world network trace show that our algorithm achieves superior performance than a previously proposed algorithm based on maximum likelihood.
@inproceedings{li2016learning, author = {Li, Zhuozhao and Shen, Haiying and Chen, Kang}, booktitle = {2016 25th International Conference on Computer Communication and Networks (ICCCN)}, title = {Learning Network Graph of SIR Epidemic Cascades Using Minimal Hitting Set Based Approach}, year = {2016}, volume = {}, number = {}, pages = {1-9}, keywords = {}, doi = {10.1109/ICCCN.2016.7568537}, issn = {}, month = aug, abbr = {ICCCN}, bibtex_show = {true}, url = {https://ieeexplore.ieee.org/abstract/document/7568537}, pdf = {ICCCN-li-sir-2016.pdf} }
- CloudPerformance Measurement on Scale-Up and Scale-Out Hadoop with Remote and Local File SystemsZhuozhao Li, and Haiying ShenIn 2016 IEEE 9th International Conference on Cloud Computing (CLOUD), 2016
MapReduce is a popular computing model for parallel data processing on large-scale datasets, which can vary from gigabytes to terabytes and petabytes. Though Hadoop MapReduce normally uses Hadoop Distributed File System (HDFS) local file system, it can be configured to use a remote file system. Then, an interesting question is raised: for a given application, which is the best running platform among the different combinations of scale-up and scale-out Hadoop with remote and local file systems. However, there has been no previous research on how different types of applications (e.g., CPU-intensive, data-intensive) with different characteristics (e.g., input data size) can benefit from the different platforms. Thus, in this paper, we conduct a comprehensive performance measurement of different applications on scale-up and scaleout clusters configured with HDFS and a remote file system (i.e., OFS), respectively. We identify and study how different job characteristics (e.g., input data size, the number of file reads/writes, and the amount of computations) affect the performance of different applications on the different platforms. This study is expected to provide a guidance for users to choose the best platform to run different applications with different characteristics in the environment that provides both remote and local storage, such as HPC cluster.
@inproceedings{li2016performance, author = {Li, Zhuozhao and Shen, Haiying}, booktitle = {2016 IEEE 9th International Conference on Cloud Computing (CLOUD)}, title = {Performance Measurement on Scale-Up and Scale-Out Hadoop with Remote and Local File Systems}, year = {2016}, volume = {}, number = {}, pages = {456-463}, keywords = {}, doi = {10.1109/CLOUD.2016.0067}, issn = {2159-6190}, month = jun, abbr = {Cloud}, bibtex_show = {true}, url = {https://ieeexplore.ieee.org/abstract/document/7820304}, pdf = {Cloud-li-measurement-2016.pdf} }
- CloudOn-Demand Bandwidth Pricing for Congestion Control in Core Switches in Cloud NetworksAbouzar Ghavami, Zhuozhao Li, and Haiying ShenIn 2016 IEEE 9th International Conference on Cloud Computing (CLOUD), 2016
The cloud networks use switches to transfer inbound and outbound traffic through the data centers. Access of multiple tenants to the limited bandwidth capacity over the network switches increases the data traffic congestion in the network. The highly congested switches are vulnerable to get overloaded, and consequently slow down the flow of data traffic in the network. This paper proposes a nonlinear pricing policy for on-demand bandwidth allocation that jointly maximizes the total satisfaction of tenants and minimizes the congestion in the core switches. The optimal schedule is found through the best response strategy, in which each tenant updates its bandwidth allocation at each step based on the updated load-dependent predetermined nonlinear bandwidth pricing functions. The updated bandwidth allocations converge to the optimal bandwidth schedule that balances the load over the core switches. The performance of proposed pricing policy is evaluated under different scenarios.
@inproceedings{ghavami2016demand, author = {Ghavami, Abouzar and Li, Zhuozhao and Shen, Haiying}, booktitle = {2016 IEEE 9th International Conference on Cloud Computing (CLOUD)}, title = {On-Demand Bandwidth Pricing for Congestion Control in Core Switches in Cloud Networks}, year = {2016}, volume = {}, number = {}, pages = {867-870}, keywords = {}, doi = {10.1109/CLOUD.2016.0125}, issn = {2159-6190}, month = jun, abbr = {Cloud}, bibtex_show = {true}, url = {https://ieeexplore.ieee.org/abstract/document/7820362}, pdf = {Cloud-abouzar-pricing-2016.pdf} }
2015
- ICPPDesigning a Hybrid Scale-Up/Out Hadoop Architecture Based on Performance Measurements for High Application PerformanceZhuozhao Li, and Haiying ShenIn 2015 44th International Conference on Parallel Processing, 2015
Since scale-up machines perform better for jobs with small and median (KB, MB) data sizes while scale-out machines perform better for jobs with large (GB, TB) data size, and a workload usually consists of jobs with different data size levels, we propose building a hybrid Hadoop architecture that includes both scale-up and scale-out machines, which however is not trivial. The first challenge is workload data storage. Thousands of small data size jobs in a workload may overload the limited local disks of scale-up machines. Jobs from scale-up and scale-out machines may both request the same set of data, which leads to data transmission between the machines. The second challenge is to automatically schedule jobs to either scale-up or scale-out cluster to achieve the best performance. We conduct a thorough performance measurement of different applications on scale-up and scale-out clusters, configured with Hadoop Distributed File System (HDFS) and a remote file system (i.e., OFS), respectively. We find that using OFS rather than HDFS can solve the data storage challenge. Also, we identify the factors that determine the performance differences on the scale-up and scale-out clusters and their cross points to make the choice. Accordingly, we design and implement the hybrid scale-up/out Hadoop architecture. Our trace-driven experimental results show that our hybrid architecture outperforms both the traditional Hadoop architecture with HDFS and with OFS in terms of job completion time.
@inproceedings{li2015designing, author = {Li, Zhuozhao and Shen, Haiying}, booktitle = {2015 44th International Conference on Parallel Processing}, title = {Designing a Hybrid Scale-Up/Out Hadoop Architecture Based on Performance Measurements for High Application Performance}, year = {2015}, volume = {}, number = {}, pages = {21-30}, keywords = {}, doi = {10.1109/ICPP.2015.11}, issn = {0190-3918}, month = sep, abbr = {ICPP}, bibtex_show = {true}, url = {https://ieeexplore.ieee.org/abstract/document/7349557}, pdf = {ICPP-li-hybrid-2015.pdf}, selected = {true} }
2014
- INFOCOMNew bandwidth sharing and pricing policies to achieve a win-win situation for cloud provider and tenantsHaiying Shen, and Zhuozhao LiIn IEEE INFOCOM 2014 - IEEE Conference on Computer Communications, 2014
For predictable application performance or fairness in network sharing in clouds, many bandwidth allocation policies have been proposed. However, with these policies, tenants are not incentivized to use idle bandwidth or prevent link congestion, and may even take advantage of the policies to gain unfair bandwidth allocation. Increasing network utilization while avoiding congestion not only benefits cloud provider but also the tenants by improving application performance. In this paper, we propose a new pricing model that sets different unit prices for reserved bandwidth, the bandwidth on congested links and on uncongested links, and makes the unit price for congested links proportional to their congestion degrees. We use game theory model to analyze tenants’ behaviors in our model and the current pricing models, which shows the effectiveness of our model in providing the incentives. With the pricing model, we propose a network sharing policy to achieve both min-guarantee and proportionality, while prevent tenants from earning unfair bandwidth. We further propose methods for each virtual machine to arrange its traffic to maximize its utility. As a result, our solution creates a win-win situation, where tenants strive to increase their benefits in bandwidth sharing, which also concurrently increases the utilities of cloud provider and other tenants. Our simulation and trace-driven experimental results show the effectiveness of our solution in creating the win-win situation.
@inproceedings{shen2014new, author = {Shen, Haiying and Li, Zhuozhao}, booktitle = {IEEE INFOCOM 2014 - IEEE Conference on Computer Communications}, title = {New bandwidth sharing and pricing policies to achieve a win-win situation for cloud provider and tenants}, year = {2014}, volume = {}, number = {}, pages = {835-843}, keywords = {}, doi = {10.1109/INFOCOM.2014.6848011}, issn = {0743-166X}, month = apr, abbr = {INFOCOM}, bibtex_show = {true}, url = {https://ieeexplore.ieee.org/document/6848011}, pdf = {INFOCOM-shen-winwin-2014.pdf} }
Preprints
2021
- Productive Parallel Programming with ParslKyle Chard, Yadu Babuji, Anna Woodard, Ben Clifford, Zhuozhao Li, Mihael Hategan, Ian Foster, Mike Wilde, and Daniel S. KatzAda Lett., 2021
Parsl is a parallel programming library for Python that aims to make it easy to specify parallelism in programs and to realize that parallelism on arbitrary parallel and distributed computing systems. Parsl relies on developers annotating Python functions-wrapping either Python or external applications-to indicate that these functions may be executed concurrently. Developers can then link together functions via the exchange of data. Parsl establishes a dynamic dependency graph and sends tasks for execution on connected resources when dependencies are resolved. Parsl’s runtime system enables different compute resources to be used, from laptops to supercomputers, without modification to the Parsl program.
@article{chard2021productive, author = {Chard, Kyle and Babuji, Yadu and Woodard, Anna and Clifford, Ben and Li, Zhuozhao and Hategan, Mihael and Foster, Ian and Wilde, Mike and Katz, Daniel S.}, title = {Productive Parallel Programming with Parsl}, year = {2021}, issue_date = {December 2020}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, volume = {40}, number = {2}, issn = {1094-3641}, url = {https://doi.org/10.1145/3463478.3463486}, doi = {10.1145/3463478.3463486}, journal = {Ada Lett.}, month = apr, pages = {73–75}, numpages = {3}, bibtex_show = {true}, status = {arXiv} }
2020
- arXivTargeting SARS-CoV-2 with AI-and HPC-enabled lead generation: a first data releaseYadu Babuji, Ben Blaiszik, Tom Brettin, Kyle Chard, Ryan Chard, Austin Clyde, Ian Foster, Zhi Hong, Shantenu Jha, Zhuozhao Li, and othersarXiv preprint arXiv:2006.02431, 2020
Researchers across the globe are seeking to rapidly repurpose existing drugs or discover new drugs to counter the the novel coronavirus disease (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). One promising approach is to train machine learning (ML) and artificial intelligence (AI) tools to screen large numbers of small molecules. As a contribution to that effort, we are aggregating numerous small molecules from a variety of sources, using high-performance computing (HPC) to computer diverse properties of those molecules, using the computed properties to train ML/AI models, and then using the resulting models for screening. In this first data release, we make available 23 datasets collected from community sources representing over 4.2 B molecules enriched with pre-computed: 1) molecular fingerprints to aid similarity searches, 2) 2D images of molecules to enable exploration and application of image-based deep learning methods, and 3) 2D and 3D molecular descriptors to speed development of machine learning models. This data release encompasses structural information on the 4.2 B molecules and 60 TB of pre-computed data. Future releases will expand the data to include more detailed molecular simulations, computed models, and other products.
@article{babuji2020targeting, title = {Targeting SARS-CoV-2 with AI-and HPC-enabled lead generation: a first data release}, author = {Babuji, Yadu and Blaiszik, Ben and Brettin, Tom and Chard, Kyle and Chard, Ryan and Clyde, Austin and Foster, Ian and Hong, Zhi and Jha, Shantenu and Li, Zhuozhao and others}, journal = {arXiv preprint arXiv:2006.02431}, year = {2020}, abbr = {arXiv}, status = {arXiv}, bibtex_show = {true}, selected = {true}, url = {https://arxiv.org/abs/2006.02431} }
2019
- arXivServerless supercomputing: High performance function as a service for scienceRyan Chard, Tyler J Skluzacek, Zhuozhao Li, Yadu Babuji, Anna Woodard, Ben Blaiszik, Steven Tuecke, Ian Foster, and Kyle ChardarXiv preprint arXiv:1908.04907, 2019
Growing data volumes and velocities are driving exciting new methods across the sciences in which data analytics and machine learning are increasingly intertwined with research. These new methods require new approaches for scientific computing in which computation is mobile, so that, for example, it can occur near data, be triggered by events (e.g., arrival of new data), or be offloaded to specialized accelerators. They also require new design approaches in which monolithic applications can be decomposed into smaller components, that may in turn be executed separately and on the most efficient resources. To address these needs we propose funcX—a high-performance function-as-a-service (FaaS) platform that enables intuitive, flexible, efficient, scalable, and performant remote function execution on existing infrastructure including clouds, clusters, and supercomputers. It allows users to register and then execute Python functions without regard for the physical resource location, scheduler architecture, or virtualization technology on which the function is executed—an approach we refer to as "serverless supercomputing." We motivate the need for funcX in science, describe our prototype implementation, and demonstrate, via experiments on two supercomputers, that funcX can process millions of functions across more than 65000 concurrent workers. We also outline five scientific scenarios in which funcX has been deployed and highlight the benefits of funcX in these scenarios.
@article{chard2019serverless, title = {Serverless supercomputing: High performance function as a service for science}, author = {Chard, Ryan and Skluzacek, Tyler J and Li, Zhuozhao and Babuji, Yadu and Woodard, Anna and Blaiszik, Ben and Tuecke, Steven and Foster, Ian and Chard, Kyle}, journal = {arXiv preprint arXiv:1908.04907}, year = {2019}, status = {arXiv}, abbr = {arXiv}, url = {https://arxiv.org/abs/1908.04907}, bibtex_show = {true} }
Thesis
2018
- PhD ThesisScheduling Techniques in Different Architectures of Data-parallel Clusters for High PerformanceZhuozhao LiDepartment of Computer Science, University of Virginia, 2018
@phdthesis{li2018scheduling, title = {Scheduling Techniques in Different Architectures of Data-parallel Clusters for High Performance}, author = {Li, Zhuozhao}, year = {2018}, pdf = {thesis.pdf}, html = {https://libraetd.lib.virginia.edu/public_view/0c483j89h}, bibtex_show = {true}, abbr = {PhD Thesis}, school = {Department of Computer Science, University of Virginia} }