AUTHOR=Shankar Karthick , Mahgoub Ashraf , Zhou Zihan , Priyam Utkarsh , Chaterji Somali TITLE=Asgard: Are NoSQL databases suitable for ephemeral data in serverless workloads? JOURNAL=Frontiers in High Performance Computing VOLUME=Volume 1 - 2023 YEAR=2023 URL=https://www.frontiersin.org/journals/high-performance-computing/articles/10.3389/fhpcp.2023.1127883 DOI=10.3389/fhpcp.2023.1127883 ISSN=2813-7337 ABSTRACT=Serverless computing platforms are becoming increasingly popular for data analytics applications due to their low management overhead and granular billing strategies. Such analytics frameworks use a Directed Acyclic Graph (DAG) structure, in which serverless functions, which are fine-grained tasks, are represented as nodes and data-dependencies between the functions are represented as edges. Passing intermediate (ephemeral) data from one function to another has been receiving attention of late with works proposing various storage systems and methods of optimization for them. The state-of-practice method is to pass the ephemeral data through a remote storage, either disk-based (e.g., Amazon S3), which is slow, or memory-based (e.g., ElastiCache Redis), which is expensive. Although several existing and widely popular NoSQL databases can leverage both memory and disk (e.g., Apache Cassandra and ScyllaDB), prior works argue that these data stores are not suited for ephemeral data since they are typically designed for reliable long-term storage. In Asgard, we quantitatively investigate if that is actually the case and look at scenarios where NoSQL databases can have a better performance (end-to-end latency) normalized by $ cost if we were to configure them in a DAG-aware manner. We find that Apache Cassandra with a default configuration is up to 326% better in performance per dollar cost than Redis while it outperforms S3 by up to 189%. Cassandra with Asgard outperforms default Cassandra by up to 47%.