Serverless Computing: One Step Forward, Two Steps Back
Serverless Computing: One Step Forward, Two Steps Back
Serverless: FaaS + Ecosystem
Serverless combines FaaS with a vendor’s ecosystem for storage, caching, events, queuing, etc.
The Good: Serverless’ Autoscaling
Embarrassingly Parallel Functions
- Independent functions, performing map functions, need not sync or communicate between each other.
- e.g. object recognition, image transformation, on-demand calculation
Orchestration Functions
- Use the serverless events to dispatch commands in another pipeline with its own separate autoscaling functionality
- e.g. Google Cloud Dataprep by Trifacta
FaaS Limitations
- Limited Lifetimes: most vendors limit a function to around 15 minutes
- I/O Bottleneck:
- vendors provide little bandwidth (0.5Gb/s, 10x slower than an SSD)
- AWS clusters a user’s Lambda functions, increasing compute, reducing per-Lambda bandwidth
- Communication Through Slow Storage:
- Lambda instances cannot communication with each other
- They communicate by writing to another Amazon storage service
- slow way to do IPC
- No Specialized Hardware
- FaaS instances are run in VMs separate from data they are processing
FaaS Limitations Consequence
- Shipping Data to Code
- we’d want to take advantage of caching, and other CPU and memory performance optimizations
- maybe even keep a small internal state
- data from everywhere is shipped to a short-lived function, where no optimization happens
- as well as network, startup latency, bandwidth limitations
- we’d like to see code shipped to live with data, processing it like a factory assembly line
- we’d want to take advantage of caching, and other CPU and memory performance optimizations
- Stymying Distributed Computing
- functions are not addressable (and can only communicate through slow storage)
- all need to read/write global state
- cannot use distributed systems to coordinate thousands/millions of FaaS functions
- FaaS stymies hardware-accelerated software innovation
- vendors only offer no custom processing (e.g. GPGPU) nor large amounts of RAM for FaaS functions
- FaaS discourages Open Source service innovation
- moss FOSS isn’t made for FaaS integration
- [we could imagine a User-Custom FaaS, where a spec dictates how User-provided software needs to run to integrate itself alongside the vendor FaaS infra]
Case Study: Model Training
- Training ML model
-
Needs 31 iterations
- On Lambda instance
- cost 0.29$
- total 465 minutes (7.75 hours)
- each iteration:
- 640 MB memory
- 15 min (max allowed)
- 3.08 seconds
- 2.49 seconds to fetch 100 MB
- 0.59 seconds to process
- On m4.large EC2 instance
- cost 0.04$
- total 22 minutes
- each iteration:
- 0.14 seconds
- 0.04 seconds to fetch 100 MB
- 0.1 seconds to process
- Lamba 21x slower, 7.3x more expensive
Case Study: Low-Latency Prediction
- Using ML model for prediction
- Immediate downside: no GPU access in Lambda
-
Batched inputs for prediction (AWS SQS)
- On Lambda
- Fetch data from SQS, place results back in SQS
- Average batch latency: 467ms
- On m5.large, still using SQS
- Average batch latency: 13ms
- On m5.large, using ZeroMQ instead of SQS
- Average batch latency: 2.8ms
- Cost difference: order of 50x
Case Study: Distributed Computing
- FaaS functions are not network addressable
- Communication happens through AWS storage
- Assessing cloud storage as communication medium
“Hence in the (unachievable) best-case scenario—when each leader is elected immediately after it joins the system—the system will spend 1.9% of its aggregate time simply in the leader election protocol. Even if you think this is tolerable, note that using DynamoDB as a fine-grained communication medium is incredibly expensive: Supporting a cluster of 1,000 nodes costs at minimum $450 per hour”
As-Is Limitations May Be Good
- FaaS forces stateless
- easy to write, debug, trivial to replicate
- FaaS forces loosly-coupled design, possibly forcing good design patterns
- “In particular, FaaS limitations favor operational flexibility over developer control”
- May resurface ideas that fit the FaaS mindset
Paper Objections
Section 3.3 addresses some criticism and clarifications on the paper content.
Improvements
- Fluid Code and Data Placement
- infra needs to allow code and data to be colocated for perf reasons
- infra might need to replicate data around to only couple data and code when needed
- high-level languages and frameworks can help the infra understand how to better move the data around
- Heterogeneous Hardware Support
- offer access to diverse type of specialized hardware
- allow infra to choose on which hardware to run the code for optimal execution
- Long-Running, Addressable Virtual Agents
- infra should allow user to specify data or hardware affinity for code
- e.g. infra would move data around, mode code and data to specific hardware
- amortize this effort by reusing code/data/hardware placement
- needs addressable entities that are present long-term
- infra should allow user to specify data or hardware affinity for code
- Flexible Programming, Common IR
- different languages, DSL, frameworks may use an IR to accomodate infra-specific optimizations and constructs
- e.g. data flow, hardware affinity, async events
- different languages, DSL, frameworks may use an IR to accomodate infra-specific optimizations and constructs
- Service-level objectives & guarantees
- offer granular SLOs, with penalties for mis-estimations
- Security concerns
- security is aided by cloud provider
- removing misconfiguration, mismanagment
- offering high-level abstractions, allowing for security-benefial constraints
- still need to consider multi-tenancy issues, leakage
- especially if code is allowed to move fluidly as per user request
- may need to adopt new tech (e.g. hardware enclave) and consider audit / post-hoc analysis
- security is aided by cloud provider