System design Trending:
Failure oriented,
Tolerate errors,
Anytime kill itself and will not impact others, micro architecture
How
The process of defining the architecture, component, modules, interfaces, and data for a system to satisfy specified requirements
General Process
Conceptual Desing, Transparentcy black, Marco level
Logical desing, Transparentcy grey,
Physical design, Transaprentcy White, Micro Level
5 Steps for cracking a desin(SNAKE)
based on example of Netflix
- Scenario: ask features / interfaces / DAU or QPS
- Enumerate functions: Register/login, Play movie, Movie recommendation
- Sort top functions: Play movie(get channels, movies in channel, play a movie in a channel)
- providing API: method name
- Necessary: constrain / hypothesis
- Ask for daily active users
- Predict Users
- average concurrent users = daily active users/ daily seconds * average online time
- Predict peak users = 6 * average concurrent users
- Predict peak users in 3 months = Predict Peak user * 2
- Predict traffic(band use)
- traffic per user = 3mbps
- Max peak traffic = predict peak users * traffic per user
- Predict Memory
- Memory per user: 10KB
- Max daily memory = daily active users * memory per user
- Predict Stroge
- Total movie: 10000
- Average movie size: 90min -> 120min * size per minute
- Movie strage = Total moview * average movie size
- Application: split application / service / module / algorithm
- Replay the case, add a service for each request
- Merge the services
- Kilobit: data
- Append dataset for each request below a service
- Choose storage types: MySQL, MongoDB, Files?
- Evolve (improve or scale): sharding, optimize, special case
- Analyze (consider some)
- with Better: constrains
- with Broader: new cases
- with Deeper: details
- from Permance
- from Scalability
- from robustness (reliability)
- Go back by evolving accordingly
- Analyze (consider some)
Compare Pull/Push Model
Use Push Model
- less resource
- less coding
- low requirement for realtime
- less user post
- Bi-direction follower/following relationship, without super star (similar to broadcast)
Use Pull Model
- more resource
- realtime requirement
- a lot of user post
- single direction follower/following, has super star(similar to broadcast)
Pull Model: (need sync DB read)
Every time request news feed, server query and merge all related data then return
Get News Feed => n DB read, sync have to wait
Post Feed => 1 DB write
Process:
- Request me news feed
- Get followings
- Get news feed from every followings (DB sync read, user must be waited)
- Merge and return
Optimize:
- cache users news feeds, 1000 feeds, n DB sync read -> n cache read
- trade off: not cache all
- 100 = Memcached QPS / MySQL QPS
Push Model:
server side add a table for every user, including all related data, once request news feed, just fetch and return existed data
struct NewsFeedTable {
var id: Int
var userId: Foreign Key
var feedId: Foreign Key
}
Get News Feed => 1 DB read
Post News Feed => n DB write, n = followings, async write
Process:
- Post a feed
- Server insert feed into DB
- Server async send feed to its followers
- Get its followings
- Fanout: insert feed to its followers
Optimize
- Disk is cheap, storing NewsFeedTable
- Inactive User waste storage,
- sort by last login time,
- ignoring some inactive user when insert feeds
- Super star(ton of followers) is slow to push,
- let fans of super star use pull model
- others use push model
- Trade off: How to determine Super star?
- <1m followers: push
- 1m - 10m: pull + push (user merge when pull)
- >10m: pull
Design New Feed Summary
- Requirement Analysis: Scenario
- Data/Scale Prediction: Needs
- Application Graph
- Schema Design
- Push Model vs Pull Model
- Super star problem
- Inactive users
- following & unfollowing
- Normalize & De-normalize
- count Likes: de-normalize, count in feed model, every day do a calibration
- comments: Normalize, count in DB
- Hot Spot
- How about cache dropping hot spot item?, leasing get