facebook的mysql使用情况
facebook的mysql使用情况Facebook Notes
Database statistics
200 MySQL servers
Each has same schema; partitioned data
Partitioned by institution, company, etc.
Anticipate having replicas via MySQL replication; don’t now for most data (some core data is
replicated)
13 million users
Daily hits 5-10 billion
½ in 4 to 6 hours
Nets out to 200K hits/sec
OR 1000 hits/sec/server
80% reads; 20% writes
Monthly growth 30-40% per month
I/O bound
3 TBytes
No joins (< 1%)
Mostly Primary Key lookups
30,000 schemas, with several hundred tables per schema
No triggers
MySQL 4.1, not 5.0
A big schema: Stanford (5-10GBytes)
Problems with current system:
MySQL multithreading
MySQL memory usage
MySQL table locks
These *might* be issues, but they didn’t say that they ARE.
Nightly backups as the OS level, then hourly snapshots.
They make use of only simple datatypes
They don’t use DATE; they use INTEGER since it’s faster/smaller
They try not to use INTEGER, using TINYINT or SMALL where possible
(Editorial note: They’re going to hate Oracle NUMBER...) J
They use TEXT. Some applications append new data into TEXT columns, so they can be quite large.
MySQL servers are Linux 2.6 AMD64 16GB RAM 4-way Opterons (2 dual core)
Local disks (15K SCSI, 2 in RAID-1). Separate SAN used for backups.
They have two collocation centers (Santa Clara and San Francisco)
For now, most machines are in one of them (the other is quite new)
May be adding more colos, not yet decided.
Applications all connect to database; use it; disconnect. No overt connection pool.
Memcache Notes
Two collocation centers
Each colo has a set of caching servers
150 servers (approx. 75 in each colo)
Web servers always use cache servers in their colo
There is cross-colo cache invalidation; discussed later.
16 GBytes RAM; 12 GBytes dedicated for memcache
Memcache is multithreaded, using 2 or 4 way AMD64 CPUs
Objects stored in memcache: almost all serialized PHP objects, a few HTML fragments
Key: object_type:schema_version_number:primary_key
Memcache is just a simple hash table
They also hash objects across memcache servers
Each object hashes to precisely one server
That object is always stored on that server
If that server is dead, then that object can’t be cached
May use virtual IPs to resolve this in the future
Use a mixture of TCP and UDP to talk to memcache (TCP default)
Each memcache server has 30,000 to 40,000 TCP connections to it
75% of values stored in memcache are less than 5Kbytes. Max size supported is 1 MByte.
Memcache supports sending an array of ‘get’ requests in one operation.
As many as 100 requests can be batched up into a single array operation
Memcache gets a hit rate in excess of 99%.
Recall that there are 200,000 MySQL hits per second
80% reads; 20% writes
Basically all of the MySQL reads are memcache ‘misses’.
Therefore there are about 16 million memcache queries per second across the site (200,000 *
0.80 * 100)
Therefore there are at least 200,000 writes to memcache per second
Could be more; updating one row in MySQL can change multiple serialized PHP objects
in memcache.
Memcache response time for a get request is 250 microseconds
100% network bound
Every Apache process on every web server winds up with a connection to every memcache server
Approximately 100 apache processes per web server
We were earlier told there are 30,000 to 40,000 TCP connections to each memcache server
Therefore there are 300 to 400 web servers at Facebook (they agreed that number was sort of
close)
Applications running under Apache update data in MySQL, then update the entries in memcache as
required.
These updates to directly to the appropriate memcaches (remember that each cached object
hashes to precisely one memcache)
For deletes or other times when cross/collocation facility invalidation of caches is required, the
application sends an invalidation request to a local memcache proxy on the same Apache node;
it then sends appropriate delete requests to both the local memcache(s) and to any appropriate
memcache(s) at other collocation facilities.
This scheme works great for most of their data
99%+ hit ratio!
But some data – for example, relationships between objects / people – doesn’t cache well.
This sort of data gets cache ratios around 60%
They are writing a new ‘relationship cache’ to handle this sort of data. Applications all connect to database; use it; disconnect. No overt connection pool. 这么做有什么好处么? mysql 4.1 not 5.0 为什么使用4.1 ?
页:
[1]