你知道你的电脑1秒钟能做多少事情吗?
让我们来看看你有多么了解电脑!所有这些程序的数值都是可变的。你的任务是:在程序花费 1 秒运行之前猜测它的大概值。
你并不需要猜出一个精确值:选择范围在 1 和 10 亿之间。你只要能猜出正确的数量级,就算正确!下面是一些注意事项:
- 如果答案是 38,000,那么你选择 10,000 或 100,000,我们就认为都是正确答案。误差只要在 10 倍范围内就 ok:)
- 我们知道不同的计算机有不同的磁盘、网络和 CPU 速度!我们会告诉运行 10 次/秒和 10 万次/秒的代码之间的差别。更新的电脑不会让你的代码运行速度快 1000 倍:)
- 也就是说,所有这一切都是运行在一台新的拥有一个快速的 SSD 和一个凑合的网络连接的笔记本电脑上的。 C 代码用 gcc -O2 编译。
祝你好运!
欢迎来到第一个程序!这一个只是让你练练手的:1 秒能完成多少循环? (结果可能比你想象得更多!)
猜猜下面的程序每秒执行多少次循环:
#include <stdlib.h> // Number to guess: How many iterations of // this loop can we go through in a second? int main (int argc, char **argv) { int NUMBER, i, s; NUMBER = atoi (argv[1]); for (s = i = 0; i < NUMBER; ++i) { s += 1; } return 0; }
准确答案:550,000,000
猜猜下面的程序每秒执行多少次循环:
#!/usr/bin/env python # Number to guess: How many iterations of an # empty loop can we go through in a second? def f (NUMBER): for _ in xrange (NUMBER): pass import sys f (int(sys.argv[1]))
准确答案:68,000,000
当我看着代码的时候,我想的是 1 毫秒完成多少次——我以为是微不足道的,但事实是,即使是 Python,你也可以在 1 毫秒的时间内执行 68,000 次空循环迭代。
下面让我们来探讨一个更接近现实的用例。在 Python 中字典几乎是无处不在的,那么在 1 秒时间内我们可以用 Python 添加多少元素呢?
然后再来看一个更复杂的操作——使用 Python 的内置 HTTP 请求解析器来解析请求。
猜猜下面的程序每秒执行多少次循环:
#!/usr/bin/env python # Number to guess: How many entries can # we add to a dictionary in a second? # Note: we take `i % 1000` to control # the size of the dictionary def f (NUMBER): d = {} for i in xrange (NUMBER): d[i % 1000] = i import sys f (int(sys.argv[1]))
准确答案:11,000,000
猜猜下面的程序每秒处理多少次 HTTP 请求:
#!/usr/bin/env python # Number to guess: How many HTTP requests # can we parse in a second? from BaseHTTPServer import BaseHTTPRequestHandler from StringIO import StringIO class HTTPRequest (BaseHTTPRequestHandler): def __init__(self, request_text): self.rfile = StringIO (request_text) self.raw_requestline = self.rfile.readline () self.error_code = self.error_message = None self.parse_request () def send_error (self, code, message): self.error_code = code self.error_message = message request_text = """GET / HTTP/1.1 Host: localhost:8001 Connection: keep-alive Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8 Upgrade-Insecure-Requests: 1 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36 Accept-Encoding: gzip, deflate, sdch Accept-Language: en-GB,en-US;q=0.8,en;q=0.6 """ def f (NUMBER): for _ in range (NUMBER): HTTPRequest (request_text) import sys f (int(sys.argv[1]))
准确答案:25,000
我们每秒可以解析 25,000 个小的 HTTP 请求!有一件事我要在这里指出的是,这里请求解析的代码是用纯 Python 编写的,而不是C。
接下来,我们要试试下载网页与运行 Python 脚本!提示:少于 1 亿:)
猜猜下面的程序每秒可以完成多少次 HTTP 请求:
#!/usr/bin/env python # Number to guess: How many times can we # download google.com in a second? from urllib2 import urlopen def f (NUMBER): for _ in xrange (NUMBER): r = urlopen ("https://google.com") r.read () import sys f (int(sys.argv[1]))
准确答案:4
猜猜下面的程序每秒可以执行多少次循环:
#!/bin/bash # Number to guess: How many times can we start # the Python interpreter in a second? NUMBER=$1 for i in $(seq $NUMBER); do python -c ''; done
准确答案:77
启动程序实际上昂贵在其本身,而不是启动 Python。如果我们只是运行/bin/true,那么 1 秒能做 500 次,所以看起来运行任何程序只需要大约 1 毫秒时间。当然,下载网页的快慢很大程度上取决于网页大小,网络连接速度,以及服务器间的距离,不过今天我们不谈网络性能。我的一个朋友说,高性能的网络完成网络往返甚至可能只要 250 纳秒(!!!),但这是在计算机位置更相邻,硬件更好的情况下。
1 秒时间能够在磁盘中写入多少字节?我们都知道写到内存中时速度会更快,但是究竟会快多少呢?对了,下面的代码运行在带有 SSD 的计算机上。
猜猜下面的程序每秒可以写入多少字节数据:
#!/usr/bin/env python # Number to guess: How many bytes can we write # to an output file in a second? # Note: we make sure everything is sync'd to disk # before exiting import tempfile import os CHUNK_SIZE = 1000000 s = "a" * CHUNK_SIZE def cleanup (f, name): f.flush () os.fsync (f.fileno ()) f.close () try: os.remove (name) except: pass def f (NUMBER): name = './out' f = open (name, 'w') bytes_written = 0 while bytes_written < NUMBER: f.write (s) bytes_written += CHUNK_SIZE cleanup (f, name) import sys f (int(sys.argv[1]))
准确答案:342,000,000
猜猜下面的程序每秒可以写入多少字节数据:
#!/usr/bin/env python # Number to guess: How many bytes can we write # to a string in memory in a second? import cStringIO CHUNK_SIZE = 1000000 s = "a" * CHUNK_SIZE def f (NUMBER): output = cStringIO.StringIO () bytes_written = 0 while bytes_written < NUMBER: output.write (s) bytes_written += CHUNK_SIZE import sys f (int(sys.argv[1]))
准确答案:2,000,000,000
下面轮到文件了!有时候,运行一个大型的 grep 之后,它可以永恒跑下去。在 1 秒时间内,grep 可以搜索多少字节?
请注意,在这么做的时候,grep 正在读取的字节已经在内存中。
文件列表同样需要时间!1 秒能列出多少文件?
猜猜下面的程序每秒可以搜索多少字节的数据:
#!/bin/bash # Number to guess: How many bytes can `grep` # search, unsuccessfully, in a second? # Note: the bytes are in memory NUMBER=$1 cat /dev/zero | head -c $NUMBER | grep blah exit 0
准确答案:2,000,000,000
猜猜下面的程序每秒可以列出多少文件:
#!/bin/bash # Number to guess: How many files can `find` list in a second? # Note: the files will be in the filesystem cache. find / -name '*' 2> /dev/null | head -n $1 > /dev/null
准确答案:325,000
序列化是一个普遍要花费大量时间的地方,让人很蛋疼,特别是如果你反复结束序列化/反序列化相同数据的时候。这里有几个基准:转换 64K 大小的 JSON 格式数据,与同样大小的 msgpack 格式数据。
猜猜下面的程序每秒可以执行多少次循环:
#!/usr/bin/env python # Number to guess: How many times can we parse # 64K of JSON in a second? import json with open ('./setup/protobuf/message.json') as f: message = f.read () def f (NUMBER): for _ in xrange (NUMBER): json.loads (message) import sys f (int(sys.argv[1]))
准确答案:449
猜猜下面的程序每秒可以执行多少次循环:
#!/usr/bin/env python # Number to guess: How many times can we parse # 46K of msgpack data in a second? import msgpack with open ('./setup/protobuf/message.msgpack') as f: message = f.read () def f (NUMBER): for _ in xrange (NUMBER): msgpack.unpackb (message) import sys f (int(sys.argv[1]))
准确答案:4,000
数据库。没有任何类似于 PostgreSQL 花里胡哨的东西,我们做了 2 份有 1000 万行数据的 SQLite 表,一个是有索引的,另一个是未建索引的。
猜猜下面的程序每秒可以执行多少次查询:
#!/usr/bin/env python # Number to guess: How many times can we # select a row from an **indexed** table with # 10,000,000 rows? import sqlite3 conn = sqlite3.connect ('./indexed_db.sqlite') c = conn.cursor () def f (NUMBER): query = "select * from my_table where key = %d" % 5 for i in xrange (NUMBER): c.execute (query) c.fetchall () import sys f (int(sys.argv[1]))
准确答案:53,000
猜猜下面的程序每秒执行多少次查询:
#!/usr/bin/env python # Number to guess: How many times can we # select a row from an **unindexed** table with # 10,000,000 rows? import sqlite3 conn = sqlite3.connect ('./unindexed_db.sqlite') c = conn.cursor () def f (NUMBER): query = "select * from my_table where key = %d" % 5 for i in xrange (NUMBER): c.execute (query) c.fetchall () import sys f (int(sys.argv[1]))
准确答案:2
下面要说 Hash 算法!在这里,我们将比较 MD5 和 bcrypt。用 MD5 你在 1 秒时间内可以哈希到相当多的东西,而用 bcrypt 则不能。
猜猜下面的程序每秒可以哈希多少字节的数据:
#!/usr/bin/env python # Number to guess: How many bytes can we md5sum in a second? import hashlib CHUNK_SIZE = 10000 s = 'a' * CHUNK_SIZE def f (NUMBER): bytes_hashed = 0 h = hashlib.md5() while bytes_hashed < NUMBER: h.update (s) bytes_hashed += CHUNK_SIZE h.digest () import sys f (int(sys.argv[1]))
准确答案:455,000,000
猜猜下面的程序每秒可以哈希多少字节的密码:
#!/usr/bin/env python # Number to guess: How many passwords # can we bcrypt in a second? import bcrypt password = 'a' * 100 def f (NUMBER): for _ in xrange (NUMBER): bcrypt.hashpw (password, bcrypt.gensalt ()) import sys f (int(sys.argv[1]))
准确答案:3
接下来,我们要说一说内存访问。 现在的 CPU 有 L1 和 L2 缓存,这比主内存访问速度更快。这意味着,循序访问内存通常比不按顺序访问内存能提供更快的代码。
猜猜下面的程序每秒可以向内存写入多少字节数据:
#include <stdlib.h> #include <stdio.h> // Number to guess: How big of an array (in bytes) // can we allocate and fill in a second? // this is intentionally more complicated than it needs to be // so that it matches the out-of-order version int main (int argc, char **argv) { int NUMBER, i; NUMBER = atoi (argv[1]); char* array = malloc (NUMBER); int j = 1; for (i = 0; i < NUMBER; ++i) { j = j * 2; if (j > NUMBER) { j = j - NUMBER; } array[i] = j; } printf ("%d", array[NUMBER / 7]); // so that -O2 doesn't optimize out the loop return 0; }
准确答案:376,000,000
猜猜下面的程序每秒可以向内存写入多少字节数据:
#include <stdlib.h> #include <stdio.h> // Number to guess: How big of an array (in bytes) // can we allocate and fill with 5s in a second? // The catch: We do it out of order instead of in order. int main (int argc, char **argv) { int NUMBER, i; NUMBER = atoi (argv[1]); char* array = malloc (NUMBER); int j = 1; for (i = 0; i < NUMBER; ++i) { j = j * 2; if (j > NUMBER) { j = j - NUMBER; } array[j] = j; } printf ("%d", array[NUMBER / 7]); // so that -O2 doesn't optimize out the loop return 0; }
准确答案:68,000,000
本文文字及图片出自 www.codeceo.com
共有 1 条讨论