大家好,我是正在实战各种 AI 项目的程序员晚枫。
性能优化的黄金法则:先测量,再优化。没有测量的优化,是在猜谜。
《流畅的Python(第2版)》专门讲了一章性能,本讲带你从「测量」到「优化」,建立完整的 Python 性能优化工作流。
🔍 第一步:测量——找到真正的瓶颈 timeit:精准测量小段代码 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 import timeitsetup = "data = list(range(1000))" snippets = { "for 循环" : "result = []\nfor x in data:\n result.append(x**2)" , "列表推导式" : "result = [x**2 for x in data]" , "map 函数" : "result = list(map(lambda x: x**2, data))" , } for name, code in snippets.items(): t = timeit.timeit(code, setup=setup, number=10000 ) print (f"{name} : {t:.4 f} s" )
cProfile:函数级性能分析 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 import cProfileimport pstatsimport iodef read_file (filename: str ) -> list [str ]: """读取文件每行""" with open (filename) as f: return f.readlines() def count_words (lines: list [str ] ) -> dict [str , int ]: """统计词频""" counts: dict [str , int ] = {} for line in lines: for word in line.split(): counts[word] = counts.get(word, 0 ) + 1 return counts def process (filename: str ) -> dict [str , int ]: lines = read_file(filename) return count_words(lines) profiler = cProfile.Profile() profiler.enable() result = process("large_file.txt" ) profiler.disable() stream = io.StringIO() stats = pstats.Stats(profiler, stream=stream) stats.sort_stats('cumulative' ) stats.print_stats(20 ) print (stream.getvalue())
line_profiler:行级性能分析 1 pip install line_profiler
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 @profile def count_words_v1 (lines: list [str ] ) -> dict [str , int ]: counts: dict [str , int ] = {} for line in lines: for word in line.split(): counts[word] = counts.get(word, 0 ) + 1 return counts
memory_profiler:内存使用分析 1 pip install memory_profiler
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 from memory_profiler import profile@profile def load_data (): data = [line for line in open ('large_file.txt' )] return data @profile def load_data_generator (): return (line for line in open ('large_file.txt' ))
⚡ 第二步:优化——针对瓶颈用对工具 1. 数据结构选择 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 import timefrom collections import deque, defaultdict, Countern = 100_000 start = time.time() lst = [] for i in range (n): lst.insert(0 , i) print (f"list 头部插入:{time.time()-start:.3 f} s" ) start = time.time() dq = deque() for i in range (n): dq.appendleft(i) print (f"deque 头部插入:{time.time()-start:.3 f} s" ) words = ["apple" , "banana" , "apple" , "cherry" , "banana" , "apple" ] * 10000 start = time.time() counts = {} for word in words: counts[word] = counts.get(word, 0 ) + 1 print (f"dict.get:{time.time()-start:.4 f} s" )start = time.time() counts = defaultdict(int ) for word in words: counts[word] += 1 print (f"defaultdict:{time.time()-start:.4 f} s" ) start = time.time() counts = Counter(words) print (f"Counter:{time.time()-start:.4 f} s" )
2. 字符串操作优化 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 import timen = 10000 parts = [str (i) for i in range (n)] start = time.time() result = "" for part in parts: result += part print (f"字符串拼接:{time.time()-start:.4 f} s" )start = time.time() result = "" .join(parts) print (f"join:{time.time()-start:.4 f} s" ) name, age = "张三" , 25 result1 = "名字: %s, 年龄: %d" % (name, age) result2 = "名字: {}, 年龄: {}" .format (name, age) result3 = f"名字: {name} , 年龄: {age} "
3. slots 节省内存 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 import sysclass PersonWithDict : """普通类:用 __dict__ 存储属性""" def __init__ (self, name, age, email ): self.name = name self.age = age self.email = email class PersonWithSlots : """使用 __slots__:固定属性,节省内存""" __slots__ = ['name' , 'age' , 'email' ] def __init__ (self, name, age, email ): self.name = name self.age = age self.email = email p1 = PersonWithDict("张三" , 25 , "zhang@example.com" ) p2 = PersonWithSlots("张三" , 25 , "zhang@example.com" ) print (f"有 __dict__:{sys.getsizeof(p1.__dict__)} bytes" ) print (f"有 __slots__:{sys.getsizeof(p2)} bytes" ) import tracemalloctracemalloc.start() objs_dict = [PersonWithDict("name" , i, "email" ) for i in range (100000 )] current, peak = tracemalloc.get_traced_memory() print (f"PersonWithDict 峰值内存:{peak / 1024 / 1024 :.1 f} MB" )tracemalloc.reset_peak() objs_slots = [PersonWithSlots("name" , i, "email" ) for i in range (100000 )] current, peak = tracemalloc.get_traced_memory() print (f"PersonWithSlots 峰值内存:{peak / 1024 / 1024 :.1 f} MB" )
4. 生成器 vs 列表:内存敏感场景 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 import syslst = [x ** 2 for x in range (1_000_000 )] print (f"列表内存:{sys.getsizeof(lst) / 1024 / 1024 :.1 f} MB" ) gen = (x ** 2 for x in range (1_000_000 )) print (f"生成器内存:{sys.getsizeof(gen)} bytes" ) def process_large_file (filename: str ): """用生成器处理大文件,内存友好""" with open (filename) as f: for line in f: yield line.strip() total = sum (x**2 for x in range (1_000_000 ))
5. 局部变量缓存:减少属性查找 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 import timedata = list (range (100_000 )) start = time.time() result = [] for item in data: result.append(item * 2 ) t1 = time.time() - start start = time.time() result = [] append = result.append for item in data: append(item * 2 ) t2 = time.time() - start print (f"原始:{t1:.4 f} s,优化后:{t2:.4 f} s,提速:{t1/t2:.1 f} x" )
🔥 第三步:极限优化——突破 Python 速度 NumPy 向量化:数值计算加速 10-100x 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 import numpy as npimport timen = 1_000_000 start = time.time() data = list (range (n)) result = [x ** 2 for x in data] print (f"Python 列表推导式:{time.time()-start:.3 f} s" )start = time.time() arr = np.arange(n) result = arr ** 2 print (f"NumPy 向量化:{time.time()-start:.3 f} s" ) a = np.random.rand(n) b = np.random.rand(n) start = time.time() c = [a[i] * b[i] for i in range (n)] print (f"Python 逐元素乘法:{time.time()-start:.3 f} s" )start = time.time() c = a * b print (f"NumPy 向量化乘法:{time.time()-start:.4 f} s" )
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 import functoolsimport timedef fib_slow (n: int ) -> int : if n < 2 : return n return fib_slow(n - 1 ) + fib_slow(n - 2 ) @functools.lru_cache(maxsize=None ) def fib_fast (n: int ) -> int : if n < 2 : return n return fib_fast(n - 1 ) + fib_fast(n - 2 ) start = time.time() result = fib_slow(35 ) print (f"无缓存 fib(35) = {result} ,耗时:{time.time()-start:.3 f} s" ) start = time.time() result = fib_fast(35 ) print (f"有缓存 fib(35) = {result} ,耗时:{time.time()-start:.6 f} s" ) print (fib_fast.cache_info())
懒加载与缓存属性 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 import functoolsclass DataProcessor : def __init__ (self, filename: str ): self.filename = filename @functools.cached_property def data (self ) -> list : """懒加载数据,只在第一次访问时加载""" print (f"加载数据从 {self.filename} ..." ) with open (self.filename) as f: return f.readlines() @functools.cached_property def word_count (self ) -> int : """词数,依赖 data,同样只计算一次""" return sum (len (line.split()) for line in self.data) processor = DataProcessor("large.txt" ) print (processor.word_count) print (processor.word_count)
📈 性能优化工作流总结 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 1. 明确优化目标(速度?内存?) ↓ 2. 编写基准测试(timeit/time) ↓ 3. 运行 cProfile 找热点函数 ↓ 4. 用 line_profiler 精确定位热点行 ↓ 5. 选择优化策略: • 算法/数据结构优化(最高收益) • NumPy 向量化(数值计算) • lru_cache 缓存(重复计算) • __slots__(内存优化) • C 扩展 / Cython(极限性能) ↓ 6. 优化后再次基准测试,验证效果 ↓ 7. 检查可读性,权衡复杂度
⚠️ 常见误区 1. 过早优化 1 2 3 4 5 6 7 8 9 10 11 def clever_but_unreadable (data ): return {k: v for k, v in zip (data[::2 ], data[1 ::2 ])} def clear_and_readable (data ): result = {} for i in range (0 , len (data), 2 ): result[data[i]] = data[i + 1 ] return result
2. 局部优化忽视全局 3. 忽视 I/O 通常才是真正的瓶颈 🎯 本讲总结 测量工具 :
timeit:精确测量小段代码cProfile:函数级热点分析(先用这个)line_profiler:行级精确分析(确认热点后)memory_profiler:内存使用分析常用优化技巧 (按收益排序):
选对数据结构 :deque / Counter / defaultdictNumPy 向量化 :数值计算快 10-100xlru_cache / cached_property :消除重复计算生成器 :大数据集的内存优化slots :大量实例时节省 30-50% 内存字符串 join :代替 += 拼接局部变量缓存 :减少属性查找工作流 :先测量 → 找瓶颈 → 针对性优化 → 再测量验证
📚 推荐教材 《Python 编程从入门到实践(第 3 版)》 | 《流畅的 Python(第 2 版)》 | 《CPython 设计与实现》
学习路线: 零基础 → 《从入门到实践》 → 《流畅的 Python》 → 本门课程 → 《CPython 设计与实现》
🎓 加入《流畅的 Python》直播共读营 学到这里,如果你想系统吃透这本书——欢迎加入我的直播共读课。
每周直播精讲,逐章拆解核心知识点 专属学习群,随时答疑交流 试运营特惠:499 元 → 299 元 👉 【立即报名《流畅的 Python》共读课】 :https://mp.weixin.qq.com/s/ivHJwn1nNx5ug4TFrapvGg
🔗 课程导航 ← 上一讲:异步编程 | 下一讲:最佳实践 →
💬 联系我 主营业务 :AI 编程培训、企业内训、技术咨询
🎓 AI 编程实战课程 想系统学习 AI 编程?程序员晚枫的 AI 编程实战课 帮你从零上手!