大家好,我是正在实战各种 AI 项目的程序员晚枫。
处理 100 万条数据,内存只占用几 KB?构建数据处理管道,代码像流水线一样优雅?生成器让这一切成为可能!
🔄 生成器函数:yield 的本质 普通函数 vs 生成器函数 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 import sysdef squares_list (n: int ) -> list [int ]: result = [] for i in range (n): result.append(i ** 2 ) return result def squares_gen (n: int ): for i in range (n): yield i ** 2 N = 1_000_000 lst = squares_list(N) gen = squares_gen(N) print (f"列表内存:{sys.getsizeof(lst):,} bytes" ) print (f"生成器内存:{sys.getsizeof(gen)} bytes" ) for x in squares_gen(10 ): print (x, end=" " )
yield 的执行流程 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 def step_by_step (): print ("Step 1" ) yield 1 print ("Step 2" ) yield 2 print ("Step 3" ) gen = step_by_step() value = next (gen) print (f"得到:{value} " ) value = next (gen) print (f"得到:{value} " ) try : next (gen) except StopIteration: print ("生成器耗尽" )
生成器表达式 1 2 3 4 5 6 7 8 9 10 11 12 13 14 import syslst = [x ** 2 for x in range (1_000_000 )] gen = (x ** 2 for x in range (1_000_000 )) print (sys.getsizeof(lst)) print (sys.getsizeof(gen)) total = sum (x ** 2 for x in range (1000 )) max_val = max (x for x in [3 , 1 , 4 , 1 , 5 , 9 ])
📡 yield from:子生成器委托 基础用法 1 2 3 4 5 6 7 8 9 10 11 12 13 def sub_generator (): yield 1 yield 2 yield 3 def main_generator (): yield "start" yield from sub_generator() yield from range (10 , 13 ) yield "end" print (list (main_generator()))
嵌套结构扁平化 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 from typing import Any , Iteratordef flatten (nested: Any ) -> Iterator: """递归展平任意深度嵌套的可迭代对象""" if isinstance (nested, (list , tuple )): for item in nested: yield from flatten(item) else : yield nested data = [1 , [2 , 3 ], [4 , [5 , [6 , 7 ]]], 8 ] print (list (flatten(data))) def flatten_manual (nested ): for item in nested: if isinstance (item, (list , tuple )): for sub in flatten_manual(item): yield sub else : yield item
yield from 的值传递 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 def accumulator (): """通过 yield 接收外部发送的值""" total = 0 while True : value = yield total if value is None : break total += value gen = accumulator() next (gen) gen.send(10 ) gen.send(20 ) result = gen.send(5 ) print (result)
🏭 数据处理管道:生成器的杀手级应用 生成器最强的应用场景之一:构建 惰性数据处理管道 ,每个阶段都是生成器,数据流式处理:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 import refrom typing import Iteratordef read_lines (filename: str ) -> Iterator[str ]: """读取文件(流式,不全量加载)""" with open (filename, encoding='utf-8' ) as f: for line in f: yield line.rstrip('\n' ) def filter_empty (lines: Iterator[str ] ) -> Iterator[str ]: """过滤空行""" for line in lines: if line.strip(): yield line def strip_comments (lines: Iterator[str ] ) -> Iterator[str ]: """去除注释行""" for line in lines: if not line.startswith('#' ): yield line def parse_fields (lines: Iterator[str ] ) -> Iterator[dict ]: """解析字段(假设 CSV 格式)""" for line in lines: fields = line.split(',' ) if len (fields) >= 3 : yield { 'name' : fields[0 ].strip(), 'age' : int (fields[1 ].strip()), 'email' : fields[2 ].strip(), } def filter_by_age (records: Iterator[dict ], min_age: int ) -> Iterator[dict ]: """按年龄过滤""" for record in records: if record['age' ] >= min_age: yield record def process_users (filename: str ) -> Iterator[dict ]: pipeline = read_lines(filename) pipeline = filter_empty(pipeline) pipeline = strip_comments(pipeline) pipeline = parse_fields(pipeline) pipeline = filter_by_age(pipeline, 18 ) return pipeline for user in process_users('users.csv' ): print (user)
itertools 是 Python 标准库中的生成器工具集,全部是惰性的:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 import itertoolsfor i in itertools.count(10 , 2 ): if i > 20 : break print (i, end=" " ) colors = itertools.cycle(['红' , '绿' , '蓝' ]) for _ in range (7 ): print (next (colors), end=" " ) result = list (itertools.chain([1 , 2 ], [3 , 4 ], [5 , 6 ])) print (result) gen = (x**2 for x in range (1_000_000 )) first_10 = list (itertools.islice(gen, 10 )) print (first_10) data = [1 , 2 , 3 , 10 , 4 , 5 ] result = list (itertools.takewhile(lambda x: x < 5 , data)) print (result) data = [ {'name' : '张三' , 'dept' : '技术' }, {'name' : '李四' , 'dept' : '产品' }, {'name' : '王五' , 'dept' : '技术' }, {'name' : '赵六' , 'dept' : '产品' }, ] data.sort(key=lambda x: x['dept' ]) for dept, members in itertools.groupby(data, key=lambda x: x['dept' ]): print (f"{dept} : {[m['name' ] for m in members]} " ) print (list (itertools.combinations('ABC' , 2 )))print (list (itertools.permutations('AB' , 2 )))
💡 生成器协程:send 和 throw 生成器不仅能产出数据,还能通过 send() 接收数据,实现双向通信:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 def running_average () -> None : """运行中实时计算平均值的协程""" total = 0.0 count = 0 average = None while True : value = yield average if value is None : break total += value count += 1 average = total / count coro = running_average() next (coro) print (coro.send(10 )) print (coro.send(20 )) print (coro.send(30 )) print (coro.send(40 )) coro.close()
⚠️ 常见陷阱 1. 生成器只能遍历一次 1 2 3 4 5 6 gen = (x**2 for x in range (5 )) print (list (gen)) print (list (gen)) data = list (gen)
2. 忘记启动协程(执行到第一个 yield) 1 2 3 4 5 6 7 8 9 10 11 12 def my_coroutine (): while True : value = yield print (f"Got: {value} " ) coro = my_coroutine() next (coro) coro.send(42 )
3. return 值在 StopIteration 中 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 def gen_with_return (): yield 1 yield 2 return "final value" gen = gen_with_return() next (gen) next (gen) try : next (gen) except StopIteration as e: print (e.value) def delegating (): result = yield from gen_with_return() print (f"子生成器返回:{result} " )
🎯 本讲总结 生成器函数 :用 yield 实现惰性求值,按需计算,节省内存。
生成器表达式 :(expr for x in iterable) 是最简洁的创建方式。
yield from :委托子生成器,自动传递值/异常,支持嵌套递归。
数据管道 :多个生成器串联,构建惰性处理流水线,适合大数据集。
itertools :标准库提供的惰性工具集,chain、islice、groupby 等是常用武器。
协程(生成器协程) :send() 实现双向通信;asyncio 的协程就是基于此演化而来的。
关键区别 :迭代器实现 __iter__/__next__;生成器是创建迭代器的便捷方式;协程是异步的基础。
📚 推荐教材 《Python 编程从入门到实践(第 3 版)》 | 《流畅的 Python(第 2 版)》 | 《CPython 设计与实现》
学习路线: 零基础 → 《从入门到实践》 → 《流畅的 Python》 → 本门课程 → 《CPython 设计与实现》
🎓 加入《流畅的 Python》直播共读营 学到这里,如果你想系统吃透这本书——欢迎加入我的直播共读课。
每周直播精讲,逐章拆解核心知识点 专属学习群,随时答疑交流 试运营特惠:499 元 → 299 元 👉 【立即报名《流畅的 Python》共读课】 :https://mp.weixin.qq.com/s/ivHJwn1nNx5ug4TFrapvGg
🔗 课程导航 ← 上一讲:装饰器详解 | 下一讲:可调用对象 →
💬 联系我 主营业务 :AI 编程培训、企业内训、技术咨询
🎓 AI 编程实战课程 想系统学习 AI 编程?程序员晚枫的 AI 编程实战课 帮你从零上手!