大家好,我是正在实战各种AI项目的程序员晚枫。
你知道程序员最头疼什么吗?处理字符串。
用户输入有空格要去掉、邮箱格式要验证、文件后缀要判断、文本要拼接替换……写起来零零碎碎,查文档查到崩溃。
今天我把我自己常用的Python字符串20个方法 全部整理出来,配合实战场景 ,让你一次搞明白,效率提升3倍!
💡 场景预告 :你有一段用户输入的文本,需要:去掉空格、转成小写、验证格式、提取关键词……学会这些方法,一行搞定!
1. 大小写转换(5个方法) 基本转换 1 2 3 4 5 6 7 text = "Hello World Python" print (text.upper()) print (text.lower()) print (text.title()) print (text.capitalize()) print (text.swapcase())
实战:忽略大小写的用户验证 1 2 3 4 5 6 7 8 9 10 11 12 answer = input ("确认继续吗?" ).strip().lower() if answer == "yes" : print ("✅ 继续执行" ) else : print ("❌ 无效输入" )
实战:首字母大写的姓名格式化 1 2 3 4 5 6 7 8 9 10 11 12 13 def format_name (name ): """规范化用户输入的姓名""" return name.strip().title() names = [" alice " , "BOB" , "charlie smith" , "DR. JOHN DOE" ] for name in names: print (f"'{name} ' -> '{format_name(name)} '" )
2. 查找与计数(4个方法) 基础用法 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 text = "Python is easy, Python is powerful, Python is fun!" print (text.find("Python" )) print (text.find("Java" )) print (text.find("Python" , 10 )) print (text.rfind("Python" )) print (text.count("Python" )) print (text.index("Python" ))
实战:日志分析 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 log = """[2024-01-01 10:00:00] INFO: 服务启动 [2024-01-01 10:00:01] ERROR: 连接数据库失败 [2024-01-01 10:00:02] INFO: 重试连接 [2024-01-01 10:00:03] ERROR: 连接数据库失败 [2024-01-01 10:00:04] ERROR: 连接数据库失败 [2024-01-01 10:00:05] INFO: 连接成功""" error_count = log.count("ERROR" ) print (f"错误数量:{error_count} " ) lines = log.split("\n" ) error_lines = [line for line in lines if "ERROR" in line] print ("所有错误:" )for line in error_lines: print (f" {line} " )
3. 替换(2个方法) 基础用法 1 2 3 4 5 6 7 8 9 10 11 12 13 14 text = "Hello World, Hello Python" print (text.replace("Hello" , "Hi" )) print (text.replace("Hello" , "Hi" , 1 )) text = "苹果的价格是10元,香蕉的价格是20元" text = text.replace("苹果" , "橙子" ).replace("10" , "15" ).replace("20" , "25" ) print (text)
实战:敏感词过滤 1 2 3 4 5 6 7 8 9 10 11 12 def filter_sensitive (text, sensitive_words, replacement="*" ): """敏感词过滤""" for word in sensitive_words: text = text.replace(word, replacement * len (word)) return text content = "这个产品质量太差了,简直是垃圾!" sensitive = ["差" , "垃圾" ] result = filter_sensitive(content, sensitive) print (result)
4. 判断方法(10个方法) 判断字符串是否符合某种规则,返回 True 或 False。
基础判断 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 filename = "report.pdf" print (filename.startswith("report" )) print (filename.endswith(".pdf" )) print (filename.endswith((".pdf" , ".docx" , ".txt" ))) url = "https://www.python.org" print (url.startswith("https://" )) print (url.startswith("http://" )) print ("123" .isdigit()) print ("123a" .isdigit()) print ("abc" .isalpha()) print ("abc123" .isalnum()) print ("username_123" .isalnum()) print ("hello" .islower()) print ("HELLO" .isupper()) print ("Hello" .istitle()) print (" " .isspace()) print ("\t\n" .isspace()) print ("" .isspace())
实战:验证用户输入 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 def validate_user_input (username, email, password ): """验证用户注册信息""" errors = [] if not (username.isalnum() and 3 <= len (username) <= 20 ): errors.append("用户名必须是3-20位的字母或数字" ) if "@" not in email or not email.endswith((".com" , ".cn" , ".org" )): errors.append("邮箱格式不正确" ) if len (password) < 8 : errors.append("密码至少8位" ) if not any (c.isupper() for c in password): errors.append("密码需要包含大写字母" ) if not any (c.islower() for c in password): errors.append("密码需要包含小写字母" ) if not any (c.isdigit() for c in password): errors.append("密码需要包含数字" ) return errors errors = validate_user_input("alice123" , "alice@example.com" , "Pass1234" ) if errors: print ("❌ 验证失败:" ) for e in errors: print (f" - {e} " ) else : print ("✅ 验证通过" )
实战:文件类型判断 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 def get_file_type (filename ): """根据文件后缀判断文件类型""" ext = filename.lower() if ext.endswith(('.jpg' , '.jpeg' , '.png' , '.gif' , '.bmp' , '.webp' )): return "图片" elif ext.endswith(('.mp4' , '.avi' , '.mov' , '.mkv' , '.flv' )): return "视频" elif ext.endswith(('.mp3' , '.wav' , '.flac' , '.aac' , '.ogg' )): return "音频" elif ext.endswith(('.pdf' , '.doc' , '.docx' , '.txt' , '.rtf' )): return "文档" elif ext.endswith(('.zip' , '.rar' , '.7z' , '.tar' , '.gz' )): return "压缩包" elif ext.endswith(('.py' , '.js' , '.java' , '.cpp' , '.c' , '.go' , '.rs' )): return "代码" else : return "未知" files = ["photo.jpg" , "movie.mp4" , "song.mp3" , "report.pdf" , "code.py" , ".gitignore" ] for f in files: print (f"{f} : {get_file_type(f)} " )
5. 去除空白(3个方法) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 text = " hello world \n\t" print (f"'{text.strip()} '" ) print (f"'{text.lstrip()} '" ) print (f"'{text.rstrip()} '" ) print ("...hello..." .strip('.' )) print ("!!!hi!!!" .strip('!' )) print ("###code###" .strip('#' )) print ("abcHelloabc" .strip('abc' ))
实战:处理用户输入 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 def clean_input (prompt ): """清理用户输入""" user_input = input (prompt) user_input = user_input.strip() user_input = " " .join(user_input.split()) return user_input name = clean_input("请输入姓名:" ) email = clean_input("请输入邮箱:" ) print (f"\n姓名:'{name} '" )print (f"邮箱:'{email} '" )
6. 对齐与填充(3个方法) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 text = "Python" print (text.center(20 )) print (text.center(20 , '-' )) print (text.center(11 , '*' )) print (text.ljust(20 , '*' )) print (text.ljust(10 )) print (text.rjust(20 , '*' )) print (text.rjust(10 )) print ("42" .zfill(5 )) print ("-42" .zfill(6 )) print ("abc" .zfill(10 ))
实战:格式化输出表格 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 products = [ ("Apple iPhone" , 5999 , 500 ), ("Samsung Galaxy" , 4999 , 300 ), ("Xiaomi Phone" , 2999 , 1000 ), ] print (f"{'商品名称' .ljust(20 )} {'价格' .rjust(10 )} {'销量' .rjust(10 )} " )print ("-" * 42 )for name, price, sales in products: print (f"{name.ljust(20 )} {price:>10 ,} {sales:>10 ,} " )
实战:格式化流水号 1 2 3 4 5 6 7 8 9 10 for i in range (1 , 11 ): order_id = f"ORD{str (i).zfill(6 )} " print (order_id)
7. 分割与合并 分割 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 text = "apple, banana, cherry, date" print (text.split(", " )) print (text.split(", " , 2 )) multiline = """第一行 第二行 第三行 """ print (multiline.splitlines()) messy = "a\t b\n c d" print (messy.split()) print (messy.split(" " )) path = "https://www.example.com/index.html" result = path.partition("://" ) print (result) email = "user@example.com" print (email.rpartition("@" ))
合并 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 words = ['hello' , 'world' , 'python' ] print (" " .join(words)) print ("-" .join(words)) print (", " .join(words)) print ("" .join(words)) fields = ["name" , "age" , "city" ] values = ["'张三'" , "28" , "'重庆'" ] sql = f"INSERT INTO users ({', ' .join(fields)} ) VALUES ({', ' .join(values)} );" print (sql)
实战:解析日志文件 1 2 3 4 5 6 7 8 9 10 11 log = """2024-01-01 10:00:00 INFO 服务启动 2024-01-01 10:00:01 ERROR 数据库连接失败 2024-01-01 10:00:02 WARNING 内存使用率高 2024-01-01 10:00:03 ERROR 重试连接""" for line in log.strip().split("\n" ): parts = line.split(" " , 3 ) if len (parts) == 4 : date, time, level, message = parts print (f"[{level} ] {message} " )
8. f-string 格式化(最常用) f-string 是 Python 3.6+ 引入的字符串格式化方式,比 % 和 .format() 都简洁。
基础用法 1 2 3 4 5 6 7 8 9 10 11 12 13 14 name = "程序员晚枫" age = 28 city = "重庆" print (f"我叫{name} ,今年{age} 岁,来自{city} " )print (f"明年就{age + 1 } 岁了" )print (f"名字长度:{len (name)} " )print (f"大写名字:{name.upper()} " )
数字格式化 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 import mathpi = math.pi print (f"π ≈ {pi:.2 f} " ) print (f"π ≈ {pi:.4 f} " ) ratio = 0.2589 print (f"占比:{ratio:.1 %} " ) big_num = 1234567890 print (f"大数字:{big_num:,} " ) print (f"大数字:{big_num:_} " ) print (f"科学:{big_num:.2 e} " )
对齐格式化 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 name = "晚枫" score = 95.5 print (f"{name:<10 } 得分:{score:.1 f} " ) print (f"{name:>10 } 得分:{score:.1 f} " ) print (f"{name:^10 } 得分:{score:.1 f} " ) print (f"{name:*^10 } 得分:{score:.1 f} " )
日期格式化 1 2 3 4 5 6 7 from datetime import datetimenow = datetime(2024 , 1 , 15 , 14 , 30 , 45 ) print (f"日期:{now:%Y年%m月%d日} " ) print (f"时间:{now:%H:%M:%S} " ) print (f"完整:{now:%Y-%m-%d %H:%M} " )
f-string 进阶:嵌套与条件 1 2 3 4 5 6 7 8 9 10 11 12 for width in [5 , 10 , 15 ]: print (f"{'Hi' :^{width} }" ) value = 75 print (f"等级:{'及格' if value >= 60 else '不及格' } " )print (f"他说:'Hello'" ) print (f'他说:"Hello"' ) print (f"路径:C:\\Users\\Name" )
9. 正则表达式增强(re模块) 对于复杂字符串处理,f-string 不够用,需要正则表达式:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 import retext = "我的邮箱是 alice@example.com,电话是 138-1234-5678,备用邮箱 bob@test.cn" emails = re.findall(r'\w+@\w+\.\w+' , text) print (f"邮箱:{emails} " ) phones = re.findall(r'\d{3}-\d{4}-\d{4}' , text) print (f"电话:{phones} " ) numbers = re.findall(r'\d+' , text) print (f"数字:{numbers} " ) masked = re.sub(r'\w+@\w+\.\w+' , '[邮箱已隐藏]' , text) print (masked) def validate_phone (phone ): pattern = r'^1[3-9]\d{9}$' return bool (re.match (pattern, phone)) tests = ["13812345678" , "12345678901" , "abc12345678" , "1381234567" ] for t in tests: print (f"{t} : {'✅' if validate_phone(t) else '❌' } " )
10. bytes 字符串处理 在处理文件、网络数据时,需要用 bytes 类型:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 text = "你好,世界!" bytes_data = text.encode("utf-8" ) print (bytes_data) back = bytes_data.decode("utf-8" ) print (back) b = b"Hello World" print (b.upper()) print (b.replace(b"World" , b"Python" )) print (b.split(b" " )) print (b.find(b"World" ))
避坑指南:字符串最容易踩的6个坑 坑1:字符串是不可变的! 1 2 3 4 5 6 7 8 9 10 11 12 13 14 s = "hello" s = s.replace("h" , "H" ) print (s) s_list = list ("hello" ) s_list[0 ] = "H" s = "" .join(s_list) print (s)
坑2:中文编码问题 1 2 3 4 5 6 7 8 9 10 11 12 13 text = "你好" bytes1 = text.encode("utf-8" ) bytes2 = text.encode("gbk" ) print (bytes1.decode("utf-8" )) text = "你好" encoded = text.encode("utf-8" ) decoded = encoded.decode("utf-8" ) print (decoded == text)
坑3:strip() 会去掉所有空白字符 1 2 3 4 5 6 text = "\n\t hello \r\n" print (f"'{text.strip()} '" ) print (f"'{text.strip(' ' )} '" )
坑4:split() 和 splitlines() 的区别 1 2 3 4 5 6 7 text = "a\n\nb\n" print (text.split("\n" )) print (text.splitlines()) print (text.splitlines(True ))
坑5:+ 拼接 vs join() 1 2 3 4 5 6 7 8 parts = ["hello" ] * 1000 result = "" for p in parts: result += p result = "" .join(parts)
坑6:f-string 中要计算要用 !r 1 2 3 4 5 6 7 8 9 10 data = {"name" : "test" } items = list (data.items()) print (f"数据:{items} " ) print (f"输出花括号:{{和}}" )
性能对比:字符串拼接方法 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 import timeparts = ["item" + str (i) for i in range (10000 )] start = time.time() result = "" for p in parts: result += p plus_time = time.time() - start start = time.time() result = "" .join(parts) join_time = time.time() - start print (f"+ 拼接耗时:{plus_time:.4 f} 秒" )print (f"join耗时:{join_time:.4 f} 秒" )print (f"join快 {(plus_time/join_time):.0 f} 倍" )
常见面试题 Q1:如何统计字符串中每个字符出现的次数?
1 2 3 4 5 6 7 8 9 from collections import Countertext = "hello world" counts = Counter(text) print (counts) counts = Counter(text.replace(" " , "" )) print (counts.most_common(3 ))
Q2:反转字符串有几种方式?
1 2 3 4 5 6 7 8 9 10 11 12 13 s = "hello" print (s[::-1 ]) print ("" .join(reversed (s))) result = "" for c in s: result = c + result print (result)
Q3:如何判断字符串是否是回文?
1 2 3 4 5 6 7 8 def is_palindrome (s ): s = "" .join(c.lower() for c in s if c.isalnum()) return s == s[::-1 ] tests = ["A man, a plan, a canal: Panama" , "race a car" , "hello" , "Madam, I'm Adam" ] for t in tests: print (f"'{t} ': {'是' if is_palindrome(t) else '不是' } 回文" )
Q4:Python 字符串有哪些不可变特性?
A:字符串一旦创建就不能修改,所有"修改"操作实际上都创建了新的字符串。这带来了以下特点:
线程安全 可以用作字典的键和集合的元素 但拼接操作较多时性能差,应该用 join() 推荐:AI Python零基础实战营 想系统学习Python,把字符串处理和文本分析全部拿下?
课程内容:
✅ Python基础语法 ✅ 字符串全面详解 ✅ 正则表达式实战 ✅ 文件读写操作 ✅ 文本处理项目 🎁 限时福利 :送《Python编程从入门到实践》实体书
👉 点击了解详情
本讲小结 分类 常用方法 大小写 .upper(), .lower(), .title(), .capitalize(), .swapcase()查找 .find(), .rfind(), .index(), .count()替换 .replace()判断 .startswith(), .endswith(), .isdigit(), .isalpha(), .isalnum(), .isspace(), .islower(), .isupper(), .istitle()去除 .strip(), .lstrip(), .rstrip()对齐 .center(), .ljust(), .rjust(), .zfill()分割 .split(), .splitlines(), .partition(), .rpartition()合并 .join()格式化 f"...", .format()
💡 记住 :字符串是 Python 最常用的数据类型,方法很多但都很简单。多练几次,自然就记住了!
下节预告 字符串学完了,下一篇来学文件操作 ——读写文件是编程的基本功。
👉 继续阅读:Python文件操作
课程导航 上一篇: Python装饰器
下一步: Python文件操作-读写文件的10种姿势
PS:字符串是Python最常用的数据类型之一。熟练使用这些方法,代码效率翻倍!
🎓 AI 编程实战课程 想系统学习 AI 编程?程序员晚枫的 AI 编程实战课 帮你从零上手!