课程 15：字典与集合

课程目录

1. 字典基础
2. 字典的实际应用
3. 集合基础
4. 集合的实际应用
5. 常见错误与调试
6. 实际应用案例

1. 字典基础

1.1 什么是字典？

定义： 字典（dict）是Python中用于存储键值对的数据结构。每个元素由"键:值"组成，键唯一且不可变，值可以是任意类型。用花括号 {} 表示。

# 定义字典的多种方式
d1 = {'name': 'Alice', 'age': 20}
d2 = dict(name='Bob', age=22)
d3 = dict([('gender', 'male'), ('score', 90)])
empty_dict = {}

常用操作：
• 添加/修改：d['key'] = value
• 删除：del d['key']、pop()
• 判断键：in
• 获取所有键/值/项：keys()、values()、items()

# 访问和操作字典
d = {'name': 'Alice', 'age': 20}
print(d['name'])  # Alice
print(d.get('age'))  # 20
d['gender'] = 'female'
d['age'] = 21
d.pop('name')
print(d)

1.2 字典的遍历与常用方法

# 遍历所有键
for k in d:
    print(k, d[k])
# 遍历所有项
for k, v in d.items():
    print(k, v)
# 获取所有键、值、项
print(list(d.keys()))
print(list(d.values()))
print(list(d.items()))

注意： 键必须唯一且不可变，字典在Python 3.7+中保持插入顺序。

1.3 字典的高级用法

# 字典推导式
squares = {x: x**2 for x in range(1, 6)}
print(squares)  # {1: 1, 2: 4, 3: 9, 4: 16, 5: 25}

# setdefault用法
d = {'name': 'Alice'}
d.setdefault('age', 18)
print(d)

字典的复制与合并：
• 浅拷贝：dict.copy()
• 合并：dict1.update(dict2) 或 {**dict1, **dict2}

# 合并字典
d1 = {'a': 1, 'b': 2}
d2 = {'b': 3, 'c': 4}
merged = {**d1, **d2}
print(merged)  # {'a': 1, 'b': 3, 'c': 4}

2. 字典的实际应用

2.1 统计与映射

# 统计单词出现次数
words = ['apple', 'banana', 'apple', 'orange']
count = {}
for w in words:
    count[w] = count.get(w, 0) + 1
print(count)

2.2 嵌套字典与查找

# 嵌套字典存储学生信息
students = {
    'Tom': {'age': 18, 'score': 90},
    'Jerry': {'age': 19, 'score': 85}
}
print(students['Tom']['score'])  # 90

2.3 字典与JSON互转

import json
d = {'name': 'Alice', 'age': 20}
json_str = json.dumps(d)
print(json_str)
print(json.loads(json_str))

3. 集合基础

3.1 什么是集合？

定义： 集合（set）是Python中一种无序、不重复的可变数据类型，用于存储唯一元素，支持数学中的集合运算（如交集、并集等）。

# 创建集合
set1 = {1, 2, 3, 4}
set2 = set([1, 2, 2, 3, 3, 4])  # 自动去重
set3 = set('hello')
empty_set = set()

常用操作：
• 添加：add(x)、update(iterable)
• 删除：remove(x)、discard(x)、pop()、clear()

# 添加和删除元素
set1.add(5)
set1.update([6, 7])
set1.remove(2)
set1.discard(10)  # 不报错
set1.pop()
set1.clear()

3.2 集合的数学运算

set1 = {1, 2, 3, 4}
set2 = {3, 4, 5, 6}
# 交集
print(set1 & set2)
# 并集
print(set1 | set2)
# 差集
print(set1 - set2)
# 对称差集
print(set1 ^ set2)

子集与超集：
• issubset()、issuperset()、==、!=

a = {1, 2}
b = {1, 2, 3}
print(a.issubset(b))  # True
print(b.issuperset(a))  # True
print(a == {2, 1})  # True

不可变集合： frozenset 创建不可变集合，可作为字典的键。

fs = frozenset({1, 2, 3})
dict_with_frozenset = {fs: 'immutable set'}

4. 集合的实际应用

4.1 去重与快速查找

numbers = [1, 2, 2, 3, 3, 4]
unique_numbers = list(set(numbers))
print(unique_numbers)

# 判断元素是否存在
s = {'apple', 'banana', 'orange'}
print('apple' in s)  # True

4.2 集合运算在数据分析中的应用

students_math = {'Alice', 'Bob', 'Charlie', 'David'}
students_physics = {'Bob', 'Charlie', 'Eve', 'Frank'}
# 同时学习数学和物理的学生
both = students_math & students_physics
print(both)
# 只学习数学的学生
only_math = students_math - students_physics
print(only_math)

5. 常见错误与调试

用可变类型（如list）做字典或集合的键/元素会报错
空集合必须用set()，{}是空字典
集合/字典无序，不能用索引访问

# 错误示例
# invalid_dict = {[1,2]: 'list'}  # TypeError
# set1 = {1, [2,3]}  # TypeError
empty_set = set()
empty_dict = {}
print(type(empty_set))  # 
print(type(empty_dict))  #

6. 实际应用案例

6.1 词频统计与数据清洗

# 统计文本中每个单词出现的次数
def word_frequency(text):
    words = text.lower().split()
    freq = {}
    for word in words:
        word = word.strip('.,!?;:')
        if word:
            freq[word] = freq.get(word, 0) + 1
    return freq

text = "Hello world! Hello Python. Python is great!"
print(word_frequency(text))

6.2 英汉词典功能

# 简单英汉词典
dict_en_cn = {'apple': '苹果', 'banana': '香蕉', 'cat': '猫'}
word = 'apple'
print(f"{word} 的中文是: {dict_en_cn.get(word, '未收录')}")

6.3 集合去重与交集应用

# 两个列表的共同元素和不同元素
list1 = [1, 2, 3, 4]
list2 = [3, 4, 5, 6]
common = set(list1) & set(list2)
diff = set(list1) ^ set(list2)
print(f"共同元素: {common}")
print(f"不同元素: {diff}")

学习建议：字典和集合是Python中非常重要的数据结构，建议多动手实践，理解其特性和常见用法，遇到报错要学会调试和查文档。

思考题：字典和集合的底层实现原理是什么？为什么字典和集合的查找速度很快？在实际开发中，如何选择使用字典还是集合？