课程 23:搜索与索引
1. 基础概念与时间复杂度
1.1 什么是搜索与索引?
搜索指在数据结构中查找目标元素;索引是用某种映射或顺序结构为数据建立“目录”,以更快定位数据。
1.2 常见复杂度对比
- 线性查找:O(n)
- 二分查找(有序数据):O(log n)
- 哈希查找(dict/set):平均 O(1)
2. 线性查找
2.1 基本实现
def linear_search(lst, target):
for i, v in enumerate(lst):
if v == target:
return i
return -1
print(linear_search([3, 1, 4], 4)) # 2
2.2 适用场景与注意事项
适用于数据量小或数据无序且不常复用的场景。若频繁查询,考虑建立索引(排序+二分、dict/set)。
3. 二分查找
3.1 基本实现(迭代)
def binary_search(lst, target):
left, right = 0, len(lst) - 1
while left <= right:
mid = (left + right) // 2
if lst[mid] == target:
return mid
elif lst[mid] < target:
left = mid + 1
else:
right = mid - 1
return -1
arr = [1, 3, 4, 7, 9]
print(binary_search(arr, 4)) # 2
二分查找要求数据已排序;注意循环条件与边界更新,避免死循环或漏查。
3.2 查找第一个/最后一个出现位置
def first_occurrence(lst, target):
left, right, ans = 0, len(lst)-1, -1
while left <= right:
mid = (left + right) // 2
if lst[mid] >= target:
if lst[mid] == target:
ans = mid
right = mid - 1
else:
left = mid + 1
return ans
def last_occurrence(lst, target):
left, right, ans = 0, len(lst)-1, -1
while left <= right:
mid = (left + right) // 2
if lst[mid] <= target:
if lst[mid] == target:
ans = mid
left = mid + 1
else:
right = mid - 1
return ans
arr = [1,2,2,2,3]
print(first_occurrence(arr, 2), last_occurrence(arr, 2)) # 1 3
4. Python标准库:bisect
4.1 插入点与存在性判断
from bisect import bisect_left, bisect_right
a = [1, 2, 2, 2, 3]
x = 2
l = bisect_left(a, x) # 第一个不小于x的位置
r = bisect_right(a, x) # 第一个大于x的位置
exists = l < len(a) and a[l] == x
count = r - l
print(l, r, exists, count) # 1 4 True 3
4.2 有序插入与维护
from bisect import insort
arr = [1, 3, 4]
insort(arr, 2)
print(arr) # [1, 2, 3, 4]
bisect适合做区间查询、频次统计、自动补全前缀范围定位(与排序结合)。
5. 使用字典与集合实现O(1)查找
5.1 成员测试与映射
ids = {1001, 1002, 1003}
print(1002 in ids) # True
students = [
{"id": 1001, "name": "Alice"},
{"id": 1002, "name": "Bob"},
]
index_by_id = {s["id"]: s for s in students}
print(index_by_id[1002]["name"]) # Bob
5.2 构建倒排索引(简化版)
# term -> set(doc_id)
from collections import defaultdict
inverted = defaultdict(set)
docs = {
1: "hello world",
2: "hello python",
3: "world of code",
}
for doc_id, text in docs.items():
for term in text.split():
inverted[term].add(doc_id)
print(sorted(inverted["hello"])) # [1, 2]
6. 字符串搜索与子串查找
6.1 基本用法
s = "abracadabra"
print("cad" in s) # True
print(s.find("bra")) # 1(找不到返回-1)
print(s.index("bra")) # 1(找不到抛异常)
print(s.count("a")) # 5
print(s.startswith("abra")) # True
print(s.endswith("bra")) # True
6.2 正则表达式(入门)
import re
m = re.search(r"hello\\s+(world|python)", "say hello world")
if m:
print(m.group(0), m.group(1)) # hello world world