课程 23：搜索与索引

课程目录

1. 基础概念与时间复杂度
2. 线性查找
3. 二分查找
4. Python标准库：bisect
5. 使用字典与集合实现O(1)查找
6. 字符串搜索与子串查找

1. 基础概念与时间复杂度

1.1 什么是搜索与索引？

搜索指在数据结构中查找目标元素；索引是用某种映射或顺序结构为数据建立“目录”，以更快定位数据。

1.2 常见复杂度对比

线性查找：O(n)
二分查找（有序数据）：O(log n)
哈希查找（dict/set）：平均 O(1)

2. 线性查找

2.1 基本实现

def linear_search(lst, target):
    for i, v in enumerate(lst):
        if v == target:
            return i
    return -1

print(linear_search([3, 1, 4], 4))  # 2

2.2 适用场景与注意事项

适用于数据量小或数据无序且不常复用的场景。若频繁查询，考虑建立索引（排序+二分、dict/set）。

3. 二分查找

3.1 基本实现（迭代）

def binary_search(lst, target):
    left, right = 0, len(lst) - 1
    while left <= right:
        mid = (left + right) // 2
        if lst[mid] == target:
            return mid
        elif lst[mid] < target:
            left = mid + 1
        else:
            right = mid - 1
    return -1

arr = [1, 3, 4, 7, 9]
print(binary_search(arr, 4))  # 2

二分查找要求数据已排序；注意循环条件与边界更新，避免死循环或漏查。

3.2 查找第一个/最后一个出现位置

def first_occurrence(lst, target):
    left, right, ans = 0, len(lst)-1, -1
    while left <= right:
        mid = (left + right) // 2
        if lst[mid] >= target:
            if lst[mid] == target:
                ans = mid
            right = mid - 1
        else:
            left = mid + 1
    return ans

def last_occurrence(lst, target):
    left, right, ans = 0, len(lst)-1, -1
    while left <= right:
        mid = (left + right) // 2
        if lst[mid] <= target:
            if lst[mid] == target:
                ans = mid
            left = mid + 1
        else:
            right = mid - 1
    return ans

arr = [1,2,2,2,3]
print(first_occurrence(arr, 2), last_occurrence(arr, 2))  # 1 3

4. Python标准库：bisect

4.1 插入点与存在性判断

from bisect import bisect_left, bisect_right

a = [1, 2, 2, 2, 3]
x = 2
l = bisect_left(a, x)   # 第一个不小于x的位置
r = bisect_right(a, x)  # 第一个大于x的位置
exists = l < len(a) and a[l] == x
count = r - l
print(l, r, exists, count)  # 1 4 True 3

4.2 有序插入与维护

from bisect import insort
arr = [1, 3, 4]
insort(arr, 2)
print(arr)  # [1, 2, 3, 4]

bisect适合做区间查询、频次统计、自动补全前缀范围定位（与排序结合）。

5. 使用字典与集合实现O(1)查找

5.1 成员测试与映射

ids = {1001, 1002, 1003}
print(1002 in ids)  # True

students = [
    {"id": 1001, "name": "Alice"},
    {"id": 1002, "name": "Bob"},
]
index_by_id = {s["id"]: s for s in students}
print(index_by_id[1002]["name"])  # Bob

5.2 构建倒排索引（简化版）

# term -> set(doc_id)
from collections import defaultdict
inverted = defaultdict(set)
docs = {
    1: "hello world",
    2: "hello python",
    3: "world of code",
}
for doc_id, text in docs.items():
    for term in text.split():
        inverted[term].add(doc_id)
print(sorted(inverted["hello"]))  # [1, 2]

6. 字符串搜索与子串查找

6.1 基本用法

s = "abracadabra"
print("cad" in s)          # True
print(s.find("bra"))       # 1（找不到返回-1）
print(s.index("bra"))      # 1（找不到抛异常）
print(s.count("a"))        # 5
print(s.startswith("abra")) # True
print(s.endswith("bra"))    # True

6.2 正则表达式（入门）

import re
m = re.search(r"hello\\s+(world|python)", "say hello world")
if m:
    print(m.group(0), m.group(1))  # hello world world

拓展阅读：