Python备忘

主要用于记录一些高级使用技巧

发表于 2024/10/12 更新于 2025/03/13

『这是头图的描述』

作者 Diexie

21 分钟阅读

Python备忘

参考:

包管理

使用pip

使用pip+国内源安装python包

  
pip install <some-package> -i https://pypi.tuna.tsinghua.edu.cn/simple/ # 使用清华源
pip install <some-package> -i https://pypi.tuna.tsinghua.edu.cn/simple/ --trusted-host pypi.tuna.tsinghua.edu.cn # 使用清华源，同时忽略SSL证书

常用的国内源：

清华：https://pypi.tuna.tsinghua.edu.cn/simple/
阿里云：http://mirrors.aliyun.com/pypi/simple/
中国科技大学：https://pypi.mirrors.ustc.edu.cn/simple/
华中科技大学：http://pypi.hustunique.com/simple/
上海交通大学：https://mirror.sjtu.edu.cn/pypi/web/simple/
豆瓣：http://pypi.douban.com/simple/ # 可能已经无了

使用conda

待补充

使用venv

venv是Python自带的虚拟环境管理工具，可以用于创建虚拟环境，避免包冲突。

  
python -m venv <env-name> # 创建一个虚拟环境
source <env-name>/bin/activate # 激活虚拟环境
deactivate # 退出虚拟环境

根据上述命令，我们会在当前目录下创建一个名为<env-name>的虚拟环境，然后激活虚拟环境，此时我们的Python环境就切换到了虚拟环境中，可以安装包，运行程序等。

十分适合用于项目开发，可以避免包冲突，同时也可以方便地分享环境配置。

异步IO

1. 协程(Couroutine)

参考:

协程

视频推荐

官方文档

协程是一种用户态的轻量级线程，执行过程中，在子程序内部可中断，然后转而执行别的子程序，在适当的时候再返回来接着执行。协程适用于需要频繁等待的任务(如I/O,网络通信)

协程的调度完全由用户控制。协程拥有自己的寄存器上下文和栈，协程调度切换时，将寄存器上下文和栈保存到其他地方，在切回来的时候，恢复先前保存的寄存器上下文和栈。因此协程能保留上一次调用时的状态，每次过程重入时，就相当于进入上一次调用的状态。

Event Loop是一个消息循环，用于执行协程，同时维护一个任务队列，每次执行一个任务，然后切换到下一个任务。Event Loop在执行一个任务时，如果遇到await，则会挂起当前任务，执行下一个任务，直到下一个任务执行完毕，再切换回来继续执行当前任务。Event Loop同一时刻只能执行一个任务，但是可以通过asyncio.create_task()创建多个任务，加入到Event Loop队列中。

async：用于声明一个协程函数

await：用于将一个协程转换为一个任务(task)，并交付给Event Loop执行，同时挂起当前协程。如果协程有返回值，await会返回协程的返回值。

asyncio.run()：用于在异步状态下运行一个协程。

asyncio.crreate_task()：用于从一个协程创建一个任务(task)，并加入Event Loop中，相当于完成了await的部分操作(没有中断当前协程)。

asyncio.gather()：用于将多个任务合并为一个任务，同时等待所有任务执行完毕。

asyncio.sleep()：挂起当前任务，等待一段时间后再继续执行。用于模拟I/O等操作。

asyncio.wait()：用于等待多个任务执行完毕。

asyncio.get_event_loop()：获取一个事件循环。

关键概念：

事件循环(Event Loop)：用于执行协程，同时维护一个任务队列。
Couroutine(协程)：一种用户态的轻量级线程，执行过程中，在子程序内部可中断，然后转而执行别的子程序，在适当的时候再返回来接着执行。
Task(任务)：被用来“并行的”调度协程。
Future：表示一个异步操作的最终结果，用于获取协程的返回值，await会返回Future对象。

  
import asyncio

async def hello(): # 定义一个协程
    print("Hello, world!")
    r = await asyncio.sleep(1) # `await`用于挂起协程
    print("Hello, again!")

if __name__ == "__main__":
    loop = asyncio.get_event_loop() # 获取一个事件循环
    loop.run_until_complete(hello())
    loop.close()

一个更详细的例子:

  
import asyncio

async def say_after(delay, what): # 定义一个协程函数
    await asyncio.sleep(delay)
    print(what)
    return what

async def main(): # 定义一个协程
    task1 = asyncio.create_task(say_after(1, "hello")) # 1：将协程函数say_after转换为一个任务task1, 并加入到Event Loop中, 但由于Event Loop中在执行main()，所以task1不会立即执行
    task2 = asyncio.create_task(say_after(2, "world")) # 2：将协程函数say_after转换为一个任务task2, 并加入到Event Loop中, 但由于Event Loop中在执行main()，所以task2不会立即执行

    # 3：此时Event Loop中的情况：main()运行中 -> task1 -> task2
    print(f"started at {time.strftime('%X')}")
    task1_return = await task1 # await用于挂起main(), 并执行task1, 此时Event Loop中的情况：task1运行中 -> main() -> task2
    task2_return = await task2 # await用于挂起main(), 并执行task2, 此时Event Loop中的情况：task2运行中 -> main()
    print(f"finished at {time.strftime('%X')}")
    # 4：全部执行完毕，Event Loop中的情况：main()运行中 -> task1 -> task2

if __name__ == "__main__":
    asyncio.run(main()) # 0：程序入口，运行一个协程main()， 将main()加入到Event Loop中

实际实例:

  
import asyncio

async def wget(host):
    print(f"wget {host}...")
    # 连接80端口:
    reader, writer = await asyncio.open_connection(host, 80) # 密集型I/O操作，挂起当前协程 相当于await asyncio.sleep(1)
    # 发送HTTP请求:
    header = f"GET / HTTP/1.0\r\nHost: {host}\r\n\r\n"
    writer.write(header.encode("utf-8"))
    await writer.drain()

    # 读取HTTP响应:
    while True:
        line = await reader.readline()
        if line == b"\r\n":
            break
        print("%s header > %s" % (host, line.decode("utf-8").rstrip()))
    # Ignore the body, close the socket
    writer.close()
    await writer.wait_closed()
    print(f"Done {host}.")

async def main():
    await asyncio.gather(wget("www.sina.com.cn"), wget("www.sohu.com"), wget("www.163.com"))

asyncio.run(main())

高级特性

关于修饰器

  
# 基本修饰器 @wrapper
def wrapper(func):
    def inner(*args, **kwargs):
        print("Before call")
        result = func(*args, **kwargs)
        print("After call")
        return result
    return inner

@wrapper
def foo(*args, **kwargs):
    ...
# 等价于
wrapper(foo)(*args, **kwargs)

# 带参数的修饰器 @wrapper_with_args(arg1, arg2)
def wrapper_with_args(arg1, arg2):
    def decorator(func):
        def inner(*args, **kwargs):
            print(f"Before call with {arg1} and {arg2}")
            result = func(*args, **kwargs)
            print("After call")
            return result
        return inner
    return decorator

@wrapper_with_args("arg1", "arg2")
def foo(*args, **kwargs):
    ...
# 等价于
wrapper_with_args("arg1", "arg2")(foo)(*args, **kwargs)

# 类修饰器
class Wrapper:
    def __init__(self, func):
        self.func = func

    def __call__(self, *args, **kwargs): # 重写__call__方法
        print("Before call")
        result = self.func(*args, **kwargs)
        print("After call")
        return result

关于修饰器typing.overload

注意: 重载与多态的区别

由于python是动态类型语言，不支持函数重载，但是可以通过typing.overload实现函数重载。

  
from typing import overload

class A:
    @overload
    def foo(self, x: int) -> int:
        pass

    @overload
    def foo(self, x: str) -> str:
        pass

    def foo(self, x):
        if isinstance(x, int):
            return x + 1
        elif isinstance(x, str):
            return x + "!"
        else:
            raise TypeError("x must be int or str")

关于修饰器classmethod和staticmethod

区别：

@classmethod修饰的方法，第一个参数是类对象，通常命名为cls，可以通过类对象调用，也可以通过实例对象调用。该方法可以访问类属性，不能访问实例属性。

  
    class A:
        @classmethod
        def foo(cls):
            print(cls)
    A.foo() 
    # output: >>> <class '__main__.A'>
    # -------------------------
    # 常用于定义工厂方法:
    class B:
        def __init__(self, value):
            self.value = value
        @classmethod
        def create(cls, value):
            return cls(value)
    b = B.create(10)
    # -------------------------

@staticmethod修饰的方法，不需要传入类对象或实例对象，通常命名为self，可以通过类对象调用，也可以通过实例对象调用。该方法不能访问类属性，也不能访问实例属性。

  
    class A:
        @staticmethod
        def foo():
            print("Hello, world!")
    A.foo()
    # output: >>> Hello, world!

内置全局变量

vars()：以字典方式返回内置全局变量

  
#！/usr/bin/env python3

print(vars()) # 返回当前模块的属性和属性值

>>> {
    '__name__': '__main__',  # 模块名
    '__doc__': None, # 模块的文档字符串
    '__package__': None, # 包名
    '__loader__': <_frozen_importlib_external.SourceFileLoader object at 0x7f8b1b3b3b50>, # 模块加载器
    '__spec__': None, # 模块的规范
    '__annotations__': {}, # 类型注解
    '__builtins__': <module 'builtins' (built-in)>, # 内置函数
    '__file__': '/path/to/module.py', # 模块路径
    '__cached__': None # 导入模块的缓存路径
    }

__name__: 当模块被直接运行时，__name__的值为__main__，当模块被导入时，__name__的值为模块名。
__file__: 当模块被导入时，__file__的值为模块的路径，当模块被直接运行时，__file__的值为__main__。
1 2 import os print(os.path.abspath(__file__)) # /path/to/module.py

__doc__: 获取文件的文档字符串。

  
 def foo():
     """This is a function""" # __doc__的值为该字符串
     pass
    
 print(foo.__doc__) # This is a function

__all__: 控制模块导入时的行为，当from module import *时，只导入__all__中的变量。
__package__: 获取导入文件的路径，多层目录以点分割，注意：对当前文件返回None
__path__: 模块的路径。
__cached__: 缓存路径。

反射

反射是指程序可以访问、检测和修改它本身状态或行为的一种能力。

getattr(object, name[, default]): 获取对象的属性，如果属性不存在，返回默认值。
setattr(object, name, value): 设置对象的属性。
hasattr(object, name): 判断对象是否有指定的属性。
delattr(object, name): 删除对象的属性。
vars(object): 以字典方式返回对象的属性和属性值。
dir(object): 返回对象的属性列表。
type(object): 返回对象的类型。
isinstance(object, class): 判断对象是否是指定类的实例。
issubclass(class, classinfo): 判断类是否是另一个类的子类。
object.__dict__: 返回对象的属性字典。

包

用于整理一些高效/有用的包

importlib

内置模块

importlib模块提供了一些有用的函数，例如importlib.import_module可以动态导入模块。

  
import importlib

# 动态导入模块中的函数
module = importlib.import_module("os") # 等价于import os

print(module.__dict__) # 返回模块的属性和属性值

functools

内置模块

functools模块提供了一些有用的函数，例如functools.partial可以固定函数的部分参数，返回一个新的函数。

  
import functools

def add(a, b):
    return a + b

add_1 = functools.partial(add, 1)
print(add_1(2)) # 3

weakref

内置模块

关于深拷贝和浅拷贝：深拷贝和浅拷贝的区别在于，浅拷贝只拷贝对象的引用，而深拷贝会拷贝对象的所有内容。在Python赋值中，不明确区分深拷贝和浅拷贝，一般对静态数据类型(如int, str)进行赋值时，会进行深拷贝，对于动态数据类型(如list, dict)进行赋值时，会进行浅拷贝。

这个模块是我在翻torch源码时发现的，torch中的torch.utils.hooks模块中创建hook的handle对象时使用

weakref模块可以创建弱引用，弱引用不会增加对象的引用计数，当对象的引用计数为0时，对象会被销毁。

该模块可以用于解决循环引用问题，例如一个对象A引用了对象B，对象B又引用了对象A，这样两个对象的引用计数永远不会为0，导致内存泄漏。

  
import weakref

class A:
    def __init__(self, value):
        self.value = value

    def __repr__(self):
        return str(self.value)

a = A(10)
d = weakref.WeakValueDictionary() # 创建一个弱引用字典
d["a"] = a # 将对象a加入字典
print(d["a"]) # 10
del a # 删除对象a
print(d["a"]) # None

dataclasses

内置模块

极为方便的数据类，用于创建不可变的数据类，可以自动实现__init__、__repr__、__eq__等方法。

  
from dataclasses import dataclass

@dataclass
class Point:
    x: int
    y: int
    z: int = 0

p = Point(1, 2)
print(p) # Point(x=1, y=2，z=0)

timm

timm是一个用于图像分类的模型库，提供了许多经典的模型，例如ResNet、EfficientNet等。

  
from timm.layers import DropPath, to_2tuple # 导入DropPath和to_2tuple

eniops

极为强大的爱因斯坦求和符号库，可以用于张量运算。

  
from einops import rearrange, reduce, repeat

omegaconf

一个强大的配置库，可以用于管理配置文件。不需要自己写配置文件解析器了~~~

  
from omegaconf import OmegaConf

contextlib

内置模块，用于创建上下文管理器。优雅地处理资源的分配和释放。

  
from contextlib import contextmanager, asynccontextmanager

@contextmanager # 创建上下文管理器
def open_file(file): # 定义一个文件生成器
    print(f"Open file {file}")
    yield file
    print(f"Close file {file}")

with open_file("test.txt") as f: # 使用上下文管理器
    print(f.read())

pydantic

一个强大的数据验证库，可以用于验证数据的合法性。

官方文档

  
from pydantic import BaseModel

class User(BaseModel):
    name: str
    age: int

user = User(name="Alice", age=20)
print(user.dict()) # {'name': 'Alice', 'age': 20}

hashlib

内置模块，用于加密算法。

  
import hashlib

# sha1加密

sha1 = hashlib.sha1()
sha1.update("Hello, world!".encode("utf-8"))
print(sha1.hexdigest()) # 2ef7bde608ce5404e97d5f042f95f89f1c232871

# md5加密

md5 = hashlib.md5()
md5.update("Hello, world!".encode("utf-8"))
print(md5.hexdigest()) # 3e25960a79dbc69b674cd4ec67a72c62

内置函数

一些不那么常用，但是很有用的内置函数

filter

内置函数，用于过滤序列。

  
li = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

odd_li = filter(lambda x: x % 2 == 1, li) # 第一个参数是一个函数，返回True或False，用于判断是否被filer; 第二个参数是一个将被filter的序列
print(list(odd_li)) # [1, 3, 5, 7, 9]

# 常见的，可以用于filter torch.parameters()↓
from torch import nn, optim

model = nn.Sequential(nn.Linear(10, 10), nn.ReLU(), nn.Linear(10, 1))
for param in model[0].parameters(): # 固定第一层的参数
    param.requires_grad = False 

optimizer = optim.Adam(filter(lambda p: p.requires_grad, model.parameters()), lr=1e-3) # 只优化requires_grad=True的参数

杂项

str.encode()和bytes.decode()是互逆操作，str.encode()将字符串编码为字节，bytes.decode()将字节解码为字符串。

  
s = "Hello, world!"
b = s.encode("utf-8")
print(b) # b'Hello, world!'
print(b.decode("utf-8")) # Hello, world!

python的复杂列表推导式:

  
li = [i for i in range(5) for j in range(10) for k in range(20)]

# 等价于

li = []
for i in range(5):
    for j in range(10):
        for k in range(20):
            li.append(i)

关于运算符@: Python3.5新增的矩阵乘法运算符, torch.tensive @ torch.tensive 等价于torch.matmul(torch.tensive, torch.tensive)

关于torch的多维张量的乘法: 前几个维度可以当作batch维度，最后两个维度可以当作矩阵维度，这样可以实现batch矩阵乘法

  
import torch

a = torch.randn(3, 4, 5)
b = torch.randn(3, 5, 4)
c = a @ b
print(c.size()) # torch.Size([3, 4, 4])

#-------------------------

a = torch.randn(3, 4, 5, 6)
b = torch.randn(3, 5, 6, 4)
c = a @ b
print(c.size()) # torch.Size([3, 4, 5, 4])

nn.Linear(in_features, out_features)可以处理多维的tensor，只要最后一个维度是in_features，前面的维度可以当作batch维度

  
import torch

a = torch.randn(3, 4, 5)
linear = torch.nn.Linear(5, 6)
b = linear(a)
print(b.size()) # torch.Size([3, 4, 6])

Python

语言 Python

本文由作者按照 CC BY 4.0 进行授权

包管理

使用pip

使用conda

使用venv

异步IO

1. 协程(Couroutine)

高级特性

关于修饰器

关于修饰器typing.overload

关于修饰器classmethod和staticmethod

内置全局变量

反射

包

importlib

functools

weakref

dataclasses

timm

eniops

omegaconf

contextlib

pydantic

hashlib

内置函数

filter

杂项

热门标签