Browse Source

上传YBM代码文件夹

feelsocode 1 tháng trước cách đây
mục cha
commit
55a98f42cd
100 tập tin đã thay đổi với 1046 bổ sung0 xóa
  1. BIN
      YBM/__pycache__/config.cpython-314.pyc
  2. BIN
      YBM/__pycache__/logger_config.cpython-314.pyc
  3. 522 0
      YBM/config.py
  4. 524 0
      YBM/count_nums.py
  5. BIN
      YBM/download_images/004b7fc2777544c3aaa486a572fb2296.png
  6. BIN
      YBM/download_images/0367db2deca441358dde875afe0089b2.png
  7. BIN
      YBM/download_images/04051c58ade14cdb972c981e4da2fb4d.png
  8. BIN
      YBM/download_images/0410b43563ce48ab9182eb08ce4a7db7.png
  9. BIN
      YBM/download_images/05fd84c0e87e4575adf49b919a4579b2.png
  10. BIN
      YBM/download_images/060f0fe436574b279997027cb63d105e.png
  11. BIN
      YBM/download_images/06b7f6254cd44115bfe498120dae26c6.png
  12. BIN
      YBM/download_images/070a1b5d328c404ca46ac51ee30e9a8b.png
  13. BIN
      YBM/download_images/07b3c778d61b44eaa8c25f9f9faad33a.png
  14. BIN
      YBM/download_images/09512e703b384ccdb9f823803f4d3c9c.png
  15. BIN
      YBM/download_images/09bcea3535a84184a81548d797baf173.png
  16. BIN
      YBM/download_images/0af26841cb154c31af86202c6cdd645b.png
  17. BIN
      YBM/download_images/0b23655ca0394e5eab9d013acb4a9c47.png
  18. BIN
      YBM/download_images/0c29fb4cf21f465ca3d0764f11b7545c.png
  19. BIN
      YBM/download_images/0c616f2ced6741388b3325d66948bd4d.png
  20. BIN
      YBM/download_images/0c82b67c17bb4005814d2ee4ca4a624a.png
  21. BIN
      YBM/download_images/0d4d96355d31456493de769b348b9d52.png
  22. BIN
      YBM/download_images/0de33dd7240847f89703055373d5be68.png
  23. BIN
      YBM/download_images/0e64205205b649afb8403171bb0860d1.png
  24. BIN
      YBM/download_images/0f560173422542d8a2bc8500f2633145.png
  25. BIN
      YBM/download_images/0f8d6994bf814fb1b6fc077a541b0bca.png
  26. BIN
      YBM/download_images/115042601ac54c53b3368d7e14f6e401.png
  27. BIN
      YBM/download_images/1185fd1b38fc4f9c9000abbe1e938e58.png
  28. BIN
      YBM/download_images/11a2977e623249ffbeb2b527c96424ba.png
  29. BIN
      YBM/download_images/11c400bd70394df4b2196368e2d50f5b.jpg
  30. BIN
      YBM/download_images/11d06a999703453b84535c0147e385d1.jpg
  31. BIN
      YBM/download_images/11f3113ccc5f4198a798c0f7db75a25d.png
  32. BIN
      YBM/download_images/12c5dc36632d4961a442112d2d5107e5.png
  33. BIN
      YBM/download_images/12f11aeba4254db1a24e38986d587ba0.png
  34. BIN
      YBM/download_images/1351a9a21d4242b2bb214f34b991ac26.png
  35. BIN
      YBM/download_images/13e1f84e71d64f1bb3398510a8b624c9.jpg
  36. BIN
      YBM/download_images/142a11835f704b7abbb040193085c0cd.png
  37. BIN
      YBM/download_images/14713067c2e64a80831e49dd64b81432.png
  38. BIN
      YBM/download_images/14a4f1e2a40a4c23b9150ce2a9846961.jpg
  39. BIN
      YBM/download_images/14bee1221f324240a11fab453e807f04.png
  40. BIN
      YBM/download_images/153ac0249d9d46929e3d2189cdd240d6.png
  41. BIN
      YBM/download_images/18e14100679b4f81b01b02cedb326bee.png
  42. BIN
      YBM/download_images/199b95af141a4ec2a927085164ca7a6b.png
  43. BIN
      YBM/download_images/1a1e26d6b6784e16996d32b59f9520c4.png
  44. BIN
      YBM/download_images/1ad8c83c47054fd29bfb8d67031648d9.png
  45. BIN
      YBM/download_images/1b8dec53c68b402090bc341a4cc42cbd.png
  46. BIN
      YBM/download_images/1ea1ff2f99b5498fba25ab62a633d7a9.jpg
  47. BIN
      YBM/download_images/20f89130b40c452b9e3f3195afa49693.png
  48. BIN
      YBM/download_images/2106dff69cbe44e2b80313c09c73729e.png
  49. BIN
      YBM/download_images/213f539506c74cf1ada4d30d75396e49.png
  50. BIN
      YBM/download_images/225f237b7bf14f9aa1493c9f11586b44.png
  51. BIN
      YBM/download_images/22b3f8a7e8a54c45b4d74053b7f44f57.png
  52. BIN
      YBM/download_images/22bf1813ed10453bac52cb927e0672e1.png
  53. BIN
      YBM/download_images/230f76ce6704421281c3996e3d994b64.png
  54. BIN
      YBM/download_images/283ecf93bc0646bc890b7608746833e4.png
  55. BIN
      YBM/download_images/298db84c13674573a49a924147cf69b2.png
  56. BIN
      YBM/download_images/29ae2cc8806d4a3c9755bbb4fdcc0af7.png
  57. BIN
      YBM/download_images/2ba0e5aadbea429f89ec5e044324d5cd.png
  58. BIN
      YBM/download_images/2c6dfc789f7e4a09b7c1a25258deba4d.png
  59. BIN
      YBM/download_images/2cb3e035fb374a1689d40298395dbe3e.png
  60. BIN
      YBM/download_images/2eb67b0f366e42209b0ef6c9c14e5562.png
  61. BIN
      YBM/download_images/2f2f71490d2044f1b60d808c552a340c.png
  62. BIN
      YBM/download_images/2fd97782097748fda3d43fb5b59002a4.png
  63. BIN
      YBM/download_images/2ff4a0b22bea4faf92486c9323044aed.png
  64. BIN
      YBM/download_images/306d0b9af99d4a549c22b95bbc888930.png
  65. BIN
      YBM/download_images/329fef992d564140a8f40933ec0fdb15.png
  66. BIN
      YBM/download_images/32b452c3f7b04f9cbb32266f017c125b.jpg
  67. BIN
      YBM/download_images/331679a15aff46e98c72008fd8ec9fef.jpg
  68. BIN
      YBM/download_images/3380d93039e84767a41728f385f3c79d.png
  69. BIN
      YBM/download_images/33eb3dacd9664eedad3ff8414fd669b0.png
  70. BIN
      YBM/download_images/349437609c274297993d134237e0276f.png
  71. BIN
      YBM/download_images/382467f6f533407aa99f2b3328603b0a.png
  72. BIN
      YBM/download_images/384341307b3f4b8a837168f3f4f01874.png
  73. BIN
      YBM/download_images/38540b983b3140a0b6bd8c186a0d85b1.png
  74. BIN
      YBM/download_images/3c83c55cc195482da362b5d496177c16.png
  75. BIN
      YBM/download_images/3cc2736bf65d49139227b2dfe1611466.png
  76. BIN
      YBM/download_images/3d4feb0ccdd1455f81aca1d3d3a167a8.jpg
  77. BIN
      YBM/download_images/3da67fb6066f44288c3f0e5d8ec7b28b.png
  78. BIN
      YBM/download_images/3e82864c27e9436b84eeeb7953cced2f.png
  79. BIN
      YBM/download_images/3f61b3e6a4ee47cc963bb38d29c35f96.png
  80. BIN
      YBM/download_images/3f892938f08043f1affed7027e2656e2.jpg
  81. BIN
      YBM/download_images/3fe3524015d84fad9cac200db851fcb8.png
  82. BIN
      YBM/download_images/41a39149b1d84a4d8dcfb76cdedee624.png
  83. BIN
      YBM/download_images/426da1cd31324331b61cd1d413322a13.png
  84. BIN
      YBM/download_images/42abad7ba09b449ab86ca998611b4fae.png
  85. BIN
      YBM/download_images/42d86856513c42c4a2dc7306c925fe95.png
  86. BIN
      YBM/download_images/4437ef3193334fa8b0e1b1b7207fbf40.png
  87. BIN
      YBM/download_images/45c8b15dde914bfc955ae3a472d44dc1.jpg
  88. BIN
      YBM/download_images/45ff72f7cba64de885b396fe5ba4bd95.png
  89. BIN
      YBM/download_images/479b36c2a18846e78d7e0cd24f617281.png
  90. BIN
      YBM/download_images/48655f52cbca49f8aa31cb260bf11f82.jpg
  91. BIN
      YBM/download_images/4950cf207ecc47d4bb5aed0805ae5d51.png
  92. BIN
      YBM/download_images/4ac6dcaaaab445469d21df594a552c74.png
  93. BIN
      YBM/download_images/4b53ba65819e4680954d71e9e7aabefd.png
  94. BIN
      YBM/download_images/4b53f9fdb1654f81aec71c49306d6a56.png
  95. BIN
      YBM/download_images/4b82eb5a793742f38b413bd88b12130f.png
  96. BIN
      YBM/download_images/4c8445ad9ef24783a860b42e06b37d51.png
  97. BIN
      YBM/download_images/4dcd29b329a248a7b6bba4bd70c24a04.png
  98. BIN
      YBM/download_images/4e24d6148eaa4ba4a25fc52dc339ac56.png
  99. BIN
      YBM/download_images/4e359e71e2a441858e135e2a9b20c50f.png
  100. BIN
      YBM/download_images/502e57f0ff7547459c9a0679fe53fe12.png

BIN
YBM/__pycache__/config.cpython-314.pyc


BIN
YBM/__pycache__/logger_config.cpython-314.pyc


+ 522 - 0
YBM/config.py

@@ -0,0 +1,522 @@
+# config.py - 药帮忙数据采集配置文件
+from datetime import datetime
+
+import pymysql
+from dotenv import load_dotenv
+import os
+import oss2
+from PIL import Image
+from logger_config import logger
+
+
+# 第一步:加载.env文件(必须放在配置读取前)
+# load_dotenv() 默认读取当前目录的.env文件;若.env在其他路径,可指定:load_dotenv("/path/to/.env")
+# load_dotenv()
+
+# MySQL配置(和你原有MYSQL_CONFIG结构一致)
+MYSQL_CONFIG = {
+    "host": "47.119.164.65",       # 本地MySQL地址
+    "port": 3306,              # 端口
+    "user": "test_c",            # 你的MySQL用户名
+    "password": "Dfwy@2025",    # 你的MySQL密码
+    "database": "test2",   # 已建好的数据库名
+    "charset": "utf8mb4"       # 字符集(避免中文乱码)
+}
+# MYSQL_CONFIG = {
+#     "host": os.getenv("MYSQL_HOST"),  # 读取.env中的MYSQL_HOST
+#     "user": os.getenv("MYSQL_USER"),
+#     "password": os.getenv("MYSQL_PASSWORD"),  # 敏感值从.env读取
+#     "database": os.getenv("MYSQL_DATABASE"),
+#     "port": int(os.getenv("MYSQL_PORT", 3306)),  # 可选配置:设置默认值3306,避免.env缺失时报错
+#     "charset": "utf8mb4"
+# }
+
+#模糊匹配url
+def fuzzy_match_product_url_in_db_mysql(product_url):
+    # 先做非空判断和通配符转义
+    if not product_url:
+        logger.warning("⚠️ 待匹配的 product_url 为空,跳过数据库查询")
+        return None
+
+    # # 转义 product_url 中的 % 和 _,避免被当作 SQL 通配符
+    # escaped_product_url = product_url.replace("%", "\%").replace("_", "\_")
+
+    try:
+        conn = pymysql.connect(**MYSQL_CONFIG)
+        cursor = conn.cursor()
+
+        # 2. 执行 MySQL 模糊查询 SQL(核心逻辑不变,占位符用 %s)
+        # 说明:%product_url% 实现包含式模糊匹配,MySQL 同样支持该通配符
+        sql = "SELECT * FROM ybm_drug_middle WHERE product_link LIKE %s"
+        match_value = f"%{product_url}%"
+        cursor.execute(sql, (match_value,))
+
+        # 3. 获取查询结果并格式化(方便后续使用)
+        result = cursor.fetchone()  # 获取第一条匹配结果(返回元组,如 (id, product_url, price, ...))
+        if result:
+            # 提取数据表字段名,将元组转换为字典(更易读取和使用)
+            column_names = [desc[0] for desc in cursor.description]
+            result_dict = dict(zip(column_names, result))
+            return result_dict  # 匹配成功,返回格式化字典
+        else:
+            return None  # 无匹配结果,返回 None
+
+    except Exception as e:
+        logger.error(f"❌ MySQL 数据库模糊匹配失败:{str(e)}")
+        return None
+    finally:
+        # 4. 关闭数据库连接,避免资源泄露(无论成功与否,都要关闭)
+        if 'conn' in locals() and conn:
+            conn.close()
+
+
+
+
+# ==================== 从数据库提取商品 ====================
+def get_search_keywords_from_db():
+    """从数据库读取keywords字段,生成SEARCH_KEYWORDS列表"""
+    keywords = []
+    conn = None
+    cursor = None
+    try:
+        # 校验MYSQL_CONFIG完整性
+        required_configs = ['host', 'user', 'password', 'database']
+        for cfg in required_configs:
+            if cfg not in MYSQL_CONFIG:
+                raise ValueError(f"MYSQL_CONFIG缺失必要配置:{cfg}")
+
+        # 建立数据库连接
+        conn = pymysql.connect(**MYSQL_CONFIG)
+        cursor = conn.cursor()
+        sql = 'SELECT scrape_name FROM ybm_scape_name_config WHERE status = 1'
+        cursor.execute(sql)
+
+
+        # 提取所有keywords字段值,生成列表
+        results = cursor.fetchall()
+        keywords = [row[0].strip() for row in results if row[0].strip()]
+
+        print(f"成功从数据库读取 {len(keywords)} 个关键词(status=1)")
+    except Exception as e:
+        print(f"读取数据库关键词失败:{str(e)}")
+        # 读取失败时,可返回空列表或备用列表(可选)
+        keywords = []
+    finally:
+        print("读取到的关键词示例:")
+        print(keywords[:5])
+        # 关闭游标和连接(容错处理)
+        if cursor:
+            try:
+                cursor.close()
+            except:
+                pass
+        if conn:
+            try:
+                conn.close()
+            except:
+                pass
+
+    return keywords
+
+
+# ==================== 1. 核心业务配置 ====================
+# 搜索关键词列表
+SEARCH_KEYWORDS = get_search_keywords_from_db()
+# get_search_keywords_from_db()
+# ['999荆防颗粒','999 感冒灵颗粒']
+
+# [
+#     "999复方感冒灵颗粒",
+#     "999糠酸莫米松凝胶",
+#     "999感冒灵颗粒",
+#     "999皮炎平复方醋酸地塞米松乳膏",
+#     "三九胃泰颗粒",
+#     "顺峰康王酮康他索乳膏",
+#     "999强力枇杷露",
+#     "999小柴胡颗粒",
+#     "999板蓝根颗粒",
+#     "999抗病毒口服液",
+#     "温胃舒颗粒",
+#     "养胃舒颗粒",
+#     "999盐酸氨溴索口服溶液",
+#     "999蒲地蓝消炎片",
+#     "999速复康复方氨酚烷胺胶囊",
+#     "999咽炎片",
+#     "999小儿止咳糖浆",
+#     "999小儿感冒颗粒",
+#     "999小儿氨酚黄那敏颗粒",
+#     "999感冒清热颗粒",
+#     "999藿香正气合剂",
+#     "999皮炎平曲安奈德益康唑乳膏",
+#     "999必无忧盐酸特比萘芬凝胶",
+#     "999精装感冒灵颗粒",
+#     "999感冒灵胶囊",
+#     "999荆防颗粒",
+#     "999精氨酸布洛芬颗粒",
+#     "999盐酸特比萘芬喷雾剂",
+#     "999止咳枇杷糖浆",
+#     "999复方金银花颗粒",
+#     "999盐酸特比萘芬乳膏",
+#     "999复方板蓝根颗粒",
+#     "999布洛芬混悬液",
+#     "999布洛芬缓释胶囊",
+#     "999速复康磷酸奥司他韦胶囊",
+#     "999维生素EC颗粒",
+#     "999玉屏风口服液",
+#     "史达功右美沙芬愈创甘油醚糖浆",
+#     "999对乙酰氨基酚口服溶液",
+#     "999小儿感冒宁颗粒",
+#     "999葡萄糖酸锌口服溶液",
+#     "999黄芪精",
+#     "今维多赐多康牌蛋白粉",
+#     "999小儿咳喘灵颗粒",
+#     "999小儿咳喘灵口服液",
+#     "华润神鹿儿泻停颗粒",
+#     "999小儿咽扁颗粒",
+#     "999速复康铝碳酸镁咀嚼片",
+#     "999选平硝酸咪康唑乳膏",
+#     "三九胃泰胶囊",
+#     "999正天胶囊",
+#     "999正天丸",
+#     "壮骨关节胶囊",
+#     "999壮骨关节丸",
+#     "999银菊清咽颗粒",
+#     "999表虚感冒颗粒"
+# ]
+
+
+
+
+
+
+# MySQL表结构(确保和你建好的表一致,仅做校验用)
+# CREATE_TABLE_SQL = """
+# CREATE TABLE IF NOT EXISTS yjj_medicine_data (
+#     id INT AUTO_INCREMENT PRIMARY KEY COMMENT '自增主键',
+#     product_title VARCHAR(500) COMMENT '商品标题',
+#     product_url VARCHAR(1000) COMMENT '商品详情页链接',
+#     purchase_price DECIMAL(10,2) DEFAULT 0.00 COMMENT '采购价格',
+#     discount_price DECIMAL(10,2) DEFAULT 0.00 COMMENT '折扣价格',
+#     spec VARCHAR(200) DEFAULT '未知规格' COMMENT '规格',
+#     box_count INT DEFAULT 1 COMMENT '盒数',
+#     store_name VARCHAR(200) DEFAULT '未知店铺' COMMENT '店铺名称',
+#     company_name VARCHAR(200) DEFAULT '未知公司' COMMENT '公司名称',
+#     validity_date VARCHAR(100) DEFAULT '无有效期' COMMENT '有效日期',
+#     production_date VARCHAR(100) DEFAULT '无生产日期' COMMENT '生产日期',
+#     approval_number VARCHAR(100) DEFAULT '无批准文号' COMMENT '批准文号',
+#     keyword VARCHAR(100) DEFAULT '无搜素关键词' COMMENT '搜素关键词',
+#     collect_time DATETIME COMMENT '采集时间'
+# ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='药九九采集数据';
+# """
+
+
+
+
+# ==================== 2. 反爬配置 ====================
+# 随机延迟范围(模拟真人操作间隔)
+MIN_CLICK_DELAY = 1.5  # 点击间隔最小秒数
+MAX_CLICK_DELAY = 3.5  # 点击间隔最大秒数
+MIN_INPUT_DELAY = 0.1  # 打字每个字符的最小延迟
+MAX_INPUT_DELAY = 0.3  # 打字每个字符的最大延迟
+MIN_PAGE_DELAY = 2.0   # 页面加载后最小等待秒数
+MAX_PAGE_DELAY = 4.0   # 页面加载后最大等待秒数
+
+# 关键词间的反爬长延迟(比单个商品更长)
+MIN_KEYWORD_DELAY = 8.0
+MAX_KEYWORD_DELAY = 15.0
+
+# 滚动配置(固定1400px±50px)
+SCROLL_TARGET_DISTANCE = 400  # 目标滚动距离
+SCROLL_OFFSET_RANGE = 50       # 随机偏移范围
+SCROLL_STEP = 50               # 每次滚动步长(越小越慢,越像真人)
+SCROLL_INTERVAL = 0.05         # 步长间隔(秒)
+
+# ==================== 3. Cookie & 登录配置 ====================
+COOKIE_FILE_PATH = "ybm_cookies.json"  # Cookie保存路径
+# 需要登录后访问的验证页面(用于检测Cookie是否有效)
+LOGIN_VALIDATE_URL = "https://www.ybm100.com/new/"
+
+# 账号密码
+USERNAME = "18008650300"
+PASSWORD = "12345678"
+
+
+# USERNAME = "yjj112031"
+# PASSWORD = "123456"
+
+# 目标登录URL
+TARGET_LOGIN_URL = "https://www.ybm100.com/new/login"
+# "https://www.yyjzt.com/login?redirect=%2FgoodDetail%3FladderNum%26itemStoreId%3D124250306%26sourceProdetail%3D%252Fsearch%26is_store%3D0"
+
+# ==================== 4. 元素选择器配置 ====================
+# 基础选择器
+USERNAME_SELECTOR = "input[placeholder*=请输入账号]"
+PASSWORD_SELECTOR = "input[placeholder*=请输入密码]"
+LOGIN_BTN_SELECTOR = "button:has(span:text('登录'))"
+SEARCH_INPUT_SELECTOR = "input[placeholder*='药品名称/厂家名称']"
+SEARCH_INPUT_SELECTOR2 = "div.home-search-container-search-head"
+SEARCH_BTN_SELECTOR = 'div.home-search-container-search-head-btn[data-scmd="text-搜索"]'
+
+# 采集元素选择器(根据页面实际调整!)
+#这里得改
+PRODUCT_ITEM_SELECTOR = "div.product-list-item"         # 商品项容器
+
+PRODUCT_TITLE_SELECTOR = "div.product-name"        # 商品标题
+PRODUCT_PRICE_SELECTOR = "div.main-price"       # 商品价格
+PRODUCT_STORE_SELECTOR = 'div[data-v-382008f5].shop-name'  #店铺名称
+PRODUCT_COMPANY_SELECTOR = "div.product-manufacturer"            # 公司名称
+PRODUCT_VALIDITY_SELECTOR = "div.product-period"     # 有效期
+
+# ==================== 5. 等待时间配置(毫秒) ====================
+ELEMENT_TIMEOUT = 10000
+LOGIN_AFTER_CLICK = 5000
+SEARCH_BTN_TIMEOUT = 5000
+COLLECT_DELAY = 3000
+DETAIL_LOAD_TIMEOUT = 5000  # 点击商品后等待详情加载的时间
+
+# ==================== 6. 浏览器配置 ====================
+BROWSER_HEADLESS = False
+BROWSER_CHANNEL = "chrome"
+SLOW_MO_MIN = 50
+SLOW_MO_MAX = 100
+
+# ==================== 7. CSV配置 ====================
+CSV_HEADERS = [
+    "商品标题", "商品采购价格", "商品折扣价格", "规格", "盒数",
+    "店铺名称", "公司名称",
+    "有效日期", "生产日期", "批准文号", "采集时间"
+]  # 表头
+# 注:CSV_FILE_PATH 因包含动态时间戳,保留在主文件中定义
+
+#存放营业执照图片路径
+# cropped_screenshot_path =
+
+
+#百度OCR配置
+request_url_config = "https://aip.baidubce.com/rest/2.0/ocr/v1/business_license"
+
+AppKey_config = "tRK2RhyItCSh6BzyT4CNVXQa"
+AppSecret_config = "TDgKiPo94i2mOM1sDqOuDnlcK1bG66jh"
+token_url_config = 'https://aip.baidubce.com/oauth/2.0/token'
+
+
+
+
+
+# ---------------------- OSS 配置项 ----------------------
+OSS_ACCESS_KEY_ID = 'LTAI5tDwjfteBvivYN41r8sJ'
+OSS_ACCESS_KEY_SECRET = 'yowuOGi2nYYnrqGpO3qcz94C4brcPp'
+OSS_ENDPOINT = "oss-cn-shenzhen.aliyuncs.com"
+OSS_BUCKET_NAME = "zhijiayun-jiansuo"
+OSS_PREFIX = "scrape_data/"
+
+
+
+# 本地截图配置
+LOCAL_SCREENSHOT_DIR = "local_screenshots"  # 本地截图保存目录
+LOCAL_SCREENSHOT_NAME = None  # 自动生成文件名,无需手动指定
+LOCAL_CROPPED_DIR = "./local_cropped_screenshots"  # 裁剪后图片保存目录
+
+
+# 图片压缩配置
+IMAGE_COMPRESS_ENABLE = True  # 是否开启图片压缩(True=开启,False=关闭)
+IMAGE_COMPRESS_QUALITY = 30  # jpg/jpeg格式压缩质量(1-95,数值越大画质越好,文件越大,推荐80-90)
+IMAGE_COMPRESS_PNG_LEVEL = 9  # png格式压缩级别(0-9,数值越大压缩率越高,速度越慢,推荐5-7)
+
+
+# ---------------------- 工具函数 ----------------------
+def init_local_screenshot_dir():
+    """
+    初始化本地截图目录(如果不存在则创建)
+    """
+    if not os.path.exists(LOCAL_SCREENSHOT_DIR):
+        os.makedirs(LOCAL_SCREENSHOT_DIR)
+        print(f"本地截图目录【{LOCAL_SCREENSHOT_DIR}】创建成功")
+    else:
+        print(f"本地截图目录【{LOCAL_SCREENSHOT_DIR}】已存在")
+
+
+
+
+def init_oss_bucket():
+    """
+    初始化OSS Bucket对象,用于后续上传操作
+    """
+    try:
+        # 创建认证对象
+        auth = oss2.Auth(OSS_ACCESS_KEY_ID, OSS_ACCESS_KEY_SECRET)
+        bucket = oss2.Bucket(auth, OSS_ENDPOINT, OSS_BUCKET_NAME)
+        # 验证Bucket是否可访问(可选)
+        bucket.get_bucket_info()
+        print("OSS Bucket 初始化成功")
+        return bucket
+    except Exception as e:
+        print(f"OSS Bucket 初始化失败:{str(e)}")
+        raise
+
+
+
+
+def upload_local_screenshot_to_oss(bucket, local_file_path, oss_file_path=None):
+    """
+    将截图内容上传到OSS
+    :param bucket: 初始化好的OSS Bucket对象
+    :param screenshot_content: 截图内容(字节流,或本地文件路径)
+    :param oss_file_path: 上传到OSS后的文件路径(如screenshots/20260130_100000_target_page.jpg)
+    :return: 上传后的OSS文件公网访问链接
+    """
+
+    # 1. 校验本地文件是否存在
+    if not os.path.exists(local_file_path):
+        raise FileNotFoundError(f"本地截图文件不存在:{local_file_path}")
+
+
+    # 2. 生成默认的OSS文件路径(如果用户未指定)
+    if not oss_file_path:
+        # 提取本地文件名作为OSS文件名,保持一致性
+        local_file_name = os.path.basename(local_file_path)
+        oss_file_path = f"screenshots/{local_file_name}"
+
+    try:
+         # 3. 上传本地文件到OSS(核心修改:使用put_object_from_file)
+        bucket.put_object_from_file(oss_file_path, local_file_path)
+
+        # 4. 构造OSS文件的公网访问链接
+        oss_file_url = f"https://{OSS_BUCKET_NAME}.{OSS_ENDPOINT}/{oss_file_path}"
+        print(f"本地截图上传OSS成功,访问链接:{oss_file_url}")
+        return oss_file_url
+
+    except Exception as e:
+        print(f"本地截图上传OSS失败:{str(e)}")
+        raise
+
+# ---------------------- 补全/修改:裁剪函数(新增完整裁剪+删原图逻辑) ----------------------
+def crop_local_screenshot(local_file_path, cropped_file_path=None, crop_region=None):
+    """
+    裁剪本地截图文件(完整实现:裁剪后图片压缩,裁剪+保存裁剪文件+删除原图)
+    :param local_file_path: 原始本地截图文件路径
+    :param cropped_file_path: 裁剪后图片的保存路径(可选)
+    :param crop_region: 裁剪区域(元组,格式:(left, upper, right, lower)),可选
+    :return: 裁剪后图片的本地路径
+    """
+    # 1. 校验原始文件是否存在
+    if not os.path.exists(local_file_path):
+        raise FileNotFoundError(f"原始截图文件不存在:{local_file_path}")
+
+    # 2. 初始化裁剪后文件目录(自动创建)(你的原有逻辑,保持不变)
+    os.makedirs(LOCAL_CROPPED_DIR, exist_ok=True)
+
+
+    # 3. 生成默认裁剪后文件路径(避免重名,带_cropped标识)
+    if not cropped_file_path:
+        file_name = os.path.basename(local_file_path)
+        file_name_no_ext, file_ext = os.path.splitext(file_name)
+        cropped_file_name = f"{file_name_no_ext}_cropped{file_ext}"
+        cropped_file_path = os.path.join(LOCAL_CROPPED_DIR, cropped_file_name)
+
+    with Image.open(local_file_path) as img:
+        img_width, img_height = img.size
+        print(f"获取截图尺寸:宽={img_width},高={img_height}")  # 打印尺寸,方便排查
+
+
+        if not crop_region:
+            left = 0
+            upper = 0
+            right = int(img_width)
+            lower = int(img_height * 0.3)
+            crop_region = (left, upper, right, lower)
+            print(f"未指定裁剪区域,默认裁剪中间30%区域:{crop_region}")
+
+        # 4.2 新增:校验裁剪区域合法性(避免超出图片尺寸)
+        c_left, c_upper, c_right, c_lower = crop_region
+        if c_right > img_width or c_lower > img_height or c_left < 0 or c_upper < 0:
+            raise ValueError(f"裁剪区域超出图片尺寸!图片尺寸:({img_width}, {img_height}),裁剪区域:{crop_region}")
+
+
+        # 4.3 执行裁剪并保存裁剪后的图片
+        cropped_img = img.crop(crop_region)
+
+        # 4.4 压缩并保存裁剪后的图片
+        file_ext = os.path.splitext(cropped_file_path)[1].lower()  # 获取文件后缀(小写,兼容JPG/Jpg等)
+        try:
+            if IMAGE_COMPRESS_ENABLE:
+                # 区分图片格式,应用不同压缩策略
+                if file_ext in ['.jpg', '.jpeg']:
+                    # JPG/JPEG格式:质量压缩(有损压缩,平衡画质和大小)
+                    cropped_img.save(
+                        cropped_file_path,
+                        format='JPEG',  # 强制指定JPEG格式,确保压缩生效
+                        quality=IMAGE_COMPRESS_QUALITY,  # 压缩质量(配置项中定义)
+                        optimize=True,  # 开启优化,提升压缩效果(减小文件体积)
+                        progressive=True  # 生成渐进式JPG,网页加载更友好(可选,不影响压缩效果)
+                    )
+                print(f"JPG图片压缩保存成功,压缩质量:{IMAGE_COMPRESS_QUALITY},保存到:{cropped_file_path}")
+            else:
+                cropped_img.save(cropped_file_path, format='JPEG')
+                print(f"未开启压缩,裁剪图片直接保存到:{cropped_file_path}")
+        except Exception as e:
+            # 压缩失败兜底:直接保存未压缩的JPG图片,不中断后续流程
+            cropped_img.save(cropped_file_path, format='JPEG')
+            print(f"JPG图片压缩失败,已直接保存未压缩版本:{str(e)}")
+
+
+
+    # 5. 裁剪成功后,删除原始截图文件(带异常处理)
+    try:
+        if os.path.exists(cropped_file_path):  # 确保裁剪文件生成成功,再删原图
+            os.remove(local_file_path)
+            print(f"原始截图文件已删除:{local_file_path}")
+        else:
+            print(f"裁剪文件未生成,暂不删除原始截图:{local_file_path}")
+    except OSError as e:
+        print(f"删除原始截图文件失败(文件可能被占用):{str(e)}")
+
+    # 6. 返回裁剪+压缩后的文件路径
+    return cropped_file_path
+
+
+def screenshot_target_page_to_local_then_oss(target_page, local_file_path=None, oss_file_path=None, full_page=True, crop_region=None):
+    """
+    对target_page截图保存到本地→裁剪图片(删原图)→上传裁剪后的图片到OSS(修改后整合版)
+    :param target_page: Playwright的Page对象(已加载目标页面)
+    :param local_file_path: 本地截图文件的完整路径(可选)
+    :param oss_file_path: OSS上的文件路径(可选)
+    :param full_page: 是否截取全屏(True=全屏,False=当前可视区域)
+    :param crop_region: 自定义裁剪区域(元组:(left, upper, right, lower)),可选
+    :return: 裁剪后文件路径 + OSS文件访问链接
+    """
+    # 1. 初始化本地截图目录(不存在则创建,避免保存文件时报错)
+    os.makedirs(LOCAL_SCREENSHOT_DIR, exist_ok=True)
+
+    # 2. 生成默认的本地文件路径(如果用户未指定)
+    if not local_file_path:
+        current_time = datetime.now().strftime("%Y%m%d_%H%M%S")
+        local_file_name = f"{current_time}_target_page.jpg"
+        local_file_path = os.path.join(LOCAL_SCREENSHOT_DIR, local_file_name)
+
+    # 3. 对target_page截图并保存到本地(核心修改:指定path参数)
+    print(f"正在对target_page截图,将保存到:{local_file_path}")
+    target_page.screenshot(
+        path=local_file_path,  # 保存到本地文件的核心参数
+        full_page=full_page,   # 是否全屏截图
+        omit_background=False, # 是否忽略背景
+        timeout=10000          # 截图超时时间
+    )
+    print(f"本地截图保存成功")
+
+
+    # 4. 调用裁剪函数,处理原图(裁剪+删原图)
+    cropped_file_path = crop_local_screenshot(
+        local_file_path=local_file_path,
+        crop_region=crop_region
+    )
+
+
+    # 5. 初始化OSS Bucket
+    bucket = init_oss_bucket()
+
+    # 6. 修改:上传裁剪后的图片,而非原始截图
+    oss_file_url = upload_local_screenshot_to_oss(bucket, cropped_file_path, oss_file_path)
+
+    # 6. 返回本地文件路径和OSS链接,方便后续使用
+    return cropped_file_path, oss_file_url

+ 524 - 0
YBM/count_nums.py

@@ -0,0 +1,524 @@
+from playwright.sync_api import sync_playwright, TimeoutError as PlaywrightTimeoutError
+import os
+import json
+import random
+from logger_config import logger
+from config import *
+import re
+COOKIE_FILE_PATH = "ybm_cookies.json"  # Cookie保存路径
+LOGIN_VALIDATE_URL = "https://www.ybm100.com/new/"
+TARGET_LOGIN_URL = "https://www.ybm100.com/new/login"
+
+
+def load_cookies(context, cookie_path=COOKIE_FILE_PATH):
+    """从本地JSON文件加载Cookie到浏览器上下文"""
+    if not os.path.exists(cookie_path):
+        # logger.warning(f" Cookie文件不存在:{cookie_path}")
+        return False
+    try:
+        with open(cookie_path, "r", encoding="utf-8") as f:
+            cookies = json.load(f)
+        context.add_cookies(cookies)
+        # logger.info(f"✅ 已从{cookie_path}加载Cookie")
+        return True
+    except Exception as e:
+        # logger.error(f" 加载Cookie失败:{e}")
+        return False
+
+
+
+def is_login(page):
+    """验证是否已登录(核心:检测登录态)"""
+    try:
+        # 访问需要登录的页面
+        page.goto(LOGIN_VALIDATE_URL, timeout=5000)
+        page.wait_for_load_state("networkidle")
+
+        # 检测是否跳转到登录页(URL包含login则未登录)
+        if "login" in page.url.lower():
+            # logger.warning(" Cookie失效,需要重新登录")
+            return False
+
+        # 可选:检测登录后的专属元素(比如用户名、个人中心等)
+        # if page.locator("用户中心选择器").count() > 0:
+        #     return True
+        # logger.info(" Cookie有效,已保持登录状态")
+        return True
+    except Exception as e:
+        # logger.error(f" 验证登录状态失败:{e}")
+        return False
+
+
+def popup_guard(page, tag=""):
+    """
+    全局弹窗/遮罩守卫:多步引导 + 关闭按钮 + 遮罩清理 + 恢复滚动
+    tag 仅用于日志区分调用位置
+    """
+    try:
+        # 给弹窗一点出现时间
+        page.wait_for_timeout(300)
+
+        # 1) 连续点“下一步/完成/我知道了/关闭”
+        for _ in range(6):
+            btn = page.locator(
+                "xpath=//button[normalize-space()='下一步' or normalize-space()='完成' or normalize-space()='我知道了' or normalize-space()='关闭']"
+            ).first
+            if btn.count() > 0 and btn.is_visible():
+                btn.click(timeout=1500)
+                page.wait_for_timeout(250)
+                continue
+
+            # 2) 常见的 close icon
+            close_btn = page.locator(
+                "css=.el-dialog__headerbtn, .el-message-box__headerbtn, .close, .icon-close, .el-icon-close"
+            ).first
+            if close_btn.count() > 0 and close_btn.is_visible():
+                close_btn.click(timeout=1200)
+                page.wait_for_timeout(250)
+                continue
+
+            break
+
+        # 3) 清遮罩 + 恢复滚动/交互
+        page.evaluate(r"""
+        () => {
+          // 第一步:精准清理已知的遮罩/弹窗类名(Element UI框架常用)
+          const selectors = [
+            '.v-modal', '.el-overlay', '.el-overlay-dialog', '.el-dialog__wrapper',
+            '.el-message-box__wrapper', '.el-loading-mask'
+          ];
+          selectors.forEach(sel => document.querySelectorAll(sel).forEach(e => e.remove()));
+
+          // 泛化兜底:近似全屏 + 高 z-index 的层直接屏蔽
+          const all = Array.from(document.querySelectorAll('body *'));
+          for (const el of all) {
+            const s = getComputedStyle(el); // 获取元素的实际样式(含CSS生效的样式)
+            const z = parseInt(s.zIndex || '0', 10);    // 取元素的层级(z-index),默认0
+            // 条件1:元素是固定/绝对定位(弹窗/遮罩常见定位方式)+ 层级≥1000(高优先级遮挡)+ 能拦截鼠标事件
+            if ((s.position === 'fixed' || s.position === 'absolute') && z >= 1000 && s.pointerEvents !== 'none') {
+              const r = el.getBoundingClientRect();     // 获取元素的尺寸和位置
+                // 条件2:元素宽度/高度≥屏幕80%(近似全屏遮罩)
+              const nearFull = r.width >= innerWidth * 0.8 && r.height >= innerHeight * 0.8;
+              if (nearFull) {
+                el.style.pointerEvents = 'none';    // 让元素不拦截鼠标点击
+                el.style.display = 'none';          // 隐藏元素
+              }
+            }
+          }
+        // 第三步:恢复页面滚动功能(弹窗常把页面设为不可滚动)
+          document.documentElement.style.overflow = 'auto';     // html标签恢复滚动
+          document.body.style.overflow = 'auto';    // body标签恢复滚动
+          document.body.classList.remove('el-popup-parent--hidden');  // 移除Element UI的滚动禁用类
+        }
+        """)
+
+        # logger.info("杀除弹窗成功")
+    except Exception:
+        pass
+
+SEARCH_INPUT_SELECTOR = "input[placeholder*='药品名称/厂家名称']"
+
+def pick_search_input(page):
+    """优先选可见且可用的搜索输入框;第一个不行就尝试第二个"""
+    inputs = page.locator(SEARCH_INPUT_SELECTOR)
+    cnt = inputs.count()
+
+    # 优先检查前两个(你说只有两个)
+    for i in range(min(cnt, 2)):
+        candidate = inputs.nth(i)
+        try:
+            candidate.wait_for(state="visible", timeout=1500)  # 小超时快速试探
+            if candidate.is_enabled():
+                return candidate
+        except PlaywrightTimeoutError:
+            continue
+
+    # 兜底:直接找任意可见的(避免命中 hidden 模板)
+    candidate = page.locator(f"{SEARCH_INPUT_SELECTOR}:visible").first
+    candidate.wait_for(state="visible", timeout=5000)
+    return candidate
+
+
+def type_slow(locator, text: str, min_delay=0.06, max_delay=0.18):
+    """逐字输入,模拟真人打字"""
+    for ch in text:
+        locator.type(ch, delay=int(random.uniform(min_delay, max_delay) * 1000))
+
+SEARCH_BTN_SELECTOR = 'div.home-search-container-search-head-btn[data-scmd="text-搜索"]'
+
+
+
+def force_close_popup(page):
+    """关闭新手引导/遮罩(多步:下一步/完成/我知道了),并兜底移除遮罩层"""
+    try:
+        # 1) 尝试连续点“下一步/完成/我知道了/关闭”
+        for _ in range(5):  # 最多点5次,足够覆盖多步引导
+            btn = page.locator(
+                "//button[normalize-space()='下一步' or normalize-space()='完成' or normalize-space()='我知道了' or normalize-space()='关闭']"
+            ).first
+
+            if btn.count() > 0 and btn.is_visible():
+                btn.click(timeout=1500)
+                page.wait_for_timeout(300)
+                continue
+
+            # 有些引导是右上角 X(如果存在就点)
+            close_icon = page.locator(
+                "xpath=//*[contains(@class,'close') or contains(@class,'el-icon-close') or name()='svg' or name()='i'][1]"
+            ).first
+            if close_icon.count() > 0 and close_icon.is_visible():
+                close_icon.click(timeout=1000)
+                page.wait_for_timeout(300)
+                continue
+
+            break
+
+        # 2) 兜底:移除常见遮罩层(element-ui / 通用 mask/overlay)
+        page.evaluate("""
+        const selectors = [
+          '.v-modal', '.el-overlay', '.el-overlay-dialog', '.el-dialog__wrapper',
+          '[class*="mask"]', '[class*="overlay"]', '[style*="z-index"]'
+        ];
+        for (const sel of selectors) {
+          document.querySelectorAll(sel).forEach(el => {
+            const s = window.getComputedStyle(el);
+            // 只移除“覆盖层”倾向的元素:fixed/absolute 且 z-index 很高
+            if ((s.position === 'fixed' || s.position === 'absolute') && parseInt(s.zIndex || '0', 10) >= 1000) {
+              el.remove();
+            }
+          });
+        }
+        """)
+    except Exception:
+        pass
+
+
+def kill_masks(page):
+    """
+    强制清理残留遮罩层/覆盖层,并恢复 body 可滚动、可点击状态
+    """
+    page.evaluate(r"""
+    () => {
+      const removed = [];
+      const hidden = [];
+
+      // 1) 先处理已知常见遮罩
+      const knownSelectors = [
+        '.v-modal',
+        '.el-overlay',
+        '.el-overlay-dialog',
+        '.el-dialog__wrapper',
+        '.el-message-box__wrapper',
+        '.el-loading-mask',
+        '.el-popup-parent--hidden'
+      ];
+
+      for (const sel of knownSelectors) {
+        document.querySelectorAll(sel).forEach(el => {
+          // v-modal / overlay 直接 remove 最省事
+          removed.push(sel);
+          el.remove();
+        });
+      }
+
+      // 2) 再做一次“泛化兜底”:全屏 fixed/absolute + 高 z-index 的覆盖层
+      //    注意:不要误删页面正常的固定导航,所以加上“近似全屏”的判断
+      const all = Array.from(document.querySelectorAll('body *'));
+      for (const el of all) {
+        const s = window.getComputedStyle(el);
+        if (!s) continue;
+
+        const z = parseInt(s.zIndex || '0', 10);
+        const pos = s.position;
+        const pe = s.pointerEvents;
+
+        if ((pos === 'fixed' || pos === 'absolute') && z >= 1000 && pe !== 'none') {
+          const r = el.getBoundingClientRect();
+          const nearFullScreen =
+            r.width >= window.innerWidth * 0.8 &&
+            r.height >= window.innerHeight * 0.8 &&
+            r.left <= window.innerWidth * 0.1 &&
+            r.top <= window.innerHeight * 0.1;
+
+          // 常见遮罩是半透明背景色,或者透明但拦截点击
+          const bg = s.backgroundColor || '';
+          const looksLikeMask =
+            nearFullScreen && (bg.includes('rgba') || bg.includes('rgb') || s.opacity !== '1');
+
+          if (nearFullScreen) {
+            // 不管透明不透明,只要近似全屏且高 z-index,就先让它不拦截点击
+            el.style.pointerEvents = 'none';
+            el.style.display = 'none';
+            hidden.push(el.tagName + '.' + (el.className || ''));
+          }
+        }
+      }
+
+      // 3) 恢复 body / html 的滚动与交互(很多弹窗会锁滚动)
+      document.documentElement.style.overflow = 'auto';
+      document.body.style.overflow = 'auto';
+      document.body.style.position = 'static';
+      document.body.style.width = 'auto';
+      document.body.style.paddingRight = '0px';
+
+      // 4) 去掉 Element-UI 常见的锁定 class
+      document.body.classList.remove('el-popup-parent--hidden');
+
+      return { removed, hiddenCount: hidden.length, hidden };
+    }
+    """)
+
+
+# ==================== 搜索操作函数 ====================
+def search_operation(page, keyword, is_first_search: bool = True):
+    """
+    搜索框填充+提交搜索
+    :param page: 页面对象
+    :param keyword: 搜索关键词
+    :param is_first_search: 是否是首次搜索(首次开新页面,后续原页面跳转)
+    :return: (detail_page, 是否成功)
+    """
+    try:
+        # 1) 找到“可用”的搜索框(第一个不行就用第二个)
+        search_locator = page.locator(SEARCH_INPUT_SELECTOR)
+
+        # 清空并填充搜索框
+        search_locator.wait_for(timeout=ELEMENT_TIMEOUT)
+
+        # 2. 清空搜索框(双重保障:先调用locator的clear,再手动全选删除)
+        search_locator.click(force=True)  # 聚焦
+        search_locator.fill("")
+        page.keyboard.down("Control")  # 按住Control键
+        page.keyboard.press("a")       # 按a键
+        page.keyboard.up("Control")    # 松开Control键
+
+        page.keyboard.press("Backspace")  # 删除选中内容
+
+        # 3) 逐字输入
+        type_slow(search_locator, keyword, min_delay=0.06, max_delay=0.18)
+
+        # 3. 输入搜索关键词
+        # search_locator.fill(keyword)
+        logger.info(f"📝 已输入搜索关键词:{keyword}")
+
+        # 3) 搜索按钮也建议点可见的那个
+        btn = page.locator(f"{SEARCH_BTN_SELECTOR}")
+        btn.wait_for(state="visible", timeout=SEARCH_BTN_TIMEOUT)
+        # btn.click()
+        page.wait_for_timeout(3000)
+
+        detail_page = page
+        if is_first_search:
+            #获取新页面对象
+            try:
+                # 先开始监听新页面事件(在点击前)
+                with page.context.expect_page(timeout=60000) as new_page_info:
+                    # 再执行点击操作
+                    btn.click()
+                # 点击后获取新页面
+                detail_page = new_page_info.value
+                detail_page.wait_for_load_state("networkidle", timeout=20000)
+
+                # #点击出现的按钮
+                # test_btn = detail_page.locator("div[data-v-c65c36bc].first-time-highlight-message-btn button")
+                # btn_count = test_btn.count()
+                # logger.info(f"✅ 匹配到的元素数量:{btn_count}")
+                # test_btn.wait_for(state="attached", timeout=5000)
+                # test_btn.click()
+            except PlaywrightTimeoutError:
+                logger.warning(f"   未检测到新标签页")
+                return None, False
+
+            except Exception as e:
+                    logger.warning(f"   等待新标签页异常:{e}")
+                    return None, False
+        else:
+            btn.click()
+            # 等待原页面跳转并加载完成(替代新页面监听)
+            page.wait_for_load_state("networkidle", timeout=20000)
+            # 详情页就是原页面,无需新建
+            detail_page = page
+            logger.info("✅ 后续搜索:已在原页面完成跳转加载")
+
+
+        test_btn = detail_page.locator("div[data-v-c65c36bc].first-time-highlight-message-btn button")
+        btn_count = test_btn.count()
+        logger.info(f"✅ 匹配到的元素数量:{btn_count}")
+
+        if btn_count > 0:
+            test_btn.wait_for(state="attached", timeout=5000)
+            test_btn.click()
+
+        force_close_popup(detail_page)
+        kill_masks(detail_page)
+        logger.info("✅ 已触发搜索")
+
+        return detail_page, True
+
+
+            # 搜索后等待结果加载
+            # page.wait_for_timeout(COLLECT_DELAY)
+            # return True
+
+    except PlaywrightTimeoutError as e:
+        logger.error(f" 搜索失败:元素定位超时 - {str(e)}")
+        return None, False  # 失败时返回 (None, False)
+    except Exception as e:
+        logger.error(f" 搜索异常:{str(e)}")
+        return None, False  # 失败时返回 (None, False)
+
+
+
+def main():
+    with sync_playwright() as p:
+        browser = p.chromium.launch(
+            headless=False,  # 不要用无头模式(反爬:无头模式易被识别)
+            channel="chrome",  # 使用真实Chrome内核
+            slow_mo=random.randint(100, 300),  # 全局操作延迟(模拟真人慢速操作)
+            args=[
+                "--disable-blink-features=AutomationControlled",  # 禁用webdriver特征(核心!)
+                "--enable-automation=false",  # 新增:禁用自动化标识
+                "--disable-infobars",  # 新增:禁用信息栏
+                "--remote-debugging-port=0",  # 新增:随机调试端口
+                "--start-maximized",  # 最大化窗口(模拟真人使用)
+                "--disable-extensions",  # 禁用扩展(避免特征)
+                "--disable-plugins-discovery",  # 禁用插件发现
+                "--no-sandbox",  # 避免沙箱模式特征
+                "--disable-dev-shm-usage",  # 避免内存限制导致的异常
+                f"--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/{random.randint(110, 120)}.0.0.0 Safari/537.36"  # 随机Chrome版本的UA
+            ]
+        )
+        # 创建页面时伪装指纹
+        context = browser.new_context(
+            locale="zh-CN",  # 中文环境
+            timezone_id="Asia/Shanghai",  # 上海时区
+            geolocation={"latitude": 31.230416, "longitude": 121.473701},  # 模拟上海地理位置(可选)
+            permissions=["geolocation"],  # 授予定位权限(模拟真人)
+            user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
+            no_viewport=True,
+            # 关键:隐藏自动化特征
+            java_script_enabled=True,
+            bypass_csp=True,
+            # user_data_dir="./temp_user_data"  # 模拟真实用户数据目录
+        )
+        input("...")
+        page = context.new_page()
+
+
+        # 关键:移除navigator.webdriver标识(反爬核心)
+        page.add_init_script("""
+            Object.defineProperty(navigator, 'webdriver', { get: () => undefined });
+            Object.defineProperty(navigator, 'plugins', { get: () => [1, 2, 3] });  // 新增:模拟插件
+            Object.defineProperty(navigator, 'mimeTypes', { get: () => [1, 2, 3] });  // 新增:模拟MIME类型
+            window.chrome = { runtime: {}, loadTimes: () => ({}) };  // 增强Chrome模拟
+            delete window.navigator.languages;
+            window.navigator.languages = ['zh-CN', 'zh'];
+            // 新增:模拟真实鼠标移动特征
+            (() => {
+                const originalAddEventListener = EventTarget.prototype.addEventListener;
+                EventTarget.prototype.addEventListener = function(type, listener) {
+                    if (type === 'mousemove') {
+                        return originalAddEventListener.call(this, type, (e) => {
+                            e._automation = undefined;
+                            listener(e);
+                        });
+                    }
+                    return originalAddEventListener.call(this, type, listener);
+                };
+            })();
+""")
+
+
+        try:
+            # ========== 核心:Cookie复用逻辑 ==========
+            # 1. 加载本地Cookie
+            load_cookies(context)
+
+            # 2. 验证登录状态
+            if not is_login(page):
+                # 3. Cookie失效/不存在,执行登录
+                page.goto(TARGET_LOGIN_URL)
+                page.wait_for_load_state("networkidle")
+                # logger.info("🔑 开始执行登录流程")
+
+                # 执行登录操作
+                # login_success = login_operation(page, USERNAME, PASSWORD)
+                # if not login_success:
+                #     logger.error(" 登录失败,程序终止")
+                #     return
+
+                # # 4. 登录成功后保存Cookie
+                # save_cookies(context)
+                # logger.info(" 登录并保存Cookie成功!")
+
+            KEYWORDS = get_search_keywords_from_db()
+            # get_search_keywords_from_db()
+            # 执行搜索
+            total_num = 0
+            # current_page = page
+            detail_page = None
+            nums = 0
+            for kw in KEYWORDS:
+                popup_guard(page, "before_search")
+                if nums == 0:
+                    popup_guard(detail_page if detail_page else page, "before_search")  # page是你的初始页面对象,需提前定义
+                    detail_page, search_success = search_operation(page, kw, is_first_search=True)
+                    nums += 1
+                else:
+                    if detail_page is None:
+                        logger.error(f" ❌ 无可用的搜索页面,跳过「{kw}」")
+                        continue
+                    popup_guard(detail_page, "before_search")
+                    detail_page, search_success = search_operation(detail_page, kw, is_first_search=False)
+
+                if not search_success:
+                    print(f"❌ 搜索失败:{kw}")
+                    continue
+
+                if detail_page is None:
+                    break
+
+                popup_guard(detail_page, "after_search")
+
+
+
+                #找不到数据跳过判断和出现杂数据跳过
+                not_found_keywords = detail_page.locator("div.filter-panel-container-empty-text")
+                if not_found_keywords.count() > 0:
+                    logger.warning(f"⚠️ 关键词「{kw}」无匹配商品,直接跳过整个关键词采集")
+                    continue
+
+
+
+                TARGET_SELECTOR = detail_page.locator(
+                    'span.el-pagination__total',  # 匹配class为el-pagination_total和is-first的span
+                )
+                total_count = 0  # ⚠️ 每一轮关键词都重置
+                if TARGET_SELECTOR.count() > 0:
+                    nums = TARGET_SELECTOR.inner_text(timeout=5000).strip()
+                    print(nums)
+                    match = re.search(r'\d+', nums)
+                    if match:
+                        total_count = int(match.group())
+                        print(total_count)
+                else:
+                    itme_boxes = detail_page.locator("div.product-list-item")
+                    total_count = itme_boxes.count()
+
+                    #
+                    print(f"【{kw}】无分页,当前页盒子数:{total_count}")
+
+                total_num += total_count
+                print(f"截止到这个{kw}关键词有{total_num}条数据")
+                page.wait_for_timeout(10000)
+            print(f"✅ 本次采集总数据量:{total_num}")
+
+        except Exception as e:
+            print(f" 程序异常:{str(e)}")
+        finally:
+            browser.close()
+            print(" 浏览器已关闭,程序结束")
+
+# ==================== 程序入口 ====================
+if __name__ == '__main__':
+    main()

BIN
YBM/download_images/004b7fc2777544c3aaa486a572fb2296.png


BIN
YBM/download_images/0367db2deca441358dde875afe0089b2.png


BIN
YBM/download_images/04051c58ade14cdb972c981e4da2fb4d.png


BIN
YBM/download_images/0410b43563ce48ab9182eb08ce4a7db7.png


BIN
YBM/download_images/05fd84c0e87e4575adf49b919a4579b2.png


BIN
YBM/download_images/060f0fe436574b279997027cb63d105e.png


BIN
YBM/download_images/06b7f6254cd44115bfe498120dae26c6.png


BIN
YBM/download_images/070a1b5d328c404ca46ac51ee30e9a8b.png


BIN
YBM/download_images/07b3c778d61b44eaa8c25f9f9faad33a.png


BIN
YBM/download_images/09512e703b384ccdb9f823803f4d3c9c.png


BIN
YBM/download_images/09bcea3535a84184a81548d797baf173.png


BIN
YBM/download_images/0af26841cb154c31af86202c6cdd645b.png


BIN
YBM/download_images/0b23655ca0394e5eab9d013acb4a9c47.png


BIN
YBM/download_images/0c29fb4cf21f465ca3d0764f11b7545c.png


BIN
YBM/download_images/0c616f2ced6741388b3325d66948bd4d.png


BIN
YBM/download_images/0c82b67c17bb4005814d2ee4ca4a624a.png


BIN
YBM/download_images/0d4d96355d31456493de769b348b9d52.png


BIN
YBM/download_images/0de33dd7240847f89703055373d5be68.png


BIN
YBM/download_images/0e64205205b649afb8403171bb0860d1.png


BIN
YBM/download_images/0f560173422542d8a2bc8500f2633145.png


BIN
YBM/download_images/0f8d6994bf814fb1b6fc077a541b0bca.png


BIN
YBM/download_images/115042601ac54c53b3368d7e14f6e401.png


BIN
YBM/download_images/1185fd1b38fc4f9c9000abbe1e938e58.png


BIN
YBM/download_images/11a2977e623249ffbeb2b527c96424ba.png


BIN
YBM/download_images/11c400bd70394df4b2196368e2d50f5b.jpg


BIN
YBM/download_images/11d06a999703453b84535c0147e385d1.jpg


BIN
YBM/download_images/11f3113ccc5f4198a798c0f7db75a25d.png


BIN
YBM/download_images/12c5dc36632d4961a442112d2d5107e5.png


BIN
YBM/download_images/12f11aeba4254db1a24e38986d587ba0.png


BIN
YBM/download_images/1351a9a21d4242b2bb214f34b991ac26.png


BIN
YBM/download_images/13e1f84e71d64f1bb3398510a8b624c9.jpg


BIN
YBM/download_images/142a11835f704b7abbb040193085c0cd.png


BIN
YBM/download_images/14713067c2e64a80831e49dd64b81432.png


BIN
YBM/download_images/14a4f1e2a40a4c23b9150ce2a9846961.jpg


BIN
YBM/download_images/14bee1221f324240a11fab453e807f04.png


BIN
YBM/download_images/153ac0249d9d46929e3d2189cdd240d6.png


BIN
YBM/download_images/18e14100679b4f81b01b02cedb326bee.png


BIN
YBM/download_images/199b95af141a4ec2a927085164ca7a6b.png


BIN
YBM/download_images/1a1e26d6b6784e16996d32b59f9520c4.png


BIN
YBM/download_images/1ad8c83c47054fd29bfb8d67031648d9.png


BIN
YBM/download_images/1b8dec53c68b402090bc341a4cc42cbd.png


BIN
YBM/download_images/1ea1ff2f99b5498fba25ab62a633d7a9.jpg


BIN
YBM/download_images/20f89130b40c452b9e3f3195afa49693.png


BIN
YBM/download_images/2106dff69cbe44e2b80313c09c73729e.png


BIN
YBM/download_images/213f539506c74cf1ada4d30d75396e49.png


BIN
YBM/download_images/225f237b7bf14f9aa1493c9f11586b44.png


BIN
YBM/download_images/22b3f8a7e8a54c45b4d74053b7f44f57.png


BIN
YBM/download_images/22bf1813ed10453bac52cb927e0672e1.png


BIN
YBM/download_images/230f76ce6704421281c3996e3d994b64.png


BIN
YBM/download_images/283ecf93bc0646bc890b7608746833e4.png


BIN
YBM/download_images/298db84c13674573a49a924147cf69b2.png


BIN
YBM/download_images/29ae2cc8806d4a3c9755bbb4fdcc0af7.png


BIN
YBM/download_images/2ba0e5aadbea429f89ec5e044324d5cd.png


BIN
YBM/download_images/2c6dfc789f7e4a09b7c1a25258deba4d.png


BIN
YBM/download_images/2cb3e035fb374a1689d40298395dbe3e.png


BIN
YBM/download_images/2eb67b0f366e42209b0ef6c9c14e5562.png


BIN
YBM/download_images/2f2f71490d2044f1b60d808c552a340c.png


BIN
YBM/download_images/2fd97782097748fda3d43fb5b59002a4.png


BIN
YBM/download_images/2ff4a0b22bea4faf92486c9323044aed.png


BIN
YBM/download_images/306d0b9af99d4a549c22b95bbc888930.png


BIN
YBM/download_images/329fef992d564140a8f40933ec0fdb15.png


BIN
YBM/download_images/32b452c3f7b04f9cbb32266f017c125b.jpg


BIN
YBM/download_images/331679a15aff46e98c72008fd8ec9fef.jpg


BIN
YBM/download_images/3380d93039e84767a41728f385f3c79d.png


BIN
YBM/download_images/33eb3dacd9664eedad3ff8414fd669b0.png


BIN
YBM/download_images/349437609c274297993d134237e0276f.png


BIN
YBM/download_images/382467f6f533407aa99f2b3328603b0a.png


BIN
YBM/download_images/384341307b3f4b8a837168f3f4f01874.png


BIN
YBM/download_images/38540b983b3140a0b6bd8c186a0d85b1.png


BIN
YBM/download_images/3c83c55cc195482da362b5d496177c16.png


BIN
YBM/download_images/3cc2736bf65d49139227b2dfe1611466.png


BIN
YBM/download_images/3d4feb0ccdd1455f81aca1d3d3a167a8.jpg


BIN
YBM/download_images/3da67fb6066f44288c3f0e5d8ec7b28b.png


BIN
YBM/download_images/3e82864c27e9436b84eeeb7953cced2f.png


BIN
YBM/download_images/3f61b3e6a4ee47cc963bb38d29c35f96.png


BIN
YBM/download_images/3f892938f08043f1affed7027e2656e2.jpg


BIN
YBM/download_images/3fe3524015d84fad9cac200db851fcb8.png


BIN
YBM/download_images/41a39149b1d84a4d8dcfb76cdedee624.png


BIN
YBM/download_images/426da1cd31324331b61cd1d413322a13.png


BIN
YBM/download_images/42abad7ba09b449ab86ca998611b4fae.png


BIN
YBM/download_images/42d86856513c42c4a2dc7306c925fe95.png


BIN
YBM/download_images/4437ef3193334fa8b0e1b1b7207fbf40.png


BIN
YBM/download_images/45c8b15dde914bfc955ae3a472d44dc1.jpg


BIN
YBM/download_images/45ff72f7cba64de885b396fe5ba4bd95.png


BIN
YBM/download_images/479b36c2a18846e78d7e0cd24f617281.png


BIN
YBM/download_images/48655f52cbca49f8aa31cb260bf11f82.jpg


BIN
YBM/download_images/4950cf207ecc47d4bb5aed0805ae5d51.png


BIN
YBM/download_images/4ac6dcaaaab445469d21df594a552c74.png


BIN
YBM/download_images/4b53ba65819e4680954d71e9e7aabefd.png


BIN
YBM/download_images/4b53f9fdb1654f81aec71c49306d6a56.png


BIN
YBM/download_images/4b82eb5a793742f38b413bd88b12130f.png


BIN
YBM/download_images/4c8445ad9ef24783a860b42e06b37d51.png


BIN
YBM/download_images/4dcd29b329a248a7b6bba4bd70c24a04.png


BIN
YBM/download_images/4e24d6148eaa4ba4a25fc52dc339ac56.png


BIN
YBM/download_images/4e359e71e2a441858e135e2a9b20c50f.png


BIN
YBM/download_images/502e57f0ff7547459c9a0679fe53fe12.png


Một số tệp đã không được hiển thị bởi vì quá nhiều tập tin thay đổi trong này khác