python调用gemini api实现自然语言自动化操作电脑代码
代码语言:python
所属分类:其他
代码描述:python调用gemini api实现自然语言自动化操作电脑代码,采用网格模式,在截图上绘制一个网格(例如 A1, B1, C1... A2, B2...),然后让 AI 返回它想要点击的网格单元ID(例如 "C4")。我们的 Python 代码再将这个简单的 ID 转换为精确的屏幕坐标。 这个方法极大地简化了 AI 的任务,将一个困难的坐标回归问题转换为了一个简单的分类问题,从而显著提高了点击的准确性和可靠性。
代码标签: python 调用 gemini api 自然 语言 自动化 操作 电脑 代码
下面为部分代码预览,完整代码请点击下载或在bfwstudio webide中打开
#!/usr/bin/python3 # -*- coding: utf-8 -* import os import json import time import google.generativeai as genai import pyautogui from PIL import Image, ImageDraw, ImageFont import io # <--- 确保导入了 io # --- 配置区 --- GRID_SIZE =20 # --- 辅助函数 (保持不变) --- def get_grid_cell_id(col, row): col_char = chr(ord('A') + col) return f"{col_char}{row + 1}" def draw_grid_on_image(image: Image.Image) -> Image.Image: draw = ImageDraw.Draw(image) width, height = image.size cell_width = width / GRID_SIZE cell_height = height / GRID_SIZE try: font = ImageFont.truetype("arial.ttf", 12) except IOError: font = ImageFont.load_default() for i in range(1, GRID_SIZE): draw.line([(i * cell_width, 0), (i * cell_width, height)], fill="red", width=1) draw.line([(0, i * cell_height), (width, i * cell_height)], fill="red", width=1) for row in range(GRID_SIZE): for col in range(GRID_SIZE): cell_id = get_grid_cell_id(col, row) text_position = (col * cell_width + 2, row * cell_height + 2) draw.text(text_position, cell_id, fill="red", font=font) return image def grid_cell_to_coords(cell_id: str, screen_width: int, screen_height: int) -> tuple[int, int] | None: if not cell_id or len(cell_id) < 2: return None col_char, row_str = cell_id[0].upper(), cell_id[1:] try: col, row = ord(col_char) - ord('A'), int(row_str) - 1 if not (0 <= col < GRID_SIZE and 0 <= row < GRID_SIZE): return None cell_width, cell_height = screen_width / GRID_SIZE, screen_height / GRID_SIZE return int((col + 0.5) * cell_width), int((row + 0.5) * cell_h.........完整代码请登录后点击上方下载按钮下载查看
网友评论0