python调用gemini api实现自然语言自动化操作电脑代码

代码语言:python

所属分类:其他

代码描述:python调用gemini api实现自然语言自动化操作电脑代码,采用网格模式,在截图上绘制一个网格(例如 A1, B1, C1... A2, B2...),然后让 AI 返回它想要点击的网格单元ID(例如 "C4")。我们的 Python 代码再将这个简单的 ID 转换为精确的屏幕坐标。 这个方法极大地简化了 AI 的任务,将一个困难的坐标回归问题转换为了一个简单的分类问题,从而显著提高了点击的准确性和可靠性。

代码标签: python 调用 gemini api 自然 语言 自动化 操作 电脑 代码

下面为部分代码预览,完整代码请点击下载或在bfwstudio webide中打开

#!/usr/bin/python3
# -*- coding: utf-8 -*
import os
import json
import time
import google.generativeai as genai
import pyautogui
from PIL import Image, ImageDraw, ImageFont
import io # <--- 确保导入了 io

# --- 配置区 ---
GRID_SIZE =20

# --- 辅助函数 (保持不变) ---

def get_grid_cell_id(col, row):
    col_char = chr(ord('A') + col)
    return f"{col_char}{row + 1}"

def draw_grid_on_image(image: Image.Image) -> Image.Image:
    draw = ImageDraw.Draw(image)
    width, height = image.size
    cell_width = width / GRID_SIZE
    cell_height = height / GRID_SIZE
    try:
        font = ImageFont.truetype("arial.ttf", 12)
    except IOError:
        font = ImageFont.load_default()
    for i in range(1, GRID_SIZE):
        draw.line([(i * cell_width, 0), (i * cell_width, height)], fill="red", width=1)
        draw.line([(0, i * cell_height), (width, i * cell_height)], fill="red", width=1)
    for row in range(GRID_SIZE):
        for col in range(GRID_SIZE):
            cell_id = get_grid_cell_id(col, row)
            text_position = (col * cell_width + 2, row * cell_height + 2)
            draw.text(text_position, cell_id, fill="red", font=font)
    return image

def grid_cell_to_coords(cell_id: str, screen_width: int, screen_height: int) -> tuple[int, int] | None:
    if not cell_id or len(cell_id) < 2: return None
    col_char, row_str = cell_id[0].upper(), cell_id[1:]
    try:
        col, row = ord(col_char) - ord('A'), int(row_str) - 1
        if not (0 <= col < GRID_SIZE and 0 <= row < GRID_SIZE): return None
        cell_width, cell_height = screen_width / GRID_SIZE, screen_height / GRID_SIZE
        return int((col + 0.5) * cell_width), int((row + 0.5) * cell_h.........完整代码请登录后点击上方下载按钮下载查看

网友评论0