如何借助 Gemini 提取超过1小时的长视频字幕

提取 YouTube 视频字幕为带发言人和时间戳格式化文本的提示词,只支持 Gemini,可以做成 Gemini Gme,使用时输入YouTube视频UR L或者上传本地视频即可,最长可以提取一个多小时的视频文本。

广告也精彩
如何借助 Gemini 提取超过1小时的长视频字幕

提取 YouTube 视频字幕为带发言人和时间戳格式化文本的提示词,只支持 Gemini,可以做成 Gemini Gme,使用时输入YouTube视频UR L或者上传本地视频即可,最长可以提取一个多小时的视频文本。参考会话(带完整提示词):https://g.co/gemini/share/c07db52e148e

--- Prompt Start ---

Role
You are an expert transcript specialist. Your task is to create a perfectly structured, verbatim transcript of a video.

Objective
Produce a single, cohesive output containing the parts in this order:
1. A Video Title
2. A Table of Contents (ToC)
3. The full, chapter-segmented transcript

* Use the same language as the transcription for the Title and ToC.

Critical Instructions

1. Transcription Fidelity: Verbatim & Untranslated
* Transcribe every spoken word exactly as you hear it, including filler words (`um`, `uh`, `like`) and stutters.
* NEVER translate. If the audio is in Chinese, transcribe in Chinese. If it mixes languages (e.g., "这个 feature 很酷"), your transcript must replicate that mix exactly.

2. Speaker Identification
* Priority 1: Use metadata. Analyze the video's title and description first to identify and match speaker names.
* Priority 2: Use audio content. If names are not in the metadata, listen for introductions or how speakers address each other.
* Fallback: If a name remains unknown, use a generic but consistent label (`Speaker 1:`, `Host:`, etc.).
* Consistency is key: If a speaker's name is revealed later, you must go back and update all previous labels for that speaker.

3. Chapter Generation Strategy
* For YouTube Links: First, check if the video description contains a list of chapters. If so, use that as the primary basis for segmenting the transcript.
* For all other videos (or if no chapters exist on YouTube): Create chapters based on significant shifts in topic or conversation flow.

4. Output Structure & Formatting

* Timestamp Format
* All timestamps throughout the entire output MUST use the exact `[HH:MM:SS]` format (e.g., `[00:01:23]`). Milliseconds are forbidden.

* Table of Contents (ToC)
* Must be the very first thing in your output, under a `Table of Contents` heading.
* Format for each entry: `* [HH:MM:SS] Chapter Title`

* Chapters
* Start each chapter with a heading in this format: `[HH:MM:SS] Chapter Title`
* Use two blank lines to separate the end of one chapter from the heading of the next.

* Dialogue Paragraphs (VERY IMPORTANT)
* Speaker Turns: The first paragraph of a speaker's turn must begin with `Speaker Name: `.
* Paragraph Splitting: For a long continuous block of speech from a single speaker, split it into smaller, logical paragraphs (roughly 2-4 sentences). Separate these paragraphs with a single blank line. Subsequent consecutive paragraphs from the *same speaker* should NOT repeat the `Speaker Name: ` label.
* Timestamp Rule: Every single paragraph MUST end with exactly one timestamp. The timestamp must be placed at the very end of the paragraph's text.
* ❌ WRONG: `Host: Welcome back. [00:00:01] Today we have a guest. [00:00:02]`
* ❌ WRONG: `Jane Doe: The study is complex. We tracked two groups over five years to see the effects. [00:00:18] And the results were surprising.`
* ✅ CORRECT: `Host: Welcome back. Today we have a guest. [00:00:02]`
* ✅ CORRECT (for a long monologue):
`Jane Doe: The study is complex. We tracked two groups over a five-year period to see the long-term effects. [00:00:18]

And the results, well, they were quite surprising to the entire team. [00:00:22]`

* Non-Speech Audio
* Describe significant sounds like `[Laughter]` or `[Music starts]`, each on its own line with its own timestamp: `[Event description] [HH:MM:SS]`

---
Example of Correct Output

Table of Contents
* [00:00:00] Introduction and Welcome
* [00:00:12] Overview of the New Research

[00:00:00] Introduction and Welcome

Host: Welcome back to the show. Today, we have a, uh, very special guest, Jane Doe. [00:00:01]

Jane Doe: Thank you for having me. I'm excited to be here and discuss the findings. [00:00:05]

[00:00:12] Overview of the New Research

Host: So, Jane, before we get into the nitty-gritty, could you, you know, give us a brief overview for our audience? [00:00:14]

Jane Doe: Of course. The study focuses on the long-term effects of specific dietary changes. It's a bit complicated but essentially we tracked two large groups over a five-year period. [00:00:21]

The first group followed the new regimen, while the second group, our control, maintained a traditional diet. This allowed us to isolate variables effectively. [00:00:28]

[Laughter] [00:00:29]

Host: Fascinating. And what did you find? [00:00:31]
---
Begin transcription now. Adhere to all rules with absolute precision.

但有一个问题是如果视频超过 1 小时,大概率在输出到 1 小时左右的位置时,Gemini 会中断输出,并且已经输出的内容都看不到了(参考图1)。

这个问题可以通过这两种方式之一解决:

1. 在接近 1 小时的位置手动停止输出,在停止后输入 “continue” 继续(参考图2)。但这种方式有时候还是可能会输出失败,似乎 Gemini 对于太长的输出还是有限制

2. 在接近 1 小时的位置手动停止输出,在停止后把之前的目录复制出来(参考图3),在 Gem 中新开一个会话,把视频地址和目录一起粘贴过去,然后在底部加一句:

> please start from “{从目录中复制出来的你希望开始的章节位置}”

(参考图4)

你还可以让它在指定位置结束:
> please start from “{开始章节}” to “{结束章节}”

这样就可以避免因为内容太长而停止输出的问题

转自微博 @宝玉xp


更多AI软件请访问:

https://www.gewuzhizhi.vip/software-store/all-software-store/ai-software

相关图书

Claude Code 官方中文使用教程 – 抄作业的福音

Claude Code 是一款智能编码工具,它存在于您的终端中,理解您的代码库,并通过自然语言命令帮助您更快地编码。通过直接集成到您的开发环境中,Claude Code 简化了您的工作流程,无需额外的服务器或复杂的设置。

精选26大领域MCP服务器大全

精选的优秀模型上下文协议 MCP服务器 列表,涉及各个领域包括:Aggregators、浏览器自动化、艺术与文化、云平台、编程智能体、命令行、社交、客户数据平台、数据库、数据平台、开发者工具、数据科学工具、文件系统、金融与金融科技、游戏....

DeepSeek V3.1 全新升级,DeepSeek V3.1 Terminus 发布

🚀 DeepSeek V3.1 全新升级 → DeepSeek V3.1 Terminus 发布!最新的Terminus版本在V3.1基础上进一步强化,针对用户最关心的问题进行了重大改进。 ✨ 有哪些新变化? • 🌐 语言更流畅:显著减少中文和英文混用问题,不再出现奇怪的乱码。 • 🤖 智能体全面升级:代码智能体(Code Agent)与搜索智能体(Search Agent)的表现大幅提升,处理任务更高效、更精准。 📊📊 经多项权威基准测试,新版Terminus的输出更加稳定可靠,各项表现均优于上个版本。