当前位置：首页 > 编程日记 > 正文

bff v2ex_语音备忘录的BFF-如何通过Machine Learning简化Speech2Text

编程日记 2024-08-07 19:00:00

bff v2ex

by Rafael Belchior

通过拉斐尔·贝尔基奥尔(Rafael Belchior)

语音备忘录的BFF-如何通过Machine Learning简化Speech2Text (The voice memo’s BFF — how to make Speech2Text easy with Machine Learning)

Do you think recording voice memos is inconvenient because you have to transcribe them? Do you waste your precious voice memos because you never write them down? Do you feel like you are not unlocking the full potential of what you record?

您是否认为录制语音备忘录很不方便，因为您必须转录它们？您是否因为从未写下来而浪费了宝贵的语音备忘录？您是否觉得自己没有释放录制内容的全部潜力？

Yeah, that sucks. ?

是的，太烂了。？

I’m a Computer Science masters student. As I think that all work and no play makes me a dull boy, I’ve decided to invest some time in doing something different. Where? In the student’s group to which I belong, by interviewing a professor.

我是计算机科学的硕士生。由于我认为所有工作和没有玩耍会使我变得乏味，所以我决定花一些时间做一些不同的事情。哪里？在我所属的学生小组中，通过采访一位教授。

I’ve talked to professor Rui Henriques, a teacher assistant @ Técnico Lisboa and researcher @ INESC-ID. He is an expert in Data Mining and Bioinformatics. The 20 minutes interview turned into almost a full hour conversation.

我已经与Rui Henriques教授，TécnicoLisboa的助教和INESC-ID的研究员进行了交谈。他是数据挖掘和生物信息学方面的专家。 20分钟的采访变成了几乎一个小时的谈话。

Rui is not only a brilliant academic but also a very honest, cheerful and easy going person, which made it very easy. I learned a lot while talking to him, and I’m sure you also can. The interview will be online soon enough. ?

芮不仅是一位出色的学者，而且还是一个非常诚实，开朗和随和的人，这使他变得非常容易。与他交谈时，我学到了很多东西，我相信你也可以。采访将很快上线。？

Anyway, I had a problem and a need. I wanted to save time by not having to transcribe the whole interview. The idea was to invest only twenty to sixty minutes in order to skyrocket performance when it comes to transcribing. This is not limited to interviews, of course. You can transcribe audio notes taken from several sources like classes, writing notes, thoughts, your shopping list, or your most philosophical pieces.

无论如何，我有一个问题和需求。我想节省时间，而不必抄写整个采访。这个想法是只花20到60分钟就可以使转录性能飞速增长。当然，这不仅限于采访。您可以抄录来自多个来源的音频笔记，例如课堂，写作笔记，想法，购物清单或最有哲理的作品。

那么，我们该怎么做呢？ (So, how do we do that?)

I’m also lecturing on It Infrastructure Management and Administration @ Técnico Lisboa. In classes, we have used Google Cloud Engine. I remembered a service called Google Speech-To-Text, which we could use in this case. And no, Google is not paying me to write this ?

我也在讲授IT 基础设施管理和管理 @ TécnicoLisboa 。 在课堂上，我们使用了Google Cloud Engine。我记得一个叫做Google Speech-To-Text的服务，我们可以在这种情况下使用它。不，谷歌不付钱给我写这个吗？

So, how to turn an interview of 55 minutes into easily editable text? How to reduce our efforts and focus on what matters? ?

那么，如何将55分钟的采访变成容易编辑的文字？如何减少我们的精力并专注于重要的事情？？

? By the way, to make the most out of this method, please cut noise and try to record with a loud, clear voice. ?

？顺便说一句，要充分利用此方法，请降低噪音并尝试以清晰大声的声音进行录制。？

步骤1：安装所需的软件 (Step 1: Installing the required software)

I use Vagrant to manage virtual machines. The advantage is that to use the environment you need to instantiate the Speech-To-Text service. In this article, I show step by step how to configure these tools (read it up to the section “The Experiment”). If you prefer to do this on your local machine, go directly to the third step.

我使用Vagrant来管理虚拟机。优点是要使用环境，您需要实例化语音转文本服务。在本文中，我将逐步展示如何配置这些工具 (请阅读“实验”部分)。如果您希望在本地计算机上执行此操作，请直接转到第三步。

步骤2：启动虚拟机 (Step 2: Start the virtual machine)

Now, open your console and run:

现在，打开控制台并运行：

$ vagrant up --provision && vagrant ssh

The virtual machine is booting, installing all the required dependencies. This may take a while.

虚拟机正在引导，并安装了所有必需的依赖项。可能还要等一下。

Wait a bit. Done. Nice. Kudos to you ?

稍等一会。做完了真好对您表示敬意？

步骤3：获取支持文件 (Step 3: Getting the support files)

Fork this repository containing the support files and then clone it to your computer. Put it in the folder that is being synced with your guest machine.

分叉包含支持文件的此存储库，然后将其克隆到您的计算机。将其放在与您的访客计算机同步的文件夹中。

步骤4：在Google Cloud Engine建立帐户 (Step 4: Creating an account at Google Cloud Engine)

You can require a free grant ($300) for this experiment ? After creating the account, go to Google Console. Create a project. You can name it “easy-interview” if you are confident enough. You should see something like this:

您可以为此实验申请免费赠款($ 300)吗？创建帐户后，转到Google控制台。创建一个项目。如果您有足够的信心，可以将其命名为“简易采访”。您应该会看到以下内容：

After that, go to “APIs & Services”, in order to activate the API we need to get the job done.

之后，转到“ API和服务”，为了激活API，我们需要完成工作。

Click on “Create Credentials”. Choose “Cloud Speech API”. On “Are you planning to use this API with App Engine or Compute Engine?” say “No”. On step 2, “Create a service account” name the service “transcribing”. The role is Project => Owner. Key type: JSON.

点击“创建证书”。选择“ Cloud Speech API”。在“您打算将此API与App Engine或Compute Engine一起使用吗？” 说不”。在步骤2中，“创建服务帐户”将服务命名为“转录”。角色是项目=>所有者。密钥类型：JSON。

By now, you should have downloaded a file called “file.txt”. It contains the credentials you need to use the service. Rename the file to “terraform-credentials.json”. Copy it to the folder containing the support files. As that folder is synced with your virtual machine, you will have access to those files from the guest machine. Now, run:

现在，您应该已经下载了一个名为“ file.txt”的文件。它包含使用该服务所需的凭据。将文件重命名为“ terraform-credentials.json”。将其复制到包含支持文件的文件夹中。由于该文件夹已与您的虚拟机同步，因此您将可以从来宾计算机访问这些文件。现在，运行：

$ gcloud auth login

Follow the instructions. Authenticate yourself following the link that is shown. Now, analyze the request.json file:

按照说明进行操作。按照显示的链接进行身份验证。现在，分析request.json文件：

{  "config": {      "encoding":"FLAC",      "sampleRateHertz": 16000,      "languageCode": "en-US",      "enableWordTimeOffsets": false  },  "audio": {      "uri":"gs://cloud-samples-tests/speech/brooklyn.flac"  }}

Make sure to tune the parameters to fit your case. Beware that there are limitations on the encoding that you can use. If your file is in a different format than flac or wav, you will need to convert it. You can convert audio files with Audacity, a free, open-source audio software. After converting the audio, you have to upload it to Google Storage. For that, you have to create a bucket.

确保调整参数以适合您的情况。请注意，可以使用的编码存在限制。如果您的文件格式不同于flac或wav ，则需要对其进行转换。您可以使用免费的开源音频软件Audacity转换音频文件。转换音频后，您必须将其上传到Google存储空间。为此，您必须创建一个bucket 。

The settings may be:

设置可能是：

After that, upload your file to the bucket. On the Bucket menu, you should be able to access the URI associated with your file. The format is gs://BUCKET/FILE.EXTENSION. Take that URI and replace it on the file my-request.json.

之后，将文件上传到存储桶。在“存储桶”菜单上，您应该能够访问与文件关联的URI。格式为gs：//BUCKET/FILE.EXTENSION 。使用该URI并将其替换在文件my-request.json上 。

Your file should look something like this:

您的文件应如下所示：

{  "config": {      "encoding":"FLAC",      "sampleRateHertz": 16000,      "languageCode": "pt-PT",      "enableWordTimeOffsets": false  },  "audio": {      "uri":"gs://easy-interview/interview.flac"  }}

Before we use the API, we need to load the credentials. Run the script load-credentials.sh to load them:

在使用API之前，我们需要加载凭据。运行脚本load-credentials.sh加载它们：

$ source load-credentials.sh

This has set the GOOGLE_APPLICATION_CREDENTIAL environment variable. Next, to test if the connection is successful, run:

这已设置了GOOGLE_APPLICATION_CREDENTIAL环境变量。接下来，要测试连接是否成功，请运行：

$ curl -s -H "Content-Type: application/json" \    -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \    https://speech.googleapis.com/v1/speech:recognize \    -d @test-request.json

You should be able to see a response with some transcribed text. Note that we ran test-request.json, which is just for testing purposes. Now, to make the call with your data, run:

您应该能够看到带有一些转录文本的回复。请注意，我们运行了test-request.json，仅用于测试目的。现在，要使用您的数据进行呼叫，请运行：

$ curl -s -H "Content-Type: application/json" \    -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \    https://speech.googleapis.com/v1/speech:longrunningrecognize \    -d @my-request.json >> name.out

If you run more name.out, you will see that the response contains a field called name. That name corresponds to the operation name that was created to meet the request. Now you have to wait a bit until the operation completes. Run (replace NAME with your operation’s name):

如果运行更多name.out，您将看到响应包含一个名为name的字段。该名称对应于为满足请求而创建的操作名称。现在，您需要稍等片刻，直到操作完成。运行(用您的操作名称替换NAME)：

$ curl -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \     -H "Content-Type: application/json; charset=utf-8" \     "https://speech.googleapis.com/v1/operations/NAME" >> result.out

While the operation doesn’t finish, your result.out will have a content similar to this:

当操作未完成时，您的result.out将具有类似于以下内容：

{ “name”: “8254262642733152416”, “metadata”: { “@type”: “type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeMetadata”, “progressPercent”: 33, “startTime”: “2018–12–08T01:15:08.969852Z”, “lastUpdateTime”: “2018–12–08T01:19:25.105683Z” }}

{“名称”：“ 8254262642733152416”，“元数据”：{“ @type ”：“ type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeMetadata”，“ progressPercent”：33，“ startTime”：“ 2018– 12–08T01：15：08.969852Z”，“ lastUpdateTime”：“ 2018–12–08T01：19：25.105683Z”}}

For a 60mb file, encoded with flac , it took about 12 minutes. You will have a file called results.out with your precious content. It will be in your host machine as well. I’ve written a very simple Python script that parses results.out. The script redirects the output to a file named results-parsed.out. To execute it, run:

对于使用flac编码的60mb文件，大约需要12分钟。您将拥有一个名为results.out的文件，其中包含您的宝贵内容。它也将在您的主机中。我编写了一个非常简单的Python脚本来解析result.out。 该脚本将输出重定向到名为results-parsed.out的文件。要执行它，运行：

$ python parse.py

If you don’t like the results, tune the parameters and try again.

如果您不喜欢结果，请调整参数，然后重试。

Enjoy your content! You are done ? To finish this experiment, exit the machine:

享受您的内容！你做完了吗？要完成此实验，请退出计算机：

$ gcemgmt: exit

Now, stop the virtual machine:

现在，停止虚拟机：

$ vagrant halt

Don’t forget to delete the files that you uploaded to Google Cloud.

不要忘记删除上传到Google Cloud的文件。

Well done!?

做得好！？

Well, this took me several hours to write, but at least I didn’t have to transcribe the whole interview. ?

好吧，这花了我几个小时来写，但是至少我不必抄写整个采访。？

底线 (Bottomline)

Firstly, I would ❤️to hear your opinion! Do you record lots of voice memos? Do you find this procedure useful? Do you have a different one?

首先，我会❤️听到您的意见！您录制很多语音备忘录吗？您觉得此程序有用吗？您有其他人吗？

If you liked this article, please click the ? button on the left. Do you have a friend or family member that would benefit from this solution? Share this article!

如果您喜欢这篇文章 ，请单击“？”。左边的按钮。 您有没有可以从该解决方案中受益的朋友或家人？ 分享此文章！

Keep Rocking ?

继续摇摆吗？

Entrepreneurship ?

创业精神？

Top 8 lessons I’ve learned in European Innovation Academy 2017Imagine you are seeing the opportunity to improve yourself at every level. Would you take it?blog.startuppulse.net

我在2017年欧洲创新学院中学到的8课， 想象一下，您正在看到在各个层次上提高自己的机会。 你会接受吗？ blog.startuppulse.net

DevOps101 ☄️

DevOps101☄️

DevOps101 — Improve Your Workflow! First Steps on VagrantAnd make clients and developers happier.hackernoon.comDevOps101 — Infrastructure as Code With VagrantAnd deploying a simple IT infrastructure (Two LAMP web servers and a client machine).hackernoon.com

DevOps101 —改善您的工作流程！ 无家可归的第一步 使客户和开发人员更快乐。 hackernoon.com DevOps101-基础架构 随处 可见， 并部署了一个简单的IT基础架构(两个LAMP Web服务器和一个客户端计算机)。 hackernoon.com

Blockchain For Students ⛓️

学生区块链⛓️

Blockchain For Students 101 -The Basics (Part 1)Are you ready to dig deep into this life-changing technology?hackernoon.com

学生用区块链101-基础知识(第1部分) 您准备好深入研究这种改变生活的技术了吗？ hackernoon.com

翻译自: https://www.freecodecamp.org/news/the-voice-memos-bff-speech-to-text-powered-by-machine-learning-1dbc7a6c65f1/

bff v2ex

https://www.dkcj.cn/info/13254.html

bff v2ex_语音备忘录的BFF-如何通过Machine Learning简化Speech2Text

语音备忘录的BFF-如何通过Machine Learning简化Speech2Text (The voice memo’s BFF — how to make Speech2Text easy with Machine Learning)

那么，我们该怎么做呢？ (So, how do we do that?)

步骤1：安装所需的软件 (Step 1: Installing the required software)

步骤2：启动虚拟机 (Step 2: Start the virtual machine)

步骤3：获取支持文件 (Step 3: Getting the support files)

步骤4：在Google Cloud Engine建立帐户 (Step 4: Creating an account at Google Cloud Engine)

底线 (Bottomline)

相关文章：

pat1094. The Largest Generation (25)

web-view里面的网页能请求未配置的request域名吗

.NET调用JAVA的WebService方法

适合初学者的数据结构_数据结构101：图-初学者的直观介绍

深入解析CSS样式层叠权重值

VUE做一个公共的提示组件，显示两秒自动隐藏，显示的值父组件传递给子组件

Linux课堂随笔---第四天

初级开发人员的缺点_作为一名初级开发人员，我如何努力克服自己的挣扎

lintcode-136-分割回文串

微信小程序把繁琐的判断用Js简单的解决

数论(Lucas定理) HDOJ 4349 Xiao Ming's Hope

docker容器虚拟化技术_Docker，虚拟机和容器的全面介绍

IOS中的响应者链

MySQL闪退问题的解决

HTML封装AJAX请求，在请求里面写登录的逻辑 ajax 网络请求 post

oye pandora_我尝试了Pandora出色的功能优先级排序方法。这是我学到的。

2016-2017-2软件工程课程总结

asp.net mvc jqgrid 同一个页面查询不同的表，jqgrid显示不同表的表头和数据并且分页...

降维后的高维特征的参数_高维超参数调整简介

细细品味大数据--初识hadoop

js中Array数组中的常用方法汇总

vue写一个通用的toast弹窗 toast 弹窗提示

我对Node.js Core的首次贡献中学到了什么

SHA204A加密芯片配置

[剑指offer] 二叉搜索树的后序遍历序列

HbuilderX中的git的使用 git HbuilderXgit HbuilderX 使用git

Swift与Objective-C：与恐龙有关的趋势

我的JavaScript学习笔记

获取枚举值上的Description特性说明

微信小程序实时获取用户经纬度