Tag Archives: JSON

Transcribe audio with Google Cloud speech-to-text api

I had a few audio files of an interview done with a late relative that I wanted to have Google transcribe for me. I wanted to supply an audio file and have it spit out the results. There are many ways to do this but I went with using the Google Cloud Platfrom speech-to-text API.

First I signed up for a GCP free trial via https://cloud.google.com/speech-to-text/ For my usage, it will remain free as 0-60 minutes of transcription per month is not charged: https://cloud.google.com/speech-to-text/pricing

Next, I needed to create GCP storage bucket as audio more than 10 minutes long cannot reliably be transcribed via the “uploading local file” option. I did this following the documentation at https://cloud.google.com/storage/docs/creating-buckets which walks you through going to their storage browser and creating a new bucket. From that screen I uploaded my audio files (FLAC in my case.)

Then I needed to create API credentials to use. I did this by going speech API console’s credentials tab and creating a service account, then saving the key to my working directory on my local computer.

Also on said computer I installed google-cloud-sdk (on Arch Linux in my case, it was as simple as yay -S google-cloud-sdk)

With service account json file downloaded & google-cloud-sdk installed I exported the GCP service account credentials into my BASH environment like so

export GOOGLE_APPLICATION_CREDENTIALS=NAME_OF_SERVICE_ACCOUNT_KEYFILE_DOWNLADED_EARLIER.json 

I created .json files following the format outlined in command line usage outlined in the quickstart documentation. I tweaked to add a line “model”: “video” to get the API to use the premium Video recognition set (as it was more accurate for this type of recording.) This is what my JSON file looked like:

{
  "config": {
      "encoding":"FLAC",
      "sampleRateHertz": 16000,
      "languageCode": "en-US",
      "enableWordTimeOffsets": false,
      "model": "video"

  },
  "audio": {
      "uri":"gs://googlestorarge-bucket-name/family-memories.flac"
  }
}

I then used CURL to send the transcription request to Google. This was my command:

curl -s -H "Content-Type: application/json" -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) https://speech.googleapis.com/v1/speech:longrunningrecognize -d @JSON_FILE_CREATED_ABOVE.json

If all goes well you will get something like this in response:

{
  "name": "4663803355627080910"
}

You can check the status of the transcription, which usually takes half the length of the audio file to do, by running this command:

curl -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) -H "Content-Type: application/json; charset=utf-8" "https://speech.googleapis.com/v1/operations/ID_NUMBER_ACQUIRED_ABOVE"

You will either get a percent progress, or if it’s done, the output of the transcription.

Success! It took some time to figure out but was still much better than manually transcribing the audio by hand.

JQ select specific value from array

I had some AWS ec2 JSON output that I needed to parse. I wanted to grab a specific value from an array and it proved to be tricky for a JSON noob like me. I finally found this site which was very helpful: https://garthkerr.com/search-json-array-jq/. In my case I wanted the value of a specific AWS EC2 tag.

The trick is to grab down to the Tags[] array, and then pipe that to a select command. If your tags have dots in them (as mine did) then make sure to quote the tag name. Then add the .Value to the end of the select statement. This is my query:

jq -r '.Reservations[].Instances[].Tags[] | select (.Key == "EC2.Tag.Name").Value' jsonfile.json

The above query grabs all the tags (an array of Key,Value lines), then searches the result for a specific key “EC2.Tag.Name” and returns the Value line associated with it.