13 – YouTube Data API – Captions – download function

This is article 13 of the YouTube API With PHP series.

Download a caption track of a Video. The caption track is returned in its original format unless the request specifies a value for the tfmt parameter and in its original language unless the request specifies a value for the tlang parameter. Since this call requires user authentication, it can only download Caption tracks of videos which belong to you.

 The Request URL is

 GET https://www.googleapis.com/youtube/v3/captions/(id)

Parameters

  • key (string) required. Your API key
  • id (string) required This is added as a suffix to the GET url. This has to be a valid id for an existing Caption track.
  • onBehalfOfContentOwner (string) optional. This is relevant only for YouTube Channel Partners. For this parameter, the API request URL should have user authentication.We will not be exploring this option.
  • tfmt (string) optional. The tfmt parameter specifies that the caption track should be returned in a specific format. If the parameter is not included in the request, the track is returned in its original format. Possible values are :
    • sbv – SubViewer subtitle
    • scc – Scenarist Closed Caption format
    • srt – SubRip subtitle
    • ttml – Timed Text Markup Language caption
    • vtt – Web Video Text Tracks caption
  • tlang (string) optional. The tlang parameter specifies that the API response should return a translation of the specified caption track. The parameter value is the international two letter language code that identifies the desired caption language. The translation is generated by using machine translation, such as Google Translate.

 Response

 On a successful call, a binary file is returned which should be saved onto local disk,

Here is sample code which downloads a Caption track:

<?php
    error_reporting(E_ALL ^ E_NOTICE ^ E_WARNING ^ E_DEPRECATED);
    set_time_limit(60 * 3);
    session_start();

    $clientId = "**";
    $clientSecret = "**-";
    $g_youtubeDataAPIKey = "**";

    $captionId =  "STDMK4mG9ONQRc2kPO88VeQN1mlD15SHfV5I8hY1acQ=";
    
    $_SESSION["code_id"] = $_SERVER["PHP_SELF"];

    if ($_SESSION["access_token"] == null || $_SESSION["access_token"] == "") {
   	 // check for oauth response
   	 header("Location: ../../init-login.php");
   	 exit;
    }

   	 $accessToken = $_SESSION["access_token"];
   	 
   	 // make api request
   	 $url = "https://www.googleapis.com/youtube/v3/captions/" . $captionId . "?key=" .
   			 $g_youtubeDataAPIKey;
   	 $curl = curl_init();
   	 curl_setopt_array($curl, array(
   				 CURLOPT_HTTPHEADER=>array('Authorization: OAuth ' . $accessToken),
   				 CURLOPT_RETURNTRANSFER => 1,
   				 CURLOPT_URL => $url,
   				 CURLOPT_USERAGENT => 'YouTube API Tester',
   				 CURLOPT_SSL_VERIFYPEER => 1,
   				 CURLOPT_SSL_VERIFYHOST=> 0,
   				 CURLOPT_CAINFO => "../../cert/cacert.pem",
   				 CURLOPT_CAPATH => "../../cert/cacert.pem",
   				 CURLOPT_FOLLOWLOCATION => TRUE
   				 ));
   	 $resp = curl_exec($curl);

   	 curl_close($curl);
   	 var_dump($resp);
    
    
?>

Here is the output:

string(3265) "0:00:00.000,0:00:08.220 a few months back sometime in 2016 
I've 0:00:06.120,0:00:10.800 made a video which showcased the 0:00:08.220,0:00:13.080 features
 of the web speech API the West 0:00:10.800,0:00:15.020 peach API is a technology made by google 
0:00:13.080,0:00:18.600 which lets you do speech recognition 0:00:15.020,0:00:21.359 within your
 browser unfortunately even 0:00:18.600,0:00:23.490 as of now the only browser which fully
 0:00:21.359,0:00:26.160 supports that specification is google 0:00:23.490,0:00:28.769 chrome 
the other browsers haven't really 0:00:26.160,0:00:30.449 got support for it so i suggest you 
have 0:00:28.769,0:00:32.489 a look at that video first in order to 0:00:30.449,0:00:34.590 
get an idea of the capabilities of web 0:00:32.489,0:00:37.350 speech API the link is right 
below this 0:00:34.590,0:00:39.030 video so this time round i decided to 0:00:37.350,0:00:42.120 
take that experiment a little further 0:00:39.030,0:00:44.940 and what i have here 
is a single web 0:00:42.120,0:00:48.000 page application which does real-time 0:00:44.940,0:00:50.280 
translation so the processing pipeline 0:00:48.000,0:00:53.399 is very simple the first thing
 it does 0:00:50.280,0:00:56.160 is that it accepts speech using a 0:00:53.399,0:00:59.489 
microphone and then translates the 0:00:56.160,0:01:01.699 speech into written text that text
 is 0:00:59.489,0:01:03.899 them fed into a translation api which 0:01:01.699,0:01:07.140 
translates that text into another 0:01:03.899,0:01:09.479 language and then that translated
 takes 0:01:07.140,0:01:11.670 to spread into a text-to-speech engine 0:01:09.479,0:01:14.159 which 
then plays back that text as 0:01:11.670,0:01:15.390 reporters audio so you can speak in one 
0:01:14.159,0:01:17.009 language and you can hear the 0:01:15.390,0:01:19.860 translation 
of the same thing in a 0:01:17.009,0:01:22.799 different language now since what
 i'm 0:01:19.860,0:01:25.439 using your motif c-suite engines and AP 0:01:22.799,0:01:27.000 is the
 machine translation is not really 0:01:25.439,0:01:29.930 hundred percent perfect and sometimes
 0:01:27.000,0:01:33.270 the results can be quite funny but this 0:01:29.930,0:01:35.549 still forms 
the base for any kind of 0:01:33.270,0:01:38.820 translation application of 
software 0:01:35.549,0:01:40.020 which one might make and this can use as 0:01:38.820,0:01:41.880 a
 base to make something more 0:01:40.020,0:01:44.990 sophisticated so i'm just going 
to show 0:01:41.880,0:01:44.990 you a few examples here 0:01:47.100,0:01:54.240 what is your 
name and where you live you 0:01:51.930,0:01:57.409 will have a callous me why I shouldn't 0:01:54
.240,0:02:02.909 even care where I usually say material 0:01:57.409,0:02:07.289 assiyah
 me here for this table do you 0:02:02.909,0:02:13.920 believe them do you have any money 
with 0:02:07.289,0:02:19.470 you muscle to kill every one 
degree to 0:02:13.920,0:02:21.750 be happy imma get
 between the phenomena 0:02:19.470,0:02:28.850 Tomas famous me your hundred your
 0:02:21.750,0:02:28.850 amateur thank you for the kind words 0:02:29.780,0:02:39.080 necessary 
master Anahata name is a photo 0:02:33.330,0:02:39.080 that dress PS different people around "


We pass the Caption track id as part of the API call. We also pass the OAuth token as part of the headers instead of the URL.

 What the call returns are the contents of the Captions file. In this example we are dumping the contents . Ideally you would save it to a file. Since we have not specified what format we want the contents to be in via the tfmt parameter, the contents are returned in the original format as it was uploaded in.

 We can set the tfmt parameter as part of the API URL as tfmt=xxx where xxx is a valid string as mentioned in the tfmt specs above.

 Here is how the previous output looks if we put tfmt=vtt

string(12175) "WEBVTT Kind: captions Language: en Style: ::cue(c.colorCCCCCC) { color: rgb(204,204,204); }
 ::cue(c.colorE5E5E5) { color: rgb(229,229,229); } ## 00:00:00.000 --> 00:00:08.220 
align:start position:19% a<00:00:01.280> few<00:00:02.280> months<00:00:02.429> back<00:00:02.790> 
sometime<00:00:03.780> in<00:00:04.160> 2016<00:00:05.160> I've 00:00:06.120 --> 00:00:10.800 
align:start position:19% made<00:00:06.509> a<00:00:06.540> video<00:00:06.839> which<00:00:07.200> 
showcased<00:00:08.040> the 00:00:08.220 --> 00:00:13.080 align:start position:19% features<00:00:08.580>
 of<00:00:08.610> the<00:00:08.910> web<00:00:09.090> speech<00:00:09.389> API<00:00:09.590> 
the<00:00:10.590> West 00:00:10.800 --> 00:00:15.020 align:start position:19% peach<00:00:11.070> 
API<00:00:11.460> is<00:00:11.580> a<00:00:11.820> technology<00:00:12.210> made<00:00:12.690> 
by<00:00:12.870> google 00:00:13.080 --> 00:00:18.600 align:start position:19% which<00:00:13.650>
 lets<00:00:13.950> you<00:00:14.099> do<00:00:14.280> speech<00:00:14.849> recognition
 00:00:15.020 --> 00:00:21.359 align:start position:19% within<00:00:16.020> 
your<00:00:16.260> browser<00:00:17.420> unfortunately


Note that it is not necessary that specifying a tfmt value will always return content in that format. If the format is invalid or it cannot be converted then no content is returned for that format.

7 Comments

  1. plz help

    /opt/lampp/htdocs/yt-sub/init-login.php:75:string ‘{
    “error”: “invalid_grant”,
    “error_description”: “Token has been expired or revoked.”
    }
    ‘ (length=90)

    • /opt/lampp/htdocs/yt-sub/index.php:42:string ‘The permissions associated with the request are not sufficient to download the caption track. The request might not be properly authorized, or the video order might not have enabled third-party contributions for this caption.’ (length=225)

Leave a Reply

Your email address will not be published.


*