This is article 7 of the YouTube API With PHP series.
THE PROBLEM WITH MAXRESULTS
Everyone who starts to work with the YouTube Data API runs into these problems very soon and then spends a few hours or even a few days trying to figure out how to deal with these issues.
Problem 1: 99% of the API calls which return multiple results , will not allow you to retrieve more than 500 items even if the API says there are more. For eg.if you want to get all the videos in a channel, the API will report that there are 4,000 videos but it will not let you retrieve the details of more than the first 500 (depending on the sorting order). There are ways around this, which we will look into, in further sections, but by and large , you have to keep this limit in mind.
Problem 2: Most API calls which return multiple results, have a field called maxResults which tells you how many results are there in total for this API call. This value keeps changing between multiple calls and cannot be used for any reliable calculations. Eg. if you execute a search API , the first call will say 200 results, the call for the next page will say 4500 results and the third page may show a different value again.
It is good if you know about these two issues early on, so that it saves you a lot of frustration later. Both these problems can be overcome , as you will find in later sections.
HANDLING PAGING
One more basic concept to deal with before we look into the actual API entities is how to page through the results. Some API calls like fetching a video only return a single data entity. Others like search or getting all the videos in a channel or all Activities of a user get multiple items of data.
When an API call returns multiple results, then due to practical reasons, the API won’t let you retrieve more than 50 items at a time. The maxResults parameter lets you set the max results to be returned for each call. It can be anything from 0 to 50. If it is not set, then the default maxResults are 5.
So if a call has 200 results and you have not set maxResults, you will have to page through it (200 / 5 = 40) times. If you set maxResults to 50, you will have (200/50=4) pages of data to deal with.
As you will see in the later sections, all API results will return two fields nextPageToken and prevPageToken, if the number of total results exceed the number of fetched results. The tokens are in the form of unique strings like CDIQAA , CKYEEAA etc.
So in order to iterate through all the results, you get the value of nextPageToken in the first API call and pass it as a parameter (pageToken) in the subsequent calls until the nextPageToken returns a null. When it becomes null, it means no more data is available.
Similar is the case with prevPageToken. This value is only available when you get the second page of results. If you need to fetch the previous page of results , pass the prevPageToken within the pageToken parameter in the API call. When you reach the first page, prevPageToken will become null.
Here is some sample code which shows how the iteration is done. This logic is more or less the same for whatever API entity is being called. Ignore the actual call details and the JSON handling. Just focus on how the nextPageToken is handled.
<?php error_reporting(E_ALL ^ E_NOTICE ^ E_WARNING ^ E_DEPRECATED); set_time_limit(60 * 3); $g_YouTubeDataAPIKey = "****"; $channelId = "UCddiUEpeqJcYeBxX1IVBKvQ"; // make api request $url = "https://www.googleapis.com/YouTube/v3/activities?part=snippet,contentDetails,id&channelId=" . $channelId. "&maxResults=50&key=" . $g_YouTubeDataAPIKey; $curl = curl_init(); curl_setopt_array($curl, array( CURLOPT_RETURNTRANSFER => 1, CURLOPT_URL => $url, CURLOPT_USERAGENT => 'YouTube API Tester', CURLOPT_SSL_VERIFYPEER => 1, CURLOPT_SSL_VERIFYHOST=> 0, CURLOPT_CAINFO => "../cert/cacert.pem", CURLOPT_CAPATH => "../cert/cacert.pem", CURLOPT_FOLLOWLOCATION => TRUE )); $resp = curl_exec($curl); curl_close($curl); if ($resp) { echo("Found data. <br>"); $pages = 1; $json = json_decode($resp); if ($json) { echo("JSON decoded<br>"); $nextPageToken = $json->nextPageToken; $total = $json->pageInfo->totalResults; $items = $json->items; } else exit("Error. could not parse JSON." . json_last_error_msg()); foreach($items as $item) { // process each item in the list } // second level search using nextpage token while ($nextPageToken != null) { echo("Fetching page " . (++$page) . " using pagetoken=" . $nextPageToken . "<br>"); $url = "https://www.googleapis.com/YouTube/v3/search?part=snippet&channelId=" . $channelId. "&maxResults=50&order=date&type=video&" . "&pageToken=" . $nextPageToken . "&key=" . $g_YouTubeDataAPIKey; $nextPageToken = null; // clear current value; $curl = curl_init(); curl_setopt_array($curl, array( CURLOPT_RETURNTRANSFER => 1, CURLOPT_URL => $url, CURLOPT_USERAGENT => 'Codular Sample cURL Request', CURLOPT_SSL_VERIFYPEER => 1, CURLOPT_SSL_VERIFYHOST=> 0, CURLOPT_CAINFO => $g_docRoot . "../cert/cacert.pem", CURLOPT_CAPATH => $g_docRoot . "../cert/cacert.pem", CURLOPT_FOLLOWLOCATION => TRUE )); $resp = curl_exec($curl); if ($resp) { $json = json_decode($resp); if ($json) { $nextPageToken = $json->nextPageToken; $total = $json->pageInfo->totalResults; $items = $json->items; } foreach($items as $item) { // process each item } // foreach } // if $resp // sometimes pagetoken is filled but no items are fetched , in such a case stop the code if (count($items) == 0) $nextPageToken = null; } // while $nextPageToken echo("Finished"); } // if $resp else { } ?>
Here is the output:
Found data.
JSON decoded
Fetching page 1 using pagetoken=CDIQAA
Fetching page 2 using pagetoken=CGQQAA
Fetching page 3 using pagetoken=CJYBEAA
Fetching page 4 using pagetoken=CMgBEAA
Fetching page 5 using pagetoken=CPoBEAA
Fetching page 6 using pagetoken=CKwCEAA
Fetching page 7 using pagetoken=CN4CEAA
Fetching page 8 using pagetoken=CJADEAA
Fetching page 9 using pagetoken=CMIDEAA
Fetching page 10 using pagetoken=CPQDEAA
Fetching page 11 using pagetoken=CKYEEAA
Finished
CROSSING THE 500 RESULTS LIMIT
As shown above, one of the biggest practical limitations while retrieving data is that you hit the 500 results limit of the YouTube API, if you working with a lot of data. Officially the limit still holds, but there is a way to go beyond that limit.
The key to this is the publishedBefore and the order parameter passed in the API call. When you are doing a search or perhaps retrieving a list of videos from a channel, specify that the results be sorted as per date. Then we use the publishedBefore or publishedAfter parameter to get more data once we cross 500 results.
Here is how the logic flows:
- Make the first call by specifying order=date and maxResults=50
- As long as there is a nextPageToken available, keep making successive calls
- Get the date of the last result (this will have the earliest date) and make a new call with order=date and publishedBefore=the date of the last result obtained
- Ignore any nextPagetokens obtained.
- Out of the results obtained, get the date of the earliest result and then repeat the call with order=date and publishedBefore=earliest date
- Repeat step 5 till you reach either 1 or zero results.
The above logic is not foolproof and has not been properly tested for edge cases and missing data. But the code based on this has been used to fetch thousands of videos from channels.
Sample code which uses this algorithm is given below:
<?php error_reporting(E_ALL ^ E_NOTICE ^ E_WARNING ^ E_DEPRECATED); set_time_limit(60 * 3); $g_youtubeDataAPIKey = "**"; $channelId = "***"; // make api request $url = "https://www.googleapis.com/youtube/v3/search?part=snippet&channelId=" . $channelId. "&maxResults=50&order=date&type=video&key=" . $g_youtubeDataAPIKey; $curl = curl_init(); curl_setopt_array($curl, array( CURLOPT_RETURNTRANSFER => 1, CURLOPT_URL => $url, CURLOPT_USERAGENT => 'Codular Sample cURL Request', CURLOPT_SSL_VERIFYPEER => 1, CURLOPT_SSL_VERIFYHOST=> 0, CURLOPT_CAINFO => $g_docRoot . "../cert/cacert.pem", CURLOPT_CAPATH => $g_docRoot . "../cert/cacert.pem", CURLOPT_FOLLOWLOCATION => TRUE )); $resp = curl_exec($curl); curl_close($curl); if ($resp) { // first level search $json = json_decode($resp); if ($json) { $nextPageToken = $json->nextPageToken; $total = $json->pageInfo->totalResults; $items = $json->items; } echo("nextpage=" . $nextPageToken . "<br>total=" . $total . "<br>items=" . count($items)); foreach($items as $item) { $videoId = $item->id->videoId; $videoTitle = $item->snippet->title; $videoDesc = $item->snippet->description; $thumbnail= $item->snippet->thumbnails->high->url; $rawDate = $item->snippet->publishedAt; echo("<hr>" . $videoId . ", " . $videoTitle . " date :" . $rawDate); } // second level search using nextpage token echo("<br><h2>Second phase</h2><br>"); while ($nextPageToken != null) { $url = "https://www.googleapis.com/youtube/v3/search?part=snippet&channelId=" . $channelId. "&maxResults=50&order=date&type=video&" . "&pageToken=" . $nextPageToken . "&key=" . $g_youtubeDataAPIKey; $curl = curl_init(); curl_setopt_array($curl, array( CURLOPT_RETURNTRANSFER => 1, CURLOPT_URL => $url, CURLOPT_USERAGENT => 'Codular Sample cURL Request', CURLOPT_SSL_VERIFYPEER => 1, CURLOPT_SSL_VERIFYHOST=> 0, CURLOPT_CAINFO => $g_docRoot . "../cert/cacert.pem", CURLOPT_CAPATH => $g_docRoot . "../cert/cacert.pem", CURLOPT_FOLLOWLOCATION => TRUE )); $resp = curl_exec($curl); if ($resp) { $json = json_decode($resp); if ($json) { $nextPageToken = $json->nextPageToken; $total = $json->pageInfo->totalResults; $items = $json->items; } echo("nextpage=" . $nextPageToken . "<br>total=" . $total . "<br>items=" . count($items)); foreach($items as $item) { $videoId = $item->id->videoId; $videoTitle = $item->snippet->title; $videoDesc = $item->snippet->description; $thumbnail= $item->snippet->thumbnails->high->url; $rawDate = $item->snippet->publishedAt; echo("<hr>" . $videoId . ", " . $videoTitle . " date :" . $rawDate); } // foreach } // if $resp // sometimes pagetoken is filled but no items are fetched , in such a case stop the code if (count($items) == 0) $nextPageToken = null; } // while $nextPageToken // rawData contains the last added video , so we for further back using it // third level search using date if there were any search results echo("<br><h2>Third phase</h2><br>"); echo("<br>RawDate of latest added video=" . $rawDate . "<br>"); if ($rawDate != null) { $doSearch = true; $nextPageToken = null; $firstRawDate = $rawDate; // remember where this loop starts from while ($doSearch) { if ($nextPageToken != null) $url = "https://www.googleapis.com/youtube/v3/search?part=snippet&channelId=" . $channelId. "&maxResults=50&order=date&type=video&" . "&pageToken=" . $nextPageToken . "&key=" . $g_youtubeDataAPIKey; else $url = "https://www.googleapis.com/youtube/v3/search?part=snippet&channelId=" . $channelId. "&maxResults=50&order=date&type=video&" . "&publishedBefore=" . $rawDate . "&key=" . $g_youtubeDataAPIKey; echo($url . "<br>"); $curl = curl_init(); curl_setopt_array($curl, array( CURLOPT_RETURNTRANSFER => 1, CURLOPT_URL => $url, CURLOPT_USERAGENT => 'Codular Sample cURL Request', CURLOPT_SSL_VERIFYPEER => 1, CURLOPT_SSL_VERIFYHOST=> 0, CURLOPT_CAINFO => $g_docRoot . "../cert/cacert.pem", CURLOPT_CAPATH => $g_docRoot . "../cert/cacert.pem", CURLOPT_FOLLOWLOCATION => TRUE )); $resp = curl_exec($curl); if ($resp) { $json = json_decode($resp); if ($json) { $nextPageToken = $json->nextPageToken; $total = $json->pageInfo->totalResults; $items = $json->items; } echo("nextpage=" . $nextPageToken . "<br>total=" . $total . "<br>items=" . count($items) . "<br>"); // we use an array to store the dates because it is not necessary that the API will // observe date sorting within the results. $arrDates = array(); foreach($items as $item) { $videoId = $item->id->videoId; $videoTitle = $item->snippet->title; $videoDesc = $item->snippet->description; $thumbnail= $item->snippet->thumbnails->high->url; $rawDate = $item->snippet->publishedAt; echo("<hr>" . $videoId . ", " . $videoTitle . " date:" . $rawDate . "<br>"); $arrDates [] = $rawDate; // store date into our array } // foreach } // if $resp $nextPageToken = null; // do not go to the next page // sort array to get earliest raw data if ($arrDates && count($arrDates) > 0) { sort($arrDates); $rawDate = $arrDates[0]; } if (($nextPageToken == null || $nextPageToken == "") && (count($items) == 0) && $rawDate == null) $doSearch = false; if ($nextPageToken == null && $rawDate == $firstRawDate) $doSearch = false; else if ($nextPageToken == null && $rawDate != $firstRawDate) { echo("<br>Raw Date=" . $rawDate . "<br>"); } // special end case - if there is only one video in the search results // then we have reached the end of search if (count($items) <= 1) $doSearch = false; } // while ($doSearch) } // if (!rawDate != null) } // if $resp || incremental ?>
From the next section we are going to look into each Resource exposed in the YouTube Data API.
2023 DEC 25 ADDITION
It has been my experience and others have also pointed out the fact that the code to fetch videos from a channel can sometimes skip videos for no logical reason. If you manually compare videos visible in Youtube for that channel and compare the output from the API, sometimes videos will be missing. After more testing, I found that if videos are created in different playlists then they do not show up in the video listing of the channel. Given that circumstance, the above code will not fetch all the videos, even if we cross the 500 video limit.
The only alternative solution is to
- Fetch all the Playlists in the channel
- For each Playlist, get all the playlistItems which will contain the videos
To use the above logic, you can refer the sample code in the following blog posts:
why sometimes pagetoken is filled but no items are fetched , in such a case stop the code
in php you can use array_filter to remove empty array
@sadam that is a quirk of the Youtube API. I couldnt find any explanation.
I am testing this api and fetching 50 items at a time. I can get no more than 19 nextPageTokens for each query. Which makes it 20*50 == 1000 maximum items total. I tried multiple searches which have thousands of items.
@Gus
Normally Youtube will not show any nextPageToken after 500 results as they claim that the relevancy of the results become bad after 500 results if you are using the API.
One trick around this is to use the the filters ‘publishedAfter’ and ‘publishedBefore’ to break up your query into loops of queries. I already have sample code which does this to return thousands of results, but I never got around to posting the code on the blog.
I will do that soon.
I would appreciate it if you post your code samples refering publishedAfter/publishedBefore issues.
@marioandmario I have added sample code to this post using publishedBefore and After to show how to cross the 500 records limit.
Hi Amit, thank you so much for this post. I’ve been trying to extract the full list of videos in a certain channel (called “Tasty”, basically a food and recipe channel)
This is the code I’ve used in Python following your advice. I’m extracting the videos by looping in batches of 50 (without even using the “next token” mechanism), I’m just doing 47 different API calls, and updating the “beforedate” parameter, to in theory scan all the videos
****CODE START
def fetch_AllVideos_by_channelid(TargetChannelID,OutputFilename,InitialDate,NumeroTandas):
youtube = build(YOUTUBE_API_SERVICE_NAME,YOUTUBE_API_VERSION,developerKey=DEVELOPER_KEY)
BeforeDate=InitialDate
#El contador de este For, habrá que ponerlo de forma que recorra todas las paginas en tandas de 50.
for counter in range(1,NumeroTandas):
print (“####” + BeforeDate)
res = youtube.search().list(part=”snippet”,channelId=TargetChannelID,order=”date”,type=”video”,maxResults=”50″,pageToken=””, publishedBefore= BeforeDate).execute()
for myvideo in res[‘items’]:
myVideoTitle = myvideo[“snippet”][“title”]
myVideoDate = myvideo[“snippet”][“publishedAt”]
myVideoId= myvideo[“id”][“videoId”]
print(myVideoTitle + “#” + myVideoId + “#” + myVideoDate)
BeforeDate=myVideoDate
with open(OutputFilename, ‘a’,encoding=”utf-8″) as file_object:
file_object.write(myVideoTitle + “#” + myVideoId + “#” + myVideoDate + “\n”)
if __name__ == ‘__main__’:
fetch_AllVideos_by_channelid(‘UCJFp8uSYCjXOMnkUyb3CQ3Q’,’C:\\Users\\Peter\\Desktop\\Extracted Videos.txt’,”2018-10-01T21:00:01.000Z”,48)
****CODE END
The weird thing is that using this method I can extract around 1300 videos aprox. But if I go this page “https://commentpicker.com/youtube-channel-id.php” and insert the channel ID of the channel, it tells me there are more than 2000 videos available.
My question is, is there any reliable way of getting the total amount of videos in the channel?
@Pedro,
If you just want to know the count of total videos in a channel then this api call does the trick:
https://www.googleapis.com/youtube/v3/channels?part=statistics&forUsername=VideoLecturesChannel&key={YOUR_API_KEY}
However if you want to get the complete list of the videos in the channel then your logic seems to be fine even though you are not using next token. The channel has about 2,300 videos.http://mediawarrior.com/channel/UCJFp8uSYCjXOMnkUyb3CQ3Q
One reason why all the videos are not coming could be because not all of them are marked as public. Apart from that, you will have to manually check if the api is skipping some videos. You will have to do some checking against the videos in the channel to see which is the last video it fetches before the loop terminates. Is it reaching the first video or is it stopping somewhere in between?
Hello again Amit,
First of all thank you so much, I wouldn’t been able to solve this without you, documentation on Youtube’s API is quite scarce in my view. And strangely enough it’s quite tough to find code examples in the web (let alone in python).
So the two things I have found:
1) There are such thing as private (and unlisted videos). These will show in the total count, but might not show when calling an API, depending on the function used, etc..
2) Im my case the above amounted to 60 videos, but didn’t explain the great difference between what I was retrieving (around 1300 videos) vs what the theoretical total (more than 2400).
This is the deal, apparently a same video can belong to more than one playlist. If this happens each (playlist-video) combination will be counted independently, causing doublecounting.
I’ve written the following code in Python. First function tells you the different playlists within a given youtube channel (and the number of videos contained in each one).
The second function loops through the different playlists obtainting all the videos in each and returning for each video Title, PlaylistID and VideoId.
This information can be easily pasted in Excel or similar separating the columns by the separator used “#” or “@”, in order to filter unique videos, etc…
******START CODE***
from apiclient.discovery import build
from apiclient.errors import HttpError
from oauth2client.tools import argparser
import json
DEVELOPER_KEY = “{insert developer key here}”
YOUTUBE_API_SERVICE_NAME = “youtube”
YOUTUBE_API_VERSION = “v3″
def Playlists_by_ChannelID(channelToSearch):
youtube = build(YOUTUBE_API_SERVICE_NAME,YOUTUBE_API_VERSION,developerKey=DEVELOPER_KEY)
response = youtube.playlists().list(part=”contentDetails,id,localizations,player,snippet,status”,channelId=channelToSearch,maxResults=50).execute()
for myPlaylist in response[“items”]:
myPlaylistId= myPlaylist[“id”]
myPlaylistTitle = myPlaylist[“snippet”][“title”]
myPlaylistCount= myPlaylist[“contentDetails”][“itemCount”]
print(myPlaylistTitle + “#” + myPlaylistId + “#” + “Number of videos “+ str (myPlaylistCount) )
#print (myPlaylist)
return (response)
def fetch_AllVideos_by_Through_Playlists(channelToSearch,OutputFilename):
youtube = build(YOUTUBE_API_SERVICE_NAME,YOUTUBE_API_VERSION,developerKey=DEVELOPER_KEY)
response = youtube.playlists().list(part=”contentDetails,id,localizations,player,snippet,status”,channelId=channelToSearch,maxResults=50).execute()
for myPlaylist in response[“items”]:
myPlaylistId= myPlaylist[“id”]
myPlaylistTitle = myPlaylist[“snippet”][“title”]
myPlaylistCount= myPlaylist[“contentDetails”][“itemCount”]
MyListOfVideos=youtube.playlistItems().list(part=”snippet”,playlistId=myPlaylistId,maxResults=”50″).execute()
nextPageToken = MyListOfVideos.get(‘nextPageToken’)
while (‘nextPageToken’ in MyListOfVideos):
nextPage = youtube.playlistItems().list(part=”snippet”,playlistId=myPlaylistId,maxResults=”50″,pageToken=nextPageToken).execute()
MyListOfVideos[‘items’] = MyListOfVideos[‘items’] + nextPage[‘items’]
if ‘nextPageToken’ not in nextPage:
MyListOfVideos.pop(‘nextPageToken’, None)
else:
nextPageToken = nextPage[‘nextPageToken’]
print (“\n\n”)
for MyVideo in MyListOfVideos[“items”]:
myVideoTitle=MyVideo[“snippet”][“title”]
myVideoId=MyVideo[“snippet”][“resourceId”][“videoId”]
#print(MyVideo)
print (myVideoTitle + “#” + myPlaylistId +”#” + myVideoId)
with open(OutputFilename, “a”,encoding=”utf-8″) as file_object:
file_object.write(myVideoTitle + “@” + myPlaylistId +”@” + myVideoId + “\n”)
return (response)
if __name__ == “__main__”:
#Playlists_by_ChannelID(“UCJFp8uSYCjXOMnkUyb3CQ3Q”)
fetch_AllVideos_by_Through_Playlists(“UCJFp8uSYCjXOMnkUyb3CQ3Q”,”C:\\Users\\Peter\\Desktop\\Extracted Videos.txt”)
******END CODE******
@Pedro
Great stuff. The API has some quirks which can catch developers unawares. Only way out in such case is to learn by experience.
Great article. Anyway remember that each Search call with Youtube API costs 102 units and you’ll consume the whole free daily amount of units (10000 units) in a blink. After that, you have to pay Google.
Yes that is true. The free tier cannot really help much in real-time apps unless a certain amount of caching is done.
You have studied so detailed! Thank you ! The ORDER, postAFTER & BEFORE params are the cool tools.