Temporary Disabled. :) please Go back ⚓ T236446 Cloud Services shared IP (static NAT for external communications) often rate limited by YouTube for video downloads www.fgks.org » Address: [go: up one dir, main page] Include Form Remove Scripts Accept Cookies Show Images Show Referer Rotate13 Base64 Strip Meta Strip Title Session Cookies Page MenuHomePhabricatorSearchConfigure Global SearchLog InCreate Task Maniphest T236446 Cloud Services shared IP (static NAT for external communications) often rate limited by YouTube for video downloadsOpen, Stalled, MediumPublicActionsEdit TaskEdit Related Tasks...Create SubtaskEdit Parent TasksEdit SubtasksMerge Duplicates InClose As DuplicateEdit Related Objects...Edit CommitsEdit MocksSubscribeMute NotificationsProtect as security issueAward TokenFlag For LaterAssigned ToNoneAuthored ByVictor_GrigasOct 25 2019, 12:35 AM2019-10-25 00:35:56 (UTC+0)Tagsvideo2commons (Backlog)Cloud-VPS (Unsorted)Upstream (Backlog)Tool-spacemedia (Backlog)cloud-services-team (Watching)Referenced FilesNoneSubscribersAinaliAklapperAlfa80ArielGlenn• AxelPettersson_WMSEBasebd808View All 40 SubscribersTokens"Heartbreak" token, awarded by Tulsi_Bhagat."Hungry Hippo" token, awarded by Don-vip."Cup of Joe" token, awarded by Coffee."The World Burns" token, awarded by czar."Hungry Hippo" token, awarded by Fae.DescriptionI've been having a consistent problem with video2commons today: "Error: An exception occurred: DownloadError: ERROR: bFbKgtZM9As: YouTube said: Unable to extract video data" Doesn't seem to matter which video it is, if it's a cc-licensed video or a public domain one. See also https://commons.wikimedia.org/wiki/Commons:Village_pump/Archive/2019/10#video2commons https://github.com/ytdl-org/youtube-dl/issues/23638 https://github.com/ytdl-org/youtube-dl/search?p=2&q=HTTP+Error+429%3A+Too+Many+Requests&type=Issues Related ObjectsMentionsDuplicatesMentioned In T242830: Video2Commons Error 116 "Stale file handle"T240414: Request for Floating/Public IP address for WikiLoopT236704: Import of the youtube channel "Les possédés et leurs mondes" Mentioned Here T254700: Citoid requests for YouTube metadata is giving 429: too many requests HTTP error Duplicates Merged Here T256672: HTTP Error 429T236705: video2commons broken after import attempt Event TimelineThere are a very large number of changes, so older changes are hidden. Show Older ChangesFae subscribed.Oct 28 2019, 1:38 PM2019-10-28 13:38:19 (UTC+0)Comment ActionsBy coincidence I (using Faebot) have been trying to run my CDC videos uploads from labs. The standard use of youtube-dl works directly from a terminal session, but when run on the grid engine I start getting WARNING: unable to download video info webpage: HTTP Error 429: Too Many Requests or the fatal (the youtube id is just a real example) youtube_dl.utils.DownloadError: ERROR: fWET2kNwdn8: YouTube said: Unable to extract video data The same 'DownloadError' can mean that the video is blocked in that region, or removed as a copyvio, but that is not the case for the CDC. The 'Too Many Requests' might be a combination of the specific WMF IP address plus the rapid querying of several playlists. However that's a bit odd considering that the code does work when not on the grid, unless the problem is that IP addresses used by the grid host are getting blocked by YouTube/Google while the IP addresses used via live sessions are not. Note that I'm continuing to try from a command line, but as the recoding (mp4/mkv to webm) may take >12 hours for some videos, that's means I'm locked out of running a terminal on labs while the project runs, plus it's against the guidelines of how labs is supposed to be used by us volunteers... Anyone interested in checking the specific Python code can find it on /mnt/nfs/labstore-secondary-tools-project/faebot/pywikibot-core/scripts/Youtube_CDC2.pyFae awarded a token.Oct 28 2019, 1:38 PM2019-10-28 13:38:52 (UTC+0)zhuyifei1999 added a comment.Oct 28 2019, 4:41 PM2019-10-28 16:41:09 (UTC+0)Comment Actions In T236446#5611391, @Fae wrote: However that's a bit odd considering that the code does work when not on the grid, unless the problem is that IP addresses used by the grid host are getting blocked by YouTube/Google while the IP addresses used via live sessions are not. Bastions have floating public IPs so it could open port 22 to the public and you could ssh in directly without a jump host. Grid exec nodes are behind a cloud-wide NAT and share a single public IP.zhuyifei1999 merged a task: T236705: video2commons broken after import attempt.Oct 28 2019, 4:42 PM2019-10-28 16:42:55 (UTC+0)zhuyifei1999 added a subscriber: Lionel_Scheepmans.• Phamhi subscribed.Oct 29 2019, 11:47 AM2019-10-29 11:47:49 (UTC+0)Comment Actions@Fae .... Try running it in one of the kubernetes python shell webservice --backend=kubernetes python shell ~/.virtualenvs/cdc/bin/python ~/pywikibot-core/pwb.py Youtube_CDC_remoteFae added a comment.Oct 29 2019, 3:43 PM2019-10-29 15:43:49 (UTC+0)Comment Actions@Phamhi good suggestion. Have not managed to get it to work so far. The Python script drops out without warning, even though I guess in theory the shell should behave in an identical way.zhuyifei1999 added a comment.Oct 29 2019, 3:51 PM2019-10-29 15:51:30 (UTC+0)Comment Actions In T236446#5614742, @Phamhi wrote: @Fae .... Try running it in one of the kubernetes python shell v2c runs from k8s and receives the same message.Matanya subscribed.Oct 29 2019, 7:16 PM2019-10-29 19:16:21 (UTC+0)bd808 renamed this task from Consistent errors from Video2Commons from YouTube to Cloud Services shared IP (static NAT for external communications) often rate limited by YouTube for video downloads.Oct 29 2019, 8:38 PM2019-10-29 20:38:34 (UTC+0)bd808 triaged this task as Medium priority.bd808 added projects: Cloud-VPS, cloud-services-team (Kanban).bd808 moved this task from Inbox to Watching on the cloud-services-team (Kanban) board.Fae added a comment.Nov 1 2019, 5:09 AM2019-11-01 05:09:58 (UTC+0)This comment was removed by Fae.bd808 subscribed.Nov 1 2019, 5:24 PM2019-11-01 17:24:22 (UTC+0)Comment ActionsThe ideal solution is obviously getting the Cloud VPS NAT IP a higher quota upstream with YouTube, but maybe we can find a way to get some things working in advance of that. @zhuyifei1999 Does v2c typically do the downloads on Toolforge, or are the instances in the video Cloud VPS project actually doing that work? If it is the latter, we could try a temporary solution of adding public IPv4 addresses to the video instances to spread across more IPs which would hopefully give a larger quota from YouTube.zhuyifei1999 added a comment.Nov 1 2019, 5:29 PM2019-11-01 17:29:05 (UTC+0)Comment Actions In T236446#5627250, @bd808 wrote: @zhuyifei1999 Does v2c typically do the downloads on Toolforge, or are the instances in the video Cloud VPS project actually doing that work? If it is the latter, we could try a temporary solution of adding public IPv4 addresses to the video instances to spread across more IPs which would hopefully give a larger quota from YouTube. Toolforge instances (k8s pods) fetch metadata, the encoding cluster does both metadata fetching and actual downloading. The fetch metadata part already hit Error 429.zhuyifei1999 added a comment.Nov 1 2019, 7:13 PM2019-11-01 19:13:22 (UTC+0)Comment ActionsLooks like the rate limit is currently lifted :)Fae added a comment.Nov 5 2019, 10:58 AM2019-11-05 10:58:14 (UTC+0)Comment ActionsIt seems impossible for me to use WMF cloud services to do the CDC video recoding. I have reverted to running an old mac mini as a headless server, which itself has experienced the YouTube "too many requests" problem, but my understanding is that this gets lifted after a day or two anyway. If someone can explain how I can legitimately run an FFmpeg recoding job on webgrid and save the files on a WMF server, that would be useful, but this experience, including getting warnings about my work, has seriously discouraged me from relying on WMF cloud services in the future, primarily because of the massive waste of precious volunteer time it takes to keep on testing and rewriting code to fit in with ever changing, non-specific and hard to understand "requirements" of this environment, compared to simply hosting a script on my own ancient kit.zhuyifei1999 added a comment.Nov 5 2019, 5:26 PM2019-11-05 17:26:03 (UTC+0)Comment Actions In T236446#5634970, @Fae wrote: If someone can explain how I can legitimately run an FFmpeg recoding job on webgrid and save the files on a WMF server, that would be useful, but this experience, including getting warnings about my work, has seriously discouraged me from relying on WMF cloud services in the future, primarily because of the massive waste of precious volunteer time it takes to keep on testing and rewriting code to fit in with ever changing, non-specific and hard to understand "requirements" of this environment, compared to simply hosting a script on my own ancient kit. Can you use video2commons? I don't know what CDC does, but I'd assume that is the download + transcode + upload, which is the same as what v2c does in the backend, which is on special-purpose instances, unlike toolforge's generic grid.zhuyifei1999 added a comment.Nov 6 2019, 3:27 AM2019-11-06 03:27:10 (UTC+0)Comment Actions In T236446#5637051, @zhuyifei1999 wrote: Can you use video2commons? I don't know what CDC does, but I'd assume that is the download + transcode + upload, which is the same as what v2c does in the backend, which is on special-purpose instances, unlike toolforge's generic grid. Looks like the rate limit is in effect again.kaldari subscribed.Nov 7 2019, 5:42 PM2019-11-07 17:42:25 (UTC+0)Comment ActionsI'll bring this up with the WMF partner folks.Fae added a comment.Edited · Nov 7 2019, 10:44 PM2019-11-07 22:44:51 (UTC+0)Comment ActionsMy experience running locally is that the YouTube IP block lasts around 2½ days. I can queue my processing, and let my programme keep testing the connection every few hours, but it's not reasonable for the average Commons user to see nothing happening for that long. Update: It turns out that YouTube has escalating IP blocks. The second time around for the CDC uploads was a 5 day block, so it's fair to presume that any IP will rapidly become unusable. Either someone can work out what the maximum bandwidth/access thresholds are for pulling video information so any tool can stay just under that, or if YouTube is incommunicado, we have to consider the site hostile for any tool that can do this task.zhuyifei1999 mentioned this in T236704: Import of the youtube channel "Les possédés et leurs mondes".Nov 11 2019, 6:03 PM2019-11-11 18:03:16 (UTC+0)Ainali subscribed.Nov 23 2019, 11:13 PM2019-11-23 23:13:01 (UTC+0)• AxelPettersson_WMSE subscribed.Nov 26 2019, 8:51 AM2019-11-26 08:51:09 (UTC+0)JTs subscribed.Nov 26 2019, 7:06 PM2019-11-26 19:06:31 (UTC+0)Victor_Grigas added a comment.Dec 9 2019, 3:20 AM2019-12-09 03:20:30 (UTC+0)Comment ActionsIs anyone working on this?Kizule subscribed.Dec 9 2019, 3:25 AM2019-12-09 03:25:50 (UTC+0)Comment Actions In T236446#5722331, @Victor_Grigas wrote: Is anyone working on this? Task is assigned to @zhuyifei1999 so he should respond :)zhuyifei1999 reassigned this task from zhuyifei1999 to Matanya.Dec 9 2019, 3:54 AM2019-12-09 03:54:42 (UTC+0)Comment ActionsI was informed by @Matanya last Wednesday that Google will respond to us in 1-2 weeks.Masumrezarock100 subscribed.Dec 9 2019, 5:27 AM2019-12-09 05:27:30 (UTC+0)zhuyifei1999 changed the task status from Open to Stalled.Dec 11 2019, 6:31 AM2019-12-11 06:31:20 (UTC+0)Comment ActionsThey think it's a ToS violation... so... this gotta be difficult.Victor_Grigas added a comment.Dec 11 2019, 2:21 PM2019-12-11 14:21:23 (UTC+0)Comment ActionsDamn, maybe get WMF legal to talk to them? I don't know how you can have cc-licensed videos on a site and think it's a ToS violation for them to be downloaded. Sounds like a knee-jerk reaction (of legit copyright infringement) to me.Theklan subscribed.Dec 11 2019, 3:04 PM2019-12-11 15:04:50 (UTC+0)Fae added a comment.Dec 11 2019, 7:37 PM2019-12-11 19:37:33 (UTC+0)Comment ActionsIn the meantime, it would be really useful to find a definition of the throttle limits for the service. If we are given a service level guide, like "20 video information queries in an hour", then at least it may be possible to manage our own queue and avoid IP blocks if we stay within it, or reliably farm out the queue if that is an acceptable practice.czar awarded a token.Dec 16 2019, 11:35 PM2019-12-16 23:35:33 (UTC+0)czar subscribed.zhuyifei1999 added a comment.Dec 18 2019, 10:26 AM2019-12-18 10:26:44 (UTC+0)Comment ActionsGoogle isn't responding (they probably don't have the incentive to), gonna wait for a few more days. If it stays like this, I'm gonna get a massive overhaul to how v2c download from YouTube. sneek peek: slimerjs + x11vncaborrero mentioned this in T240414: Request for Floating/Public IP address for WikiLoop.Dec 18 2019, 10:38 AM2019-12-18 10:38:06 (UTC+0)Matanya mentioned this in T242830: Video2Commons Error 116 "Stale file handle".Jan 15 2020, 8:38 PM2020-01-15 20:38:08 (UTC+0)bd808 added a project: Upstream.Jan 17 2020, 9:35 PM2020-01-17 21:35:25 (UTC+0)bd808 updated the task description. (Show Details)Xinbenlv subscribed.Jan 24 2020, 6:42 PM2020-01-24 18:42:34 (UTC+0)Chicocvenancio added subscribers: Sturm, Chicocvenancio.Jan 28 2020, 3:12 PM2020-01-28 15:12:55 (UTC+0)Comment Actions In T236446#5750583, @zhuyifei1999 wrote: Google isn't responding (they probably don't have the incentive to), gonna wait for a few more days. If it stays like this, I'm gonna get a massive overhaul to how v2c download from YouTube. sneek peek: slimerjs + x11vnc Need help with this, @zhuyifei1999? @Sturm called my attention to this task yesterday and indeed it seems like a very significant disruption that Google is imposing on us here. @Xinbenlv is there anything you can do to help us here? I want to avoid using technical measures to make it seem as though our traffic comes from other sources, but I am starting to consider that solution as a viable option. Will that be a necessary step here? Is there anything you need from us to possibly expedite communications on Google's side?ArielGlenn subscribed.Jan 28 2020, 3:57 PM2020-01-28 15:57:14 (UTC+0)Krenair subscribed.Jan 28 2020, 8:22 PM2020-01-28 20:22:03 (UTC+0)Ixocactus subscribed.Jan 29 2020, 2:08 AM2020-01-29 02:08:58 (UTC+0)Coffee awarded a token.Jan 29 2020, 12:05 PM2020-01-29 12:05:35 (UTC+0)Coffee subscribed.Alfa80 subscribed.Feb 5 2020, 6:58 PM2020-02-05 18:58:06 (UTC+0)Cabeza2000 subscribed.Feb 6 2020, 12:45 PM2020-02-06 12:45:09 (UTC+0)Keegan subscribed.Feb 7 2020, 1:47 AM2020-02-07 01:47:36 (UTC+0)bd808 updated the task description. (Show Details)Feb 7 2020, 3:24 AM2020-02-07 03:24:58 (UTC+0)Varnent subscribed.Feb 14 2020, 1:21 AM2020-02-14 01:21:28 (UTC+0)Comment ActionsHello all, Thank you for your comments and insights with respect to the recent suspension of API access that the Video2Commons tool was using. We share your desire to address this issue. The Wikimedia Foundation's Partnerships team, which is responsible for maintaining our long-term relationships with entities like Google/Alphabet/YouTube, has met with our contacts at YouTube to discuss this issue. We have not worked out any specifics, but they are indeed interested in working with us on a resolution. As soon as we have more information and details to share, we will make them available here. We will continue to talk with our engineering teams on a long-term solution which will hopefully allow for a more streamlined way to upload Creative Commons license videos to Commons. Thank you all for your interest and your patience while we work to inform our contacts at YouTube on the importance of this issue and explore a way forward. The Foundation respects the position many of you have taken, and agree that any resolution with YouTube should capture both the spirit and stated intent of Creative Commons licenses.Lionel_Scheepmans added a comment.Edited · Feb 29 2020, 6:48 PM2020-02-29 18:48:09 (UTC+0)Comment Actions@Varnet, The last time I was on Youtube I saw that they have removed the possibility to publish videos in another free license than CC0. Youtube and Google are certainly anticipating a possible paid access to the Wikidata and commons APIs. They read Wikimedia grant strategy 2020-2030 as everyone... I don't know if anyone share my point of view, but I have the feeling than we're on the way to lose the vision of the free software movement and maybe the soul of the initial WWW in wich Wikimedia movement is one of the most significant survivor ... The CC0 license (on wikidata and commons) is already a godsend for all companies that already have a monopoly on the users of online services. With CC0, they can use the work of Wikimedia projects' volunteers without legal problems for making new copyrighted services. The free software movement, and the Wikimedia movement miss a tool to acomplish there mission : a CC.SA licence. Only the copyleft can ensure that a Wikimedia contributor will never have to use a Google accompt and thus become the company's commercial product in order to be able to effectively use the amount of work that he and other digital workers will have provided. Here is the missed licence retired in 2004 for inadequate demand : https://creativecommons.org/licenses/sa/1.0/Victor_Grigas added a comment.Apr 9 2020, 2:17 AM2020-04-09 02:17:12 (UTC+0)Comment ActionsJust curious if there is an update here? No rush of course!Strainu subscribed.Apr 12 2020, 9:39 PM2020-04-12 21:39:14 (UTC+0)RhinosF1 subscribed.May 10 2020, 7:25 AM2020-05-10 07:25:16 (UTC+0)VisbyStar subscribed.May 13 2020, 7:25 PM2020-05-13 19:25:41 (UTC+0)Don-vip awarded a token.May 18 2020, 1:36 PM2020-05-18 13:36:56 (UTC+0)Don-vip added a project: Tool-spacemedia.May 18 2020, 1:45 PM2020-05-18 13:45:45 (UTC+0)Don-vip subscribed.Comment ActionsI'm also facing this issue in my tool. It runs on Toolforge/k8s to detect new CC videos published by Arianespace, using YouTube API, and faces HTTP 429 errors too.Mvolz subscribed.Jun 9 2020, 9:34 AM2020-06-09 09:34:14 (UTC+0)Comment ActionsThis is now affecting citoid too, we're unable to get metadata from the page to cite youtube videos in references :/ T254700Base subscribed.Jun 14 2020, 9:15 PM2020-06-14 21:15:13 (UTC+0)Chicocvenancio merged a task: T256672: HTTP Error 429.Jun 29 2020, 6:44 PM2020-06-29 18:44:58 (UTC+0)Chicocvenancio added a subscriber: Elisfkc.bd808 removed Matanya as the assignee of this task.Oct 2 2020, 12:25 AM2020-10-02 00:25:24 (UTC+0)bd808 added a subscriber: Yael-weissburg.Comment Actions In T236446#5883909, @Varnent wrote: As soon as we have more information and details to share, we will make them available here. We will continue to talk with our engineering teams on a long-term solution which will hopefully allow for a more streamlined way to upload Creative Commons license videos to Commons. This message from @Yael-weissburg on T254700: Citoid requests for YouTube metadata is giving 429: too many requests HTTP error was intended for this bug report: In T254700#6510988, @Yael-weissburg wrote: Hello All, Yael here, Director of Strategic Partnerships at the Foundation. First, I want to apologize for the painfully long time it has taken me or anyone on my team to follow up from @Varnent's previous message. At first, the reason for the delay was ongoing (initially hopeful, productive-seeming) conversations with YouTube. More recently, the reason was... 2020 life getting in the way and me dropping the ball on communicate back to you all. Unfortunately, my update is not what I would have wanted to share. Despite ongoing conversations (involving folks from Partnerships, Product and Legal from both organizations), we were not able to reach any resolution in our discussions with YouTube about this issue, and, unfortunately, I do not expect any changes from them coming in the future. I'm personally disappointed about this, as I feel we offered them a potential way to work closely with the movement and support our mission with little risk or downside to their business. Unfortunately, they have chosen not to prioritize this at the moment. I'm sorry I don't have happier news, and thank you to all of you who continued to try to find a solution. Again, my apologies that I haven't been more communicative about this (and thanks to @bd808 for continuing to nudge me. Please don't hesitate to reach out to me directly at yweissburg@wikimedia.org if you have any questions. Cheers, Yael Tulsi_Bhagat awarded a token.Oct 4 2020, 8:31 AM2020-10-04 08:31:13 (UTC+0)Tulsi_Bhagat subscribed.Thibaut120094 subscribed.Jan 30 2021, 10:02 PM2021-01-30 22:02:08 (UTC+0)Parzeus subscribed.May 20 2021, 10:59 PM2021-05-20 22:59:05 (UTC+0)Parzeus added a comment.May 20 2021, 11:02 PM2021-05-20 23:02:09 (UTC+0)Comment ActionsHey, just passing to say that this problem seems to be ocurring yet again. I don't know much where we're standing on the relationship with YouTube and if there is some way to resolve this issue, but well, it had been working for the last weeks, and since the start of the week this error started ocurring.Lionel_Scheepmans added a comment.Sep 16 2021, 8:01 PM2021-09-16 20:01:04 (UTC+0)Comment ActionsI've just try to use video2commons one more time, and it still doesn't work... (see this screenshot)) Just a question to @Yael-weissburg, Wikimedia foundation staff, and other participants to this topic. Why Wikimedia Foundation are implementing Wikimedia Enterprise, a commercial project to help big tech companies to use more easily of Wikimedia projects contain for profits, when this same companies don't collaborate with us to increase our own contain by a fair way as video2common tool ? @Varnent , @Don-vip , @Mvolz , do you have an idea ? Sometimes I wonder if the foundation is aware of the dangerous game it plays with big tech companies...LWyatt subscribed.Sep 17 2021, 3:33 PM2021-09-17 15:33:04 (UTC+0)Comment Actions@Lionel_Scheepmans I am not sure what the relevance of the Meta RfC that you opened - and was recently closed by community-consensus - relating to the existence of the Wikimedia Enterprise project has to this discussion. Sometimes I wonder if the foundation is aware of the dangerous game it plays with big tech companies... This argument appears to be a fallacy - connecting this specific technical topic to a separate topic which has independent [valid] considerations. It is a truism that the companies you're referring to are already using Wikimedia projects' content for commercial profit, and that they have that right as per our free-licenses. Furthermore, they will continue to do so, independently of whether an API built for the speed/volume needs of commercial organisations is created. The argument that Wikimedia should not work with 'big tech companies' is moot - since they are already using our content and we (as is evidenced by this Phab. ticket) are already wanting to use theirs. The outcome that their paying for the [optional] Enterprise API would create is that Wikimedia would no longer be financially subsiding "big tech's" business model - instead, they would be financially supporting our movement. But - these are quasi-philosophical issues independent from the specific technical concern of this Ticket. As you have previously raised your concerns about Enterprise in the projects's Meta talkpage (and the aforementioned RFC), I invite you to add any new comments on those threads.Cabeza2000 unsubscribed.Sep 17 2021, 3:37 PM2021-09-17 15:37:37 (UTC+0)Yael-weissburg added a comment.Sep 17 2021, 11:35 PM2021-09-17 23:35:18 (UTC+0)Comment ActionsHi All, Unfortunately, as I noted in an email to Jos Damen on October 19th when he flagged this for me again, there's nothing much we can do here from a leverage perspective. I'm copying my email to Jos below. I'll let @LWyatt's comments on Enterprise stand. Let me know if you have any questions, and I'm sorry this isn't something that we can change - I really wish we could. Best, Yael Hi Jos, Thanks for reaching out. Unfortunately, as you have noted, YouTube has indicated that they won't work with us to make an exception for Video2Commons. As I noted on the Phab ticket, this was deeply disappointing to me, and I think they made the wrong call (both ethically and from a business / PR perspective). In consultation with our legal counsel, we have made clear to YouTube that if the community chooses to fight this battle, we will not support YouTube's position. I'm not sure what effect your public advocacy will have, to be honest, but as the person responsible for managing the overall relationship with YouTube from WMF, I support your efforts. I would have used this as another prompt to go back to YouTube and make the case, but having done so three times I realize that the org-to-org negotiation is not going to bear fruit. Happy to connect further if you'd find it useful. Best, YaelLionel_Scheepmans added a comment.Dec 8 2021, 10:12 AM2021-12-08 10:12:50 (UTC+0)Comment Actions@LWyatt you're right. The unfair attitude of Google is not a technical issue. And the debate have to take place outside Phabricator. Also, I've spend 30 minutes talking with Lane. It was very interesting and reassuring about the spirit of the project Wikimedia enterprise. Keep in mine than I'm not again this projet, just worried about the movement entering the world of commercial affaire that could be unfaire some time sa we see concerning youtuble API and for sure fare away of the Wikiphilosophy. Thanks, @Yael-weissburg for your feedback. Let we see later how can I continue, may be with an article somewhere that could touch the public Google image. Thanks to both of you !fnegri edited projects, added cloud-services-team; removed cloud-services-team (Kanban).Jan 18 2023, 6:41 PM2023-01-18 18:41:00 (UTC+0)fnegri moved this task from Kanban to Watching on the cloud-services-team board.Chicocvenancio added a comment.Dec 30 2023, 2:43 PM2023-12-30 14:43:04 (UTC+0)Comment ActionsFYI google has started to block downloads from video2commons again since around 2023-12-29 22:30.Yann subscribed.Dec 31 2023, 8:50 PM2023-12-31 20:50:01 (UTC+0)Yann added a comment.Dec 31 2023, 9:29 PM2023-12-31 21:29:51 (UTC+0)Comment Actions In T236446#9428102, @Chicocvenancio wrote: FYI google has started to block downloads from video2commons again since around 2023-12-29 22:30. FYI trying to download https://www.youtube.com/watch?v=ecQWZWpwZVw locally, Youtube Downloader HD ( https://www.youtubedownloaderhd.com/download.html ) doesn't work, but yt-dlp ( https://github.com/yt-dlp/yt-dlp ) does work. Does YT block V2C by IP or by software ID, or both, or something else?Chicocvenancio added a comment.Jan 2 2024, 3:27 PM2024-01-02 15:27:06 (UTC+0)Comment Actions In T236446#9428663, @Yann wrote: FYI trying to download https://www.youtube.com/watch?v=ecQWZWpwZVw locally, Youtube Downloader HD ( https://www.youtubedownloaderhd.com/download.html ) doesn't work, but yt-dlp ( https://github.com/yt-dlp/yt-dlp ) does work. Does YT block V2C by IP or by software ID, or both, or something else? It is not entirely clear. Passing good youtube auth cookies in cloud VPS does seem to work, so it is not a full block of the ip now, but it is likely the an account would be restricted as well if we start using one.Yann added a comment.Feb 28 2024, 11:30 AM2024-02-28 11:30:14 (UTC+0)Comment ActionsIt seems transfer from YT works again, i.e. today https://commons.wikimedia.org/wiki/File:Fake_News_in_the_18th_Century_-_Collection_in_Focus_-_British_Library.webmLog In to Comment
I've been having a consistent problem with video2commons today:
"Error: An exception occurred: DownloadError: ERROR: bFbKgtZM9As: YouTube said: Unable to extract video data"
Doesn't seem to matter which video it is, if it's a cc-licensed video or a public domain one.
By coincidence I (using Faebot) have been trying to run my CDC videos uploads from labs. The standard use of youtube-dl works directly from a terminal session, but when run on the grid engine I start getting
WARNING: unable to download video info webpage: HTTP Error 429: Too Many Requests
or the fatal (the youtube id is just a real example)
youtube_dl.utils.DownloadError: ERROR: fWET2kNwdn8: YouTube said: Unable to extract video data
The same 'DownloadError' can mean that the video is blocked in that region, or removed as a copyvio, but that is not the case for the CDC.
The 'Too Many Requests' might be a combination of the specific WMF IP address plus the rapid querying of several playlists. However that's a bit odd considering that the code does work when not on the grid, unless the problem is that IP addresses used by the grid host are getting blocked by YouTube/Google while the IP addresses used via live sessions are not.
Note that I'm continuing to try from a command line, but as the recoding (mp4/mkv to webm) may take >12 hours for some videos, that's means I'm locked out of running a terminal on labs while the project runs, plus it's against the guidelines of how labs is supposed to be used by us volunteers...
Anyone interested in checking the specific Python code can find it on /mnt/nfs/labstore-secondary-tools-project/faebot/pywikibot-core/scripts/Youtube_CDC2.py
In T236446#5611391, @Fae wrote: However that's a bit odd considering that the code does work when not on the grid, unless the problem is that IP addresses used by the grid host are getting blocked by YouTube/Google while the IP addresses used via live sessions are not.
However that's a bit odd considering that the code does work when not on the grid, unless the problem is that IP addresses used by the grid host are getting blocked by YouTube/Google while the IP addresses used via live sessions are not.
Bastions have floating public IPs so it could open port 22 to the public and you could ssh in directly without a jump host. Grid exec nodes are behind a cloud-wide NAT and share a single public IP.
@Fae .... Try running it in one of the kubernetes python shell
webservice --backend=kubernetes python shell ~/.virtualenvs/cdc/bin/python ~/pywikibot-core/pwb.py Youtube_CDC_remote
@Phamhi good suggestion. Have not managed to get it to work so far. The Python script drops out without warning, even though I guess in theory the shell should behave in an identical way.
In T236446#5614742, @Phamhi wrote: @Fae .... Try running it in one of the kubernetes python shell
v2c runs from k8s and receives the same message.
The ideal solution is obviously getting the Cloud VPS NAT IP a higher quota upstream with YouTube, but maybe we can find a way to get some things working in advance of that.
@zhuyifei1999 Does v2c typically do the downloads on Toolforge, or are the instances in the video Cloud VPS project actually doing that work? If it is the latter, we could try a temporary solution of adding public IPv4 addresses to the video instances to spread across more IPs which would hopefully give a larger quota from YouTube.
In T236446#5627250, @bd808 wrote: @zhuyifei1999 Does v2c typically do the downloads on Toolforge, or are the instances in the video Cloud VPS project actually doing that work? If it is the latter, we could try a temporary solution of adding public IPv4 addresses to the video instances to spread across more IPs which would hopefully give a larger quota from YouTube.
Toolforge instances (k8s pods) fetch metadata, the encoding cluster does both metadata fetching and actual downloading. The fetch metadata part already hit Error 429.
Looks like the rate limit is currently lifted :)
It seems impossible for me to use WMF cloud services to do the CDC video recoding. I have reverted to running an old mac mini as a headless server, which itself has experienced the YouTube "too many requests" problem, but my understanding is that this gets lifted after a day or two anyway.
If someone can explain how I can legitimately run an FFmpeg recoding job on webgrid and save the files on a WMF server, that would be useful, but this experience, including getting warnings about my work, has seriously discouraged me from relying on WMF cloud services in the future, primarily because of the massive waste of precious volunteer time it takes to keep on testing and rewriting code to fit in with ever changing, non-specific and hard to understand "requirements" of this environment, compared to simply hosting a script on my own ancient kit.
In T236446#5634970, @Fae wrote: If someone can explain how I can legitimately run an FFmpeg recoding job on webgrid and save the files on a WMF server, that would be useful, but this experience, including getting warnings about my work, has seriously discouraged me from relying on WMF cloud services in the future, primarily because of the massive waste of precious volunteer time it takes to keep on testing and rewriting code to fit in with ever changing, non-specific and hard to understand "requirements" of this environment, compared to simply hosting a script on my own ancient kit.
Can you use video2commons? I don't know what CDC does, but I'd assume that is the download + transcode + upload, which is the same as what v2c does in the backend, which is on special-purpose instances, unlike toolforge's generic grid.
In T236446#5637051, @zhuyifei1999 wrote: Can you use video2commons? I don't know what CDC does, but I'd assume that is the download + transcode + upload, which is the same as what v2c does in the backend, which is on special-purpose instances, unlike toolforge's generic grid.
Looks like the rate limit is in effect again.
I'll bring this up with the WMF partner folks.
My experience running locally is that the YouTube IP block lasts around 2½ days. I can queue my processing, and let my programme keep testing the connection every few hours, but it's not reasonable for the average Commons user to see nothing happening for that long.
Update: It turns out that YouTube has escalating IP blocks. The second time around for the CDC uploads was a 5 day block, so it's fair to presume that any IP will rapidly become unusable. Either someone can work out what the maximum bandwidth/access thresholds are for pulling video information so any tool can stay just under that, or if YouTube is incommunicado, we have to consider the site hostile for any tool that can do this task.
Is anyone working on this?
In T236446#5722331, @Victor_Grigas wrote: Is anyone working on this?
Task is assigned to @zhuyifei1999 so he should respond :)
I was informed by @Matanya last Wednesday that Google will respond to us in 1-2 weeks.
They think it's a ToS violation... so... this gotta be difficult.
Damn, maybe get WMF legal to talk to them? I don't know how you can have cc-licensed videos on a site and think it's a ToS violation for them to be downloaded. Sounds like a knee-jerk reaction (of legit copyright infringement) to me.
In the meantime, it would be really useful to find a definition of the throttle limits for the service. If we are given a service level guide, like "20 video information queries in an hour", then at least it may be possible to manage our own queue and avoid IP blocks if we stay within it, or reliably farm out the queue if that is an acceptable practice.
Google isn't responding (they probably don't have the incentive to), gonna wait for a few more days. If it stays like this, I'm gonna get a massive overhaul to how v2c download from YouTube. sneek peek: slimerjs + x11vnc
In T236446#5750583, @zhuyifei1999 wrote: Google isn't responding (they probably don't have the incentive to), gonna wait for a few more days. If it stays like this, I'm gonna get a massive overhaul to how v2c download from YouTube. sneek peek: slimerjs + x11vnc
Need help with this, @zhuyifei1999? @Sturm called my attention to this task yesterday and indeed it seems like a very significant disruption that Google is imposing on us here.
@Xinbenlv is there anything you can do to help us here? I want to avoid using technical measures to make it seem as though our traffic comes from other sources, but I am starting to consider that solution as a viable option. Will that be a necessary step here? Is there anything you need from us to possibly expedite communications on Google's side?
Hello all,
Thank you for your comments and insights with respect to the recent suspension of API access that the Video2Commons tool was using. We share your desire to address this issue. The Wikimedia Foundation's Partnerships team, which is responsible for maintaining our long-term relationships with entities like Google/Alphabet/YouTube, has met with our contacts at YouTube to discuss this issue. We have not worked out any specifics, but they are indeed interested in working with us on a resolution.
As soon as we have more information and details to share, we will make them available here. We will continue to talk with our engineering teams on a long-term solution which will hopefully allow for a more streamlined way to upload Creative Commons license videos to Commons.
Thank you all for your interest and your patience while we work to inform our contacts at YouTube on the importance of this issue and explore a way forward. The Foundation respects the position many of you have taken, and agree that any resolution with YouTube should capture both the spirit and stated intent of Creative Commons licenses.
@Varnet,
The last time I was on Youtube I saw that they have removed the possibility to publish videos in another free license than CC0. Youtube and Google are certainly anticipating a possible paid access to the Wikidata and commons APIs. They read Wikimedia grant strategy 2020-2030 as everyone...
I don't know if anyone share my point of view, but I have the feeling than we're on the way to lose the vision of the free software movement and maybe the soul of the initial WWW in wich Wikimedia movement is one of the most significant survivor ... The CC0 license (on wikidata and commons) is already a godsend for all companies that already have a monopoly on the users of online services. With CC0, they can use the work of Wikimedia projects' volunteers without legal problems for making new copyrighted services.
The free software movement, and the Wikimedia movement miss a tool to acomplish there mission : a CC.SA licence. Only the copyleft can ensure that a Wikimedia contributor will never have to use a Google accompt and thus become the company's commercial product in order to be able to effectively use the amount of work that he and other digital workers will have provided.
Here is the missed licence retired in 2004 for inadequate demand : https://creativecommons.org/licenses/sa/1.0/
Just curious if there is an update here? No rush of course!
I'm also facing this issue in my tool. It runs on Toolforge/k8s to detect new CC videos published by Arianespace, using YouTube API, and faces HTTP 429 errors too.
This is now affecting citoid too, we're unable to get metadata from the page to cite youtube videos in references :/ T254700
In T236446#5883909, @Varnent wrote: As soon as we have more information and details to share, we will make them available here. We will continue to talk with our engineering teams on a long-term solution which will hopefully allow for a more streamlined way to upload Creative Commons license videos to Commons.
This message from @Yael-weissburg on T254700: Citoid requests for YouTube metadata is giving 429: too many requests HTTP error was intended for this bug report:
In T254700#6510988, @Yael-weissburg wrote: Hello All, Yael here, Director of Strategic Partnerships at the Foundation. First, I want to apologize for the painfully long time it has taken me or anyone on my team to follow up from @Varnent's previous message. At first, the reason for the delay was ongoing (initially hopeful, productive-seeming) conversations with YouTube. More recently, the reason was... 2020 life getting in the way and me dropping the ball on communicate back to you all. Unfortunately, my update is not what I would have wanted to share. Despite ongoing conversations (involving folks from Partnerships, Product and Legal from both organizations), we were not able to reach any resolution in our discussions with YouTube about this issue, and, unfortunately, I do not expect any changes from them coming in the future. I'm personally disappointed about this, as I feel we offered them a potential way to work closely with the movement and support our mission with little risk or downside to their business. Unfortunately, they have chosen not to prioritize this at the moment. I'm sorry I don't have happier news, and thank you to all of you who continued to try to find a solution. Again, my apologies that I haven't been more communicative about this (and thanks to @bd808 for continuing to nudge me. Please don't hesitate to reach out to me directly at yweissburg@wikimedia.org if you have any questions. Cheers, Yael
Hello All,
Yael here, Director of Strategic Partnerships at the Foundation.
First, I want to apologize for the painfully long time it has taken me or anyone on my team to follow up from @Varnent's previous message. At first, the reason for the delay was ongoing (initially hopeful, productive-seeming) conversations with YouTube. More recently, the reason was... 2020 life getting in the way and me dropping the ball on communicate back to you all.
Unfortunately, my update is not what I would have wanted to share. Despite ongoing conversations (involving folks from Partnerships, Product and Legal from both organizations), we were not able to reach any resolution in our discussions with YouTube about this issue, and, unfortunately, I do not expect any changes from them coming in the future.
I'm personally disappointed about this, as I feel we offered them a potential way to work closely with the movement and support our mission with little risk or downside to their business. Unfortunately, they have chosen not to prioritize this at the moment.
I'm sorry I don't have happier news, and thank you to all of you who continued to try to find a solution. Again, my apologies that I haven't been more communicative about this (and thanks to @bd808 for continuing to nudge me.
Please don't hesitate to reach out to me directly at yweissburg@wikimedia.org if you have any questions.
Cheers,
Yael
Hey, just passing to say that this problem seems to be ocurring yet again. I don't know much where we're standing on the relationship with YouTube and if there is some way to resolve this issue, but well, it had been working for the last weeks, and since the start of the week this error started ocurring.
I've just try to use video2commons one more time, and it still doesn't work... (see this screenshot))
Just a question to @Yael-weissburg, Wikimedia foundation staff, and other participants to this topic. Why Wikimedia Foundation are implementing Wikimedia Enterprise, a commercial project to help big tech companies to use more easily of Wikimedia projects contain for profits, when this same companies don't collaborate with us to increase our own contain by a fair way as video2common tool ?
@Varnent , @Don-vip , @Mvolz , do you have an idea ? Sometimes I wonder if the foundation is aware of the dangerous game it plays with big tech companies...
@Lionel_Scheepmans I am not sure what the relevance of the Meta RfC that you opened - and was recently closed by community-consensus - relating to the existence of the Wikimedia Enterprise project has to this discussion.
Sometimes I wonder if the foundation is aware of the dangerous game it plays with big tech companies...
This argument appears to be a fallacy - connecting this specific technical topic to a separate topic which has independent [valid] considerations. It is a truism that the companies you're referring to are already using Wikimedia projects' content for commercial profit, and that they have that right as per our free-licenses. Furthermore, they will continue to do so, independently of whether an API built for the speed/volume needs of commercial organisations is created. The argument that Wikimedia should not work with 'big tech companies' is moot - since they are already using our content and we (as is evidenced by this Phab. ticket) are already wanting to use theirs. The outcome that their paying for the [optional] Enterprise API would create is that Wikimedia would no longer be financially subsiding "big tech's" business model - instead, they would be financially supporting our movement. But - these are quasi-philosophical issues independent from the specific technical concern of this Ticket. As you have previously raised your concerns about Enterprise in the projects's Meta talkpage (and the aforementioned RFC), I invite you to add any new comments on those threads.
Hi All,
Unfortunately, as I noted in an email to Jos Damen on October 19th when he flagged this for me again, there's nothing much we can do here from a leverage perspective. I'm copying my email to Jos below. I'll let @LWyatt's comments on Enterprise stand.
Let me know if you have any questions, and I'm sorry this isn't something that we can change - I really wish we could.
Best,
Hi Jos,
Thanks for reaching out. Unfortunately, as you have noted, YouTube has indicated that they won't work with us to make an exception for Video2Commons. As I noted on the Phab ticket, this was deeply disappointing to me, and I think they made the wrong call (both ethically and from a business / PR perspective). In consultation with our legal counsel, we have made clear to YouTube that if the community chooses to fight this battle, we will not support YouTube's position.
I'm not sure what effect your public advocacy will have, to be honest, but as the person responsible for managing the overall relationship with YouTube from WMF, I support your efforts. I would have used this as another prompt to go back to YouTube and make the case, but having done so three times I realize that the org-to-org negotiation is not going to bear fruit.
Happy to connect further if you'd find it useful.
@LWyatt you're right. The unfair attitude of Google is not a technical issue. And the debate have to take place outside Phabricator. Also, I've spend 30 minutes talking with Lane. It was very interesting and reassuring about the spirit of the project Wikimedia enterprise. Keep in mine than I'm not again this projet, just worried about the movement entering the world of commercial affaire that could be unfaire some time sa we see concerning youtuble API and for sure fare away of the Wikiphilosophy.
Thanks, @Yael-weissburg for your feedback. Let we see later how can I continue, may be with an article somewhere that could touch the public Google image. Thanks to both of you !
FYI google has started to block downloads from video2commons again since around 2023-12-29 22:30.
In T236446#9428102, @Chicocvenancio wrote: FYI google has started to block downloads from video2commons again since around 2023-12-29 22:30.
FYI trying to download https://www.youtube.com/watch?v=ecQWZWpwZVw locally, Youtube Downloader HD ( https://www.youtubedownloaderhd.com/download.html ) doesn't work, but yt-dlp ( https://github.com/yt-dlp/yt-dlp ) does work. Does YT block V2C by IP or by software ID, or both, or something else?
In T236446#9428663, @Yann wrote: FYI trying to download https://www.youtube.com/watch?v=ecQWZWpwZVw locally, Youtube Downloader HD ( https://www.youtubedownloaderhd.com/download.html ) doesn't work, but yt-dlp ( https://github.com/yt-dlp/yt-dlp ) does work. Does YT block V2C by IP or by software ID, or both, or something else?
It is not entirely clear. Passing good youtube auth cookies in cloud VPS does seem to work, so it is not a full block of the ip now, but it is likely the an account would be restricted as well if we start using one.
It seems transfer from YT works again, i.e. today https://commons.wikimedia.org/wiki/File:Fake_News_in_the_18th_Century_-_Collection_in_Focus_-_British_Library.webm