Temporary Disabled. :) please Go back ⚓ T299417 Normalize templatelinks table www.fgks.org » Address: [go: up one dir, main page] Include Form Remove Scripts Accept Cookies Show Images Show Referer Rotate13 Base64 Strip Meta Strip Title Session Cookies Page MenuHomePhabricatorSearchConfigure Global SearchLog InCreate Task Maniphest T299417 Normalize templatelinks tableClosed, ResolvedPublicActionsEdit TaskEdit Related Tasks...Create SubtaskEdit Parent TasksEdit SubtasksMerge Duplicates InClose As DuplicateEdit Related Objects...Edit CommitsEdit MocksSubscribeMute NotificationsProtect as security issueAward TokenFlag For LaterAssigned ToLadsgroupAuthored ByLadsgroupJan 18 2022, 3:58 PM2022-01-18 15:58:32 (UTC+0)TagsDBA (Done)Performance-Team (Radar) (Watching)MW-1.39-notesMW-1.40-notes (1.40.0-wmf.2; 2022-09-19)User-notice-archive (Backlog)Patch-For-ReviewMW-1.43-notes (1.43.0-wmf.8; 2024-06-04)Referenced FilesNoneSubscribersAklapperAntiCompositeNumberArielGlennBrandonXLFCrypticdoctaxonIKhitronView All 22 SubscribersDescriptionAfter T299416: Normalize link tables: Create linktarget table is done. To learn how to update your queries, see T299417#7814637.DetailsSubjectRepoBranchLines +/-Remove TemplateLinksSchemaMigrationStage configmediawiki/coremaster+36 -88 Introduce pruneUnusedLinkTargetRows maint scriptmediawiki/coremaster+104 -0Introduce pruneUnusedLinkTargetRows maint scriptmediawiki/coreREL1_39+104 -0schema: Drop tl_title and tl_namespace fields from templatelinksmediawiki/coreREL1_39+172 -69 schema: Drop tl_title and tl_namespace fields from templatelinksmediawiki/coremaster+172 -69Customize query in gerritRelated ObjectsSearch...Task GraphMentionsStatusSubtypeAssignedTask OpenNoneT300222 Implement normalizing MediaWiki link tables ResolvedLadsgroupT299417 Normalize templatelinks table ResolvedLadsgroupT299418 Add tl_target_id as foreign key to lt_id to templatelinks ResolvedLadsgroupT299419 Write code for enabling compat writes for templatelinks and title ResolvedLadsgroupT299420 Turn on write both in beta cluster for templatelinks normalization ResolvedLadsgroupT299421 Turn on write both in production for templatelinks normalization ResolvedLadsgroupT299423 Write maintenance script for backfilling tl_target_id ResolvedLadsgroupT299424 Run maintenance script backfilling tl_target_id Resolved• MarosteguiT300775 Add tl_target_id column to templatelinks Resolved• MarosteguiT301313 db2076, db2074 and db1123 crashed while altering templatelinks table Resolved• MarosteguiT301848 Check for compressed templatelinks tables Resolved• MarosteguiT301850 Switchover s3 master (db1157 -> db1123) Resolved• MarosteguiT302222 Check and fix compressed mismatched tables ResolvedLadsgroupT304780 Write code for enabling compat read for templatelinks and linktarget ResolvedLadsgroupT305064 Make linktarget table visible on cloud wiki replicas ResolvedLadsgroupT306673 Turn on read new for templatelinks on beta and production ResolvedLadsgroupT306674 Add support for write new for templatelinks migration ResolvedLadsgroupT308207 ApiQueryInfo::getProtectionInfo is slow on normalized templatelinks ResolvedLadsgroupT312863 Schema change to change primary key of templatelinks Resolved• MarosteguiT317614 Switchover codfw s1 master (db2103 -> db2112) ResolvedLadsgroupT312865 Turn off writing to the old columns of templatelinks in beta and production ResolvedLadsgroupT314041 Drop old templatelinks columns and indexes ResolvedLadsgroupT314711 Add support for links migration to namespaceDupes.phpMentioned In T300222: Implement normalizing MediaWiki link tablesT329842: Some jobs in refreshLinksPrioritized seems to repeat themselves for everT330382: Upgrading from 1.35.8 to 1.39.2 using PostgreSQL fails, index "tl_namespace" does not existT318823: Update tools following templatelinks normalizationR2060:703408bd7891: Fix queries to use linktargetT317258: Replica templatelinks table is broken for some sitesT317172: Can't see the statistics of wikipedia page editsT316297: Prepare cleanupInvalidDbKeys.php for new linktarget tableT312666: Remove duplication in externallinks tableT38316: Set "Add pages I edit to my watchlist" and "Add pages I create to my watchlist" to true by default on Wikimedia wikis (only for new users)T297633: <Tech Initiative> Improving Databases in MediaWikiT299947: Normalize pagelinks table Mentioned Here T320314: Update query for finding categories on Commons with Wikidata links but no Wikidata InfoboxT315063: Include linktarget data in public dumpsT299416: Normalize link tables: Create linktarget table Event TimelineThere are a very large number of changes, so older changes are hidden. Show Older ChangesRhinosF1 subscribed.Mar 26 2022, 4:31 PM2022-03-26 16:31:52 (UTC+0)Ladsgroup added a comment.Mar 27 2022, 2:56 AM2022-03-27 02:56:53 (UTC+0)Comment ActionsSteps that don't have a subticket yet: Turn on read new in beta cluster Turn on read new in production Schema change to make tl_namespace and tl_title nullable Deploying the schema change Write code stopping to write on the old columns via config Turn off writing to the old columns in beta and then production Schema change to drop old columns Deploying the schema change AntiCompositeNumber subscribed.Mar 29 2022, 1:38 PM2022-03-29 13:38:55 (UTC+0)IKhitron subscribed.Edited · Mar 29 2022, 1:48 PM2022-03-29 13:48:42 (UTC+0)Comment ActionsCould you ellaborate, please, how will the new scheme interpret the red templates transclusion, so that quarry could recognize them?MusikAnimal subscribed.Mar 29 2022, 2:02 PM2022-03-29 14:02:20 (UTC+0)Ladsgroup added a comment.Mar 29 2022, 2:18 PM2022-03-29 14:18:21 (UTC+0)Comment ActionsI'm not 100% sure I understood your usecase but if a query to templatelinks used to look like this: SELECT tl_from, tl_namespace, tl_title from templatelinks where tl_from = 5974403; It will change to: SELECT tl_from, lt_namespace, lt_title from templatelinks join linktarget on lt_target_id = lt_id where tl_from = 5974403; It seems the new table has not been replicated to the cloud replicas (I'll check and fix it soon). If this doesn't answer your question, can you elaborate more? possibly giving the query you run usually.Superyetkin subscribed.Mar 29 2022, 5:03 PM2022-03-29 17:03:28 (UTC+0)IKhitron added a comment.Edited · Mar 29 2022, 5:38 PM2022-03-29 17:38:46 (UTC+0)Comment Actions In T299417#7814637, @Ladsgroup wrote: I'm not 100% sure I understood your usecase but if a query to templatelinks used to look like this: SELECT tl_from, tl_namespace, tl_title from templatelinks where tl_from = 5974403; It will change to: SELECT tl_from, lt_namespace, lt_title from templatelinks join linktarget on lt_target_id = lt_id where tl_from = 5974403; It seems the new table has not been replicated to the cloud replicas (I'll check and fix it soon). If this doesn't answer your question, can you elaborate more? possibly giving the query you run usually. Thanks. It doesn't. What happens when the template does not exist? (For example, here.)Mormegil subscribed.Mar 30 2022, 9:34 AM2022-03-30 09:34:41 (UTC+0)Ladsgroup added a comment.Mar 30 2022, 2:08 PM2022-03-30 14:08:14 (UTC+0)Comment ActionsNothing will change when it comes to existence of a template. For example, you have this query: select page_namespace, page_title, tl_title from page inner join templatelinks where page_namespace in (0, 4, 6, 8, 10, 12, 14, 100) and page_id = tl_from and tl_namespace != 8 and not ((page_namespace = 100 and (tl_namespace <> 10 or tl_title like "הידעת_מדינות_%")) or (page_namespace = 8 and page_title like "%.js") or (page_namespace = 4 and (page_title like "%OTRS%" or page_title like "%בקשות_לבדיקה%" or page_title like "%דלפק_ייעוץ%" or page_title like "%הכה_את_המומחה%" or page_title like "%זכויות_יוצרים%" or page_title like "%יומן_מחיקות%" or page_title like "%סדנה_לגרפיקה%" or (page_title like "%המלצות_קודמות%" and not page_title in ("ערכים_מומלצים/המלצות_קודמות/אפריל_2016", "ערכים_מומלצים/המלצות_קודמות/מרץ_2016", "ערכים_מומלצים/המלצות_קודמות/מאי_2016")) or page_title like "%אולם_דיונים%" or page_title like "%עבודות_ויקידמיות%" or page_title like "%ארכיון%" or page_title like "%סקר_ויקיפדיה%" or page_title like "%איך_להקים_אסם%" or page_title like "%מיזמי_ויקיפדיה/נבחרי_ציבור_פוטנציאליים%" or page_title like "%הומור%" or page_title like "%רשימת%" or page_title like "%חשבון_נפש%" or page_title like "%סופשבוע_ללא_ערכים%" or page_title like "%ויקימניה%" or page_title like "%תחרות_צילום%" or page_title like "%מתקפת_איכות%" or page_title like "%מזנון%" or page_title like "%מועדונים%" or page_title like "%מפגשים%" or page_title like "%תבנית%" or page_title like "%שער_לילדים%" or page_title like "%שיתופי_פעולה%" or page_title like "%מיזמי_ויקיפדיה%" or page_title like "%בוט_החלפות%" or page_title like "הידעת?/סדרה__/%" or page_title in ("הידעת?/2018/אוגוסט", "חדשות", "הידעת?/2018/ספטמבר", "הידעת?/ממרץ_2011", "ארגז_חול", "הידעת?", "כיכר_העיר", "הידעת?/המתנה") or page_title = "ויקימדיה_ישראל/תמונה_נבחרת/גלריה" and page_namespace = 4 and tl_namespace = 4 and tl_title like "ויקימדיה_ישראל/תמונה_נבחרת/%")) or exists (select * from page where page_namespace = tl_namespace and page_title = tl_title)) In order to make the change easier, move the join condition to ON clause instead of WHERE: select page_namespace, page_title, tl_title from page inner join templatelinks ON page_id = tl_from where page_namespace in (0, 4, 6, 8, 10, 12, 14, 100) and tl_namespace != 8 and not ((page_namespace = 100 and (tl_namespace <> 10 or tl_title like "הידעת_מדינות_%")) or (page_namespace = 8 and page_title like "%.js") or (page_namespace = 4 and (page_title like "%OTRS%" or page_title like "%בקשות_לבדיקה%" or page_title like "%דלפק_ייעוץ%" or page_title like "%הכה_את_המומחה%" or page_title like "%זכויות_יוצרים%" or page_title like "%יומן_מחיקות%" or page_title like "%סדנה_לגרפיקה%" or (page_title like "%המלצות_קודמות%" and not page_title in ("ערכים_מומלצים/המלצות_קודמות/אפריל_2016", "ערכים_מומלצים/המלצות_קודמות/מרץ_2016", "ערכים_מומלצים/המלצות_קודמות/מאי_2016")) or page_title like "%אולם_דיונים%" or page_title like "%עבודות_ויקידמיות%" or page_title like "%ארכיון%" or page_title like "%סקר_ויקיפדיה%" or page_title like "%איך_להקים_אסם%" or page_title like "%מיזמי_ויקיפדיה/נבחרי_ציבור_פוטנציאליים%" or page_title like "%הומור%" or page_title like "%רשימת%" or page_title like "%חשבון_נפש%" or page_title like "%סופשבוע_ללא_ערכים%" or page_title like "%ויקימניה%" or page_title like "%תחרות_צילום%" or page_title like "%מתקפת_איכות%" or page_title like "%מזנון%" or page_title like "%מועדונים%" or page_title like "%מפגשים%" or page_title like "%תבנית%" or page_title like "%שער_לילדים%" or page_title like "%שיתופי_פעולה%" or page_title like "%מיזמי_ויקיפדיה%" or page_title like "%בוט_החלפות%" or page_title like "הידעת?/סדרה__/%" or page_title in ("הידעת?/2018/אוגוסט", "חדשות", "הידעת?/2018/ספטמבר", "הידעת?/ממרץ_2011", "ארגז_חול", "הידעת?", "כיכר_העיר", "הידעת?/המתנה") or page_title = "ויקימדיה_ישראל/תמונה_נבחרת/גלריה" and page_namespace = 4 and tl_namespace = 4 and tl_title like "ויקימדיה_ישראל/תמונה_נבחרת/%")) or exists (select * from page where page_namespace = tl_namespace and page_title = tl_title)) Note that actually the last part of your query (the subquery) is wrong as it doesn't add templatelinks. And then change things: select page_namespace, page_title, lt_title from page inner join templatelinks ON page_id = tl_from join linktarget on tl_target_id = lt_id where page_namespace in (0, 4, 6, 8, 10, 12, 14, 100) and lt_namespace != 8 and not ((page_namespace = 100 and (lt_namespace <> 10 or lt_title like "הידעת_מדינות_%")) or (page_namespace = 8 and page_title like "%.js") or (page_namespace = 4 and (page_title like "%OTRS%" or page_title like "%בקשות_לבדיקה%" or page_title like "%דלפק_ייעוץ%" or page_title like "%הכה_את_המומחה%" or page_title like "%זכויות_יוצרים%" or page_title like "%יומן_מחיקות%" or page_title like "%סדנה_לגרפיקה%" or (page_title like "%המלצות_קודמות%" and not page_title in ("ערכים_מומלצים/המלצות_קודמות/אפריל_2016", "ערכים_מומלצים/המלצות_קודמות/מרץ_2016", "ערכים_מומלצים/המלצות_קודמות/מאי_2016")) or page_title like "%אולם_דיונים%" or page_title like "%עבודות_ויקידמיות%" or page_title like "%ארכיון%" or page_title like "%סקר_ויקיפדיה%" or page_title like "%איך_להקים_אסם%" or page_title like "%מיזמי_ויקיפדיה/נבחרי_ציבור_פוטנציאליים%" or page_title like "%הומור%" or page_title like "%רשימת%" or page_title like "%חשבון_נפש%" or page_title like "%סופשבוע_ללא_ערכים%" or page_title like "%ויקימניה%" or page_title like "%תחרות_צילום%" or page_title like "%מתקפת_איכות%" or page_title like "%מזנון%" or page_title like "%מועדונים%" or page_title like "%מפגשים%" or page_title like "%תבנית%" or page_title like "%שער_לילדים%" or page_title like "%שיתופי_פעולה%" or page_title like "%מיזמי_ויקיפדיה%" or page_title like "%בוט_החלפות%" or page_title like "הידעת?/סדרה__/%" or page_title in ("הידעת?/2018/אוגוסט", "חדשות", "הידעת?/2018/ספטמבר", "הידעת?/ממרץ_2011", "ארגז_חול", "הידעת?", "כיכר_העיר", "הידעת?/המתנה") or page_title = "ויקימדיה_ישראל/תמונה_נבחרת/גלריה" and page_namespace = 4 and lt_namespace = 4 and lt_title like "ויקימדיה_ישראל/תמונה_נבחרת/%")) or exists (select * from page join linktarget where page_namespace = lt_namespace and page_title = lt_title))IKhitron added a comment.Mar 30 2022, 2:12 PM2022-03-30 14:12:46 (UTC+0)Comment ActionsHow can it work if linktarget does not have any data about non-existing templates? What will be the id in templatelinks and where will be the tempkate name stored?Ladsgroup added a comment.Mar 30 2022, 2:15 PM2022-03-30 14:15:35 (UTC+0)Comment ActionsHow can it work if linktarget does not have any data about non-existing templates? That's not true. It doesn't care if the page exist or not and store it anyway, it can exist or not. That's exactly one of two reasons we created a new table instead of just using page_id.IKhitron added a comment.Mar 30 2022, 2:17 PM2022-03-30 14:17:52 (UTC+0)Comment ActionsI see. Thanks for your help.Zabe subscribed.Apr 2 2022, 7:11 PM2022-04-02 19:11:03 (UTC+0)Ladsgroup closed subtask T304780: Write code for enabling compat read for templatelinks and linktarget as Resolved.Apr 25 2022, 5:16 PM2022-04-25 17:16:46 (UTC+0)Ladsgroup closed subtask T305064: Make linktarget table visible on cloud wiki replicas as Resolved.May 4 2022, 10:26 PM2022-05-04 22:26:28 (UTC+0)Ladsgroup closed subtask T308207: ApiQueryInfo::getProtectionInfo is slow on normalized templatelinks as Resolved.May 16 2022, 11:12 AM2022-05-16 11:12:37 (UTC+0)• Marostegui closed subtask T300775: Add tl_target_id column to templatelinks as Resolved.May 20 2022, 5:08 AM2022-05-20 05:08:08 (UTC+0)Ladsgroup closed subtask T299421: Turn on write both in production for templatelinks normalization as Resolved.May 23 2022, 6:16 AM2022-05-23 06:16:46 (UTC+0)Krinkle edited projects, added Performance-Team (Radar); removed Platform Engineering.May 25 2022, 10:00 PM2022-05-25 22:00:16 (UTC+0)Krinkle moved this task from Limbo to Watching on the Performance-Team (Radar) board.Universal_Omega subscribed.Jun 23 2022, 5:10 PM2022-06-23 17:10:49 (UTC+0)Ladsgroup mentioned this in T312666: Remove duplication in externallinks table.Jul 8 2022, 6:40 PM2022-07-08 18:40:24 (UTC+0)Ladsgroup closed subtask T306674: Add support for write new for templatelinks migration as Resolved.Jul 12 2022, 3:10 PM2022-07-12 15:10:18 (UTC+0)Ladsgroup added a comment.Jul 12 2022, 5:30 PM2022-07-12 17:30:41 (UTC+0)Comment ActionsI dropped the column (and indexes) in fawiki in beta cluster: root@deployment-db08:/srv/sqldata/fawiki# ls -Ssh | grep -i templatelinks 9.1M templatelinks.ibd 4.0K templatelinks.frm root@deployment-db08:/srv/sqldata/fawiki# ls -Ssh | grep -i templatelinks 528K templatelinks.ibd 4.0K templatelinks.frm 5.7% of its original size ^^Ladsgroup added a comment.Jul 12 2022, 5:33 PM2022-07-12 17:33:24 (UTC+0)Comment ActionsFWIW, linktarget is 180K: root@deployment-db08:/srv/sqldata/fawiki# ls -Ssh | grep -i linktarget 180K linktarget.ibd 4.0K linktarget.frmLadsgroup added a comment.Jul 14 2022, 11:08 AM2022-07-14 11:08:32 (UTC+0)Comment ActionsRemoved it in all of beta cluster now which removed 1GB from beta cluster: ladsgroup@deployment-db08:~$ df -h Filesystem Size Used Avail Use% Mounted on udev 7.9G 0 7.9G 0% /dev tmpfs 1.6G 177M 1.4G 12% /run /dev/sda2 19G 4.7G 13G 27% / tmpfs 7.9G 0 7.9G 0% /dev/shm tmpfs 5.0M 0 5.0M 0% /run/lock tmpfs 7.9G 0 7.9G 0% /sys/fs/cgroup /dev/mapper/vd-second--local--disk 111G 37G 68G 36% /srv tmpfs 1.6G 0 1.6G 0% /run/user/0 tmpfs 1.6G 0 1.6G 0% /run/user/3182 ladsgroup@deployment-db08:~$ df -h Filesystem Size Used Avail Use% Mounted on udev 7.9G 0 7.9G 0% /dev tmpfs 1.6G 177M 1.4G 12% /run /dev/sda2 19G 4.7G 13G 27% / tmpfs 7.9G 0 7.9G 0% /dev/shm tmpfs 5.0M 0 5.0M 0% /run/lock tmpfs 7.9G 0 7.9G 0% /sys/fs/cgroup /dev/mapper/vd-second--local--disk 111G 37G 69G 35% /srv tmpfs 1.6G 0 1.6G 0% /run/user/0 tmpfs 1.6G 0 1.6G 0% /run/user/3182Ladsgroup added a comment.Jul 28 2022, 3:20 PM2022-07-28 15:20:17 (UTC+0)Comment ActionsIt is now dropped from testwiki in production and it went from 60MB to 20MB there. Given its number of rows and with linear extrapolation we get ~120GB for commons and cebwiki (they are currently on 500GB and 300GB respectively), this means just fixing cebwiki and commonswiki would free 600GB. If you add some other similar wikis like arzwiki, you'd get around a TB of clean up, probably more.• Marostegui subscribed.Jul 28 2022, 3:41 PM2022-07-28 15:41:03 (UTC+0)Comment ActionsWow!!!!!! Great work <3Ladsgroup added a comment.Aug 10 2022, 8:43 PM2022-08-10 20:43:15 (UTC+0)Comment ActionsDropping on s5 now. Started with dbstore1003:3315 and will continue next week (letting it stay there for a while in case writes happen to the old fields). Impact report: Before the second schema change, largest wikis of s5: 217G cebwiki/templatelinks.ibd 8.5G dewiki/templatelinks.ibd 6.6G srwiki/templatelinks.ibd 1.9G shwiki/templatelinks.ibd 1.5G mgwiktionary/templatelinks.ibd In total: 235.5GB After: 95G cebwiki/templatelinks.ibd 3.0G dewiki/templatelinks.ibd 2.5G srwiki/templatelinks.ibd 797M shwiki/templatelinks.ibd 729M mgwiktionary/templatelinks.ibd In total: 102.0GB, 43% of the original size. Note that this is the second schema change, the first one runs optimization on it which already reduced the total size of the db by 13% and 7% in s5 and s4 respectively. The second has done a whooping 21% redaction in size (from 620GB to 490GB) on top of the previous redaction and will definitely trigger an alert in backups. It has such an impact that dbstore1003 while holding three sections (and we just dropped this on one section only) has reached its lowest storage utilization in the past year meaning all the growth in size in three sections (s1, s5 and s7) over a year has been already offset by this change in s5.doctaxon subscribed.Aug 19 2022, 6:29 AM2022-08-19 06:29:46 (UTC+0)doctaxon added a comment.Aug 19 2022, 6:39 AM2022-08-19 06:39:46 (UTC+0)Comment ActionsCan you estimate please, when templatelinks column tl_namespace is available again?RhinosF1 added a comment.Aug 19 2022, 6:41 AM2022-08-19 06:41:29 (UTC+0)Comment Actions In T299417#8168022, @doctaxon wrote: Can you estimate please, when templatelinks column tl_namespace is available again? Never. The column is being removed.Ladsgroup added a comment.Aug 19 2022, 6:42 AM2022-08-19 06:42:44 (UTC+0)Comment Actions In T299417#8168022, @doctaxon wrote: Can you estimate please, when templatelinks column tl_namespace is available again? It won't be. You need to join your query with linktarget. e.g. select tl_from, tl_namespace, tl_title from templatelinks where ....; Must become select tl_from, lt_namespace, lt_title from templatelinks join linktarget on tl_target_id = lt_id where ....; HTHdoctaxon added a comment.Edited · Aug 19 2022, 6:46 AM2022-08-19 06:46:13 (UTC+0)Comment ActionsThank, but it's a little bit confusing: So I have to change the query: SELECT page_title, page_namespace FROM page, templatelinks WHERE tl_from = page_id AND page_namespace IN (0,100) AND tl_from_namespace IN (0,100) AND tl_namespace = 10 AND tl_title = 'Review'; But how to do? Am Fr., 19. Aug. 2022 um 08:42 Uhr schrieb Ladsgroup < no-reply@phabricator.wikimedia.org>: Ladsgroup added a comment. View Task https://phabricator.wikimedia.org/T299417 In T299417#8168022 https://phabricator.wikimedia.org/T299417#8168022, @doctaxon https://phabricator.wikimedia.org/p/doctaxon/ wrote: Can you estimate please, when templatelinks column tl_namespace is available again? It won't be. You need to join your query with linktarget. e.g. select tl_from, tl_namespace, tl_title from templatelinks where ....; Must become select tl_from, lt_namespace, lt_title from templatelinks join linktarget on tl_target_id = lt_id where ....; HTH *TASK DETAIL* https://phabricator.wikimedia.org/T299417 *EMAIL PREFERENCES* https://phabricator.wikimedia.org/settings/panel/emailpreferences/ *To: *Ladsgroup *Cc: *doctaxon, Marostegui, Universal_Omega, Zabe, Mormegil, Superyetkin, MusikAnimal, IKhitron, AntiCompositeNumber, RhinosF1, ArielGlenn, Izno, Aklapper, Ladsgroup, Devnull, LSobanski, Hazizibinmahdi, Iflorez, Vali.matei, Minhnv-2809, Jay8g, KrenairLadsgroup added a comment.Aug 19 2022, 6:49 AM2022-08-19 06:49:27 (UTC+0)Comment ActionsThis query: SELECT page_title, page_namespace FROM page, templatelinks WHERE tl_from = page_id AND page_namespace IN (0,100) AND tl_from_namespace IN (0,100) AND tl_namespace = 10 AND tl_title = 'Review'; Must become this: SELECT page_title, page_namespace FROM page join templatelinks on tl_from = page_id join linktarget on tl_target_id = lt_id AND page_namespace IN (0,100) AND tl_from_namespace IN (0,100) AND lt_namespace = 10 AND lt_title = 'Review'; Tested and works fine.doctaxon added a comment.Aug 19 2022, 6:56 AM2022-08-19 06:56:35 (UTC+0)Comment Actionschecked, thank you very much and thanks for your great database changing works. It's very helpful.Umherirrender mentioned this in T316297: Prepare cleanupInvalidDbKeys.php for new linktarget table.Aug 25 2022, 9:41 PM2022-08-25 21:41:04 (UTC+0)Ladsgroup closed subtask T299424: Run maintenance script backfilling tl_target_id as Resolved.Sep 2 2022, 9:24 AM2022-09-02 09:24:46 (UTC+0)Ladsgroup closed subtask T306673: Turn on read new for templatelinks on beta and production as Resolved.Sep 5 2022, 7:35 AM2022-09-05 07:35:31 (UTC+0)gerritbot added a comment.Sep 6 2022, 9:29 AM2022-09-06 09:29:23 (UTC+0)Comment ActionsChange 830114 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani): [mediawiki/core@master] schema: Drop tl_title and tl_namespace fields from templatelinks https://gerrit.wikimedia.org/r/830114gerritbot added a project: Patch-For-Review.Sep 6 2022, 9:29 AM2022-09-06 09:29:23 (UTC+0)gerritbot added a comment.Sep 6 2022, 5:39 PM2022-09-06 17:39:33 (UTC+0)Comment ActionsChange 830219 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani): [mediawiki/core@REL1_39] schema: Drop tl_title and tl_namespace fields from templatelinks https://gerrit.wikimedia.org/r/830219gerritbot added a comment.Sep 6 2022, 6:29 PM2022-09-06 18:29:56 (UTC+0)Comment ActionsChange 830114 merged by jenkins-bot: [mediawiki/core@master] schema: Drop tl_title and tl_namespace fields from templatelinks https://gerrit.wikimedia.org/r/830114ReleaseTaggerBot added a project: MW-1.40-notes (1.40.0-wmf.1; 2022-09-12).Sep 6 2022, 7:00 PM2022-09-06 19:00:33 (UTC+0)gerritbot added a comment.Sep 7 2022, 7:14 AM2022-09-07 07:14:07 (UTC+0)Comment ActionsChange 830219 abandoned by Ladsgroup: [mediawiki/core@REL1_39] schema: Drop tl_title and tl_namespace fields from templatelinks Reason: let's try again. https://gerrit.wikimedia.org/r/830219gerritbot added a comment.Sep 7 2022, 7:14 AM2022-09-07 07:14:39 (UTC+0)Comment ActionsChange 830219 restored by Ladsgroup: [mediawiki/core@REL1_39] schema: Drop tl_title and tl_namespace fields from templatelinks https://gerrit.wikimedia.org/r/830219gerritbot added a comment.Sep 7 2022, 2:22 PM2022-09-07 14:22:11 (UTC+0)Comment ActionsChange 830636 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani): [mediawiki/core@master] Introduce pruneUnusedLinkTargetRows maint script https://gerrit.wikimedia.org/r/830636MusikAnimal mentioned this in T317172: Can't see the statistics of wikipedia page edits.Sep 7 2022, 2:22 PM2022-09-07 14:22:51 (UTC+0)JJMC89 mentioned this in T317258: Replica templatelinks table is broken for some sites.Sep 7 2022, 10:33 PM2022-09-07 22:33:14 (UTC+0)BrandonXLF subscribed.Sep 8 2022, 8:38 AM2022-09-08 08:38:30 (UTC+0)gerritbot added a comment.Sep 8 2022, 7:12 PM2022-09-08 19:12:24 (UTC+0)Comment ActionsChange 830219 merged by Ladsgroup: [mediawiki/core@REL1_39] schema: Drop tl_title and tl_namespace fields from templatelinks https://gerrit.wikimedia.org/r/830219ReleaseTaggerBot added a project: MW-1.39-notes.Sep 8 2022, 8:00 PM2022-09-08 20:00:46 (UTC+0)Legoktm updated the task description. (Show Details)Sep 13 2022, 5:00 AM2022-09-13 05:00:53 (UTC+0)Legoktm mentioned this in R2060:703408bd7891: Fix queries to use linktarget.Sep 13 2022, 5:49 AM2022-09-13 05:49:12 (UTC+0)gerritbot added a comment.Sep 14 2022, 8:18 AM2022-09-14 08:18:00 (UTC+0)Comment ActionsChange 832157 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani): [operations/mediawiki-config@master] Stop writing to the old templatelinks columns of enwiki https://gerrit.wikimedia.org/r/832157Ladsgroup closed subtask T312865: Turn off writing to the old columns of templatelinks in beta and production as Resolved.Sep 14 2022, 8:34 AM2022-09-14 08:34:12 (UTC+0)Ladsgroup closed subtask T312863: Schema change to change primary key of templatelinks as Resolved.PeterBowman subscribed.Sep 14 2022, 9:43 AM2022-09-14 09:43:42 (UTC+0)Comment ActionsAny SQL query involving the templatelinks table (e.g. select *) on eswiktionary and plwiktionary (group1) now returns: ERROR 1356 (HY000): View 'eswiktionary_p.templatelinks' references invalid table(s) or column(s) or function(s) or definer/invoker of view lack rights to use themLadsgroup added a comment.Sep 14 2022, 9:47 AM2022-09-14 09:47:34 (UTC+0)Comment ActionsYeah, I'm trying to run the maintain views there but there are queries that are stuck. I need to depool them which takes a bit.Ladsgroup added a comment.Sep 14 2022, 10:03 AM2022-09-14 10:03:14 (UTC+0)Comment ActionsIt should be fixed now.PeterBowman added a comment.Sep 14 2022, 10:31 AM2022-09-14 10:31:53 (UTC+0)Comment ActionsI can confirm that, thank you!Cryptic subscribed.Sep 15 2022, 12:07 PM2022-09-15 12:07:54 (UTC+0)gerritbot added a comment.Sep 16 2022, 7:50 AM2022-09-16 07:50:45 (UTC+0)Comment ActionsChange 830636 merged by jenkins-bot: [mediawiki/core@master] Introduce pruneUnusedLinkTargetRows maint script https://gerrit.wikimedia.org/r/830636gerritbot added a comment.Sep 16 2022, 7:51 AM2022-09-16 07:51:21 (UTC+0)Comment ActionsChange 832563 had a related patch set uploaded (by Jforrester; author: Amir Sarabadani): [mediawiki/core@REL1_39] Introduce pruneUnusedLinkTargetRows maint script https://gerrit.wikimedia.org/r/832563ReleaseTaggerBot edited projects, added MW-1.40-notes (1.40.0-wmf.2; 2022-09-19); removed MW-1.40-notes (1.40.0-wmf.1; 2022-09-12).Sep 16 2022, 8:00 AM2022-09-16 08:00:40 (UTC+0)gerritbot added a comment.Sep 16 2022, 9:06 AM2022-09-16 09:06:04 (UTC+0)Comment ActionsChange 832563 merged by Jforrester: [mediawiki/core@REL1_39] Introduce pruneUnusedLinkTargetRows maint script https://gerrit.wikimedia.org/r/832563Ladsgroup edited projects, added User-notice; removed Patch-For-Review, User-Ladsgroup.Sep 16 2022, 10:20 AM2022-09-16 10:20:45 (UTC+0)Comment ActionsWe have been dropping the old columns which now affects users. While this was announced a while ago (plus regular updates afterwards) but let's add a line in tech news about this. Something like: Two fields of tl_namespace and tl_title in templatelinks table are now being dropped. Queries relying on these fields need to change to use the new normalization field called tl_target_id. See T299417 for more information. This is part of normalization of links tables [1] (edit mercilessly) I also wrote a document explaining why we need to do normalization: https://www.mediawiki.org/wiki/User:ASarabadani_(WMF)/Database_for_devs_toolkit/Concepts/Normalization Maybe that'd be useful.Quiddity moved this task from To Triage to In current Tech/News draft on the User-notice board.Sep 16 2022, 7:27 PM2022-09-16 19:27:11 (UTC+0)valerio.bozzolan subscribed.Sep 20 2022, 7:42 AM2022-09-20 07:42:41 (UTC+0)Quiddity moved this task from In current Tech/News draft to Already announced/Archive on the User-notice board.Sep 22 2022, 5:52 PM2022-09-22 17:52:39 (UTC+0)Orlodrim subscribed.Sep 24 2022, 10:34 PM2022-09-24 22:34:29 (UTC+0)Comment ActionsIs the new linktarget table publicly available somewhere? My bot regularly updates maintenance lists (https://fr.wikipedia.org/wiki/Projet:Mod%C3%A8le/Maintenance/Listes) based on the template dump of frwiki (https://dumps.wikimedia.org/frwiki/latest/frwiki-latest-templatelinks.sql.gz). After this update, I don't see a way to reconstruct the fields that were present before, because I don't know where to find the data to resolve target_ids.Vahurzpu subscribed.Sep 24 2022, 11:25 PM2022-09-24 23:25:22 (UTC+0)Comment Actions@Orlodrim: see T315063; the table will be dumped, but some required work hasn't been done yet. In the meantime, the table is available in the Toolforge database replicas.MusikAnimal mentioned this in T318823: Update tools following templatelinks normalization.Sep 28 2022, 2:57 PM2022-09-28 14:57:35 (UTC+0)Steenth subscribed.Oct 2 2022, 1:47 PM2022-10-02 13:47:23 (UTC+0)Mike_Peel subscribed.Oct 8 2022, 10:25 AM2022-10-08 10:25:02 (UTC+0)Comment ActionsHi, the change also affected Pi bot's deployment of Wikidata Infoboxes on Commons, help rewriting the query would be appreciated at T320314!Ladsgroup closed this task as Resolved.Dec 8 2022, 3:39 AM2022-12-08 03:39:27 (UTC+0)Ladsgroup closed subtask T314041: Drop old templatelinks columns and indexes as Resolved.Ladsgroup closed subtask T314711: Add support for links migration to namespaceDupes.php as Resolved.Ladsgroup moved this task from Blocked to Done on the DBA board.Comment ActionsThis is doneMaintenance_bot edited projects, added User-notice-archive; removed User-notice.Dec 18 2022, 4:30 AM2022-12-18 04:30:39 (UTC+0)Aklapper mentioned this in T330382: Upgrading from 1.35.8 to 1.39.2 using PostgreSQL fails, index "tl_namespace" does not exist.Feb 23 2023, 10:32 AM2023-02-23 10:32:18 (UTC+0)Krinkle mentioned this in T329842: Some jobs in refreshLinksPrioritized seems to repeat themselves for ever.Apr 13 2023, 6:26 AM2023-04-13 06:26:00 (UTC+0)Jdforrester-WMF mentioned this in T300222: Implement normalizing MediaWiki link tables.Jul 31 2023, 12:45 PM2023-07-31 12:45:00 (UTC+0)gerritbot added a comment.May 15 2024, 8:05 PM2024-05-15 20:05:40 (UTC+0)Comment ActionsChange #1032027 had a related patch set uploaded (by Umherirrender; author: Umherirrender): [mediawiki/core@master] Remove TemplateLinksSchemaMigrationStage config https://gerrit.wikimedia.org/r/1032027gerritbot added a project: Patch-For-Review.May 15 2024, 8:05 PM2024-05-15 20:05:41 (UTC+0)gerritbot added a comment.May 28 2024, 11:45 AM2024-05-28 11:45:53 (UTC+0)Comment ActionsChange #1032027 merged by jenkins-bot: [mediawiki/core@master] Remove TemplateLinksSchemaMigrationStage config https://gerrit.wikimedia.org/r/1032027ReleaseTaggerBot added a project: MW-1.43-notes (1.43.0-wmf.8; 2024-06-04).May 28 2024, 1:01 PM2024-05-28 13:01:41 (UTC+0)Log In to Comment
After T299416: Normalize link tables: Create linktarget table is done. To learn how to update your queries, see T299417#7814637.
Steps that don't have a subticket yet:
Could you ellaborate, please, how will the new scheme interpret the red templates transclusion, so that quarry could recognize them?
I'm not 100% sure I understood your usecase but if a query to templatelinks used to look like this:
SELECT tl_from, tl_namespace, tl_title from templatelinks where tl_from = 5974403;
It will change to:
SELECT tl_from, lt_namespace, lt_title from templatelinks join linktarget on lt_target_id = lt_id where tl_from = 5974403;
It seems the new table has not been replicated to the cloud replicas (I'll check and fix it soon).
If this doesn't answer your question, can you elaborate more? possibly giving the query you run usually.
In T299417#7814637, @Ladsgroup wrote: I'm not 100% sure I understood your usecase but if a query to templatelinks used to look like this: SELECT tl_from, tl_namespace, tl_title from templatelinks where tl_from = 5974403; It will change to: SELECT tl_from, lt_namespace, lt_title from templatelinks join linktarget on lt_target_id = lt_id where tl_from = 5974403; It seems the new table has not been replicated to the cloud replicas (I'll check and fix it soon). If this doesn't answer your question, can you elaborate more? possibly giving the query you run usually.
Thanks. It doesn't. What happens when the template does not exist? (For example, here.)
Nothing will change when it comes to existence of a template. For example, you have this query:
select page_namespace, page_title, tl_title from page inner join templatelinks where page_namespace in (0, 4, 6, 8, 10, 12, 14, 100) and page_id = tl_from and tl_namespace != 8 and not ((page_namespace = 100 and (tl_namespace <> 10 or tl_title like "הידעת_מדינות_%")) or (page_namespace = 8 and page_title like "%.js") or (page_namespace = 4 and (page_title like "%OTRS%" or page_title like "%בקשות_לבדיקה%" or page_title like "%דלפק_ייעוץ%" or page_title like "%הכה_את_המומחה%" or page_title like "%זכויות_יוצרים%" or page_title like "%יומן_מחיקות%" or page_title like "%סדנה_לגרפיקה%" or (page_title like "%המלצות_קודמות%" and not page_title in ("ערכים_מומלצים/המלצות_קודמות/אפריל_2016", "ערכים_מומלצים/המלצות_קודמות/מרץ_2016", "ערכים_מומלצים/המלצות_קודמות/מאי_2016")) or page_title like "%אולם_דיונים%" or page_title like "%עבודות_ויקידמיות%" or page_title like "%ארכיון%" or page_title like "%סקר_ויקיפדיה%" or page_title like "%איך_להקים_אסם%" or page_title like "%מיזמי_ויקיפדיה/נבחרי_ציבור_פוטנציאליים%" or page_title like "%הומור%" or page_title like "%רשימת%" or page_title like "%חשבון_נפש%" or page_title like "%סופשבוע_ללא_ערכים%" or page_title like "%ויקימניה%" or page_title like "%תחרות_צילום%" or page_title like "%מתקפת_איכות%" or page_title like "%מזנון%" or page_title like "%מועדונים%" or page_title like "%מפגשים%" or page_title like "%תבנית%" or page_title like "%שער_לילדים%" or page_title like "%שיתופי_פעולה%" or page_title like "%מיזמי_ויקיפדיה%" or page_title like "%בוט_החלפות%" or page_title like "הידעת?/סדרה__/%" or page_title in ("הידעת?/2018/אוגוסט", "חדשות", "הידעת?/2018/ספטמבר", "הידעת?/ממרץ_2011", "ארגז_חול", "הידעת?", "כיכר_העיר", "הידעת?/המתנה") or page_title = "ויקימדיה_ישראל/תמונה_נבחרת/גלריה" and page_namespace = 4 and tl_namespace = 4 and tl_title like "ויקימדיה_ישראל/תמונה_נבחרת/%")) or exists (select * from page where page_namespace = tl_namespace and page_title = tl_title))
In order to make the change easier, move the join condition to ON clause instead of WHERE:
select page_namespace, page_title, tl_title from page inner join templatelinks ON page_id = tl_from where page_namespace in (0, 4, 6, 8, 10, 12, 14, 100) and tl_namespace != 8 and not ((page_namespace = 100 and (tl_namespace <> 10 or tl_title like "הידעת_מדינות_%")) or (page_namespace = 8 and page_title like "%.js") or (page_namespace = 4 and (page_title like "%OTRS%" or page_title like "%בקשות_לבדיקה%" or page_title like "%דלפק_ייעוץ%" or page_title like "%הכה_את_המומחה%" or page_title like "%זכויות_יוצרים%" or page_title like "%יומן_מחיקות%" or page_title like "%סדנה_לגרפיקה%" or (page_title like "%המלצות_קודמות%" and not page_title in ("ערכים_מומלצים/המלצות_קודמות/אפריל_2016", "ערכים_מומלצים/המלצות_קודמות/מרץ_2016", "ערכים_מומלצים/המלצות_קודמות/מאי_2016")) or page_title like "%אולם_דיונים%" or page_title like "%עבודות_ויקידמיות%" or page_title like "%ארכיון%" or page_title like "%סקר_ויקיפדיה%" or page_title like "%איך_להקים_אסם%" or page_title like "%מיזמי_ויקיפדיה/נבחרי_ציבור_פוטנציאליים%" or page_title like "%הומור%" or page_title like "%רשימת%" or page_title like "%חשבון_נפש%" or page_title like "%סופשבוע_ללא_ערכים%" or page_title like "%ויקימניה%" or page_title like "%תחרות_צילום%" or page_title like "%מתקפת_איכות%" or page_title like "%מזנון%" or page_title like "%מועדונים%" or page_title like "%מפגשים%" or page_title like "%תבנית%" or page_title like "%שער_לילדים%" or page_title like "%שיתופי_פעולה%" or page_title like "%מיזמי_ויקיפדיה%" or page_title like "%בוט_החלפות%" or page_title like "הידעת?/סדרה__/%" or page_title in ("הידעת?/2018/אוגוסט", "חדשות", "הידעת?/2018/ספטמבר", "הידעת?/ממרץ_2011", "ארגז_חול", "הידעת?", "כיכר_העיר", "הידעת?/המתנה") or page_title = "ויקימדיה_ישראל/תמונה_נבחרת/גלריה" and page_namespace = 4 and tl_namespace = 4 and tl_title like "ויקימדיה_ישראל/תמונה_נבחרת/%")) or exists (select * from page where page_namespace = tl_namespace and page_title = tl_title))
Note that actually the last part of your query (the subquery) is wrong as it doesn't add templatelinks.
And then change things:
select page_namespace, page_title, lt_title from page inner join templatelinks ON page_id = tl_from join linktarget on tl_target_id = lt_id where page_namespace in (0, 4, 6, 8, 10, 12, 14, 100) and lt_namespace != 8 and not ((page_namespace = 100 and (lt_namespace <> 10 or lt_title like "הידעת_מדינות_%")) or (page_namespace = 8 and page_title like "%.js") or (page_namespace = 4 and (page_title like "%OTRS%" or page_title like "%בקשות_לבדיקה%" or page_title like "%דלפק_ייעוץ%" or page_title like "%הכה_את_המומחה%" or page_title like "%זכויות_יוצרים%" or page_title like "%יומן_מחיקות%" or page_title like "%סדנה_לגרפיקה%" or (page_title like "%המלצות_קודמות%" and not page_title in ("ערכים_מומלצים/המלצות_קודמות/אפריל_2016", "ערכים_מומלצים/המלצות_קודמות/מרץ_2016", "ערכים_מומלצים/המלצות_קודמות/מאי_2016")) or page_title like "%אולם_דיונים%" or page_title like "%עבודות_ויקידמיות%" or page_title like "%ארכיון%" or page_title like "%סקר_ויקיפדיה%" or page_title like "%איך_להקים_אסם%" or page_title like "%מיזמי_ויקיפדיה/נבחרי_ציבור_פוטנציאליים%" or page_title like "%הומור%" or page_title like "%רשימת%" or page_title like "%חשבון_נפש%" or page_title like "%סופשבוע_ללא_ערכים%" or page_title like "%ויקימניה%" or page_title like "%תחרות_צילום%" or page_title like "%מתקפת_איכות%" or page_title like "%מזנון%" or page_title like "%מועדונים%" or page_title like "%מפגשים%" or page_title like "%תבנית%" or page_title like "%שער_לילדים%" or page_title like "%שיתופי_פעולה%" or page_title like "%מיזמי_ויקיפדיה%" or page_title like "%בוט_החלפות%" or page_title like "הידעת?/סדרה__/%" or page_title in ("הידעת?/2018/אוגוסט", "חדשות", "הידעת?/2018/ספטמבר", "הידעת?/ממרץ_2011", "ארגז_חול", "הידעת?", "כיכר_העיר", "הידעת?/המתנה") or page_title = "ויקימדיה_ישראל/תמונה_נבחרת/גלריה" and page_namespace = 4 and lt_namespace = 4 and lt_title like "ויקימדיה_ישראל/תמונה_נבחרת/%")) or exists (select * from page join linktarget where page_namespace = lt_namespace and page_title = lt_title))
How can it work if linktarget does not have any data about non-existing templates? What will be the id in templatelinks and where will be the tempkate name stored?
How can it work if linktarget does not have any data about non-existing templates?
That's not true. It doesn't care if the page exist or not and store it anyway, it can exist or not. That's exactly one of two reasons we created a new table instead of just using page_id.
I see. Thanks for your help.
I dropped the column (and indexes) in fawiki in beta cluster:
root@deployment-db08:/srv/sqldata/fawiki# ls -Ssh | grep -i templatelinks 9.1M templatelinks.ibd 4.0K templatelinks.frm root@deployment-db08:/srv/sqldata/fawiki# ls -Ssh | grep -i templatelinks 528K templatelinks.ibd 4.0K templatelinks.frm
5.7% of its original size ^^
FWIW, linktarget is 180K:
root@deployment-db08:/srv/sqldata/fawiki# ls -Ssh | grep -i linktarget 180K linktarget.ibd 4.0K linktarget.frm
Removed it in all of beta cluster now which removed 1GB from beta cluster:
ladsgroup@deployment-db08:~$ df -h Filesystem Size Used Avail Use% Mounted on udev 7.9G 0 7.9G 0% /dev tmpfs 1.6G 177M 1.4G 12% /run /dev/sda2 19G 4.7G 13G 27% / tmpfs 7.9G 0 7.9G 0% /dev/shm tmpfs 5.0M 0 5.0M 0% /run/lock tmpfs 7.9G 0 7.9G 0% /sys/fs/cgroup /dev/mapper/vd-second--local--disk 111G 37G 68G 36% /srv tmpfs 1.6G 0 1.6G 0% /run/user/0 tmpfs 1.6G 0 1.6G 0% /run/user/3182 ladsgroup@deployment-db08:~$ df -h Filesystem Size Used Avail Use% Mounted on udev 7.9G 0 7.9G 0% /dev tmpfs 1.6G 177M 1.4G 12% /run /dev/sda2 19G 4.7G 13G 27% / tmpfs 7.9G 0 7.9G 0% /dev/shm tmpfs 5.0M 0 5.0M 0% /run/lock tmpfs 7.9G 0 7.9G 0% /sys/fs/cgroup /dev/mapper/vd-second--local--disk 111G 37G 69G 35% /srv tmpfs 1.6G 0 1.6G 0% /run/user/0 tmpfs 1.6G 0 1.6G 0% /run/user/3182
It is now dropped from testwiki in production and it went from 60MB to 20MB there. Given its number of rows and with linear extrapolation we get ~120GB for commons and cebwiki (they are currently on 500GB and 300GB respectively), this means just fixing cebwiki and commonswiki would free 600GB. If you add some other similar wikis like arzwiki, you'd get around a TB of clean up, probably more.
Wow!!!!!! Great work <3
Dropping on s5 now. Started with dbstore1003:3315 and will continue next week (letting it stay there for a while in case writes happen to the old fields).
Impact report: Before the second schema change, largest wikis of s5:
217G cebwiki/templatelinks.ibd 8.5G dewiki/templatelinks.ibd 6.6G srwiki/templatelinks.ibd 1.9G shwiki/templatelinks.ibd 1.5G mgwiktionary/templatelinks.ibd
In total: 235.5GB
After:
95G cebwiki/templatelinks.ibd 3.0G dewiki/templatelinks.ibd 2.5G srwiki/templatelinks.ibd 797M shwiki/templatelinks.ibd 729M mgwiktionary/templatelinks.ibd
In total: 102.0GB, 43% of the original size. Note that this is the second schema change, the first one runs optimization on it which already reduced the total size of the db by 13% and 7% in s5 and s4 respectively. The second has done a whooping 21% redaction in size (from 620GB to 490GB) on top of the previous redaction and will definitely trigger an alert in backups.
It has such an impact that dbstore1003 while holding three sections (and we just dropped this on one section only) has reached its lowest storage utilization in the past year meaning all the growth in size in three sections (s1, s5 and s7) over a year has been already offset by this change in s5.
Can you estimate please, when templatelinks column tl_namespace is available again?
In T299417#8168022, @doctaxon wrote: Can you estimate please, when templatelinks column tl_namespace is available again?
Never. The column is being removed.
It won't be. You need to join your query with linktarget.
e.g.
select tl_from, tl_namespace, tl_title from templatelinks where ....;
Must become
select tl_from, lt_namespace, lt_title from templatelinks join linktarget on tl_target_id = lt_id where ....;
HTH
Thank, but it's a little bit confusing:
So I have to change the query:
SELECT page_title, page_namespace FROM page, templatelinks WHERE tl_from = page_id AND page_namespace IN (0,100) AND tl_from_namespace IN (0,100) AND tl_namespace = 10 AND tl_title = 'Review';
But how to do?
Am Fr., 19. Aug. 2022 um 08:42 Uhr schrieb Ladsgroup < no-reply@phabricator.wikimedia.org>:
Ladsgroup added a comment. View Task https://phabricator.wikimedia.org/T299417 In T299417#8168022 https://phabricator.wikimedia.org/T299417#8168022, @doctaxon https://phabricator.wikimedia.org/p/doctaxon/ wrote: Can you estimate please, when templatelinks column tl_namespace is available again? It won't be. You need to join your query with linktarget. e.g. select tl_from, tl_namespace, tl_title from templatelinks where ....; Must become select tl_from, lt_namespace, lt_title from templatelinks join linktarget on tl_target_id = lt_id where ....; HTH *TASK DETAIL* https://phabricator.wikimedia.org/T299417 *EMAIL PREFERENCES* https://phabricator.wikimedia.org/settings/panel/emailpreferences/ *To: *Ladsgroup *Cc: *doctaxon, Marostegui, Universal_Omega, Zabe, Mormegil, Superyetkin, MusikAnimal, IKhitron, AntiCompositeNumber, RhinosF1, ArielGlenn, Izno, Aklapper, Ladsgroup, Devnull, LSobanski, Hazizibinmahdi, Iflorez, Vali.matei, Minhnv-2809, Jay8g, Krenair
Ladsgroup added a comment. View Task https://phabricator.wikimedia.org/T299417
In T299417#8168022 https://phabricator.wikimedia.org/T299417#8168022, @doctaxon https://phabricator.wikimedia.org/p/doctaxon/ wrote:
*TASK DETAIL* https://phabricator.wikimedia.org/T299417
*EMAIL PREFERENCES* https://phabricator.wikimedia.org/settings/panel/emailpreferences/
*To: *Ladsgroup *Cc: *doctaxon, Marostegui, Universal_Omega, Zabe, Mormegil, Superyetkin, MusikAnimal, IKhitron, AntiCompositeNumber, RhinosF1, ArielGlenn, Izno, Aklapper, Ladsgroup, Devnull, LSobanski, Hazizibinmahdi, Iflorez, Vali.matei, Minhnv-2809, Jay8g, Krenair
This query:
Must become this:
SELECT page_title, page_namespace FROM page join templatelinks on tl_from = page_id join linktarget on tl_target_id = lt_id AND page_namespace IN (0,100) AND tl_from_namespace IN (0,100) AND lt_namespace = 10 AND lt_title = 'Review';
Tested and works fine.
checked, thank you very much
and thanks for your great database changing works. It's very helpful.
Change 830114 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):
[mediawiki/core@master] schema: Drop tl_title and tl_namespace fields from templatelinks
https://gerrit.wikimedia.org/r/830114
Change 830219 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):
[mediawiki/core@REL1_39] schema: Drop tl_title and tl_namespace fields from templatelinks
https://gerrit.wikimedia.org/r/830219
Change 830114 merged by jenkins-bot:
Change 830219 abandoned by Ladsgroup:
Reason:
let's try again.
Change 830219 restored by Ladsgroup:
Change 830636 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):
[mediawiki/core@master] Introduce pruneUnusedLinkTargetRows maint script
https://gerrit.wikimedia.org/r/830636
Change 830219 merged by Ladsgroup:
Change 832157 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):
[operations/mediawiki-config@master] Stop writing to the old templatelinks columns of enwiki
https://gerrit.wikimedia.org/r/832157
Any SQL query involving the templatelinks table (e.g. select *) on eswiktionary and plwiktionary (group1) now returns:
ERROR 1356 (HY000): View 'eswiktionary_p.templatelinks' references invalid table(s) or column(s) or function(s) or definer/invoker of view lack rights to use them
Yeah, I'm trying to run the maintain views there but there are queries that are stuck. I need to depool them which takes a bit.
It should be fixed now.
I can confirm that, thank you!
Change 830636 merged by jenkins-bot:
Change 832563 had a related patch set uploaded (by Jforrester; author: Amir Sarabadani):
[mediawiki/core@REL1_39] Introduce pruneUnusedLinkTargetRows maint script
https://gerrit.wikimedia.org/r/832563
Change 832563 merged by Jforrester:
We have been dropping the old columns which now affects users. While this was announced a while ago (plus regular updates afterwards) but let's add a line in tech news about this.
Something like:
Two fields of tl_namespace and tl_title in templatelinks table are now being dropped. Queries relying on these fields need to change to use the new normalization field called tl_target_id. See T299417 for more information. This is part of normalization of links tables [1]
(edit mercilessly)
I also wrote a document explaining why we need to do normalization: https://www.mediawiki.org/wiki/User:ASarabadani_(WMF)/Database_for_devs_toolkit/Concepts/Normalization Maybe that'd be useful.
Is the new linktarget table publicly available somewhere? My bot regularly updates maintenance lists (https://fr.wikipedia.org/wiki/Projet:Mod%C3%A8le/Maintenance/Listes) based on the template dump of frwiki (https://dumps.wikimedia.org/frwiki/latest/frwiki-latest-templatelinks.sql.gz). After this update, I don't see a way to reconstruct the fields that were present before, because I don't know where to find the data to resolve target_ids.
@Orlodrim: see T315063; the table will be dumped, but some required work hasn't been done yet. In the meantime, the table is available in the Toolforge database replicas.
Hi, the change also affected Pi bot's deployment of Wikidata Infoboxes on Commons, help rewriting the query would be appreciated at T320314!
This is done
Change #1032027 had a related patch set uploaded (by Umherirrender; author: Umherirrender):
[mediawiki/core@master] Remove TemplateLinksSchemaMigrationStage config
https://gerrit.wikimedia.org/r/1032027
Change #1032027 merged by jenkins-bot: