tchin (Thomas)
Software Engineer

Projects (7)
View All

Calendar

User Details

User Since: Jun 21 2021, 2:34 PM (158 w, 3 d)
Availability: Available
LDAP User: TChin
MediaWiki User: TChin (WMF) [ Global Accounts ]

Recent Activity
View All

Fri, Jun 14

tchin added a comment to T309229: Enforce Cassandra client encryption (AQS cluster).

I don't think so. The image suggestion work on Flink never progressed passed the original ticket.

Fri, Jun 14, 11:20 PM · Data-Engineering-Radar, Cassandra

Tue, Jun 11

tchin added a comment to T365512: Archive the service-scaffold-node and service-scaffold-golang libraries.

Getting rid of service-scaffold-node also means we should get rid of servicelib-node since it was created for service-scaffold node (this is the reason why service-scaffold node depends on packages that don't exist. This project never finished)

Tue, Jun 11, 4:46 PM · Language-Team, service-template-node, MW-Interfaces-Team, Wikimedia-GitHub, Diffusion-Repository-Administrators, Projects-Cleanup

Jun 3 2024

tchin added a comment to T344730: Migrate Data Engineering Pipelinelib repos to GitLab.

In T344730#9848618, @Ottomata wrote:

Added jsonschema-tools to list, as it is similar to node-rdkafka-factory and eventgate.

Jun 3 2024, 2:59 PM · Patch-For-Review, Data-Engineering (Q4 2024 April 1st - June 30th), GitLab (Pipeline Services Migration🐤), Event-Platform

May 29 2024

tchin added a comment to T344730: Migrate Data Engineering Pipelinelib repos to GitLab.

Don't forget that any CI that has a production deployment pipeline needs the repo to be added to trusted runners and also have their tags protected (Slack thread on protecting tags)

May 29 2024, 4:17 PM · Patch-For-Review, Data-Engineering (Q4 2024 April 1st - June 30th), GitLab (Pipeline Services Migration🐤), Event-Platform

tchin added a comment to T344730: Migrate Data Engineering Pipelinelib repos to GitLab.

In T344730#9838311, @Ottomata wrote:

Yes, that's actually what I do for service-utils

Very cool!

@tchin should we make that a reusable gitlab_ci template job in workflow_utils?

May 29 2024, 4:14 PM · Patch-For-Review, Data-Engineering (Q4 2024 April 1st - June 30th), GitLab (Pipeline Services Migration🐤), Event-Platform

tchin updated subscribers of T365829: Fix DPE alerts dashboard to work with Google Groups.

This might be harder than I thought. Creating a dummy google account to act as the receiver seems off the table. All of Google's APIs require OAuth or some manual way for the user to sign in. There is no way to make a pure bot account, and also no good way to automate login without being slapped by a ban.

May 29 2024, 3:27 AM · Data-Engineering (Q4 2024 April 1st - June 30th)

May 28 2024

tchin added a comment to T344730: Migrate Data Engineering Pipelinelib repos to GitLab.

In T344730#9838008, @Ottomata wrote:

Can we publish and depend on npm pacakges from gitlab, like we do for python wheels?

May 28 2024, 2:59 PM · Patch-For-Review, Data-Engineering (Q4 2024 April 1st - June 30th), GitLab (Pipeline Services Migration🐤), Event-Platform

May 21 2024

tchin added a comment to T365512: Archive the service-scaffold-node and service-scaffold-golang libraries.

Sounds good to me. service-scaffold-node was started to turn service-template-node into a group of libraries and is basically superseded by my effort to replace service-runner (T360924) which is mostly completed

May 21 2024, 8:16 PM · Language-Team, service-template-node, MW-Interfaces-Team, Wikimedia-GitHub, Diffusion-Repository-Administrators, Projects-Cleanup

May 15 2024

tchin added a comment to T344730: Migrate Data Engineering Pipelinelib repos to GitLab.

Would be nice to get a confirmation for archiving node-rdkafka-statsd since it'll progress T349118

May 15 2024, 1:54 PM · Patch-For-Review, Data-Engineering (Q4 2024 April 1st - June 30th), GitLab (Pipeline Services Migration🐤), Event-Platform

May 13 2024

tchin added a comment to T347498: eventgate: eventstreams: services should use common logging schema.

In T347498#9791831, @Ottomata wrote:

@tchin, if we are able to get off of old service-runner, would your new framework take care of this?

May 13 2024, 7:08 PM · Data-Engineering, EventStreams, Event-Platform

Apr 17 2024

tchin added a comment to T362774: Application Security Review Request : service-runner replacement: @tchin/service-utils.

Has this project been discussed across the WMF/Community?

It would be great if there was a RFC process, but there has at least been discussions about what to do with service-runner and this project is on the radar to the entirety of Data Platform Engineering and some people on the MW engineering team and the language team. It was also posted on slack on #engineering-all to give people a head's up just in case there was another team working on something similar. If there's one thing I'm sure about is that the consensus is that we need a replacement, whether or not this is it.

Apr 17 2024, 3:33 PM · secscrum, Security, Application Security Reviews

tchin updated the task description for T362774: Application Security Review Request : service-runner replacement: @tchin/service-utils.

Apr 17 2024, 1:12 PM · secscrum, Security, Application Security Reviews

tchin renamed T362774: Application Security Review Request : service-runner replacement: @tchin/service-utils from Application Security Review Request : service-runner replacement to Application Security Review Request : service-runner replacement: @tchin/service-utils.

Apr 17 2024, 1:10 PM · secscrum, Security, Application Security Reviews

tchin created T362774: Application Security Review Request : service-runner replacement: @tchin/service-utils.

Apr 17 2024, 1:09 PM · secscrum, Security, Application Security Reviews

Apr 5 2024

tchin added a comment to T357468: [Dataset Config Store] Setup initial CI checks.

Config store repo does CI checks for jsonschema correctness and config values against its jsonschema. The Datasets Config service repo has dockerized CI using Kokkuri and Blubber.

Apr 5 2024, 4:21 PM · Data-Engineering (Q4 2024 April 1st - June 30th)

Mar 26 2024

tchin claimed T357434: [Dataset Config Store] Deploy poc to dse-k8s.

Mar 26 2024, 10:13 AM · Data-Engineering (Q4 2024 April 1st - June 30th)

Feb 27 2024

tchin added a comment to T350180: Upgrade prom-client in NodeJS service-runner and enable collectDefaultMetrics.

If it's to a point where we even need to use a new name, might as well break everything. I'd love to join in on the fun

Feb 27 2024, 2:09 PM · Data-Engineering, observability, ChangeProp, Event-Platform, service-runner

Feb 11 2024

tchin moved T357005: eventstreams regularly uses more than 95% of its memory limit from Next Up to Radar (External Teams) on the Data-Engineering (Sprint 8) board.

Feb 11 2024, 3:04 AM · Data-Engineering, Event-Platform, EventStreams, serviceops, Prod-Kubernetes, Kubernetes

tchin edited projects for T357005: eventstreams regularly uses more than 95% of its memory limit, added: Data-Engineering (Sprint 8); removed Data-Engineering.

Feb 11 2024, 3:03 AM · Data-Engineering, Event-Platform, EventStreams, serviceops, Prod-Kubernetes, Kubernetes

tchin added a comment to T357005: eventstreams regularly uses more than 95% of its memory limit.

Looking at the logs, this seems to coincide with the redaction patch to eventstreams, but looking at the code I'm having a hard time finding where a memory leak could've happened... more confusing that it's just 1 or 2 pods hitting the limit

Feb 11 2024, 3:01 AM · Data-Engineering, Event-Platform, EventStreams, serviceops, Prod-Kubernetes, Kubernetes

Jan 30 2024

tchin moved T352669: [Iceberg Migration] Migrate aqs hourly tables to Iceberg from Blocked/Paused to Ready to Deploy on the Data-Engineering (Sprint 8) board.

Jan 30 2024, 2:17 PM · Data-Engineering (Sprint 8)

Jan 22 2024

tchin added a comment to T352671: [Iceberg Migration] Migrate interlanguage tables to Iceberg.

Using lz4 compression works but checking it with parquet-tools doesn't. I see something like compression: UNKNOWN (space_saved: -25%) Seems like a known issue.

Jan 22 2024, 1:51 PM · Data-Engineering (Sprint 7), Patch-For-Review

Jan 5 2024

tchin added a comment to T352671: [Iceberg Migration] Migrate interlanguage tables to Iceberg.

INSERT OVERRIDE with PARTITION also doesn't work anymore because Iceberg uses hidden partitioning so had to enable Spark's dynamic overwrite
https://iceberg.apache.org/docs/latest/spark-writes/#insert-overwrite

Jan 5 2024, 6:32 PM · Data-Engineering (Sprint 7), Patch-For-Review

tchin added a comment to T352671: [Iceberg Migration] Migrate interlanguage tables to Iceberg.

TIL when setting the compression codec to snappy, Iceberg doesn't end the files in hdfs with .snappy.parquet. I had to check if the format was correct using parquet-tools.

Jan 5 2024, 6:23 PM · Data-Engineering (Sprint 7), Patch-For-Review

tchin moved T352671: [Iceberg Migration] Migrate interlanguage tables to Iceberg from Next Up to In progress on the Data-Engineering (Sprint 6) board.

Jan 5 2024, 5:58 PM · Data-Engineering (Sprint 7), Patch-For-Review

tchin claimed T352671: [Iceberg Migration] Migrate interlanguage tables to Iceberg.

Jan 5 2024, 5:58 PM · Data-Engineering (Sprint 7), Patch-For-Review

Dec 19 2023

tchin added a comment to T352669: [Iceberg Migration] Migrate aqs hourly tables to Iceberg.

Tested to see if the COALESCE hints still work in Iceberg by creating 2 tables and filling then with/without the hint. It still seems to work.

Dec 19 2023, 7:30 AM · Data-Engineering (Sprint 8)

Dec 18 2023

tchin awarded T336739: Post Oozie -> Airflow migration refactorings a Barnstar token.

Dec 18 2023, 3:12 PM · Patch-For-Review, Data-Engineering, Epic, Data Pipelines

tchin moved T352669: [Iceberg Migration] Migrate aqs hourly tables to Iceberg from Next Up to In progress on the Data-Engineering (Sprint 6) board.

Dec 18 2023, 2:53 PM · Data-Engineering (Sprint 8)

Dec 16 2023

tchin added a comment to T352669: [Iceberg Migration] Migrate aqs hourly tables to Iceberg.

Tested on a stat machine with

CREATE EXTERNAL TABLE IF NOT EXISTS `aqs_hourly`(  
    `cache_status`      string     COMMENT 'Cache status',  
    `http_status`       string     COMMENT 'HTTP status of response',  
    `http_method`       string     COMMENT 'HTTP method of request',  
    `response_size`     bigint     COMMENT 'Response size',  
    `uri_host`          string     COMMENT 'Host of request',  
    `uri_path`          string     COMMENT 'Path of request',  
    `request_count`     bigint     COMMENT 'Number of requests',  
    `hour`              timestamp  COMMENT 'The aggregated hour. Covers from minute 00 to 59'  
)  
USING ICEBERG
PARTITIONED BY (days(hour))
;

And

spark3-sql --master yarn --executor-memory 8G --executor-cores 4 --driver-memory 2G --conf spark.dynamicAllocation.maxExecutors=64 \
-f aqs_hourly_iceberg.hql  \
-d source_table=wmf.webrequest \
-d webrequest_source=text \
-d destination_table=tchin.aqs_hourly \
-d coalesce_partitions=1 \
-d year=2023 \
-d month=12 \
-d day=3 \
-d hour=0

Dec 16 2023, 4:23 AM · Data-Engineering (Sprint 8)

Dec 14 2023

tchin changed the status of T352669: [Iceberg Migration] Migrate aqs hourly tables to Iceberg, a subtask of T333013: [Iceberg Migration] Apache Iceberg Migration, from Open to In Progress.

Dec 14 2023, 6:16 AM · Data-Engineering, Epic

tchin changed the status of T352669: [Iceberg Migration] Migrate aqs hourly tables to Iceberg from Open to In Progress.

Dec 14 2023, 6:16 AM · Data-Engineering (Sprint 8)

Dec 11 2023

tchin awarded T311866: Migrate Database::select usages to SelectQueryBuilder (in WMF-deployed extensions) a Barnstar token.

Dec 11 2023, 3:10 PM · MW-1.41-notes (1.41.0-wmf.25; 2023-09-05), MW-1.40-notes (1.40.0-wmf.26; 2023-03-06), MW-1.39-notes (1.39.0-wmf.26; 2022-08-22), Data-Persistence (work done), Platform Engineering

tchin claimed T352669: [Iceberg Migration] Migrate aqs hourly tables to Iceberg.

Dec 11 2023, 2:45 PM · Data-Engineering (Sprint 8)

Dec 2 2023

tchin awarded T347347: Make "Quick" MW install a thing a Love token.

Dec 2 2023, 11:21 PM · MW-1.42-notes (1.42.0-wmf.12; 2024-01-02), User-zeljkofilipin, MediaWiki-Platform-Team, MediaWiki-Documentation

Nov 14 2023

tchin added a comment to T351092: [harbor,docs] Improve Harbor quota handling and docs.

I think the per-image quota should probably be increased. I tested building a few projects locally and a project with NodeJS and 0 dependencies results in a built image that's 805.58 MB. One with only VueJS as a dependency bumps it up to 858.13 MB. I'm probably not going to be the last one who needs more than 200 MB of working space :/

Nov 14 2023, 4:40 AM · Toolforge (Toolforge iteration 12), Documentation

Nov 13 2023

tchin added a comment to T351092: [harbor,docs] Improve Harbor quota handling and docs.

Example error:

step-export: 2023-11-13T05:41:56.835942824Z ERROR: failed to export: failed to write image to the following tags: [tools-harbor.wmcloud.org/tool-dpe-alerts-dashboard/tool-dpe-alerts-dashboard:latest: PATCH https://tools-harbor.wmcloud.org/v2/tool-dpe-alerts-dashboard/tool-dpe-alerts-dashboard/blobs/uploads/b62dd944-4fad-4ee8-b900-8409f7860d6c?_state=REDACTED: unexpected status code 413 Request Entity Too Large: <html>
step-export: 2023-11-13T05:41:56.835973012Z <head><title>413 Request Entity Too Large</title></head>
step-export: 2023-11-13T05:41:56.835976984Z <body>
step-export: 2023-11-13T05:41:56.835979969Z <center><h1>413 Request Entity Too Large</h1></center>
step-export: 2023-11-13T05:41:56.835983468Z <hr><center>nginx/1.18.0</center>
step-export: 2023-11-13T05:41:56.836002364Z </body>
step-export: 2023-11-13T05:41:56.836005027Z </html>
step-export: 2023-11-13T05:41:56.836008032Z ]
step-export: 
step-results: 2023-11-13T05:41:57.433667715Z 2023/11/13 05:41:57 Skipping step because a previous step failed

Nov 13 2023, 2:57 PM · Toolforge (Toolforge iteration 12), Documentation

Oct 26 2023

tchin added a comment to T347706: [Data Quality] [SPIKE] Document Current Logging, Monitoring and Data Quality Checks for Unique Devices.

Current version of the writeup is here

Oct 26 2023, 3:53 PM · Data Engineering and Event Platform Team (Sprint 4)

Oct 11 2023

tchin added a comment to T345389: [SPIKE] Should we introduce static typing to Event Platform nodejs codebases?.

If we do introduce something, we should use JSDoc3 and follow what's happening on this ticket T138401

Oct 11 2023, 2:28 PM · Data-Engineering, Event-Platform

Oct 3 2023

tchin moved T347706: [Data Quality] [SPIKE] Document Current Logging, Monitoring and Data Quality Checks for Unique Devices from Next Up to In progress on the Data Engineering and Event Platform Team (Sprint 3) board.

Oct 3 2023, 5:39 PM · Data Engineering and Event Platform Team (Sprint 4)

Sep 29 2023

tchin added a comment to T347676: Partition reassignment on kafka-jumbo negatively impacting mw-page-content-change-enrich.

DeliveryGuarantee.AT_LEAST_ONCE: The sink will wait for all outstanding records in the Kafka buffers to be acknowledged by the Kafka producer on a checkpoint. No messages will be lost in case of any issue with the Kafka brokers but messages may be duplicated when Flink restarts because Flink reprocesses old input records.

https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/connectors/datastream/kafka/#fault-tolerance

Sep 29 2023, 12:11 PM · Event-Platform, Data Engineering and Event Platform Team, Data-Engineering, Data-Platform-SRE

tchin merged T347615: mw-page-content-change-enrich not checkpointing into T347676: Partition reassignment on kafka-jumbo negatively impacting mw-page-content-change-enrich.

Sep 29 2023, 11:57 AM · Event-Platform, Data Engineering and Event Platform Team, Data-Engineering, Data-Platform-SRE

tchin merged task T347615: mw-page-content-change-enrich not checkpointing into T347676: Partition reassignment on kafka-jumbo negatively impacting mw-page-content-change-enrich.

Sep 29 2023, 11:57 AM · Data Engineering and Event Platform Team (Sprint 2), Data-Engineering, Event-Platform

Sep 28 2023

tchin added a comment to T347615: mw-page-content-change-enrich not checkpointing.

Unaligned checkpoints didn't work. Maybe it's because of data being moved around to new brokers and Kafka is too overloaded.

Sep 28 2023, 6:04 PM · Data Engineering and Event Platform Team (Sprint 2), Data-Engineering, Event-Platform

tchin updated subscribers of T347615: mw-page-content-change-enrich not checkpointing.

Sep 28 2023, 6:00 PM · Data Engineering and Event Platform Team (Sprint 2), Data-Engineering, Event-Platform

tchin moved T347615: mw-page-content-change-enrich not checkpointing from Data Eng Backlog to Sprint 2 on the Data Engineering and Event Platform Team board.

Sep 28 2023, 5:59 PM · Data Engineering and Event Platform Team (Sprint 2), Data-Engineering, Event-Platform

tchin moved T347615: mw-page-content-change-enrich not checkpointing from Next Up to In progress on the Data Engineering and Event Platform Team (Sprint 2) board.

Sep 28 2023, 5:59 PM · Data Engineering and Event Platform Team (Sprint 2), Data-Engineering, Event-Platform

tchin renamed T347615: mw-page-content-change-enrich not checkpointing from mw-page-content-change-enrich not checkpoint to mw-page-content-change-enrich not checkpointing.

Sep 28 2023, 5:27 PM · Data Engineering and Event Platform Team (Sprint 2), Data-Engineering, Event-Platform

tchin created T347615: mw-page-content-change-enrich not checkpointing.

Sep 28 2023, 5:26 PM · Data Engineering and Event Platform Team (Sprint 2), Data-Engineering, Event-Platform

tchin added a comment to T347521: Troubleshoot mw-page-content-change-enrich and flink-operator.

@bking Gabriele is currently on sick leave but yes let's try incrementing the helm chart version

Sep 28 2023, 1:29 PM · Data-Platform-SRE

Sep 19 2023

tchin placed T287405: Refactor ILocalizedException to be DI-friendly. up for grabs.

Sep 19 2023, 6:59 AM · MediaWiki-General, MW-1.41-notes (1.41.0-wmf.30; 2023-10-10), Patch-For-Review, User-thiemowmde, WMDE-TechWish-Maintenance, Move-Files-To-Commons, MW-1.37-notes (1.37.0-wmf.23; 2021-09-13), Dependency injection, User-DannyS712, Platform Team Workboards (MW Expedition)

tchin placed T291009: LoadExtensionSchemaUpdates hook needs to have access to Config up for grabs.

Sep 19 2023, 6:58 AM · MediaWiki-Core-Hooks, Platform Team Workboards (MW Expedition)

Aug 31 2023

tchin added a comment to T344511: Enum with an entry of `null` should fail jsonschema-tools validation.

Associated GitHub PR: https://github.com/wikimedia/jsonschema-tools/pull/48

Aug 31 2023, 5:21 PM · Data Engineering and Event Platform Team (Sprint 2), Data-Engineering, Event-Platform, Patch-For-Review

Aug 29 2023

tchin moved T344511: Enum with an entry of `null` should fail jsonschema-tools validation from Next Up to In progress on the Data Engineering and Event Platform Team (Sprint 1) board.

Aug 29 2023, 5:34 PM · Data Engineering and Event Platform Team (Sprint 2), Data-Engineering, Event-Platform, Patch-For-Review

tchin edited projects for T344511: Enum with an entry of `null` should fail jsonschema-tools validation, added: Data Engineering and Event Platform Team (Sprint 1); removed Data Engineering and Event Platform Team.

Aug 29 2023, 5:34 PM · Data Engineering and Event Platform Team (Sprint 2), Data-Engineering, Event-Platform, Patch-For-Review

tchin claimed T344511: Enum with an entry of `null` should fail jsonschema-tools validation.

Aug 29 2023, 5:19 PM · Data Engineering and Event Platform Team (Sprint 2), Data-Engineering, Event-Platform, Patch-For-Review

tchin added a comment to T344511: Enum with an entry of `null` should fail jsonschema-tools validation.

Seems like in jsonschema-tools the enums are only validated through ajv and their strict union type checking allows null so will have to implement the check ourselves

Aug 29 2023, 5:18 PM · Data Engineering and Event Platform Team (Sprint 2), Data-Engineering, Event-Platform, Patch-For-Review

Aug 28 2023

tchin added a comment to T318863: [Event Platform] Event Platform and DataHub Integration.

While adding a workaround to T344235, I noticed that additionalProperties isn't very well represented in DataHub.

"custom_data": {
    "additionalProperties": {
        "properties": {
            "data_type": {
                "type": "string",
                "enum": ["number", "string", "boolean", "null"],
            }
        }
    },
    "propertyNames": {
        "maxLength": 255,
        "minLength": 1,
        "pattern": "^[$a-z]+[a-z0-9_]*$",
    },
},

Just shows up in DataHub as a Struct with no defined nested fields (which I guess makes sense, but is not helpful).

Aug 28 2023, 5:52 AM · Data Engineering and Event Platform Team (Sprint 3), Data-Engineering, Data-Catalog, Event-Platform

Aug 22 2023

tchin added a comment to T318863: [Event Platform] Event Platform and DataHub Integration.

From the recent meeting:

Event Streams will be the name of the platform
Streams are upstream to Kafka topics

Aug 22 2023, 6:30 PM · Data Engineering and Event Platform Team (Sprint 3), Data-Engineering, Data-Catalog, Event-Platform

tchin added a comment to T318863: [Event Platform] Event Platform and DataHub Integration.

After experimenting a lot, I have a Datahub transformer for Kafka that generates an Event Streams platform, adds description, schema, and path. However, I don't know if it should be a transformer since it's doing a bit more than just transforming.

Aug 22 2023, 5:23 PM · Data Engineering and Event Platform Team (Sprint 3), Data-Engineering, Data-Catalog, Event-Platform

Aug 18 2023

tchin created T344511: Enum with an entry of `null` should fail jsonschema-tools validation.

Aug 18 2023, 5:55 PM · Data Engineering and Event Platform Team (Sprint 2), Data-Engineering, Event-Platform, Patch-For-Review

Aug 16 2023

tchin added a comment to T318863: [Event Platform] Event Platform and DataHub Integration.

Since Datahub has the concept of platforms, I think the best way forward is to have a separate platform called Event Streams where the datasets under it are the streams defined in the stream config. We can then keep the Kafka platform for all the individual Kafka topics. Then what we can do is have a transform attached to the current Kafka ingestion recipe that will attach the schemas to the individual topics when supported but also at the same time insert the streams into the Event Streams platform. This way we can have the schemas on both the stream and its topics

Aug 16 2023, 6:27 PM · Data Engineering and Event Platform Team (Sprint 3), Data-Engineering, Data-Catalog, Event-Platform

Jul 28 2023

tchin claimed T341277: mediawiki page_content_change should generate new meta.id field.

Jul 28 2023, 5:54 AM · Data Engineering and Event Platform Team (Sprint 1), Data-Engineering, Event-Platform

tchin moved T341277: mediawiki page_content_change should generate new meta.id field from Next Up to In Review on the Data Engineering and Event Platform Team (Sprint 1) board.

Jul 28 2023, 5:54 AM · Data Engineering and Event Platform Team (Sprint 1), Data-Engineering, Event-Platform

Jul 12 2023

tchin added a comment to T340765: jsonschema-tools test should fail if fields are removed in new (non major) version.

On the wiki for schema guidelines there's a blanket statement that all modifications should be backwards compatible - I assume this doesn't apply to major version changes so will note that

Jul 12 2023, 4:12 AM · Data Engineering and Event Platform Team (Sprint 0), Event-Platform (Sprint 14 B), Data-Engineering