#wikimedia-office: TechCom RFC meeting
Meeting started by RoanKattouw at 21:03:40 UTC
(full logs).
Meeting summary
- Job queue issues (RoanKattouw, 21:03:47)
- https://www.mediawiki.org/wiki/User:Daniel_Kinzler_(WMDE)/Job_Queue
(RoanKattouw,
21:03:54)
- https://etherpad.wikimedia.org/p/JobQueue-ircmeeting
(DanielK_WMDE__,
21:04:01)
- <mobrovac> the scheduling in eventbus
will be per job type, in order of ingress (DanielK_WMDE__,
21:12:55)
- <TimStarling> having fairness of
scheduling between wikis was a deliberate design decision, IMHO
important and useful (DanielK_WMDE__,
21:13:49)
- <Krinkle> The end use case that should
remain is that if a wiki is dormant and I schedule 1 job there, it
should run nearly instantly no matter what. (DanielK_WMDE__,
21:15:10)
- <Krinkle> _joe_: we should confirm then
if the problem is the "wasting of time" on subjective unimportant
jobs, or the waste on cycles checking/switching wikis. The former
might be a hard sell. (DanielK_WMDE__,
21:16:28)
- if there are a lot of jobs being queued from a
given wiki, it makes sense to defer those jobs for a while so that
deduplication can take effect (DanielK_WMDE__,
21:18:57)
- <brion> queue size as a raw number is
pretty meta... lag is human-scaled :) (DanielK_WMDE__,
21:24:55)
- <DanielK_WMDE__> mobrovac: a LRU list of
job signatures recently seen locally, kept in memory. that would be
quick-and-dirty dedupe before push. (DanielK_WMDE__,
21:25:08)
- <MaxSem|grrrr> why instead of deduping
not just use page_touched to skip [RefreshLinksJobs] quickly?
(DanielK_WMDE__,
21:27:26)
- <Krinkle> I believe it is spenidng most
time waiting for replag. A job quue write is not complete until
after we wait for all slaves to have replicated the write.
(DanielK_WMDE__,
21:30:42)
- <Platonides> from a wiki user POV, it
should be possible to see the backlog for a wiki, so that you could
know "all changes made before <two weeks> have taken effect"
or "wait two weeks for all transclusions to update" (legoktm,
21:30:52)
- waiting for replication makes sense. DB
throughput is a hard limit on job execution, and should be. batching
can improve that. but batching kills deduplication (DanielK_WMDE__,
21:32:32)
- maybe the runner could keep track of the avg
execution time per job type, and consider that value for scheduling
fairness. so a large job on one wiki would count for many small jobs
on another wiki (DanielK_WMDE__,
21:36:30)
- https://github.com/wikimedia/budgeteer
(gwicke,
21:36:51)
- <TimStarling> for third parties I think
we should do like wordpress and have a cron.php which you hit from
cron with curl (DanielK_WMDE__,
21:41:28)
- Stock MediaWiki job runner
(maintenance/runJobs.php) invokes JobRunner class directly, not over
HTTP. For cache and config consistency, we should consider
standardising on Special:RunJobs over http. (Krinkle,
21:44:15)
- MediaWiki by default will run one job per web
request
<https://www.mediawiki.org/wiki/Manual:$wgJobRunRate>
(legoktm,
21:45:23)
- [with kafka] there is a combination of
concurrency limiting, and cost-based rate limiting; with cost being
typically dominated by execution cost (DanielK_WMDE__,
21:47:40)
- <Krinkle> It sounds like the new stack
performs an HTTP call to MediaWiki/rpc for each individual job,
whereas the current model does it per batch (wiki+job type+batch
limits) (DanielK_WMDE__,
21:48:19)
- <_joe_> So regarding deduplication, I am
unsure how effective it is, because there is actually no way to tell
right now <_joe_> I don't think it would be hard to compute
those data <_joe_> we just don't (DanielK_WMDE__,
21:57:42)
- 14:54:00 <TimStarling> my test wiki now
unconditionally queues 6 jobs per edit, it used to be between 0 and
1 (RoanKattouw,
21:58:15)
- https://grafana.wikimedia.org/dashboard/db/job-queue-rate?panelId=7&fullscreen&orgId=1
(gwicke,
22:00:30)
- https://github.com/wikimedia/mediawiki/commit/cb7c910ba72bdf4c2c2f5fa7e7dd307f98e5138e
(Krinkle,
22:03:05)
Meeting ended at 22:05:14 UTC
(full logs).
Action items
- (none)
People present (lines said)
- Krinkle (87)
- DanielK_WMDE__ (84)
- _joe_ (78)
- gwicke (60)
- TimStarling (36)
- brion (22)
- RoanKattouw (19)
- Platonides (13)
- mobrovac (13)
- Pchelolo (6)
- legoktm (5)
- wm-labs-meetbot (3)
- MaxSem|grrrr (3)
- bd808 (3)
- Scott_WUaS (2)
- Amir1 (2)
- SMalyshev (2)
- stashbot (1)
- no_justification (1)
- volans (1)
- kaldari (1)
Generated by MeetBot 0.1.4.