Description
For a wiki with a significant number of users, saving a page is slow due to an inefficient implementation of getSubscribers. This is similar to the problem described in MoinMoinBugs/GetSubscribersPerformanceProblem, but it doesn't address the other cause of slowness in the getSubscribers function.
Steps to reproduce
- Have a few thousand user accounts
- truss/strace the web process
- Save a change to a page, and notice that every user file is scanned on every page save.
On this page, go to Info > General Page Infos and notice that it takes 5-7 seconds for the page to load. Most of the time is spend getting subscriber information.
- After implementing a workaround (described below), page saves and "General Page Infos" load significantly faster.
Example
(no example)
Component selection
- This happens in the getSubscribers function of Page.py since it iterates through every user file.
Details
MoinMoin Version |
1.8.3 |
OS and Version |
Solaris 10 |
Python Version |
2.5 |
Server Setup |
apache + mod_wsgi |
Server Details |
|
Language you are using the wiki in (set in the browser/UserPreferences) |
en |
Workaround
Here's a patch to cache page subscribers. This significantly speeds up page saves and the general info page.
Page.py
1 --- Page.py 28 Apr 2009 16:08:04 -0000 1.8
2 +++ Page.py 6 May 2009 05:46:14 -0000
3 @@ -813,7 +813,7 @@
4
5 return link
6
7 - def getSubscribers(self, request, **kw):
8 + def getSubscribersNotCached(self, request, **kw): # original function renamed
9 """ Get all subscribers of this page.
10
11 @param request: the request object
12 @@ -876,6 +876,106 @@
13
14 return subscriber_list
15
16 + # cached version of getSubscribers
17 + def getSubscribers(self, request, **kw):
18 + """ Get all subscribers of this page.
19 +
20 + @param request: the request object
21 + @keyword include_self: if 1, include current user (default: 0)
22 + @keyword return_users: if 1, return user instances (default: 0)
23 + @rtype: dict
24 + @return: lists of subscribed email addresses in a dict by language key
25 + """
26 + include_self = kw.get('include_self', self.include_self)
27 + return_users = kw.get('return_users', 0)
28 +
29 + request.clock.start('getsubscribers')
30 + # extract categories of this page
31 + pageList = self.getCategories(request)
32 +
33 + # add current page name for list matching
34 + pageList.append(self.page_name)
35 +
36 + arena = 'user'
37 + key = 'page_sub'
38 +
39 + # get or create cache file
40 + page_sub = {}
41 + cache = caching.CacheEntry(request, arena, key, scope='wiki', use_pickle=True)
42 + if cache.exists():
43 + page_sub = cache.content()
44 + else:
45 + #build a cache if it doesn't exist
46 + userlist = user.getUserList(request)
47 + for uid in userlist:
48 + subscriber = user.User(request, uid)
49 + # we don't care about storing entries for users without any page subscriptions
50 + if subscriber.subscribed_pages:
51 + page_sub[subscriber.id] = {
52 + 'name': subscriber.name,
53 + 'email': subscriber.email,
54 + 'subscribed_pages': subscriber.subscribed_pages
55 + }
56 + cache.update(page_sub)
57 +
58 + if self.cfg.SecurityPolicy:
59 + UserPerms = self.cfg.SecurityPolicy
60 + else:
61 + from MoinMoin.security import Default as UserPerms
62 +
63 + # get email addresses of the all wiki user which have a profile stored;
64 + # add the address only if the user has subscribed to the page and
65 + # the user is not the current editor
66 + userlist = page_sub
67 + subscriber_list = {}
68 +
69 + pages = pageList[:]
70 + if request.cfg.interwikiname:
71 + pages += ["%s:%s" % (request.cfg.interwikiname, pagename) for pagename in pageList]
72 + # Create text for regular expression search
73 + text = '\n'.join(pages)
74 +
75 + for uid in userlist.keys():
76 + if uid == request.user.id and not include_self:
77 + continue # no self notification
78 +
79 + isSubscribed = False
80 +
81 + # This is a bit wrong if return_users=1 (which implies that the caller will process
82 + # user attributes and may, for example choose to send an SMS)
83 + # So it _should_ be "not (subscriber.email and return_users)" but that breaks at the moment.
84 + if not userlist[uid]['email']:
85 + continue # skip empty email addresses
86 +
87 + # now check patters for actual match
88 + for pattern in userlist[uid]['subscribed_pages']:
89 + if pattern in pages:
90 + isSubscribed = True
91 + try:
92 + pattern = re.compile(r'^%s$' % pattern, re.M)
93 + except re.error:
94 + continue
95 + if pattern.search(text):
96 + isSubscribed = True
97 +
98 + # only if subscribed, then read user info from file
99 + if isSubscribed:
100 + subscriber = user.User(request, uid)
101 +
102 + if not UserPerms(subscriber).read(self.page_name):
103 + continue
104 +
105 + lang = subscriber.language or request.cfg.language_default
106 + if not lang in subscriber_list:
107 + subscriber_list[lang] = []
108 + if return_users:
109 + subscriber_list[lang].append(subscriber)
110 + else:
111 + subscriber_list[lang].append(subscriber.email)
112 +
113 + request.clock.stop('getsubscribers')
114 + return subscriber_list
115 +
116 def parse_processing_instructions(self):
117 """ Parse page text and extract processing instructions,
118 return a dict of PIs and the non-PI rest of the body.
user.py
1 --- user.py 27 Apr 2009 23:29:15 -0000 1.1.1.5
2 +++ user.py 6 May 2009 05:48:58 -0000
3 @@ -596,6 +596,9 @@
4 event = events.UserCreatedEvent(self._request, self)
5 events.send_event(event)
6
7 + # update page subscriber's cache after saving user preferences
8 + self.updatePageSubCache()
9 +
10 # -----------------------------------------------------------------
11 # Time and date formatting
12
13 @@ -783,6 +786,34 @@
14 self.save()
15 return not self.isSubscribedTo([pagename])
16
17 + # update page subscribers cache
18 + def updatePageSubCache(self):
19 + """ When a user changes his preferences, we update the
20 + page subscriber's cache
21 + """
22 +
23 + arena = 'user'
24 + key = 'page_sub'
25 +
26 + page_sub = {}
27 + cache = caching.CacheEntry(self._request, arena, key, scope='wiki', use_pickle=True)
28 + if not cache.exists():
29 + return # if no cache file, just don't do anything
30 +
31 + page_sub = cache.content()
32 +
33 + # we don't care about storing entries for users without any page subscriptions
34 + if self.subscribed_pages:
35 + page_sub[self.id] = {
36 + 'name': self.name,
37 + 'email': self.email,
38 + 'subscribed_pages': self.subscribed_pages
39 + }
40 + elif page_sub.get(self.id):
41 + del page_sub[self.id]
42 +
43 + cache.update(page_sub)
44 +
45 # -----------------------------------------------------------------
46 # Quicklinks
I know that this duplicates some code in other places, but I wanted to make the diff as self contained as possible.
Discussion
I reviewed the patch (didn't try it yet) and it looks quite good (for the goal of touching not too much code, but as you noted, it duplicates some code and that should be solved later, also we likely need more tests).
What's maybe missing (see user.User.isSubscribedTo) is the .valid check - it has to be checked when .valid is True.
So:
- can you check the .valid stuff?
- check the patch with 1.8 and 1.9 repo code?
Attached an updated patch after feedback from Thomas that it could do with some locking - see subscribercache.patch
User Feedback?
Who is using this patch? Can you tell about your results with it?
Used by GNOME (http://live.gnome.org/). We have a large amount of users and MoinMoin is unworkable without it. We actually got offers to fully handle the switch to another Wiki software (this is pretty complicated due to the size of our wiki).
- see the 2 unanswered questions from above (discussion section).
- it is rather unlikely that moin 1.8 will see any "feature changes"
- for moin 1.9, only well-tested and maintained stuff will be acceptable
- moin2 has a sqlalchemy-based index for some stuff and likely will also use it for users.
We'd like to use this in the Debian Wiki (http://wiki.debian.org/), as we're seeing the same problem there.
Plan
- Priority:
- Assigned to:
- Status:
CategoryMoinMoinBug CategoryFeaturePatched CategoryMoinMoinPatch
