Description

For a wiki with a significant number of users, saving a page is slow due to an inefficient implementation of getSubscribers. This is similar to the problem described in MoinMoinBugs/GetSubscribersPerformanceProblem, but it doesn't address the other cause of slowness in the getSubscribers function.

Steps to reproduce

  1. Have a few thousand user accounts
  2. truss/strace the web process
  3. Save a change to a page, and notice that every user file is scanned on every page save.
  4. On this page, go to Info > General Page Infos and notice that it takes 5-7 seconds for the page to load. Most of the time is spend getting subscriber information.

  5. After implementing a workaround (described below), page saves and "General Page Infos" load significantly faster.

Example

(no example)

Component selection

Details

MoinMoin Version

1.8.3

OS and Version

Solaris 10

Python Version

2.5

Server Setup

apache + mod_wsgi

Server Details

Language you are using the wiki in (set in the browser/UserPreferences)

en

Workaround

Here's a patch to cache page subscribers. This significantly speeds up page saves and the general info page.

Page.py

   1 --- Page.py     28 Apr 2009 16:08:04 -0000      1.8
   2 +++ Page.py     6 May 2009 05:46:14 -0000
   3 @@ -813,7 +813,7 @@
   4 
   5          return link
   6 
   7 -    def getSubscribers(self, request, **kw):
   8 +    def getSubscribersNotCached(self, request, **kw):  # original function renamed
   9          """ Get all subscribers of this page.
  10 
  11          @param request: the request object
  12 @@ -876,6 +876,106 @@
  13 
  14          return subscriber_list
  15 
  16 +    # cached version of getSubscribers
  17 +    def getSubscribers(self, request, **kw):
  18 +        """ Get all subscribers of this page.
  19 +
  20 +        @param request: the request object
  21 +        @keyword include_self: if 1, include current user (default: 0)
  22 +        @keyword return_users: if 1, return user instances (default: 0)
  23 +        @rtype: dict
  24 +        @return: lists of subscribed email addresses in a dict by language key
  25 +        """
  26 +        include_self = kw.get('include_self', self.include_self)
  27 +        return_users = kw.get('return_users', 0)
  28 +
  29 +        request.clock.start('getsubscribers')
  30 +        # extract categories of this page
  31 +        pageList = self.getCategories(request)
  32 +
  33 +        # add current page name for list matching
  34 +        pageList.append(self.page_name)
  35 +
  36 +        arena = 'user'
  37 +        key = 'page_sub'
  38 +
  39 +        # get or create cache file
  40 +        page_sub = {}
  41 +        cache = caching.CacheEntry(request, arena, key, scope='wiki', use_pickle=True)
  42 +        if cache.exists():
  43 +            page_sub = cache.content()
  44 +        else:
  45 +            #build a cache if it doesn't exist
  46 +            userlist = user.getUserList(request)
  47 +            for uid in userlist:
  48 +                subscriber = user.User(request, uid)
  49 +                # we don't care about storing entries for users without any page subscriptions
  50 +                if subscriber.subscribed_pages:
  51 +                    page_sub[subscriber.id] = {
  52 +                        'name': subscriber.name,
  53 +                        'email': subscriber.email,
  54 +                        'subscribed_pages': subscriber.subscribed_pages
  55 +                    }
  56 +            cache.update(page_sub)
  57 +
  58 +        if self.cfg.SecurityPolicy:
  59 +            UserPerms = self.cfg.SecurityPolicy
  60 +        else:
  61 +            from MoinMoin.security import Default as UserPerms
  62 +
  63 +        # get email addresses of the all wiki user which have a profile stored;
  64 +        # add the address only if the user has subscribed to the page and
  65 +        # the user is not the current editor
  66 +        userlist = page_sub
  67 +        subscriber_list = {}
  68 +
  69 +        pages = pageList[:]
  70 +        if request.cfg.interwikiname:
  71 +            pages += ["%s:%s" % (request.cfg.interwikiname, pagename) for pagename in pageList]
  72 +        # Create text for regular expression search
  73 +        text = '\n'.join(pages)
  74 +
  75 +        for uid in userlist.keys():
  76 +            if uid == request.user.id and not include_self:
  77 +                continue # no self notification
  78 +
  79 +            isSubscribed = False
  80 +
  81 +            # This is a bit wrong if return_users=1 (which implies that the caller will process
  82 +            # user attributes and may, for example choose to send an SMS)
  83 +            # So it _should_ be "not (subscriber.email and return_users)" but that breaks at the moment.
  84 +            if not userlist[uid]['email']:
  85 +                continue # skip empty email addresses
  86 +
  87 +            # now check patters for actual match
  88 +            for pattern in userlist[uid]['subscribed_pages']:
  89 +                if pattern in pages:
  90 +                    isSubscribed = True
  91 +                try:
  92 +                    pattern = re.compile(r'^%s$' % pattern, re.M)
  93 +                except re.error:
  94 +                    continue
  95 +                if pattern.search(text):
  96 +                    isSubscribed = True
  97 +
  98 +            # only if subscribed, then read user info from file
  99 +            if isSubscribed:
 100 +                subscriber = user.User(request, uid)
 101 +
 102 +                if not UserPerms(subscriber).read(self.page_name):
 103 +                    continue
 104 +
 105 +                lang = subscriber.language or request.cfg.language_default
 106 +                if not lang in subscriber_list:
 107 +                    subscriber_list[lang] = []
 108 +                if return_users:
 109 +                    subscriber_list[lang].append(subscriber)
 110 +                else:
 111 +                    subscriber_list[lang].append(subscriber.email)
 112 +
 113 +        request.clock.stop('getsubscribers')
 114 +        return subscriber_list
 115 +
 116      def parse_processing_instructions(self):
 117          """ Parse page text and extract processing instructions,
 118              return a dict of PIs and the non-PI rest of the body.

user.py

   1 --- user.py     27 Apr 2009 23:29:15 -0000      1.1.1.5
   2 +++ user.py     6 May 2009 05:48:58 -0000
   3 @@ -596,6 +596,9 @@
   4              event = events.UserCreatedEvent(self._request, self)
   5              events.send_event(event)
   6 
   7 +        # update page subscriber's cache after saving user preferences
   8 +        self.updatePageSubCache()
   9 +
  10      # -----------------------------------------------------------------
  11      # Time and date formatting
  12 
  13 @@ -783,6 +786,34 @@
  14              self.save()
  15          return not self.isSubscribedTo([pagename])
  16 
  17 +    # update page subscribers cache
  18 +    def updatePageSubCache(self):
  19 +        """ When a user changes his preferences, we update the
  20 +        page subscriber's cache
  21 +        """
  22 +
  23 +        arena = 'user'
  24 +        key = 'page_sub'
  25 +
  26 +        page_sub = {}
  27 +        cache = caching.CacheEntry(self._request, arena, key, scope='wiki', use_pickle=True)
  28 +        if not cache.exists():
  29 +            return  # if no cache file, just don't do anything
  30 +
  31 +        page_sub = cache.content()
  32 +
  33 +        # we don't care about storing entries for users without any page subscriptions
  34 +        if self.subscribed_pages:
  35 +            page_sub[self.id] = {
  36 +                'name': self.name,
  37 +                'email': self.email,
  38 +                'subscribed_pages': self.subscribed_pages
  39 +            }
  40 +        elif page_sub.get(self.id):
  41 +            del page_sub[self.id]
  42 +
  43 +        cache.update(page_sub)
  44 +
  45      # -----------------------------------------------------------------
  46      # Quicklinks

I know that this duplicates some code in other places, but I wanted to make the diff as self contained as possible.

Discussion

I reviewed the patch (didn't try it yet) and it looks quite good (for the goal of touching not too much code, but as you noted, it duplicates some code and that should be solved later, also we likely need more tests).

What's maybe missing (see user.User.isSubscribedTo) is the .valid check - it has to be checked when .valid is True.

So:

Attached an updated patch after feedback from Thomas that it could do with some locking - see subscribercache.patch

User Feedback?

Who is using this patch? Can you tell about your results with it?

Plan


CategoryMoinMoinBug CategoryFeaturePatched CategoryMoinMoinPatch

MoinMoin: MoinMoinBugs/GetSubscribersSlow (last edited 2012-04-29 18:28:02 by lump)