Description

I hit UnicodeDecodeError in xmlrpc_mergeDiff function in MoinMoin/xmlrpc/init.py.

The error is raised in the following code:

# generate the new page revision by applying the diff
newcontents = patch(basepage.get_raw_body_str(), decompress(str(diff)))
#print "Diff against %r" % basepage.get_raw_body_str()

# write page
try:
    currentpage.saveText(newcontents.decode("utf-8"), last_remote_rev or 0, comment=comment)
except PageEditor.Unchanged: # could happen in case of both wiki's pages being equal
    pass
except PageEditor.EditConflict:
    return LASTREV_INVALID

Probably the binary patch() may break utf-8 characters. My wiki is in Russian so pages contain multibyte characters. As result SyncPages does not work.

Steps to reproduce

  1. Make page A in a remove wiki.
    •     Revision 1.1
  2. Do sync
  3. Take full cold backup of the remote wiki.
  4. Edit page A in the remote wiki.
    •     Revision 1.1
          Revision 1.2
  5. Do sync
  6. Restore the remote wiki from backup.
  7. Edit page A in the remote wiki. It will get the same revision number 2, but will have different content.
    •     Revision 1.1
          Revision 2.2
  8. Edit page A in the remote wiki one more time to trigger synchronization.
    •     Revision 1.1
          Revision 2.2
          Revision 2.3
  9. Do sync.
  10. Local page A gets wrong content
    •     Revision 1.1
          Revision 1.2 <-- ERROR should be Revision 2.2
          Revision 2.3

Sorry no stack trace, but page is corrupted. If I would use multibyte characters it is possible to create wrong character that will produce stack trace.

Example

Component selection

Details

MoinMoin Version

1.9.3

OS and Version

Ubuntu 10.10

Python Version

2.6.6

Server Setup

Standalone

Server Details

Language you are using the wiki in (set in the browser/UserPreferences)

Russian (ru_RU)

Workaround

Unknown

Discussion

2010-11-07

Hit another UnicodeDecodeError with the following stack trace

...
File "***/MoinMoin/wsgiapp.py", line 195, in handle_action
  handler(context.page.page_name, context)
File "***/MoinMoin/action/SyncPages.py", line 519, in execute
  ActionClass(pagename, request).render()
File "***/MoinMoin/action/SyncPages.py", line 220, in render
  self.sync(params, local, remote)
File "***/MoinMoin/action/SyncPages.py", line 515, in sync
  rpc_aggregator.scheduler(remote.create_multicall_object, handle_page, m_pages, 8, remote.prepare_multicall)
File "***/MoinMoin/util/rpc_aggregator.py", line 73, in scheduler
  call = gen.fetch_call()
File "***/MoinMoin/util/rpc_aggregator.py", line 32, in fetch_call
  next_item = self._gen.next()
File "***/MoinMoin/action/SyncPages.py", line 442, in run
  remote_contents_unicode = remote_contents.decode("utf-8")
File "/usr/lib/python2.6/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)

I think it is the same error, but occured on a local side instead of remote. Debugger shows that SyncPages.py tries to reconstruct current page contents from local base and remote diff, while local base revision and remote base revisions have different contents. The result is garbage. Most of such merges may happen without errors. I'm lucky that patch break utf-8 character, I got a chance to notice page corruption.

Suppose, the of the issue is frequent wiki backup/restore (same as in MoinMoinBugs/1.9WikiSyncCorruptedSynctags). After restore page revision number may step back. Then after some edits it gets the same number as in sync time, but will probably have different content. Subsequent sync will try to blindly use this different revision in place of old lost revison and constructs invalid diff.

Will try to make a test case.

I can't currently reproduce the merge issue but half of the issue is the same as MoinMoinBugs/1.9WikiSyncCorruptedSynctags -- ReimarBauer 2012-02-05 12:24:23

Plan


CategoryMoinMoinBug

MoinMoin: MoinMoinBugs/1.9WikiSyncUnicodeDecodeErrorInMergeDiff (last edited 2012-02-05 12:24:23 by ReimarBauer)