Boris Bobrov

E-Mail

<me AT bbreton DOT org>

Alternate Contact

<breton AT ARGH cynicmansion ARGH DOT ru>, xmpp:breton@jabber.ru, xmpp:breton@unstable.nl

Homepage/blog/wiki URL

none yet

Country (born / living in)

Uzbekistan

Academic experience

second year student at Moscow State University, Tashkent branch, applied math and informatics faculty

Your current occupation

Tashkent, Uzbekistan

Software projects you have already participated in

FOSS: https://github.com/bretonium/ - nothing big though
Closed source: A big site-catalog for a local company in Python, Django - about 11k LOC with HTML templates, mostly done single-handed, ajax and filtering of items based on admin-defined parameters; a couple of smaller sites (2-3k lines).

Experience Level

Experience with google summer of code: none
Experience in coding in general: started 2009
Experience in Python coding: started 2010
Experience in HTML: started 2007
Experience in CSS: started 2007
Experience in Javascript: started 2008
Your favorite programming language(s), best first: don't know yet, though I like LISP more and more.
Tools you use for development: for * - vim, git, google. For python: pdb. For HTML/css/js - firefox+firebug.
Did you already do full day work (8h/5d) over some weeks on some software project yet?: N (that project was more freelance-like, time planning by myself etc).
If not, is your motivation good enough that you think you can do that for MoinMoin?: Y.

Project(s) you apply for and your ideas for them

From GoogleSoc2012/InitialProjectIdeas:

Branching, merging and syncing

moin2 revision metadata has some foundations for the following operations, but they need to be implemented:
 * support multiple heads, multiple parents
 * merge heads, meta and data
 * user interface
 * transfer changes between wikis, requires: network api
 * offline editor wrapper

Branches

Now, when two revisions (without loss of generality the idea applies to >2 revisions) of the "Item0" are edited in the same time, the last one (named as "Item2" later) which gets saved becomes "current". Another (named as "Item1") gets saved with the same PARENTID, but before "Item2".

I am planning to show the last user that someone edited this Item in the same time with him and these 2 version are not merged. Until he (or someone else) merges these versions, a notification will be showed on the top of the article, which will still be in revision "Item0"_revision.

TODO: whether to allow user to edit article until he merged it or not. +0 for allow from me.

Another use of branches will be separation for "reviewed and showed" version and "current" version, which can be convenient for organizations which want to premoderate content. This will require a possibility to forbid to edit a branch. I am afraid I will not have enough time to do do that, so I will not include it in my proposal and will do it only if I have spare time in the end of the work.

Long-living branches are required for network synchronization, see below.

Now I cannot definitely tell how to display these branches to the user. One of possible ways - coloring rows of the same branch on +history. Another - padding of tables of the same branch.

Merging

When user presses "merge", he will be moved to a page where 2 different versions will be showed. In moin2 all(?) objects are Items, so we have to define a method for each kind of Items (which are now stored in items/__init__.py). Templates will be rendered in similar way as it is for showing or editing an item (_render_data or do_modify, though I like _render_data more - it is "controller" and Item is a "model", and model shouldn't know anything about how it will be represented).

After the user presses "save", a new version with multiple PARENTID gets saved. If someone tries to revert the change to Item1 or Item2, Item0 will become "current" and a message about required merge will be shown. + cases with non-existent parent.

Network and syncing

In moin1 a specially crafted page was used to enumerate pages for sync. For moin2 I suggest to stay with this approach, but to make a special type for listing pages to sync, where custom form for editing will be used. Custom type will be defined in items/__init__.py.

POST request is sent to remote server with names of pages and current revisions to retrieve and users credentials (if required). For each name in the list server will recursively get an Item, and each name from "itemtransclusions".

The synchronization will be both-sides, i.e. user will be able to "push" data to remote server and "pull" from remote server.

The process of pull (for example) can be described as:

while queue1:
    1. Client sends POST to remote server in order to get revision information. POST contains login and password;
    2. Server responses with list of revisions of the item;
    3. Client compares revid's with its own list, finds diff;
    4. Puts all the revids from diff to the queue2;

while queue2:
    1. For each revid in queue2 get item (using format defined in storage/middleware/serialization.py), sending login and password in POST. Possible alternative variants - "get by list" and "get all items with revid > this".
    2. Add each name from transclusions and links to queue1 if they are not in "already retrieved" list.

# for push login and password must be supplied as arguments of the POST request

Example of queue1 initialization:

   1 queue1 = []  # or queue = multithreading.Queue()
   2 queue1.append('ItemName')
   3 while queue1 or queue2:
   4     do_stuff_with_queue1()
   5     do_stuff_with_queue2()

"get all items with revid > this" can be done similar to '/+history/<itemname:item_name>'.

"get all items with revid > this" is one of possible options to code. It's highly unlikely that I will code that, because we can get problems when something gets changed in the past revisions and we shall have to retrieve a lot of unnecessary revisions.

Options:

enable history retrieving [x]
enable transclusions retrieving [x]
enable transclusions history retrieving [x]
depth of the links to retrieve [0-N]. 0 - no links are retrieved, 1 - linked elements are retrieved etc. If transcluded items have some links too, transclusions themselves have the same level as an item.

All the retrieving can be done in different threads. It is assumed in the code above, though I will not do parallel retrieving as part of GSoC2012 and will do a simple one-process retrieving.

After sync, items will be "fast-forward merged" (in terms of git) if ITEMID(XXX) and PARENTID(XXX) are the same or a request for merging will appear (see "notifications" and "branches" above). An option for forced replacement will be available, if checked, no merge request will appear anywhere and items will be simply replaced.

Network API (deprecated)

The idea below was criticized for being not general and as requiring a lot of copypaste. But I will leave it here temporary for now. The actual ideas is under this chapter with label "actual"

The idea is to implement network API slightly modifying views. Now they look like:

   1 @frontend.route('/+prefix/<param:param>', methods=xxx)
   2 def view(param):
   3     do_stuff_with_param()
   4     item = Item.get_item(param)
   5     item = Item.render(item)
   6     return render_template('template.html', item=item)

I suggest to have something like:

   1 @frontend.route('/+prefix/<param:param>', methods=xxx)
   2 @frontend.route('/+net/+prefix/<param:param>', methods=xxx, defaults=dict(net=True))
   3 def view(param, net):
   4     do_stuff_with_param()
   5     item = Item.get_item(param)  # [0]
   6     if not net:
   7         item = Item.render(item)
   8         return render_template('template.html', item=item)
   9     else:
  10         item = Item.serialize(item)
  11         return item

The code above is very abstract and will be applied to forms too.

[0]: the item here is not rendered yet and is in internal moin mimetype. This can be changed by passing params to this view.

The result of .serialize(item) will be a json like:

   1 {
   2     "page": '+modify',
   3     "forms":
   4     [
   5         {
   6             'name': 'NAME',
   7             'fields': ['field1', 'field2', 'field3']  # with indication of requireness etc
   8         }
   9     ],
  10     "item": '<serialized item with serialize_rev from serialization.py>'
  11 }

Why this approach? It will allow to reuse existing forms, to modify forms without changing the views. Serialization will be absolutely the same for all mimetypes - no need to add support for each Item.

Network API (actual)

The idea is to implement network API by modifying render_template (or by doing something like render_template = serialize_items). In fact, render_template will not be modified, but an additional layer between get_item() and render_template() will be added.

Now:

   1 def render_template(template, item, form=None, nav=None):
   2     return do_stuff()
   3 
   4 @frontend.route('/+prefix/<param:param>', methods=xxx)
   5 def view(param):
   6     do_stuff_with_param()
   7     item = Item.get_item(param)
   8     item = Item.render(item)
   9     return render_template('template.html', item=item)

I suggest to have something like:

   1 def render_template(template, item, **whatever):
   2     return do_stuff(item, whatever)
   3 
   4 def process_soup(htmlize, serialize=default_serializator, **kwargs):
   5     if flaskg.is_remote:
   6         return serialize(kwargs)  # maybe kwargs will be modified somehow
   7     else:
   8         return htmlize(kwargs)
   9 
  10 @frontend.route('/+prefix/<param:param>', methods=xxx)
  11 @frontend.route('/+net/+prefix/<param:param>', methods=xxx, defaults=dict(net=True))
  12 def view(param, net):
  13     def htmlize_for_view1(item, form, nav):
  14         do_stuff1(item)
  15         do_stuff2(form)
  16         do_stuff3(nav)
  17         return render_template(item, form, nav, new_arg)
  18 
  19     do_stuff_with_param()
  20     item_soup = Item.get_item_in_default_mimetype(param)  # [0]
  21     return process_soup(htmlize_for_view1, item_soup, some_form, nav_data)

The result of serialize(soup) will be a json like:

{
    "page": '+modify',
    "forms":
    [
        {
            'name': 'NAME',
            'fields': ['field1', 'field2', 'field3']  # with indication of requireness etc
        }
    ],
    "item": '<serialized item with serialize_rev from serialization.py>'
}

There is a number of problems in this solution.

Existing code. It doesn't allow to get form and not rendered item in one place sometimes and is smeared by a number of places
Time. I've missed this part unintentionally during planning and now realize that I can't squeeze it in my timeline.

Comments

Not sure why you are suggesting an api based on the "views" level, my idea was to just expose the storage api. Also, keep in mind that some items could be rather large, so the "just put everything into json" approach might not work.

The existing storage api only allows us to retrieve an item - not to create, modify, revert, delete or destroy it and we need these features for offline client. Or offline clients authors should know by themselves, what to POST/GET?

Also, meta of all revisions is required for offline client for history/reverting. Meta, but not the actual text/data.

I just don't want 2 things (views and network api) to require changes in case of something gets changed in the model - for example, a new field in the form or if a field (name) is changed. Also, edit forms now are not defined in one place, can we be sure they have the same fields names?

About large items. For them we can pass an url in json, where offline client can fetch if from. We can do so for all non-text items. Or pass such an url for ALL items - even for text ones.

So, the main question is - how much the offline client author will know about "how moin2 does it" and how change-prone the API should be.

Offline client wrapper

Reuse code from above in "push" variant. Code from serialization.py will be used for transfering items.

Other obligations

university stuff - I'll have exams from June, 10 to June, 31 and will not be able to write a lot of code. Because of it I'll start working on the project earlier and will spend more time in July and August. I have 5 exams, 4 are math-related -> important.
other job - none
other time consuming obligations - none

Schedule (deprecated)

21/04 - 20/05 - Bonding period - learning the code, experimenting with it, trying to create own mimetype, testing moin1 on its behaviour on syncing/merging, discussing some options with mentors.
21/05 - 27/05 - Branches, multiple parents
28/05 - 03/06 - base merging, interface for merging
04/06 - 10/06 - "unmerged" notifications, reverts
(exams) 11/06 - 20/06 - merge methods and templates for text
(exams) 21/06 - 01/07 - merge methods and templates for binary and targz
02/07 - 08/07 - Custom type for enumerating list of pages to sync, all required methods to display it.
09/07 - 15/07 - Adaptation of merging for syncronization, both sides
16/07 - (!)29/07 - Server part of pull/push
30/07 - 05/08 - Client part of pull/push + templates # will reuse a lot of code from server part, such as merging and transclusions/links lookup
06/08 - 12/08 - simplified views for offline client, will mostly use code from push/pull
13/08 - 19/08 - fixing bugs/UI on all items above, docs, howtos, missing tests.

Messages

Are you still working on your application? The stuff sounds interesting, but many TODO still. -- ThomasWaldmann 2012-04-03 08:02:38

how do you mean that precisely?:

queue1 init?

   1 queue1 = []  # or queue = multithreading.Queue()
   2 queue1.append('ItemName')
   3 while queue1 or queue2:
   4     do_stuff_with_queue1()
   5     do_stuff_with_queue2()

"get all items with revid > this"

Similar to '/+history/<itemname:item_name>'.

It is one of possible options to code. It's highly unlikely that I will code that, because we can have problems when something gets changed in the past revisions and we will have to retrieve a lot of unnecessary revisions.

Is it really necessary to change the serialization code to add login/password?

Hmm, right, not necessary at all - they can be sent as arguments with POST. Fixed that part.

What's the difference between transclusions and links when considering sync?

Items can be linked VERY heavily, we'll have to sync like the whole wiki in some situations. Look at the number of links on http://en.wikipedia.org/wiki/New_Zealand. Though I've though about that and an options of "depth" can be added. 0 - no links are retrieved, 1 - linked elements are retrieved etc. If transcluded items have some links too, transclusions are 0-level items and if 1 as set as level, transclusions links will be retrieved too.

How many exams do you have in the exam period?

I have 5 exams, 4 are math-related -> important.

Note: please don't put answers here, but just change your application text so it is clearer / has the answers.

MoinMoin: BorisBobrov/GSoC2012Application (last edited 2012-04-24 13:43:08 by BorisBobrov)