Making a plan about doing 2.0 in small steps
Maybe doing 2.0 in small steps is the way to reach it faster. Just collecting items now, sort into right sequence later. Each section should contain some limited amount of work and should begin and end with a usable state. Some stuff is already partly done, but maybe needs cleanup/refactoring.
Introduce mimetype parsers / formatters
implement wikiutil.MimeType class. As the standard mimetypes are a bit inconsistent, we use own "sanitized" mimetypes internally.
Use some internal mapping from #format xxx to sanitized mimetype.
depending on page mimetype, choose parser. rename parsers so that mimetype.module_name() == parser_module name.
"show" (default) emits parsed, formatted output within the content area
"raw" just emits mimetype header and raw content (kind of "null" parser), no wiki "frame"
do the same for output formatters, using some &mimetype=xxx (default text/html) url arg and quote(mimetype) == formatter_module (action=show&mimetype=xxx replaces action=format&mimetype=xxx)
Storage
Cleanup / making it easier
remove hierarchical page support (rootpage, getPagePath & Co) will simplify stuff, we don't use it anyway - The following 2 items don't need to be done if we don't either don't remove the current storage code from Page (esp. not getPagePath and everything needed for it) until we really don't need it any more or if we fix all the code referencing it:
- make cache storage independant of page storage
- make attachment storage independant of page storage
User storage changes
- make user data stuff easier by handling it by existing code and the same way as pages (might degrade performance temporarily, but makes modularization easier):
move trail to session object
move user RC bookmarks to user profile
Before moving anything, I strongly suggest to get the API right first. Because the API should be abstract and therefore the caller should not mind how the data is stored. This is not a problem IMHO for the user data. -- AlexanderSchremmer 2006-09-02 21:26:13
After those steps, we shouldn't have to care about users and attachments for a while.
Modularize
Write some short abstract classes (AbstractItem, AbstractItemRevision, ...)
- This is a design step - it is what this whole idea is about. The result needs to be generic so that the methods can be implemented by other "storage providers" as well.
- Move current storage code into subclasses of those abstract classes
- The data/underlay stuff should be generalized and go into a generic "Layering" layer.
- Write some methods of the Page class to call that new storage class
- Fix the attachments action to work with the new storage code
Result after these steps: the API has proven that it is abstract enough to be able to work for this legacy storage format. Implementing the new system should be rather trivial (no "workarounds", no old code) now. Note that we have a working MoinMoin now that has only lost compatiblity to code that uses very internal Page functions (easy to do the transition, no barriers for users and developers).
Now it is time to think about the new storage format - this is not about changing, though. It is about implementing the classes because the actual change can be done a generic converter that just copies all itemrevisions without knowing any internals (never write any structural migration scripts again :-)).
Change layout
- change storage code to use revision directories instead of files
- it would be possible to keep "files" and "revision directories" as two plugins in the short term
- have one specific revision directory "current/"
- have a "{00000001,00000002,...,current}/data" file with the revision's content
have a "{00000001,00000002,...,current}/meta" file with "rev: <integer>" in it, so we can easily find out the revision of the current stuff
- keep attachments directory for now
- write converter for 1.5.x storage to new storage API (as it calls just the API, the target format can be anything)
Add meta
Page content meta
move stuff from page's current CONTENT (page mimetype, acls, language, ...) into <revisiondir>/meta (see also MetaDataInMoin)
- provide simple editing for metadata (action=editmeta, use plain text editor) or alternatively merge metadata and page content into editor data and split after editing again
- do not touch edit-log here, it also has attachments stuff in it!
Search / TitleIndex / WordIndex
- restrict search in items to some hardcoded set of mimetypes (default text/moin-wiki or text/* to search pages only) because we introduce binary items into item namespace in the next section
xapian can search for specific mimetypes
restrict TitleIndex / WordIndex to same mimetypes
- maybe have text/moin-wiki.help and text/moin-wiki.system (or similar) for help and system pages so we can do the filtering needed for them in a similar way
Items
- implement "empty item" base class (offer file upload, creation by editor)
- implement "application/octet-stream item" (+ download, display some file infos)
- implement "text/* item" (+ text editor link, ...)
- implement "text/moin-wiki item" (+ gui editor link, ...)
Implement Drag'n Drop into the UI additional to the fileuploader or as replacement
Replacing attachment code
fix underscore ambiguity (attachments often have both _ and blanks) - TODO: write conversion script for data_dir content
- implement "image/* item" (+ display image, display image infos)
- implement "application/twikidraw item" (subclass image/* item, offer java editing)
- add "attachment:" and "inline:" emulation using items and other compat. code
- or provide conversion script to new markup
- write migration script moving attachments into subitems
edit-log to meta
- split the local edit-log stored in the pagedir into appropriate entries stored in each revision's meta file
until above works, keep the global edit-log as is for RecentChanges
now care about global edit-log (in 1.5 we have information duplicated there because it seemed more efficient to process by RecentChanges action)
- alternatively, the logfile for RC could have just some pointer (pagename and revision of the changed page) appended at the end.
- the other items (editor, ip/hostname, comment, ...) could be looked up in the local meta data of that page revision.
this is no performance regression because we have to open the page metadata anyway to check for ACLs and existance of the page.
This would make the global edit-log much smaller.
- alternatively, we could completely drop the global edit-log and put those pointers into a cache
- the cache would contain a pickle of an ordered sequence of (timestamp, pagename, revision) tuples
- every day or week could be a separate cache file, making purging (or just not having to load) old stuff easy
- reading the cache would be very fast
- updating the cache would be fast and low-risk (unlike the global edit-log)
- (re)building this cache would be slow because we have to iterate over all pages and for each page over all revisions created in the interesting time frame
- having the stuff in the cache would be better if we extend RC to multiple levels some day (like sub-wikis), because we don't need duplicated authoritative data at each level then
- alternatively, the logfile for RC could have just some pointer (pagename and revision of the changed page) appended at the end.
Search
- implement some "advanced search" UI for being able to give the set of mimetypes to search through
partly done for single mimetypes
- the set that was hardcoded (see above) shall become the default selection there
Later
Change layout to be hierarchical
Converting to a hierarchical storage layout is completely optional (in case we see that the itemlist gets too long), alternatively it would be possible to have some high level code doing mass renames.
I strongly suggest the latter because I do not see any advantages of having such a hierachical scheme. (it is slower in the average case, more complex, you more easily reach path limits, ...). Putting a loop into the rename function of the storage layer is not really a big problem -- AlexanderSchremmer 2006-04-20 21:41:04
- +: it does an atomic rename of an item with all sub-items with zero implementation cost for that
- This is not true on Windows, you will need the reader-writer-pattern there or use busy waiting.
- +: it will speed up pagelist operations as sub items won't be listed on upper level (and we will have lots of attachments as sub-items then!)
- OK, this might be a valid reason.
- 0: when using "/sub/" path for subitem storage (instead of "(2f)" now), the page name limit will be 1 byte less per level, that shouldn't be a reason against it (we could even use "/_/" for using less bytes)
- -: it is more complex, as it isn't flat storage any more
- +: it does an atomic rename of an item with all sub-items with zero implementation cost for that
In case we do it in the hierarchical way:
- store subitems in a directory below the parent item (has to be done at some time between first layout change and converting attachments to subitems - or attachments will lose connection to parent item when that is renamed).
- acl processing could be changed to work by hierarchically traversing the items (this is optional)
- when doing this, ACL settings for wiki virtual root page inherit from config ACLs
- we could even think about inheriting about all config items (optional, can be done later)
Use YAML
We could use YAML for:
- structured automatically processable content (use text/x-yaml mimetype for them)
- could replace *Dict and *Group pages long term
- lots of other uses maybe
- item metadata (check compatibility with ACL syntax)
- user profile
- either at separate storage place, or:
move user quicklinks from user profile to UserName/QuickLinks, protect by ACLs
move user profiles to page storage UserName/ProfileData (use WikiDict?), protect by ACLs, ... (if too slow, use caching)
