= CodeBlockColorizer =

This page is about syntax highlighting, colorizing pre sections and inlining attachments with parsing (i.e. colorization). Based on Taesu Pyo's !BaseParser

= Integration in moin--main--1.3 =

 TLA Branch:: [[http://sky.rhein-zeitung.de/tla/ograf@bitart.de--2004-local|ograf@bitart.de--2004-local]]/moin--colorize--1.3
 Live Examples:: [[http://fp2.sky.rhein-zeitung.de/moin.fcg/FrontPage|Index]]<<BR>>[[http://fp2.sky.rhein-zeitung.de/moin.fcg/ColorizeTest|ColorizeTest]]<<BR>>[[http://fp2.sky.rhein-zeitung.de/moin.fcg/FormatTest|FormatTest]]<<BR>>[[http://fp2.sky.rhein-zeitung.de/moin.fcg/InlineTest|InlineTest]]

TODO (only in the branch):

 * adapt CSS style names to those of XEmacs font-lock-mode
 * make some nice CSS for the default themes
 * add a heading_shift argument to Include?

MERGED (to moin--main--1.3):

 * (./) backward compability for `#!python`
 * (./) add !CodeBlockColorizer
 * (./) add Taesu Pyo's !BaseParser
 * (./) change the whole thing to use `<span>` and CSS to format the code
 * (./) Languages like Pascal need a ignore case switch in !BaseParser
 * (./) !JavaScript switchable line numbers (default is no numbers)
 * (./) enable numbers from the beginning, optional `start` and `step` numbering parameters
 * (./) add extra arguments to the #format pi. everything after the first word (parser module to import) gets passed as `format_args` keyword argument to the `__init__` method of the Parser class. [[http://fp2.sky.rhein-zeitung.de/wiki/FormatTest|FormatTest]]
 * (./) extend parsers so they know what files they can handle (for inline:)
 * (./) finishing touches on CSS
 * (./) add `code_area`, `code_line` and `code_token` formatter methods to base
 * (./) change parsers to parse attributes with wikiutil.parseAttributes
  * accepted attributes: `start`=Number, `step`=Number, `numbers`=`on`|`off`
 * (./) if numbering is set to `off`, don't show the numbers initially but give the links to activate them
 * (./) add new startContent and endContent methods to formatters to get rid of the content div for text_plain
 * (./) fix text_plain list rendering (here? or in moin--main--1.3?)
 * (./) added default `Parser.extensions` handling: just use the string `'*'` to mark the parser as fallback handler
 * (./) cache extension to parser mapping in request.cfg (this is currently no caching)
 * (./) added starshine code_area CSS
 * (./) fixed cplusplus Preprc parsing and multiline spanning syntax display
 * (./) added extra parsers attribute value `numbers`=`on`|`off`|`disable`
  * disable does not show any numbers or JS numbering links
 * (./) make div IDs unique
  * (./) fixes MoinMoinBugs/ContentDivProblems
 * (./) macro improvements
  * (./) !FootNote (allow wiki markup)
  * (./) Include (MoinMoinBugs/IncludeNotCacheAware)
  * (./) !TableOfContents (MoinMoinBugs/TableOfContentsIgnoresIncludedHeadings)
 * (./) escaping needs to be done in the formatter
 * (./) fix content div top/bottom anchors (they are open)
 * (./) heading IDs for included pages sometimes do not match (no unique for auto-headers)
 * (./) fix back links of recursive Includes. See MoinMoinBugs/RecursiveIncludeBacktoIsWrong
 * /!\ recursive or multiple Includes will screw up a !TableOfContents. We don't support this at this time. see below.
 * (./) fix MoinMoinBugs/TableOfContentsBreakOnExtraSpaces
 * (./) fixed code_area linenumber switch ID problem
 * (./) fixed test_parser_wiki (did not reset request before parse)

Questions & Thoughts:

 * the current change adds an `extensions` list attribute to the Parser class. This enables the inline code to pick a parser for an file extension. Using `{``{``{` and `#format Colorize` lets you select a parser with extra arguments. But a thing like the !VimColor parser, which can do huge amounts of syntax highlighting needs some fallback config (use this if nothing else will handle). Or not?
  * following approach sounds nice:
   1. check if the `inline:` statement specifies a parser to use (`inline:FILENAME:PARSER`)
   2. if not, try to detect the parser by extension
   3. if not, try to use a configured `default_inline_parser`
   4. error (currently inline is ignored in that case, but we should show that something failed)
    * what about displaying a link to the attachment

= Alternate Highlighting =

Taesu Pyo's !BaseParser from ParserMarket is a possibility (and is currently used in the branch). But it is kind of hard labor to do all those parsers for all those formats out there. So why not use some existing syntax highlighting engine or definition?

== Reuse highlighting from some other project ==

 * write a compiler/converter/interpreter for vim's syntax files. This would give instant 500+ syntax highlightings...
  * vim's pattern (regex) syntax is different from the python re syntax. [[http://larc.ee.nthu.edu.tw/~cthuang/vim/files/vim-regex/vim-regex.htm|Here]] is a comparison to perl re syntax.
   * the two are quite different. not easy to convert for all possible operators/patterns. I'll concentrate on getting CBC using Taesu Pyo's implementation done, a vim syntax parser/formatter can be added later -- OliverGraf <<DateTime(2004-04-25T14:33:25Z)>>
  * currently porting Text::!VimColor to python -- OliverGraf <<DateTime(2004-04-26T06:18:58Z)>>
   * basically working. finishing touches tomorrow -- OliverGraf <<DateTime(2004-04-26T21:10:24Z)>>
   * oh, yes, I should work on this again... -- OliverGraf <<DateTime(2004-07-22T09:40:35Z)>>
    * [[http://fp2.sky.rhein-zeitung.de/moin.fcg/VimColorTest|VimColorTest]] :) -- OliverGraf <<DateTime(2004-07-22T11:57:37Z)>>
 * [[http://www.gnu.org/software/src-highlite/source-highlight.html|GNU source-highlighter]] external program, fast, not many syntax defs, hard to customize
 * [[http://sourceforge.net/projects/silvercity/|SilverCity]] it supports 20+ languages.

= CSS for syntax coloring =

Here is a module to manage the css styles for syntax coloring: [[attachment:css.py]]

  ''The example classes inside the python are the classes and subclasses vim uses. They are by no means a fixed thing and only to visualize the intended structure. See below for a discussion about what CSS classes moin should support for code areas.''

The classes are based on Oliver version, made by joining X/Emacs and Vim definitions, but its very bad as is.  A lot of the styles are useless and we have to group them in a way that make more sense, like put all stuff that usually displayed using the same color into one group. 

We can duplicate the behavior of mature applications like Emacs and Vim, or try to improve and simplify. I know that I use only 4-5 colors and I don't need most of the fine grain control that those apps have. Code with too many harsh colors simply does not look good and hard to read. Less is more in this case.

The benefit of a simple structure is that we can add many simple colorizers that use only the main classes both for parsing the code and for formating. Basic support for a lot of languages is better than great colorizer for the language that you don't need.

 ''Thats true. And it's the same as vim does implement it (I'm no vim user or fanatic, just in case someone asks, I'm all for XEmacs ;) ). Vim has a set of generic colorization classes. All syntax colorizers implement their own syntax 'names', but also provide a mapping to the basic classes, so you go well with configuring just these. I now managed to get the vim colorizer to output both names which looks like:''
 {{{
<pre>
<span class="LineNumber"></span><span class="Type diffNewFile">--- orig/MoinMoin/request.py</span>
<span class="LineNumber"></span><span class="Type diffFile">+++ mod/MoinMoin/request.py</span>
<span class="LineNumber"></span><span class="Statement diffLine">@@ -1141,7 +1141,7 @@</span>
<span class="LineNumber"></span><span class="Text"> </span>
<span class="LineNumber"></span><span class="Text">             @param header: string, containing valid HTTP header.</span>
<span class="LineNumber"></span><span class="Text">         """</span>
<span class="LineNumber"></span><span class="Special diffRemoved">-        key, value = header.split(':',1)</span>

<span class="LineNumber"></span><span class="Identifier diffAdded">+        key, value = header.encode(config.charset).split(':',1)</span>
<span class="LineNumber"></span><span class="Text">         value = value.lstrip()</span>
<span class="LineNumber"></span><span class="Text">         if key.lower() == 'content-type':</span>
<span class="LineNumber"></span><span class="Text">             # save content-type for http_headers</span>
<span class="LineNumber"></span>
<span class="LineNumber"></span>
<span class="LineNumber"></span>
</pre>
}}}
 ''As you can see the '''diff''' syntax has the special names `diffRemoved` and `diffAdded` which get mapped to `Special` and `Identifier` color classes, just to get the display different.''
 
 ''I think this is a very good way to implement this, cause it makes it useable for everyone using just the basic colorization classes. But if you are a ruby programmer (don't know if ruby syntax uses specials, just n example) you could add some extra css to make use of those special features, without the need of changing the code.'' -- OliverGraf <<DateTime(2004-07-25T06:32:18Z)>>

Maybe something like this - feel free to improve this list:

First, the default group, that use the default font and color of code fragments:

 (!) I prefer to not color these items, and color only the most important items, like language keywords. I think it looks better and easier to read. By marking these with a class, anyone will be able to customize his wiki css by adding rules for these classes. If you think that one of these should be in the "names" group in moin default coloring, please move it to the names group.

 These are not subclass of a "default" class, since they inherit the default style from the code div/table/pre item
 * operators - +, -, &, {, (
 * function - like in `def myFunction():`
 * class - like in `class MyClass:`
 * variable like in `total = 25; avg = total / num`
 * reference - I guess this means pointer to something like &var in C?

Below are classes and their sub classes that use different colors:

 * comment (I use dark green for these)
  * I don't see how we can add ignore/todo and like, they are all just comment in many format: Todo/todo/To do/ etc.

 * literal - data that you define in the program (I use dark magenta for some of theses)
  * float - 1.0 (float, double, etc.)
  * integer - 1 or 'c' in C (int, long, long long etc.)
  * boolean - like True
  * string - "like this str" or u'like this unicode'
  * docstring - it can be under comment, but its both a comment and program data

 * names - names used in the language and standard libraries
  * keyword like class, def, print, if, while, is, not in python, html, body, in HTML etc.
  * exception - built in exceptions like ValueError - its very handy and prevents errors like UnpicklingError...
  * builtin - names that are not part of the language, but are part of the standard libraries, like range, min, max etc.

 * preprocessor - this is relevant only to compiled code
  * include
  * define - #def, #undef etc.
  * macro - is this not the same as define?
  * conditionals - #ifdef etc.

 * debug - I'm not sure if these are relevant for non interactive syntax coloring
  * error
  * warning

 * diff - this is very important, so we can view patches inlined in pages.
  * added/new - lines with +
  * removed/old - lines with -

 * meta - except line numbers, I don't know if any of these is relevant 
  * line numbering
  * invisible-characters - like \r, \n, spaces, tabs etc.
  * breakpoints - I don't know if that relevant
  * bookmarks - same as breakpoints

Here is an example python code using default color scheme based on this list: [[attachment:color-test.html]]

= TOC & Include =

== Problems with multiple or recursive Includes ==

Recursive and multiple Includes are a pain in the a**. They never can work with caching (in case of full page includes) cause the second include will get the same heading ids as the first. Solution: prefix ids with content-ID (its unique per default in this branch) and disable caching for ALL includes.

'''To make this more clear:''' there is only one cached copy of a page. This is the whole page, so from/to includes can't use the cache. If a page includes an other page multiple times, at least the second include will use the cached copy, inclusive all IDs (cause they are passed as arguments to the formatter). So double includes always break IDs if caching is used. 

'''A possible solution is''' to put ID generation into the formatter. The text_python formatter has to uniquify the IDs, so the cached copy will always output unique IDs. But this will make everything harder for Include & TOC, cause they have to use the same method to generate the hrefs to the headings...