"""
MoinMoin - LookupPagesAndSort Macro
A line-oriented search macro over multiple pages, with sorting

@copyright: Jonas Smedegaard <dr@jones.dk>
@license: GPL

Based heavily on SearchInPagesAndSort
by Pascal Bauermeister <pascal DOT bauermeister AT hispeed DOT ch>

Updates:
  * [v0.3.4.3] Jonas Wed Nov 23 15:22:27 CET 2005
    * Ignore empty LookupText values (continue loop rather than break).
    * Adjust a variable name for a smaller SearchInPlacesAndSort diff.
    * Allow caching of the page (let's see if it causes any trouble).
    * Correct references to name of script itself.
    * Replace examples with ones that make better sense.
    * New feature: Pages can now be a group-page lookup: Prepend a "+".

  * [v0.3.4.2] Jonas Fri Nov 18 17:27:01 CET 2005
    * Replace SearchString (regexp) with LookupString (dict)
    * Simplify heading_text and keyval loops (always one per page now)
    * Decode UTF-8 input in regexp

  * [v0.3.4.1] Jonas Fri Nov 18 17:03:26 CET 2005
    * Add dict lookup. Syntax: @PN?Definition@
    * Drop NbSubs and MoreSubsText support

  * [v0.3.4] Pascal Sat Mar  5 17:53:08 CET 2005
    * MoinMoin 1.3.x _and_ 1.2.x compatible
    * Added arguments: Format, HeaderFormat and FormatSort
    
  * [v0.3.3] Pascal
    * Fixed a security hole (eval used for arguments parsing)
    * Added argument: ExcludePages=regex

  * [v0.3.2] Pascal
    * Use StringIO instead of cStringIO, for unicode compatibility

  * [v0.3.1] Pascal Sat Nov  6 16:03:01 CET 2004
    * Added NoText, RawText, NbSubs and MoreSubsText arguments

  * [v0.3.1] Pascal Mon Aug 30 21:27:36 CEST 2004
    * Corrected bug: did not work well with multiple pages hit.
      Bug reported by Craig Johnson.
      It worked in 0.2.x because one bug corrected another one...
    * If args are not a kw list (e.g. old macro form) inserts usage in html
      page (brutal, but we really don't want to support the old form any more)

  * [v0.3.0] Pascal Wed Aug 18 15:39:54 CEST 2004
    * macro arguments are now passed as a list of KEYWORD=VALUE
    * ACL is handled
    * new options: Reverse and NoHeader

  * [v0.2.4] Pascal Mon Jul 19 23:40:54 CEST 2004
    * Comparisons to None use the 'is' and 'is not' operator (nicer)
    * Use get() for dict lookup w/ default value
    * Do not quote args and retry to compile if they are not valid regexes
    * Corrected usage samples in the comment below

  * [v0.2.3] Pascal Sun Jul 18 13:45:46 CEST 2004
    Avoid endless recursion when matching page contains this macro

  * [v0.2.2] Fri Jul 16 14:43:23 CEST 2004
    * Use Request.redirect(). Thanks to Craig Johnson <cpjohnson AT edcon DOT
      co DOT za>
      and Thomas Waldmann <tw DASH public AT g m x DOT d e>.
    * No more unused imports.
    * Catch only expected exceptions.

  * [v0.2.1] Mon Jun  7 11:54:52 CEST 2004
    * options: links, heading
    * works now with MoinMoin Release 1.2 too

  * [v0.1.1] Wed Oct 29 14:48:02 CET 2003
    works with MoinMoin Release 1.1 [Revision 1.173] and Python 2.3.2

  * [v0.1.0] 2003/04/24 10:32:04
    Original version

----

Usage:
  [[ LookupPagesAndSort ]]
  [[ LookupPagesAndSort (KEYWORD=VALUE [, ...] ) ]]

Lookup 'lookuptext' dict definitions in pages matching 'pages' regex, and
sort the found lines (=hits) in this order:
  1) substring of the hit matching 'sortkey'; group same matches of
     'sortkey' by a header
  2) substring of the hit matching 'lookuptext'
  3) the hit itself

If no arguments are given, the usage is inserted in the HTML result.
Possible keywords:

  Help           = 0, 1, 2         Displays 1:short or 2:full help in the page.
                                   Default: 0 (i.e. no help).

  Pages          = 'PAGES REGEX'   Pages in which the text is sought. If
                   or              empty (default) search in the current page
                   '+PageGroup'    and defaults 'NoLinks' to 1. If starting with
                                   "+" then a single PageGroup page is looked up.
                                   Default: empty (i.e. current page).

  ExcludePages   = 'PAGES REGEX'   Exclude these pages (i.e. remove these pages
                                   from the list collected by 'Pages').
                                   Default: empty (i.e. don't exclude any).

  LookupText     = 'TEXT DICT'     To lookup definition in matching pages.
                                   Mandatory!

  SortKey        = 'TEXT REGEX'    Criterion to sort matching lines (=hits).
                                   Default: empty (i.e. no sorting).

  Heading        = 'TEXT REGEX'    Follow each hit by the text maching Regex,
                                   that preceeds the hit in its source page.
                                   Default: empty (i.e. no headings).

  UnassignedText = 'WIKI TEXT'     Header for hits not matching the sort key.
                                   Default: '[unassigned]'.

  Reverse        = 0 or 1          Reverse-sort the hits.
                                   Default: 0 (i.e. forward sort).

  RawText        = 0 or 1          Do not format found text.
                                   Default: 0 (i.e. formatted).

  Format         = 'STRING'        Explicitely format the output using this
                                   string, which can contain wiki formatting
                                   as well as these tokens:
                                     @KT@ : text matching 'SortKey'
                                     @LT@ : text matching 'LookupText'
                                     @FT@ : line of text
                                     @PN@ : page name
                                     @HT@ : heading text
                                     @@   : the '@' character
                                     \\n  : newline (of wiki source text).

                                   Each token can contain a regex acting as
                                   a filter for displaying the value, e.g:
                                     @FT:{[123]}@      displays the prio smiley

                                   Multiple groups can be defined, in which
                                   case the text matching them will be
                                   displayed, e.g:
                                     @FT:{[123]}(.*)@  displays text after prio

                                   Default: '' (i.e. auto-formatting).

  HeaderFormat   = 'STRING'        If specified, use this instead of 'Format'
                                   for headers.
                                   Default: '' (i.e. do not display headers).

  FormatSort     = 0 or 1          If 1, sort the output generated by 'Format'
                                   (if 'Reverse' is 1, reverse-sort). If 0,
                                   leave the output sorted by the 'SortKey'
                                   criterion (if specified).
                                   Default: 0 (i.e. no sorting).

  NoHeader       = 0 or 1          Disable showing the headers as subtitles.
                                   Default: 0 (i.e. show headers).

  NoLinks        = 0 or 1          Disable following each hit by a link to its
                                   page.
                                   Default: 0 (i.e. show links) or 1 if
                                   'Pages' is omitted.

  NoPageText     = 'HTML TEXT'     Text displayed if no page match 'Pages'.
                                   Default: an error message w/ Page regex

  NoText         = 0 or 1          Disables showing the found text.
                                   Default: 0 (i.e. show found text).

Keywords can be also given in upper or lower cases, or abbreviated.
Example: LookupText, lookuptext, LOOKUPTEXT, lt, LT, Pages, p, etc.

----

Sample 1:

  Given a page named 'AnInterestingBook':
        = A rather interesting Book =
        == Bibliographical facts ==
         Title:: A rather interesting Book
         Author:: A. Man
         Publisher:: Cool Publishing Corp.
        == Comments ==
        I really think that this book is worth a read.

        I'd even wanna lend out my copy if needed!

        == Status ==
         Owner:: Jonas Smedegaard
         Availability:: Lend out to Jack the Ripper

  ...and a page named 'AnotherInterestingBook':
        = Another interesting Book =
        == Bibliographical facts ==
         Title:: Another interesting Book
         Author:: A. Man
         Publisher:: Cool Publishing Corp.
        == Comments ==
        This is the sequel to AnInterestingBook - also worth a read.

        == Status ==
         Owner:: Jonas Smedegaard
         Availability:: Available - call me if interested in lending it

  ...and a page named 'AnotherBoringBook':
        = A boring Book =
        == Bibliographical facts ==
         Title:: A pretty boring Book
         Author:: Some Fool
         Publisher:: Lousy Publishing Corp.
        == Comments ==
        Don't waste time on this book.

        I was stupid enough to buy it once, but won't even lend it out!

        == Status ==
         Owner:: Jonas Smedegaard
         Availability::

  ...and the wiki setup to include books as dict pages:
        page_dict_regex =  u'[a-z0-9](Book|Dict)$'

  ...using the macro in a page named 'BookOverview' like this:
        = Known books =
        [[LookupPagesAndSort(pages=".*Book$", lookuptext="Title")]]

        = Book availability =
        [[LookupPagesAndSort(pages=".*Book$", lookuptext="Availability")]]

  ...will give this output (note: _text_ are links):
        Known books
          * A. Man
            * A rather interesting Book _AnInterestingBook_
            * A rather interesting Book _AnotherInterestingBook_
          * Some Fool
            * A pretty boring Book _AnotherBoringBook_

        Book Availability
          * Lend out to Jack the Ripper _AnInterestingBook_
          * Available - call me if interested in lending it _AnotherInterestingBook_


Sample 2:

  Given a page /MyDict containing:
        == Contact info ==
         FirstName:: Jonas
         FullName:: Jonas "dr. Jones" Smedegaard
         Phone:: +45 40843136
         Email:: dr@jones.dk
        == Photo gallery ==
         PhotoThumbnail:: http://dr.jones.dk/images/me/kp_bricks_thumb.jpg
         PhotoPortrait:: http://dr.jones.dk/images/me/kp_bricks.jpg

  ...the following macro call in another page:
        [[LookupPagesAndSort(lookuptext="+WikiEditorsGroup", DictPage="/MyDict", LookupText="Email", Format=" * @PN?PhotoThumbnail@ [mailto:@PN?Email@ @PN?FirstName@]\\n")]])]]

  ...will produce a list of images and email references for me and all other editors.
"""

# Imports
import re, sys, StringIO, urllib
from string import ascii_lowercase, maketrans
from MoinMoin import config, wikiutil, version
from MoinMoin.Page import Page
from MoinMoin.parser import wiki

before_1_3 = version.release < '1.3'

#Dependencies = ["time"] # macro cannot be cached

_recursions = 0
FAKETRANS = maketrans ("","")


class _Error (Exception):
    pass


def execute (macro, text, args_re=None):

    global _recursions
    if _recursions: return ''

    _recursions += 1
    try:     res = _execute (macro, text)
    except _Error, msg:
        _recursions = 0
        return """
        <p><strong class="error">
        Error: macro LookupPagesAndSort: %s</strong> </p>
        """ % msg

    _recursions -=1
    return res


def _delparam (keyword, params):
    value = params [keyword]
    del params [keyword]
    return value.decode("UTF-8")


def _param_get (params, spec, default):

    """Returns the value for a parameter, if specified with one of
    several acceptable keyword names, or returns its default value if
    it is missing from the macro call. If the parameter is specified,
    it is removed from the list, so that remaining params can be
    signalled as unknown"""

    # param name is litteral ?
    if params.has_key (spec): return _delparam (spec, params)

    # param name is all lower or all upper ?
    lspec = spec.lower ()
    if params.has_key (lspec): return _delparam (lspec, params)
    uspec = spec.upper ()
    if params.has_key (uspec): return _delparam (uspec, params)

    # param name is abbreviated ?
    cspec = spec [0].upper () + spec [1:] # capitalize 1st letter
    cspec = cspec.translate (FAKETRANS, ascii_lowercase)
    if params.has_key (cspec): return _delparam (cspec, params)
    cspec = cspec.lower ()
    if params.has_key (cspec): return _delparam (cspec, params)

    # nope: return default value
    return default


def _usage (full = False):

    """Returns the interesting part of the module's doc"""

    if full: return __doc__

    lines = __doc__.replace ('\\n', '\\\\n'). splitlines ()
    start = 0
    end = len (lines)
    for i in range (end):
        if lines [i].strip ().lower () == "usage:":
            start = i
            break
    for i in range (start, end):
        if lines [i].startswith ('--'):
            end = i
            break
    return '\n'.join (lines [start:end])


def _re_compile (text, name):
    try:
        return re.compile (text, re.IGNORECASE)
    except Exception, msg:
        raise _Error ("%s for regex argument %s: '%s'" % (msg, name, text))


last_request_h = None
last_pages_list = []

def _get_all_pages (request):
    global last_request_h
    global last_pages_list
    request_h = hash (request)
    if request_h != last_request_h:
        if before_1_3: all_pages = wikiutil.getPageList (config.text_dir)
        else: all_pages = request.rootpage.getPageList()
        last_request_h = request_h
        last_pages_list = all_pages
    return last_pages_list


# The "raison d'etre" of this module
def _execute (macro, text):

    result = ""

    # new args syntax
    try:
        params = eval ("(lambda **opts: opts)(%s)" % text,
                       {'__builtins__': []}, {})
    except Exception, msg:
        raise _Error ("""<pre>malformed arguments list:
        %s<br>cause:
        %s
        </pre>
        <br> usage:
        <pre>%s</pre>
        """ % (text, msg, _usage () ) )

    arg_text            = _param_get (params, 'LookupText',   None)
    arg_pages           = _param_get (params, 'Pages',        '')
    arg_excl_pages      = _param_get (params, 'ExcludePages', '')
    arg_dict            = _param_get (params, 'DictPage',     '')
    arg_key             = _param_get (params, 'SortKey',      None)

    opt_heading         = _param_get (params, 'Heading',      None)
    opt_unassigned_text = _param_get (params, 'UnassignedText',
                                      "[unassigned]")
    opt_reverse         = _param_get (params, 'Reverse',      False)
    opt_rawtext         = _param_get (params, 'RawText',      False)

    opt_format          = _param_get (params, 'Format',       '')
    opt_headerformat    = _param_get (params, 'HeaderFormat', '')
    opt_formatsort      = _param_get (params, 'FormatSort',   0)
    
    def_nolinks         = (1,0) [len (arg_pages)>0]
    opt_nolinks         = _param_get (params, 'NoLinks',      def_nolinks)
    opt_noheader        = _param_get (params, 'NoHeader',     False)
    opt_notext          = _param_get (params, 'NoText',       False)
    opt_nopage          = _param_get (params, 'NoPageText',   None)
    opt_help            = _param_get (params, 'Help',         0)

    # help ?
    if opt_help:
        return """
        <p>
        Macro LookupPagesAndSort usage:
        <pre>%s</pre></p>
        """ % _usage (opt_help==2)

    # check the args a little bit
    if len (params):
        raise _Error ("""unknown argument(s): %s
        <br> usage:
        <pre>%s</pre>
        """ % (`params.keys ()`, _usage () ) )

    if arg_text is None:
        raise _Error ("missing 'lookuptext' argument")

    # empty page means this page; subpage are also handled
    if len (arg_pages) == 0 or arg_pages.startswith ('/'):
        arg_pages = macro.formatter.page.page_name + arg_pages

    # get a list of pages matching the PageRegex
    all_pages = _get_all_pages (macro.request)
    if arg_pages [0]=="+":
        hits = macro.request.dicts.members(arg_pages [1:])
    else:
        pages_re = _re_compile (arg_pages, 'Pages')
        hits = filter (pages_re.search, all_pages)
    if arg_excl_pages:
        excl_pages_re = _re_compile (arg_excl_pages, 'ExcludePages')
        hits = filter (lambda hit: not excl_pages_re.search (hit), hits)

    if before_1_3:
        # check ACL now (since we may end up with no pages)
        if config.acl_enabled:
            me = macro.request.user.name
            def _check_page (page_name):
                page = Page (page_name) # too bad we must instanciate...
                return page.getACL ().may (macro.request, me, "read")
            hits = filter (_check_page, hits)

    # sort pages, check if we have pages
    if len (hits) == 0:
        if opt_nopage: return "%s" % opt_nopage
        else:
            raise _Error ("no page matching '%s'!" % arg_pages)
    else: hits.sort ()

    if arg_key is not None:
        key_re = _re_compile (arg_key, 'SortKey')

    if opt_heading is not None:
        heading_re = _re_compile (opt_heading, 'Heading')

    # we will collect matching lines in each matching page
    all_matches = []

    # treat each found page
    for page_name in hits:
        heading_text = ""

        # Set dict page to use for lookups
        if len (arg_dict) == 0 or arg_dict.startswith ('/'):
            dict_name = page_name + arg_dict
        else:
            dict_name = arg_dict

        # lookup text
        lookuptext = macro.request.dicts.dict(dict_name).get(arg_text,'')
        if not lookuptext: continue

        # text is found; now search for heading
        if opt_heading is not None:
            heading_match = heading_re.search (lookuptext)
            if heading_match:
                heading_text = heading_match.group (0)

        # find the sort key
        keyval = ""
        if arg_key is not None:
            keymatch = key_re.search (lookuptext)
            if keymatch:
                keyval = keymatch.group (0)
            else:
                keyval = opt_unassigned_text

        # store info
        item = []
        item.append (keyval)                          # key text
        item.append (lookuptext)                      # lookup text
        item.append (page_name)                       # page name
        item.append (dict_name)                       # dict name
        item.append (heading_text)                    # heading
        all_matches.append (item)

    # all pages handled

    # prepare some formatting text
    bullet_list_open = macro.formatter.bullet_list (1)
    bullet_list_close = macro.formatter.bullet_list (0)
    listitem_open = macro.formatter.listitem (1)
    listitem_close = macro.formatter.listitem (0)

    # now sort and format records
    if not opt_notext: all_matches.sort ()
    if opt_reverse: all_matches.reverse ()

    # explicitely-formatted output
    if opt_format:
        block = ""
        last_keytext = None
        rx = re.compile (r'([^@]*?)(@[^@]*?@)')
        pairs = re.findall (rx, opt_format+"@-@")
        if opt_headerformat: hpairs = re.findall (rx, opt_headerformat+"@-@")
        else: hpairs = None
        rx2d = {}
        for item in all_matches:
            keytext, text, pagename, dict_name, heading_text = item
            if keytext == last_keytext: plist = (pairs,)
            elif hpairs: plist = (hpairs, pairs)
            else: plist = (pairs,)
            last_keytext = keytext
            for p in plist:
                for txt, token in p:
                    txt = txt.replace ("\\n", "\n")
                    if not token: continue
                    token = token.strip ("@")
                    block += txt
                    rx2 = None
                    if len (token)>2 and token [2]=="?":
                        #FIXME: only cut out dict name here, and lookup after (non-hardcoded!) pagename is expanded
                        token = macro.request.dicts.dict(dict_name).get(token [3:],'')
                    if len (token)>2 and token [2]==":":
                        token, rx2 = token [:2], token [3:]
                        if not rx2d.has_key (rx2): rx2d [rx2] = \
                           re.compile (rx2)
                        rx2 = rx2d [rx2]
                    token = token.replace ("\\n", "\n")
                    d = { "KT": keytext,      "LT": text,
                          "PN": pagename,     "HT": heading_text,
                          "":   "@",
                          "-":  "",
                          }
                    if rx2:
                        tx = d.get (token, None)
                        if tx:
                            tx = map ("".join, re.findall (rx2, tx)) [0]
                        else: tx = token
                        block += tx
                    else:
                        block += d.get (token, token)
        if opt_formatsort:
            lines = block.split ("\n")
            lines.sort ()
            if opt_reverse: lines.reverse ()
            block = "\n".join (lines)
        result += "\n%s\n" % _format (block, macro.request, macro.formatter)

    # auto-formatted output treat records for output
    else:
        head_count = 0
        result = result+"\n" + bullet_list_open
        keyval = ""
        last_pagename = ""

        for item in all_matches:
            keytext, text, pagename, dict_name, heading_text = item

            if opt_notext:
                text_fmtted = ""
                if last_pagename == pagename: continue
                else: last_pagename = pagename
            elif opt_rawtext:
                text_fmtted = wikiutil.escape (text)
            else:
                # parse the text  (in wiki source format) and make HTML,
                # after diverting sys.stdout to a string
                text_fmtted = _format (text, macro.request, macro.formatter)
                text_fmtted = text_fmtted.strip (' ') # preserve newlines

                # empty text => drop this item
                if len (text_fmtted)==0: continue

            # insert heading  (only if not yet done)
            if not opt_noheader \
               and arg_key is not None \
               and keytext != keyval:
                # this is a new heading
                keyval = keytext
                if head_count:
                    result = result+"\n    " + bullet_list_close
                    result = result+"\n  " + listitem_close
                head_count = head_count +1
                result = result+"\n  " + listitem_open
                result = result+ _format (keyval,
                                          macro.request, macro.formatter)
                result = result+"\n    " + bullet_list_open

            # correct the text format (berk)
            if text_fmtted.startswith ("\n<p>"):
                 text_fmtted = text_fmtted [4:]
            if text_fmtted.endswith ("</p>\n"):
                text_fmtted = text_fmtted [:-5]
                text_trailer = "\n</p>\n"
            else: text_trailer = ""

            # insert formatted text
            result = result+"\n      " + listitem_open
            result = result + text_fmtted
            if not opt_nolinks:
                result = result + "&nbsp;&nbsp;&nbsp;<font size=-1>"
                if arg_text:
                    if before_1_3:
                        pageurl = '%s?action=highlight&value=%s' % (
                            pagename,
                            urllib.quote_plus (re.escape (text)))
                    else:
                        pageurl = '%s?highlight=%s' % (
                            pagename,
                            urllib.quote_plus (re.escape (text)))

                else: pageurl = wikiutil.quoteWikiname (pagename)
                link_text = wikiutil.link_tag (macro.request,
                                               pageurl, pagename)

                result = result + link_text
                result = result + "</font>"
            if opt_heading is not None:
                result = result + "&nbsp;&nbsp;&nbsp;<font size=-1>"
                result = result + heading_text
                result = result + "</font>"

            result = result + text_trailer + "\n      " + listitem_close

    # all items done, close  (hopefully) gracefully
    if not opt_format:
        if head_count:
            result = result+"\n      " + listitem_close
            result = result+"\n    " + bullet_list_close
        if not opt_noheader and arg_key is not None:
            result = result+"\n  " + listitem_close
        result = result+"\n" + bullet_list_close

    # done
    return result

def _format (src_text, request, formatter):
    # parse the text (in wiki source format) and make HTML,
    # after diverting sys.stdout to a string
    str_out = StringIO.StringIO ()      # create str to collect output
    request.redirect (str_out)          # divert output to that string
    # parse this line
    wiki.Parser (src_text, request).format (formatter)
    request.redirect ()                 # restore output
    return str_out.getvalue ()          # return what was generated
