Sunday, June 22, 2008

Threading in mailnews

I was cleaning up my WHATWG mailing list folder—a task which mostly involves looking at the subject of a message and deciding whether or not I cared to keep this piece of correspondence—when I thought about how threading interacted. If you haven't subscribed to this mailing list (which I doubt most readers have), the main WHATWG author (Hixie) writes a message which is a reply to several messages at once.

A brief aside, if I may: for completely unrelated reasons (responding to a new bug which turned out to be a dupe, stupid me for checking validity for duplication), I was perusing RFC 2822, specifically the In-Reply-To field (§ 3.4.6). Interestingly enough, the case of a message having multiple parents is quite well-defined in the spec (and Hixie violates the spec on this point). I did a brief check of the code on this point, and the code will handle the theoretically correct case fine (using In-Reply-To in lieu of References, which is not quite correct, but works for the purposes of threading).

Anyways, the thing that caught me the most was that I often cared more about Hixie's catch-all reply than the earlier message to which the reply had been attached. In essence, I wished for the ability to reroot threads. I thought a little more, and listed other threading enhancements I wanted. But there already is a mammoth chart of threaded view issues—see bug 236849 for a sublist of many of these.

At the core of threading, one can distinguish several levels of threading. The basest is none at all; this is represented by turning threaded view off. Second is relying on subject: one can only tell that two messages are related by this methods, but not which is a reply to the other. Third is typical threading, relying on In-Reply-To and References, which works well. Fourth is what I like to think of as über-threading: parsing the message text to determine the quoted replies and use that to determine the parent of a message. Fifth, and highest, is the ability to redefine threading as the user sees fit. Note that most of these are orthogonal, so that one can have a combination of the inner three to determine a thread's parent.

The utility of redefining your own threading is hard to over-state. How many of you have received email where people blithely hit "Reply to All" and start a new message like that, but others in the same category legitimately use reply features? I myself have one thread like that composing 20 different real threads. Other times you hit those cases where someone one a borked client (*cough*Yahoo!*cough*) and someone changes the thread subject, or a confluence of mailing lists and forwarding and replies (four threads where one is warranted, again in my inbox).

There are touchier areas with respect to threading. For example, the notion of subthreads is powerful (there are RFEs to implement practically every "Apply xxx to whole thread" as also an "Apply xxx to subthread"), but it is a pain in the backend, not least of which is the fact that we have some other bugs inducing loops into the thread hierarchy there. Similarly, the question of what do with multiple parenting (both how to represent it and how to generate it) can be touchy on the UX end. A final thorn I would like to specifically direct your attention to is the idea of dummy thread headers, as referred to in jwz's algorithm, the seminal work on the matter (ignore his anti-NS 4 rant, however, he lives in the glory days of NS 2).

On the other hand, don't expect me to implement any of these improvements soon, nor anyone else for the matter. I merely wanted to express my opinions as Thunderbird drivers debate UI on a higher level, with a tendency that seems to be somewhat towards ignoring some of the finer aspects of good message threading. Ah well....

Wednesday, June 11, 2008

Documentation in Mozilla

Having worked in depths of poorly documented, just plain undocumented, or, worse yet, misdocumented code, I have started taking some initiatives on documenting code. Working with db48x, we have improved some of Mozilla's documentation framework (achievable with make documentation). I'm still polishing the fine edges of bug 433206, but what's in there should be sufficient to make spiffy documentation. The other important component of fix comes from doxygen bug 535379, a simple fix that handles Mozilla's IDLs better.

There's still more to go. There should probably be an official documentation guide for mozilla or at least the components. Someone patching up SVG and dot in doxygen would be helpful, especially the annoying URI mistake.

But the important part is how to document code. At the moment, the class list is provided in a 5-by-several hundred line table containing every IDL file and all exported headers. Wondering about how to do some IO foo, but don't know where to look? Right now, your only choice is to go through this entire list, guessing at names that would produce the right magic. Ideally, however, the documentation would include separate modules that make querying easier. However, before I make a commitment, I need to investigate how namespaces interact with doxygen for best results.

So, the important question is basic documentation. Doxygen's manual is a good starting point, but I'll brush up on basics. Documentation is signified by, alternatively, /*!, /**, /*<, ///, //!. or /*<. The ones with < in the names are used for post-documentation. A comment consists of a brief description (one-sentence, punctuated by a period), followed by potentially several paragraphs of almost-HTML code (doesn't have all HTML tags). Interspersed, though typically at the end, are doxygen tags, denoted by your preference of @ or \ (the majority of code uses @, just to warn you).

To describe all the tags would be arduous and pointless. Common ones are exception (the nsresult values), note, param, and return. See should probably be more common as well. Links to other documentation can be generated by providing the fully-qualified member name, e.g. nsMsgFolderFlags::Directory. Code can be further grouped by using the name tag and @{...@}. The latter signifies a group; one can also distribute comments across multiple members using the format.

More advanced documentation that might be helpful: lists (you can use HTML tags or -, #, and indentation, to represent unordered lists, ordered lists, and nested lists in said order). Formulas can be specified in LaTeX format if you really need it. Message sequence charts can also be generated, as well as generic dot diagrams as well, in addition to the ones doxygen generates for you. But the documentation pages can never be better than the sources from which they are derived...