@q file: notes.w@>
@q%   Copyright Dave Bone 1998 - 2015@>
@q% /*@>
@q%    This Source Code Form is subject to the terms of the Mozilla Public@>
@q%    License, v. 2.0. If a copy of the MPL was not distributed with this@>
@q%    file, You can obtain one at http://mozilla.org/MPL/2.0/.@>
@q% */@>
@** Notes to myself ... Decisions.\fbreak
@*2 Evaluate if extern ``C'' should be used in |Set element compare functor|.\fbreak
Cuz its a closed system, there is no need to make the C++ functor global
for other languages. So remove "C".

@*2 Cleanup from failed parallel parse.\fbreak
As the local parallel parse does not affect the 
parser requesting parallelism, there is no save/reset action needed on its
token stream variable |current_token__| and position.
So remove the paranonia code.

@*2 Verfiy if all successful threads consume a token even if its{...}.\fbreak
just a remapper on the current token.
For example in the Pascal translator, the lookahead token
might need a re-verification by the symbol table across all scopes.
So call the thread who tries the remapping and returns the result
be it the same or remapped.

Now what. Is result in terms of processing the token stream and the new lookahead?
I got it from the grape vine that... yup. As per normal --- consumption takes place!

@*2 Manual arbitrator how does it work?.\fbreak
It's a proxy just returning the 1st token in the accept queue. 
|AR_for_manual_thread_spawning| is a canned proxy arbitrator for this purpose.
There is no judging code. It's a teflon special --- nothing sticks to it; just
pass back the first item in the queue.
|spawn_thread_manually| function sets up this default.
Corrected the |call_arbitrator| who originally jamed the first parm into the accept queue.
Now, call the arbitrator given for both types normal and manual threads.

Though arbitrator function is a single procedure
for that state configuration, it must service all the nested threads
with this configuration.
I still use the msg as a parameter for calling the function.
It makes things simpler and consistent: generic parameter passed that needs casting to its
real self.
Note: arbitrator is not multi-threaded as there is only 1 copy of itself
but it is re-entrant. So when two or more competing nested threads 
require its services, I leave it up to the operating system 
to deal with parallelism. It probably throatles back to single process but
how many situations are there that use nested competing parses of 
same grammatical expressions?

@*2 |Ccm_to_ar| message needed?.\fbreak
I ask the question in light that an arbitrator is a global procedure and not
a thread. Yes it is needed as it containes the info to arbitrate.
Like what? The cm providing the accept queue for review.
Should the parm be a message type? No, but it keeps it simple Dave.

@*2 Why (CHARP) instead of |Cparse_record| definition in{...}.\fbreak
the |reduce_rhs_of_rule| function?
Well back in time, u got it, Microsoft's compiler was a honking. 
So if you look at the generated code for a concrete |reduce_rhs_of_rule|,
you'll see how it games itself down thru the stack equating the subrule's
parameters in LIFO. 
Does it still hold this quirk? Don't know until I retry. At the moment,
I have too many other things to complete.

Well I'm bitting the ??? to make things faster.
Rewrote the stack and corrected for speed the emitted code of
the rhs subrules.
eliminate (CHARP):\fbreak
3 Oct. 2005\fbreak
Added rule recycling to speed up parsing due to the rule's birth-run-delete cycle.\fbreak
June 2007\fbreak

@*2 Why nil ptr test in |T_11|?.\fbreak
Originally some symbols pushed onto the stack were zeroed out
to protect from abort cleanups etc.
This situation does not exist anymore. So rid it ghost busters. 

@*2 Clean up parallel parse in control monitor instead of grammar{...}.\fbreak
requesting parse.
 It's just cleaner and closer to the action.
Here are my original thoughts.
Some house keeping is done.
The cleanup is to pop the  \PARshift{} symbol from
the attempted parallel parses. It could have been done
in the control monitor who was the creator of this but
I felt that spreading this cleanup to the control monitor was potentially
spreading the mess.\fbreak
\fbreak
Dictum: keep the effects' cleanup as close to the affect. 
Is this an Occam?

@*2 Conversion of control monitor and parallel parse code.\fbreak
This is the injection code included into the outputted grammar modules from Yacco2.
Conversion cleaned up dregs from cm handling of a ${\vert?\vert}$ dynamic code request.
A thought of minimal value where there are other means better to cope
with this type of situation.
Now what is this situation? How do you cope with a parsing situation like the
syntax directed code that needs parsing? There is no assigned set of grammars to
properly parse the \CPLUSPLUS/ code. So, do a dynamic parse looking for a dynamicly
calculated lookahead token
to stop the parse-by-character situation.

Now the good stuff. The |cweb| worked first time both in the control monitors and parallel
grammar threads. Let the applause begin.

@*2 Why is there an abort attribute in the parse stack record?.\fbreak
If there is a symbol on the  parse stack with `affected by an abort parse' turned on,
the cleanup of an aborted parse will delete the symbol like an ``auto delete'' when
it pops the parse stack.  

@*2 Make all yacco2's types, structures etc housed within yacco2's namespace.\fbreak
The `INT' type is also used by Microsoft. So, add `yacco2::' qualifier to all
definitions and implementations. This way there is no conflict of interest when
porting to other environments.

Correct also the implementations to be qualified by namespace yacco2. 
There are 2 ways to do this.
Firstly, be explicit per implementation. Secondly, enrobe the implementations
with a namespace `${\{}$' ... `${\}}$' construct. To each their own ... you'll see both
approaches depending on my mood.

For the moment, files |wcm_core.h| and |wpp_core.h| are not explicitly qualified by yacco2::.
This allows the old current code that uses this to be compiled until the |cweb| 
version is completely finished. 
The current system does not
include everything within the yacco2 namespace.

@*2 Make enclosure of namespace yacco2 explicit in implementation part of code.\fbreak
Eliminates assumptions. |@<bns@>| and |@<ens@>| bracket the code to be housed
within yacco2's namespace. 
All implementation code contains this start/stop definition.
The code |wcm_code.h|, |wpp_core.h|, |war_begin_code.h|, and |war_end_code.h|
are just that snippets and so are contained within another implementation. 
They still use |@<uns@>|.

@*2 The old version of terminal enumeration:.\fbreak
The terminal alphabet is represented by the whole numbers both positive and negative.
Both errors and regular terminals are open ended in
their expansion capabilities as they are the left and right end points 
in the terminal enumeration scheme.
Error terminals grow towards minus infinity while the regular terminals expand positively.
The balance or pivot point of the terminal alphabet is `eog' that starts the meta terminals. 
Meta terminals are indicators of parsing situations like end-of-token stream reached,
parallel parsing to take place, to different wild type shifts.
None of these meta-terminals are found within the input language being parsed.

The |Base_enum_of_T| parameter of `fsm' is the starting point of the enumerated terminals.
Due to the current enumeration scheme, its
value is required to map a terminal's enumeration id into a set's co-ordinates.
This is a bit of a hack as each grammar contains this starting point. The hack comes
about from an out-of-sync condition when new errors 
affecting this start point has been defined and 
all grammars have not been recompiled and passed thru Yacco2's linker.
The consequence is the parser when run will have strange things happen because
of the wrong enumeration mapping to the terminals that are buried in the old
finite automaton's tables.
Trust me, I'm the guinea pig. Regenerate all the grammars.

Raw characters represent the mapping from the 8 bit ASCII character into its
raw character terminal.
Error terminals are internally generated situations produced by the parsing grammars
manufactured by the grammar writer. They indicate the appropriate faulty situation detected.
Regular terminals are composites. They get created by the grammars from streams of other
raw character terminals or composite terminals. 
They are evolutionary and come into existance from various passes made on
the token streams: lexical to syntactic to semantic.

Reason to change:\fbreak
Why this type of mapping instead of the positive integers?
Reality is there is no difference apart from using the range of numbers and how they expand. 
Both meta and raw character terminals are constant in size. It is the
other two types that expand or evolve as one is developing the language recognizer.
Either way of enumerating the terminals, when an error or a new regular terminal
is created, all the grammars need to be regenerated due to the change in the lookahead sets.
Hindsight critiques that a start seed buried in the grammar's finite state automaton definition 
is required.
So get rid of it!
The better design is to enumeration from 0.
This eliminates the mapping from the negative space into the positive
space of the set co-ordinates.

Take 2: Here is the new mapping: meta-terminals, raw characters, errors, and finally
the regular terminals.
There is no need to map into the positive space 
before calculating a terminal's lookahead set co-ordinates. 
Just use the enumerate value to translate to the set's partition and element!

@*2 Tree token template container.\fbreak
Well let's try passing references instead of pointers. I hope
that the compilers are kinder to me within the threaded environment.
This certainly saves alot of constraint checking. 14 Oct. 2004.

@*2 Add in Yacco2 arbitration requiring code on the possibility of{...}.\fbreak
2 or more terminals in the accept queue with no arbitration taking place.
That is, it defaults to the first terminal in the queue.
The compilation check requires the checking of their first sets for the common prefix condition.
At the moment, this does not take place due to the yacco2 / linker loop.
Yacco2's linker generates the transitive first sets for the threads that call other threads.
So this check is is a post condition beyond the compiler/compiler. 
At present, Yacco2 issues a warning and
use at your own risk.

At runtime, there still needs a look-over-your-shoulder throw condition. 
This will be implemented
in the arbitration code. This is done --- 26 Oct. 2004 in Yacco2 generator. 
There is an optimization done before the throw code is appended
to the arbitration thread:\fbreak
\ptindent{1) more than 1 thread must be dispatched --- thread with a name: NULL name bypassed}
\ptindent{2)no arbitration code supplied by the grammar writer} 

@*2 Rework of thread management.\fbreak
At present it is spread between the global implementation of independent 
methods and the table of spawned threads,
and the worker thread record structure.

@*2 To check: does stop msg have wait/reply mechanism?.\fbreak
In the shutdown? no.

@*2 Change tree container to a specialized version of |tok_can<AST*>|.\fbreak
This makes things more consistent. Now, all u see are specialization containers.
So why did u not do it in the beginning?
This container was an after thought.
It was written to support a Pascal translator to re-target a preprocessed Pascal variant using
Oregon Software's compiler to Dec aka Compac aka HP Pascal.
As there were special extensions to the Oregon Pascal, a complete front end compiler
was needed to build a source tree of the program so that the source code could be morphed. 
There were lots of sinning go on.
Well the outcome was this family of tree walkers and container.
So what! Why did u not write a template specialization? Probably too deep into
getting it done without the thought to whether it has any generalization.
The other containers using string and ifstream did specialize but... 
11 Nov. 2004. Now to correct the grammars that use the old container |tok_can_ast|.

@*2 Eliminate the control monitor.\fbreak
The middleman is too expensive as a thread due to
the current threading model.
This helps 
in optimizing the run performance of Yacco2. To do this meant moving all the responsiblities
of the control monitor into the grammar requesting parallelism. 
This plumbing is within |Parser|. Part of the demolition meant throwing out the messages 
between the various components --- pp between cm between th.
Now the message is the media or is it the |Parser|?
The requesting |Parser| just passes itself to the grammar threads. 
It contains the pertinent token stream variable: token and position (current values) 
within the stream, 
and all the token containers --- supplier, producer, recycling bin, and the error 
container (refuge shelter).
Also removed was the distinction between the containers --- parallel versus monolithic.
As parallel grammars just graft onto the current token scene, there is no need to make
the distinction except at their start up time that grabs 
their containers' addresses from the spawning
parser. 
They are just readers of the tokens and not writers.
Now what about error tokens?
They should not be added to the error queue but should be passed back
to the calling grammars within the |Caccept_parse| object.
The arbitrator of the calling grammar determines what should be done.
If u need to add to it then use the guard dog approach or is it the drake?
``i get no respect'' so choose your mutex before doing your thing.\fbreak
  
Done 23 Nov. 2004. Performance gain: 30 percent.

@*2 Eliminate |pp_support__| as a thread optimization.\fbreak
All info in now contained in |Parser|.
Depending on how the thread is started --- monolithic or parallel,
the appropriate parse containers are imported either thru the
contructor or via the passed parameter.

@*2 Another thread optimization.\fbreak  
If only 1 parallel thread asked to perform, one does not need
to acquire / release the lock of the requesting grammar to report
success or failure.

Look I'm trying  to make threading closer to recursive descent in performance.
 Date: 3 Dec. 2004.  
Well I'm crawling out of the swamp... darwinism? If there is just one thread to be run, 
why not call it as a recursive descent procedure instead of the thread route.
We'll see what the cost of thread modulation is against the procedure call 
approach and its object creation / destruction overhead.
Take 200.1... 9 percent run improvement of procedure call over threads. 

@*2 a N * 2.\fbreak
Eliminate the number of times that the token container is read does miracles.
Now let's look at my myopia. 
There was a single pass, call it P1, to break up the character stream into line segments
followed by the lexical segment called P3. Why? I was lazy and wanted all down stream
tokens to be properly tagged in file no - line number pairings. Why lazy? 
The P1 pass ensured that the tokens where properly GPSed. I did not have to deal with
the vagaries of ``how is a line delimited?''. It was handled in one place: the ``eol'' thread,
 and could be retargeted to other dealings. 
Now the logic is hardwired for now to the ``line-feed'' definition
based on Ascii encoding.
By combining the 2 passes (P1 + P3), the number of reads on a N  character stream is halved.

Now lets look at the raw character to symbol translation. 
Again this is a 2 traverse mechanism that reads 
from a file its characters that are translated into symbols.
It should have been a just-in-time read like the tree traversals.
Each character request fetches the character from the file and then calls the character 
translator to do the cosmetic make over. 
This definitely improves the ``file include'' process.
This is a reduction from  37 seconds to 15 seconds. Not bad: a 2.something zinger.

Now for the overhead of raw caharcters to symbol objects. Judging from the cursor winking,
this could be another 10 second improvement. Wait and see...
Ladies and gentlemen and the winner is ... 37 seconds down to ??? 
Maestro the envelope please. 15 seconds! 
A 22 percent improvement against the 100 second starting point but
2.something faster against the 37 seconds. Slimefast ain't got nothing on us.
As the song says --- looking for xxx in all the wrong places.

Now what about the cost of symbol creates and std::map usuage in the thread library
and the garbage collector? 
I'll  see what I can do. I must approach the recursive descent speed zone or this 
thought experiment on parallel parsing is just that --- religated to the empirical
sidelines. A second string something and excuse the pun.  

@*2 Remove |unique_id__| from |CAbs_lr1_sym|.\fbreak
It's original purpose was a birthing number
to give a count to the number of symbols produced and as a partial order.
Never used so out damn
thoughts! Dieting and speed is in.

@*2 Okay guys Yacco2 is starting to smoke.\fbreak
Here's another improvement. Firstly I was looking in the wrong places: String
copy was thought to be a major cause but it turns out that its a minor
overhead. Globalization of the character storage is good at the cost
of saftey but not a really really big stopper.

So here's the scoop: First set evaluation goes thru INDIVIDUALLY each potential thread
contained in the state's configuration list.\fbreak 
If there are many potential threads to-be-run assessed on a per character basis
--- ouch. All one has to do is gather the threads into a consolidation thread
to have only attempted pass on the first set of the consolidation thread.
Yacco2's linker consolidates this first set of referenced threads.
 If the threads are orthogonal to one another
 (there is no common prefix), then the single first test lowers the cholestoral levels.

With this insight, now to modify the grammars like: pass3, lint, syntax directed
code gatherer etc. Jan. 1/2005. Well this had limited improvement. Not what was expected so 
see |Global Parallel table entry| where it explains how Yacco2's linker became involved.
Jan. 6/2005. Speed improvement --- ???.

@*2 Slim down the |CAbs_lr1_sym| space.\fbreak
This is the base component to all other symbols.
Originally I had associated the parser across all symbols: Terminals and Rules.
This fattened the space by 4 bytes. With a shrinking of some variables to short integer
and unionizing the rule's variables, I brought down the space bloat from 36 bytes to...24 bytes.
So what? Well, this allows more raw characters to be stored in a prefixed array 
rather than a template container.\fbreak 
3 Jan. 2005.

@*2 Grammar as a logic sequencer: Allow no token containers.\fbreak
What type of improvement is this? By passing in pointers to the parser, 
does this not open
up more programming mistakes? Could but hear my reasons please.
This lets the grammar writer program grammars as logic sequencers using
epsilon rules and related syntax directed code.
If the writer is very creative, behavioural terminals could be defined and 
put into a token container for parsing: each to their own. 
See |enumerate_T_alphabet.lex| as an example of this use.\fbreak
15 Aug. 2005

@*2 Logic bug: same accept token added to accept queue more than once.\fbreak
Help the needy, the grammar has launched multiple threads and
these threads have returned the same token.
This condition is caught by the number of accept tokens in queue
is not equal to the number of threads reporting success.
The needy? well i was caught with this logic bug.
See |Arbitrator code generator| 
where logic check resides.\fbreak
13 Dec. 2005

@*2 Porting of |cweb| code.\fbreak
Make sure the the @@i include construct uses quoted file names.
Without the quotes, the mac version of |cweave| has a slight 
stammer. The Microsoft flavour works.\fbreak

See |Generated finite state automaton macros| for more
stumblings from within. The c macro definition workaround works but
the references to the macros are not placed into the
Index.\fbreak
16 Dec. 2005

@*2 |cweave| \CPLUSPLUS/ code.\fbreak
Removed ending semi-colon from
|RSVP| macro to have |cweave| print out these type of token
macros onto its own line.
So make sure u add a ``;'' following their use.\fbreak
8 Jan. 2006

@*2 |failed| directive added in the |fsm| construct.\fbreak
I felt the grammar writer 
should be given a last-chance
to deal with failed parses.
Why?
For example, my |yacco2_lcl_option| needs to deal 
with options having multiple letters.
Now how do u program these 
options whose via prefix is faulty?
For example, option -err has -e and -er as the potential
option but are in error.
One could explode on the combinatorial code within a grammar
to deal with each evolving prefixe or
force the calling thread to
handle the failed thread with some form of epsilon
in the grammar code.
This is crude so why not field a returned error terminal?
To do this i needed a directive of last-chance to be
tried in the |parallel_parse_unsuccessful| procedure.
For the moment, it is only supported in a thread grammar.
Possibly i'll look at the monolithic grammar 
and what it means
in particular for error correction.\fbreak
8 Mar. 2006\fbreak
\fbreak
Verified that |failed| directive works in a |monolithic|
grammar. 2 thumbs up for consistency.
Just make sure that a ``failed'' directive
within a monolithic grammar
places the Error T in the ``Error queue'' via the 
|ADD_TOKEN_TO_ERROR_QUEUE_FSM| macro and not |RSVP_FSM| macro:
this places the error into the ``accept queue'' which is wrong.\fbreak
15 Jun. 2014

@*2 More token info for tracing.\fbreak
Added to token trace macros the GPS of the source.
This allows one to see where within the source things are occurring.
\fbreak
22 Mar. 2006


@*2 Added to the |CAbs_lr1_sym| definition a ``who created'' GPS.\fbreak
Comes in handy when errors are throw but from where?
Errors are directed to the source file 
with no fingering as who the grammar was that generated it.
So it's up to the grammar writer to tell it as it is.
Now the |O2_err_hdlr| grammar can spread the word so to speak...
if it is available.
See |set_who_created|, |who_file|, and |who_line_no|.\fbreak
22 Mar. 2006

@*2 Rewrote |tok_can<AST*>| due to global functor firing.\fbreak
Originally i had the filter mechanism within the 
|tok_can<AST*>| container. This lead to the functor
being fired by the advance routine regardless
of whether the tree node was rejected or not.
Why the oversight?
i did this to quickly knockoff the tree container.
Now it's in the tree walker where it should be.
This way the functor only gets fired if the tree node fetched is accepted
by the filter or there is no filter.\fbreak
17 Apr. 2006

@*2 Adjusted array of ``[]'' declaration.\fbreak
Originally i defined arrays of unknown size as type variable-name[].
Porting to Sun did not like this.
So my delimma was ``how to define a base table structure 
 for each table for threads, shifting, reducing etc?''.
The emitted cpp tables were explicitly sized in their definitions
 for the ``bsearch'' function to act on but my generic search code was open-ended
 having no knowledge of each table's size.
\fbreak
Solution:\fbreak
Create a base definition of only 1 entry:\fbreak
\listing{"/usr/local/yacco2/diagrams/array_def.txt"}
\fbreak
22 Dec. 2006

@*2 More porting issues dealing with threads and syncing signals.\fbreak
When there was only 1 thread requested to run, i optimized out the mutex acquire / release cycle
and left the Caller parser and the Called thread to complete their launch cycle
by a) Caller parser goes into a wait state by |pthread_cond_wait| and 
b) the Called thread signaling the Caller parser by |pthread_cond_signal|.

What happens when:\fbreak
A calls only 1 thread B and B completes before A puts itself into a Wait stupor.
IE, B will be signalling A to wake up. It depends on the Pthread implementation.
Some will queue it up for the wait signal to happen and then pass it back immediately 
to the Caller
while Sun
drops the signal and so ..... hear the zzzzzs from the sleeping beast and
the anxiety from the compiler writer while waiting and wait....\fbreak
\fbreak
Conclusion: Remove the optimization and just use proper acquire / release hygiene to
deal with syncing between friends.
As procedure calls are slower then thread calls due to ``oo'' variable initialization
and destructor clean up , I'll just remove completely the conditional |THREAD_VS_PROC_CALL__|.
My tracing works VERY WELL to diagnose this problem. Here here.
\fbreak
\fbreak
Dregs of past thoughts:\fbreak
|THREAD_VS_PROC_CALL__| thread versus procedure call performance.\fbreak
{\bf It must be defined as it is a preprocessor conditional symbol}!
There is a cost of calling a thread versus a procedure call. 
What is it is the reason for this symbol. 
When there is only
one thread to be launched, this becomes a procedure call instead of a thread.
Where I'm the doubting Thomas, is the cost of objects birthing and dying greater 
than having a thread startup
and put on reserve for other calls?

|THREAD_VS_PROC_CALL__| of 0  calls threads and 1  calls procedures. 
The winner is procedure-call by 9 percent. NOT ANY MORE! It's threads cuz of oo's
overhead in those damn objects and their rights of passage.\fbreak
\fbreak
16 Jan. 2007

@*2 Changed back to passing Parser as a pointer for tracing purposes.\fbreak
When the going get debugged, it a hell-of-lot-better to see what the pointer is pointing to
in the debug session  rather than just an address. Maybe a weakness in the Sun Studio debugger 
but so what. This will allow me to see
if i'm clobbering memory by the data per parser environment.\fbreak
29 June 2007

@*2 Some more optimizations.\fbreak
The grammar suite takes 1:50 minutes. Now to improve.
@*3 1) precalculates a compressed set key from a terminal's enumerate id.\fbreak
This eliminates everytime a reduce takes place mapping the terminal's
enumerate id into a compressed set key format
so that the lookahead set can be searched. Its a tradeoff towards space
for speed. 
Adjusted |CAbs_lr1_sym| to contain and manufacture the compressed set key.
The performance improvement is approximately 20\% --- 35 seconds on grammar suite.
@*3 2) eliminate passing shift's element enumerate value.\fbreak
Split the |find_shift_entry| into 2 contexts:\fbreak
\ptindent{1) current T context}
\ptindent{2) Rule or returned T from parallelism context}
\fbreak
The 2 routines are |find_R_or_paralleled_T_shift_entry| and |find_cur_T_shift_entry|.
5 seconds improvement on grammar suite.
@*3 3) eliminating the |tok_can| reader mutex --- nope.\fbreak
Well here's the scoop. The |tok_can| templates are ``just in time'' (jit) in
accessing their contents. What does this mean?
For example, |tok_can<ifstream>| container is a wrapper to access
raw characters of a file returning the raw character 
transformed into raw character token
placed into its secondary container for possible reuse.
If the read request has the token in its internal container ---
container inside a wrapper container, then it returns it via the 
inside template container's operator[xxx].
Now for the ``jit'', if the [xxx] request is not inside its
 internal container,
|tok_can<ifstream>| calls the ifstream object to fetch the next character.
For far so good but put this into a multithreaded context 
where there are 2 or more cpus running at-the-same-time.
Now the |tok_can<ifstream>| ifstream object becomes a critical region.
What is the critical region part?: its subscript.
Even though my |get_next_token| request is reader only against the
|tok_can<>| container, this container itself is a reader/writer depending on
the context --- reader if it has the request squirelled
away in its token container, but a writer when it does not
contain the request and must access the ifstream object.
An optimization test was 
conducted, no ``jit'' character accessing by the |tok_can<>|
(all the characters were read at time of open before any read requests were done)
versus the ''jit'' with guarded mutex.
Though the winner was no ``jit'' by only 3 seconds over 80 compiles,
it was not worth the gain over a slighlty unsafe attitude.
 I would have
needed to adjust all
|tok_can<xxx>| variants to remove the ``jit'' unsafe condition. \fbreak
August 2007
@*4 Elimination of reader mutex for optimization reasons.\fbreak
The Ides of nagging made me do it for speed. So mutex control has been 
eliminated from the ``jit'' containers that are now not ``jit''.
These template containers now do a double read across their input 
as the cost of the read mutex is tooooo slow: 3/80\%.
I'm putting into my subconsious the problem to find a better silicon / hardware solution to 
critical region control.
I'll have a look at the overhead using Sun's ``dtrace'' facility not
only for mutex overhead but also other optimizations that can be done to \O2
to approach top-down parsing speeds --- ie \O2{}batch versus \O2: \O2 is approximately 
4 times slower. Don't know if this is an accumulation of 
c++ and templates etc against a bare bones
\O2{}batch ``c'' language approach? 
\fbreak
Sept. 2007
@*4 Parallel thread reduction should be lr(0).\fbreak
Here's the scoop: if a thread's lookahead boundry is a superset
of what should follow, the returned lookahead token could be in error.
As \O2's reduce operation looks to find its boundry dependent
of the faulty lookahead, guess what it throwns an error due to the
lookahead token not found in the reduce table of the calling grammar.
So create a new |find_parallel_reduce| procedure that just returns the first |Reduce_entry|
to complete the reduce. It effectively is lr(0): no concern for the following token context!

Now the error can be dealt with by programming the shift operation within the grammar
using either \ALLshift{} or \INVshift{} to capture the faulty parse point and to
report a specific error against the GPS of the returned lookahead token. 
\fbreak
Oct. 2007
@*4 Make |accept_queue| more efficient.\fbreak
Make it a fixed array of local |Caccept_parse| for 2 reasons:\fbreak
\ptindent{1) eliminate the new / use / delete cycle: malloc is too slow}
\ptindent{2) don't need a map but just a sequential queue}
This gives a 13 percent inprovement.
\fbreak
Nov. 2007

@*4 Use Procedure call when only 1 thread needs to be run.\fbreak
The mutex / thread paraphrenalia is tooooo slow compared to a procedure call.
This thought was nagging me since my
1st \o2 compiler written by recursive descent.
It became my bench mark that thread parsing was measured against.
Yes i'm aware of the bottom-up optimization by Ullman but i'm
not there yet in digesting  the optimized requirements
to lower the
push / pop overhead by consolidation of subrules and their syntax directed code
that need some form of sequential sequencer when the consolidation
consequence must get exercised.

Now why come back to this subject anyway?
Those nagging optimization muses!
I eliminated the mutex controlls due to threads and my critical regions;
there is a 1:1 activity taking place whereby
the calling of the procedure
by the requesting grammar passes the right to the called 
procedure to enter its critical region when needed without
the paranoia of duality destructive conditions.
By
making the |Parser| and its evil grammar fsm twin global and by mallocing them
within the called procedure, the overhead should be lessened.
Mastro the envelop please.
And the winner is: 25\% faster.
How was this measured? My Apple laptop where running times 
between threads only against the hybrid 
approach where taken using the |o2grammars.bat| script.
\fbreak
Dec. 28, 2007

@*4 Thread's start-up attributes for stack size and system scope?.\fbreak
I played with |pthread_attr_setstacksize| and |pthread_attr_setscope|
attributes to improve possibly speed and fat deposits.
Well the |pthread_attr_setscope|'s setting of |PTHREAD_SCOPE_SYSTEM|
made things worse as this was an aggregate of all things considered.
Procedure calls of threads by threads made the run environment
too sensitive to this unknown size mix.
The result can produce a |SIGSEGV|.
Experimenting by increasing the |stack size| delayed the problem but bloated the run size.
As always the cure was easy: just remove this fiddling and default to
the runtime attributes of the local |pthread| implementation.
On the Sun Solaris, the stack size for all threads is 1 megabyte --- more than enough. 
\fbreak
Apr. 2008

@*2 Error detection within a grammar: new \QUEshift{} symbol introduced.\fbreak
\QUEshift{} was created to handle questionable situations like error
detection points within a grammar. It can be expressed as a normal shift terminal
or within the returned T of a \PARshift{} thread expression.
As the lookahead symbol is questionable, using the \ALLshift{} or \INVshift{} symbol to handle
error detection has one weakness: its subrule reduce operation depends on the lookahead set
 which the current T could be not in this LA set. Consequently the reduction 
could possibly will not action.
Introducting the new symbol draws the reader's eye to the error point with the grammar.
The reduce is a lr(0) context which means no dependency on the current symbol and so
the subrule always reduces!
This allows the grammar writer to coerse the parser's 
behaviour by the subrule reducing syntax directed code.\fbreak
Warning:\fbreak
The current token is {\bf not advanced} so perpetual motion on the 
same token spot could occur if one is using the \QUEshift{} to act like a \ALLshift.
|@<Invalid \QUEshift instead of \ALLshift use@>| has 
been created to detect and stop the parse process.
So be warned.\fbreak 
June 2008

@*2 Speed wonderful speed in ``Oliver Twist'' and not William Burroughs.\fbreak
Well the rule recycling works now. No more new(s)... Just recycle them grammar
rules.
The envelope please ... 25\% speed improvement from 32 to 24 seconds against
all them grammars. As time shrinks there seems to be an asymtotic return on
performance improvements. But this one is good; no really very good.
I'm only 4--5 seconds away from the recursive descent bench mark.
It's malloc! and its mutual exclusion that is very very expensive by
the following ``dtrace'' outpout.\fbreak
\INDENT{.5in}{0  57766  lmutex\_lock:entry} 
\INDENT{.6in}{libc.so.1`lmutex\_lock}
\INDENT{.6in}{libc.so.1`malloc$+$0x25}
\INDENT{.6in}{libCrun.so.1`void*operator new(unsigned long)$+$0x2e}
\INDENT{.6in}{o2`void NS\_o2\_sdc::Co2\_sdc::reduce\_rhs\_of\_rule(...*)$+$0x282}
The above trace also brought out my sloppiness in proper code emmissions per
grammar's |reduce_rhs_of_rule| routine. I never stored the 
newed rule so each time the grammar was run the used rules were recreated --- uck.
\fbreak
\fbreak
Dec. 2008

@*2 Improve dumped data when Shift T not found in parse table.\fbreak
See |@.Err Can't find symbol to shift i...@>| where it is thrown.
Though this is a grammar writer's lack of error catching in his grammar,
 at least dump out the info on T: its enumerated id and literal.
Now the info dump contains the grammar in question, 
its current parse state, and the T details.
Why isn't it using a Error class T
 and to use \O2's generic error queue dump facility?
Cuz this is below the user's language: remember this is a generic interface
without any knowledge of what's being built on top of it.
 And I
didn't want to force yet another canned set of T definitions like lr constant and rc.
\fbreak
\fbreak
Feb. 2009

@*2 VMS spits core dumps  when its thread stack is exceeded.\fbreak
Ahh recursion is sometimes devine but
not when the stack limit is exceeded
thinking its a runaway recursion call when A recurses on itself
without any stop recursion detection.
So U must increase the |VMS_PTHREAD_STACK_SIZE__| 
symbol in the |yacco2_compile_symbols.h|
file and rebuild the \O2 library.
The allocated thread stack size was 128k before the Pascal translator 
starting to choke due to better symbol table management
that increased the |pas_variable| grammar run size
when dynmically creating the statement variable's
symbol table components.
double ugh but this is reality.
\fbreak
\fbreak
Feb. 2009

@*2 Caught by your short and curly --- local variables in grammar rule.\fbreak
The short of it is the recycling of rules to new once reuse forever.
The consequence is the rule gets recycled and if u have
not reinitialized the variable aka an array or table then
 the past dregs of invocation will haunt u.
Either crate the variable in the ``fsm'' grammar construct or
reinitialize in the rule's construct directive.
Better yet do it in the rule's ``op'' directive
before the variable is being used.
Do u really want the curly part?
Of course not so where did it grab u Dave?
Grammar |la_express| to calculate the lookahead expression.
Rule reuse happens on ``+'', ``-'' expressions: eolr - ".".
\fbreak
\fbreak
Feb. 2009


@*2 Add a complete trace on fetching a T when symbol functor in use.\fbreak
When the |tble_lkup_type| token fetching in its various forms
 attempts to remap the
raw T, i just traced the fetched T before the potential
remapping took place.
If the symbol table functor is in place and turned on then the after attempt is
now also traced.
This was highlighted when i wrote a Pascal translator with
a syntax directed symbol table scope handling
and my myopic test was the problem as i put an externally defined 
function within a local procedure.
Boy my misfits never cease to entertain.
This seems to be my problem where the original test item was faulty.
I guess u could say my grammars should have caught this faux-pas but
they were not written to catch all sins but to remap one correct 
Pascal program into another correct Pascal variant.
Some error reporting is being done but the more 
others use it the more retrofitting of error reporting is taking place.
More for the weary when problems prevail.
\fbreak
\fbreak
Feb. 2009

@*2 Add right recursion support for rule recycling.\fbreak
Well how did i treat this?
I detected full rule use consumption and outputted a message to the grammar writer
that all the allocated rules were in use and exited with a message.
Please see grammar |rules_use_cnt.lex| as to how it counts number of rules
in a left recursion scenario.
Well this was not good as right recursion has its place in parsing
though it hits hard on the parse stack.
How so?
Before the rule can be reduced it keeps pushing aka shifting until
its lookahead boundary is met.
So if the parse exceeds the fixed stack size it will still honk
with an abrupt message and quick stage exit.
Staying within the stack allocation is fine.
See |MAX_LR_STK_ITEMS| as to the parse stack allocation: adjust accordingly.
\fbreak
\fbreak
Feb. 2009

@*2 Changed input order of T Vocabulary --- exchanged T with Error T.\fbreak
Why the change?
This allows the grammar writer to
write independent compiler/grammar combos --- Eg front end lexing of Unicode,
so that the front-end creates the external token container for the other
compiler/parser combo to digest.
Currently all token containers are memory only template derived.
With this change the parser/grammar(s) T Vocabulary now appends the Errors 
at the end of T Vocabulary enumeration scheme.
The second parser/grammars combo must include the first T definitions 
in their own T Vocabulary in the exact order defined by the first parser.
From there it can build its own T Vocabulary of additional Tes and Error symbols. 
Another way is to remap the enumerated ids of the first parser's tokens 
 into the
ordering scheme of the second parser. 
Use of the token read functor associated with a read token container
 to remap Tes at read time.
It could just change the ``enumerate\_id'' value of the old token into the
current parser's T Vocabulary mapping.
It could also create a new token but this itself is overkill unless one is
remapping the token into another different token type: 
for example remapping an ``identifier'' token 
into a keyword by use of a symbol table lookup.
\fbreak
Caveat.\fbreak
Currently the \O2 library has globally defined symbols that
get resolved at linker time.
So one cannot run mutiply defined independent threads of 
parsers with having exclusive use of \O2. 
\O2's implementation
contains multiple independent parsers sharing the same \O2 library
and only 1 super set of Tes defined for all parse stages.
For example, the command line to \O2 gets parsed by
its own grammars and their outputted tokens become downstream fodder
for the suite of grammars used to parse the inputted grammar file. 
\fbreak
\fbreak
There is still work to be done to consolidate
\O2's external symbols into a structure containing indirect pointers
to these symbols that are currently resolved by the linker (ld).
1st thought:\fbreak
1) have a local structure initialized to these pointers.\fbreak
2) register this structure of pointers with the runtime 
library of \O2  before any parsing begins.
\fbreak
3) each independent parser can run in its own thread 
\fbreak
2nd Thought:\fbreak
1) use a fork process where the token containers are passed 
somehow as input to the subprocess that fills its booty.
This thought is similar to the spawning of a 
grammar as a thread or its optimized procedure call.
\fbreak
\fbreak
May. 2009

@*2 Tree container is out-of-sorts from self modifying trees.\fbreak
Well its back to just-in-time (JIT) reading of the tree |tok_can<AST*>| 
as the following example outlines why:\fbreak
Given a grammar that reads a specific T type like ``call-stmt'' and u want to
change its younger brother to a different T. What happens during the parse?
The current T is shifted onto the parse stack 
and the lookahead T is fetched becoming the current token. This LA T will
be a ``call-stmt'' possibly used to reduce the shifted T ``rhs'' subrule.
The problem is the container has the unmodified reference
of the lookahead T. Now within the grammar's syntax-directed-code u
process the younger brother nodes to which u changed some of the 
tree's content. If u are unlucky, the LA T's id gets changed. 
Irrational behaviour could occur: the parser doesn't 
reduce properly or possiblely as the T type is different from the 
parse stack frame entry of ``call-stmt'', this acts like
an uninitialized object having
random behaviour.\fbreak
\fbreak
So what can one do? i corrected the |tok_can<AST*>| container to JIT
 reading of its Tes and
implemented the |remove| method that pops the last entry from the container.
If u are modifying the T type of the tree: ie replacing the tree node's content
with another T type, now the grammar writer 
must add syntax-directed-code to remove the LA T from the container,
 re-align
the current token position to the shifted T position,
and do a ``get\_next\_token'' to fetch the proper LA T thus maintaining 
the integrity of the parser.
All this sounds like a lot of work but here is an example of such coding:\fbreak
An example:\fbreak
\fbreak
\let\setuplistinghook = \linenumberedlisting
\listing{"/usr/local/yacco2/diagrams/treemodify.txt"}
\fbreak
\fbreak
The code above is taken from a grammar's ``rule'' syntax-directed-code.
The rule has a reference to the parser environment and doesn't have to go 
thru the ``fsm'' route
to get at the token supplier.
lines 5--6 gets the tree token container from the parser and casts it 
to a tree container.
Lines 7--8 removes the last T from the container and re-aligns the 
parser's current token position
to the shifted T position.
Note: All token containers have subscripted token access starting from 0. 
Line 9 fetches the new LA T for the parser to continue merrily along its way.
There are other ways to re-align the LA T: 
Please see |@<Parser's token defs@>|.
All this for dynamic modifying of trees: good stuff!
\fbreak
\fbreak
May 2009\fbreak
@*2 Multiple Reader/Writer improvement to supplier container.\fbreak
Historics: JIT fetching of tokens from an ``ifstream'' container demanded
locking when the request was not in the container.
Consider 2 parallel threads A and B competing where their read requests to the
container are simultaneous: A on cpu 1 and B on cpu 2 and their requests 
are not in the container.
The critical region becomes the physical i/o to the ``ifstream'' object
when the request was not within the container.
So what did i do? experiment 1 was remove the JIT attitude and read
all the ``ifstream'' characters into the container at file open time.
Now the container becomes a read-only with no need to use locking.
So ``ifstream'' issue is solved but what about a tree container with T filtering?
It is a JIT container that requires locking protection as u do not
want to walk the complete tree filling it up before the first read request.
Also consider a self modifying tree. 
What? The Pascal translator required the following:\fbreak
The HP ``delete'' call statement had to be removed and replaced
with a raised signal variable
 so that its future close statement could deal with
it using a ``delete disposition'' clause within a modified close.
This future close tree node was morphed into a conditional subtree
dealing with ``to delete or not to delete'' issue.
Without the JIT attitude the tree walker has remnants of the before
tree surgery.
The container could contain
items that are no longer valid due to this modification.
\fbreak
\fbreak
Back to the JIT and Quick overview of mutual exclusion.\fbreak
When a writer in introduced, locking protection is required
if there is more than one simultaneous accessor to the container.
If there are only readers JIT still demands writing to the container before the 
read request can be satified.
No lock protection is required when only one suitor is active.
Within the parsing environment, all threads are co-operative and must house
clean when completing their task even though they might abort.
By keeping a reader/writer count against the container and per parser,
the supplier container lock usage can be optimized according to the simultaneous
number of accessors. 
\fbreak
\fbreak
What about the other containers: recycle-bin, error, and producer?
Do they require lock protection?
Yes they do when they are being filled
 and yes when they are acting as a supplier container.
As they are more infrequently used, i leave the locking
mechanism with the ``add\_token\_to\_xxx'' procedures where
xxx is one of ``error\_queue'', ``producer'', or ``recycle\_bin''. 
For occassional back door T adding to the supplier,
the ``add\_token\_to\_supplier'' procedure is lock optimized
on simulatneous accessors as the supplier container maintains its suitor count.
\fbreak
\fbreak
June 2009\fbreak
@*2 Removed |grammar_stk_state_no__| from the |CAbs_lr1_sym| definition.\fbreak
The original thought was to capture the parse stack number at time of
T creation for error tracing.
The thought was half baked as what happens when a T is created
outside of the parsing environment --- no parse stack? 
So out half-baked!
If the grammar writer needs this information, it can be programmed explicitly
by the grammar writer by adding the appropriate attributes
 to the error T being logged.
\fbreak
\fbreak
June 2009\fbreak
@*2 Note on what's in the token container and its size.\fbreak
The ``end-of-grammar'' condition signaled by the 
|PTR_LR1_eog__| T is not an element of the container.
Why?
It acts as a conditional being only-the-lonely as only
the Tes in the token stream are contained.
So u are warned.
If u are testing the token container for size --- for example
u walked a tree container with filtering and u are testing whether the
2 Tes and the ``end-of-grammar'' condition are there, u should test the
container's size for 2 elements
and not 3.
Why all this verbage? whispers to myself.
\fbreak
\fbreak
June 2009\fbreak
@*2 Sets: Sequential versus binary search optimization.\fbreak
Well what is the break-over point when to use a sequential
search on an ordered table versus a binary search?
This question came up when i wanted to improve set handling: aka
shift, reduce operations within the fsa state.
Try to paper out the result! I finally wrote a simple
program to gather stats on the break-out point.
Surprizingly it was 72 elements.
The test used a table of elements having a multiple of 3 as 1*3, 2*3, etc.
The population went from 1 to 128 elements, and for each element in the table,
a spanned search key of +,-, and = the element key was done.
This was run against each search type to find out the break-over point
on instruction costs.
Now all state searches have a dual strategy tested against the
|SEQ_SRCH_VS_BIN_SRCH_LIMIT| constant as to what search type to use. 
\fbreak
\fbreak
July 2009\fbreak

@*2 Change T containers's subscripting to unsigned integer or my subtle stupidities.\fbreak
Why the change from signed to unsigned integers for size, subscripting?
Depending on the stl template library, there will be unresolved references to 
method like ``size'' that returns unsigned.\fbreak 
\fbreak
Stupidity number 1: overloading the subscript range: subscript \LTsign{} 0 \derives{} 
have not accessed container
for T, before first time access, etc. U get the notion.
Due to this, ``first-time-accessed'', and ``end-of-container-reached'' attributes were needed.
Tree walking with filtering needed special attention in
 the ``do i already have a T in the container?'' and ``end-of-tree-reached''.
That is, a request could be asked to fetch a specific T after the ``end-of-tree'' has already 
been reached.\fbreak
\fbreak
2nd stupidity: not commenting / documenting that a Parser expects that the T is already been
fetched before it
requests it. This showed up in my haha finetuning of my logic on tree containers and the 
discrete logic grammars
getting nada input: dead end T. 
\fbreak
\fbreak 
Cost to my overloading, about 8 hours of work to farret out these subtleties.
I know its rather simple but this is my twilight zone of stupidity.  
\fbreak
\fbreak
Nov. 2009\fbreak

@*2 Porting to Microsoft: Visual Studio 8.\fbreak
Some not so happy comments on 32 bit console application:\fbreak
1) They got it wrong when it comes to C runtime (CRT) and their
different calling types: \_\_cdecl, \_\_stdcall and 
how their libraries static or dynamic were built.
The threaded library needs \_\_stdcall, while the main program needs \_\_cdecl.
Each library draws from its own memory pool depending on what library type u are using.
So build everthing using \_\_cdecl and fine-tune the call to ``\_beginthredex'' with
\_\_stdcall.\fbreak
2) U better choose the right type of multi-threading ``/MT'' or ``/MD'' or Klack-klack-klack?
Well trial-by-error discovered ``/MT'' is the right one and not their choosen default.\fbreak
3) Forums are thin on quality but lots of verbage on
multi-threading: Try looking up exit code (255).\fbreak
4) U better use ``/force:multiple'' to allow all those common c++  rtns to coalesce.\fbreak
5) Last, their Release libraries don't work! its blows up before 
the program ``main'' is entered into.
So the port has the porky version but it works!\fbreak
\fbreak
Alas poor fool for thinking they improved on this from Visual Studio 5 to 8.
It was trial-by-the-blind using the various combinations to get it going.
Better cosmetic documents but of same software quality ilk. Well my tea 
reading is this: cica 2003
was move to the CLR / C sharp development and leave as is the 32 bit console application code.
Let the street hawkers spin their new tails of enchantment to follow them.
Anyway the port is done but tooth mashing ain't fun.
\fbreak
\fbreak 
Nov. 2009\fbreak

@*2 Mutexing the containers.\fbreak
A review:\fbreak
1) All containers start with one owner. Therefore the 1st fetch is safe.\fbreak
2) All sequential reads from a container is safe.\fbreak
3) After a T is delivered from its container, the container checks nto see if 
the request was for its last T inside it. If so
the container will do a future request by itself and not by the consumer.
That is it is pushing the race condition ahead to maintain saftey to the consumer.\fbreak 
4) This future read i call lookahead. It contains the mutex mechanism to protect from 2 or
more suitors. So what happens when 2 consumers request the same last T?
Well there could be 2 potential lookaheads attempted. Only 1 lookahead T added to the 
T pool.
What happens if the lookahead request hits the end-of-T-stream?
The mutex protect checks for this. 
\fbreak
\fbreak 
Nov. 2009\fbreak

@*2 Some refinements to source file/line tracings.\fbreak
External file print sourcing improved, added source file/line to dynamic
tracing.
Cleaned up ``Generated finite state automaton macros'' from ``c type macros''
back to cweb macro.\fbreak
See  |EXTERNAL_GPSing| and |FILE_LINE| macros with appropriate comments.
\fbreak
\fbreak 
Jun. 20014\fbreak