@q file: notes.w@> @q% Copyright Dave Bone 1998 - 2015@> @q% /*@> @q% This Source Code Form is subject to the terms of the Mozilla Public@> @q% License, v. 2.0. If a copy of the MPL was not distributed with this@> @q% file, You can obtain one at http://mozilla.org/MPL/2.0/.@> @q% */@> @** Notes to myself ... Decisions.\fbreak @*2 Evaluate if extern ``C'' should be used in |Set element compare functor|.\fbreak Cuz its a closed system, there is no need to make the C++ functor global for other languages. So remove "C". @*2 Cleanup from failed parallel parse.\fbreak As the local parallel parse does not affect the parser requesting parallelism, there is no save/reset action needed on its token stream variable |current_token__| and position. So remove the paranonia code. @*2 Verfiy if all successful threads consume a token even if its{...}.\fbreak just a remapper on the current token. For example in the Pascal translator, the lookahead token might need a re-verification by the symbol table across all scopes. So call the thread who tries the remapping and returns the result be it the same or remapped. Now what. Is result in terms of processing the token stream and the new lookahead? I got it from the grape vine that... yup. As per normal --- consumption takes place! @*2 Manual arbitrator how does it work?.\fbreak It's a proxy just returning the 1st token in the accept queue. |AR_for_manual_thread_spawning| is a canned proxy arbitrator for this purpose. There is no judging code. It's a teflon special --- nothing sticks to it; just pass back the first item in the queue. |spawn_thread_manually| function sets up this default. Corrected the |call_arbitrator| who originally jamed the first parm into the accept queue. Now, call the arbitrator given for both types normal and manual threads. Though arbitrator function is a single procedure for that state configuration, it must service all the nested threads with this configuration. I still use the msg as a parameter for calling the function. It makes things simpler and consistent: generic parameter passed that needs casting to its real self. Note: arbitrator is not multi-threaded as there is only 1 copy of itself but it is re-entrant. So when two or more competing nested threads require its services, I leave it up to the operating system to deal with parallelism. It probably throatles back to single process but how many situations are there that use nested competing parses of same grammatical expressions? @*2 |Ccm_to_ar| message needed?.\fbreak I ask the question in light that an arbitrator is a global procedure and not a thread. Yes it is needed as it containes the info to arbitrate. Like what? The cm providing the accept queue for review. Should the parm be a message type? No, but it keeps it simple Dave. @*2 Why (CHARP) instead of |Cparse_record| definition in{...}.\fbreak the |reduce_rhs_of_rule| function? Well back in time, u got it, Microsoft's compiler was a honking. So if you look at the generated code for a concrete |reduce_rhs_of_rule|, you'll see how it games itself down thru the stack equating the subrule's parameters in LIFO. Does it still hold this quirk? Don't know until I retry. At the moment, I have too many other things to complete. Well I'm bitting the ??? to make things faster. Rewrote the stack and corrected for speed the emitted code of the rhs subrules. eliminate (CHARP):\fbreak 3 Oct. 2005\fbreak Added rule recycling to speed up parsing due to the rule's birth-run-delete cycle.\fbreak June 2007\fbreak @*2 Why nil ptr test in |T_11|?.\fbreak Originally some symbols pushed onto the stack were zeroed out to protect from abort cleanups etc. This situation does not exist anymore. So rid it ghost busters. @*2 Clean up parallel parse in control monitor instead of grammar{...}.\fbreak requesting parse. It's just cleaner and closer to the action. Here are my original thoughts. Some house keeping is done. The cleanup is to pop the \PARshift{} symbol from the attempted parallel parses. It could have been done in the control monitor who was the creator of this but I felt that spreading this cleanup to the control monitor was potentially spreading the mess.\fbreak \fbreak Dictum: keep the effects' cleanup as close to the affect. Is this an Occam? @*2 Conversion of control monitor and parallel parse code.\fbreak This is the injection code included into the outputted grammar modules from Yacco2. Conversion cleaned up dregs from cm handling of a ${\vert?\vert}$ dynamic code request. A thought of minimal value where there are other means better to cope with this type of situation. Now what is this situation? How do you cope with a parsing situation like the syntax directed code that needs parsing? There is no assigned set of grammars to properly parse the \CPLUSPLUS/ code. So, do a dynamic parse looking for a dynamicly calculated lookahead token to stop the parse-by-character situation. Now the good stuff. The |cweb| worked first time both in the control monitors and parallel grammar threads. Let the applause begin. @*2 Why is there an abort attribute in the parse stack record?.\fbreak If there is a symbol on the parse stack with `affected by an abort parse' turned on, the cleanup of an aborted parse will delete the symbol like an ``auto delete'' when it pops the parse stack. @*2 Make all yacco2's types, structures etc housed within yacco2's namespace.\fbreak The `INT' type is also used by Microsoft. So, add `yacco2::' qualifier to all definitions and implementations. This way there is no conflict of interest when porting to other environments. Correct also the implementations to be qualified by namespace yacco2. There are 2 ways to do this. Firstly, be explicit per implementation. Secondly, enrobe the implementations with a namespace `${\{}$' ... `${\}}$' construct. To each their own ... you'll see both approaches depending on my mood. For the moment, files |wcm_core.h| and |wpp_core.h| are not explicitly qualified by yacco2::. This allows the old current code that uses this to be compiled until the |cweb| version is completely finished. The current system does not include everything within the yacco2 namespace. @*2 Make enclosure of namespace yacco2 explicit in implementation part of code.\fbreak Eliminates assumptions. |@| and |@| bracket the code to be housed within yacco2's namespace. All implementation code contains this start/stop definition. The code |wcm_code.h|, |wpp_core.h|, |war_begin_code.h|, and |war_end_code.h| are just that snippets and so are contained within another implementation. They still use |@|. @*2 The old version of terminal enumeration:.\fbreak The terminal alphabet is represented by the whole numbers both positive and negative. Both errors and regular terminals are open ended in their expansion capabilities as they are the left and right end points in the terminal enumeration scheme. Error terminals grow towards minus infinity while the regular terminals expand positively. The balance or pivot point of the terminal alphabet is `eog' that starts the meta terminals. Meta terminals are indicators of parsing situations like end-of-token stream reached, parallel parsing to take place, to different wild type shifts. None of these meta-terminals are found within the input language being parsed. The |Base_enum_of_T| parameter of `fsm' is the starting point of the enumerated terminals. Due to the current enumeration scheme, its value is required to map a terminal's enumeration id into a set's co-ordinates. This is a bit of a hack as each grammar contains this starting point. The hack comes about from an out-of-sync condition when new errors affecting this start point has been defined and all grammars have not been recompiled and passed thru Yacco2's linker. The consequence is the parser when run will have strange things happen because of the wrong enumeration mapping to the terminals that are buried in the old finite automaton's tables. Trust me, I'm the guinea pig. Regenerate all the grammars. Raw characters represent the mapping from the 8 bit ASCII character into its raw character terminal. Error terminals are internally generated situations produced by the parsing grammars manufactured by the grammar writer. They indicate the appropriate faulty situation detected. Regular terminals are composites. They get created by the grammars from streams of other raw character terminals or composite terminals. They are evolutionary and come into existance from various passes made on the token streams: lexical to syntactic to semantic. Reason to change:\fbreak Why this type of mapping instead of the positive integers? Reality is there is no difference apart from using the range of numbers and how they expand. Both meta and raw character terminals are constant in size. It is the other two types that expand or evolve as one is developing the language recognizer. Either way of enumerating the terminals, when an error or a new regular terminal is created, all the grammars need to be regenerated due to the change in the lookahead sets. Hindsight critiques that a start seed buried in the grammar's finite state automaton definition is required. So get rid of it! The better design is to enumeration from 0. This eliminates the mapping from the negative space into the positive space of the set co-ordinates. Take 2: Here is the new mapping: meta-terminals, raw characters, errors, and finally the regular terminals. There is no need to map into the positive space before calculating a terminal's lookahead set co-ordinates. Just use the enumerate value to translate to the set's partition and element! @*2 Tree token template container.\fbreak Well let's try passing references instead of pointers. I hope that the compilers are kinder to me within the threaded environment. This certainly saves alot of constraint checking. 14 Oct. 2004. @*2 Add in Yacco2 arbitration requiring code on the possibility of{...}.\fbreak 2 or more terminals in the accept queue with no arbitration taking place. That is, it defaults to the first terminal in the queue. The compilation check requires the checking of their first sets for the common prefix condition. At the moment, this does not take place due to the yacco2 / linker loop. Yacco2's linker generates the transitive first sets for the threads that call other threads. So this check is is a post condition beyond the compiler/compiler. At present, Yacco2 issues a warning and use at your own risk. At runtime, there still needs a look-over-your-shoulder throw condition. This will be implemented in the arbitration code. This is done --- 26 Oct. 2004 in Yacco2 generator. There is an optimization done before the throw code is appended to the arbitration thread:\fbreak \ptindent{1) more than 1 thread must be dispatched --- thread with a name: NULL name bypassed} \ptindent{2)no arbitration code supplied by the grammar writer} @*2 Rework of thread management.\fbreak At present it is spread between the global implementation of independent methods and the table of spawned threads, and the worker thread record structure. @*2 To check: does stop msg have wait/reply mechanism?.\fbreak In the shutdown? no. @*2 Change tree container to a specialized version of |tok_can|.\fbreak This makes things more consistent. Now, all u see are specialization containers. So why did u not do it in the beginning? This container was an after thought. It was written to support a Pascal translator to re-target a preprocessed Pascal variant using Oregon Software's compiler to Dec aka Compac aka HP Pascal. As there were special extensions to the Oregon Pascal, a complete front end compiler was needed to build a source tree of the program so that the source code could be morphed. There were lots of sinning go on. Well the outcome was this family of tree walkers and container. So what! Why did u not write a template specialization? Probably too deep into getting it done without the thought to whether it has any generalization. The other containers using string and ifstream did specialize but... 11 Nov. 2004. Now to correct the grammars that use the old container |tok_can_ast|. @*2 Eliminate the control monitor.\fbreak The middleman is too expensive as a thread due to the current threading model. This helps in optimizing the run performance of Yacco2. To do this meant moving all the responsiblities of the control monitor into the grammar requesting parallelism. This plumbing is within |Parser|. Part of the demolition meant throwing out the messages between the various components --- pp between cm between th. Now the message is the media or is it the |Parser|? The requesting |Parser| just passes itself to the grammar threads. It contains the pertinent token stream variable: token and position (current values) within the stream, and all the token containers --- supplier, producer, recycling bin, and the error container (refuge shelter). Also removed was the distinction between the containers --- parallel versus monolithic. As parallel grammars just graft onto the current token scene, there is no need to make the distinction except at their start up time that grabs their containers' addresses from the spawning parser. They are just readers of the tokens and not writers. Now what about error tokens? They should not be added to the error queue but should be passed back to the calling grammars within the |Caccept_parse| object. The arbitrator of the calling grammar determines what should be done. If u need to add to it then use the guard dog approach or is it the drake? ``i get no respect'' so choose your mutex before doing your thing.\fbreak Done 23 Nov. 2004. Performance gain: 30 percent. @*2 Eliminate |pp_support__| as a thread optimization.\fbreak All info in now contained in |Parser|. Depending on how the thread is started --- monolithic or parallel, the appropriate parse containers are imported either thru the contructor or via the passed parameter. @*2 Another thread optimization.\fbreak If only 1 parallel thread asked to perform, one does not need to acquire / release the lock of the requesting grammar to report success or failure. Look I'm trying to make threading closer to recursive descent in performance. Date: 3 Dec. 2004. Well I'm crawling out of the swamp... darwinism? If there is just one thread to be run, why not call it as a recursive descent procedure instead of the thread route. We'll see what the cost of thread modulation is against the procedure call approach and its object creation / destruction overhead. Take 200.1... 9 percent run improvement of procedure call over threads. @*2 a N * 2.\fbreak Eliminate the number of times that the token container is read does miracles. Now let's look at my myopia. There was a single pass, call it P1, to break up the character stream into line segments followed by the lexical segment called P3. Why? I was lazy and wanted all down stream tokens to be properly tagged in file no - line number pairings. Why lazy? The P1 pass ensured that the tokens where properly GPSed. I did not have to deal with the vagaries of ``how is a line delimited?''. It was handled in one place: the ``eol'' thread, and could be retargeted to other dealings. Now the logic is hardwired for now to the ``line-feed'' definition based on Ascii encoding. By combining the 2 passes (P1 + P3), the number of reads on a N character stream is halved. Now lets look at the raw character to symbol translation. Again this is a 2 traverse mechanism that reads from a file its characters that are translated into symbols. It should have been a just-in-time read like the tree traversals. Each character request fetches the character from the file and then calls the character translator to do the cosmetic make over. This definitely improves the ``file include'' process. This is a reduction from 37 seconds to 15 seconds. Not bad: a 2.something zinger. Now for the overhead of raw caharcters to symbol objects. Judging from the cursor winking, this could be another 10 second improvement. Wait and see... Ladies and gentlemen and the winner is ... 37 seconds down to ??? Maestro the envelope please. 15 seconds! A 22 percent improvement against the 100 second starting point but 2.something faster against the 37 seconds. Slimefast ain't got nothing on us. As the song says --- looking for xxx in all the wrong places. Now what about the cost of symbol creates and std::map usuage in the thread library and the garbage collector? I'll see what I can do. I must approach the recursive descent speed zone or this thought experiment on parallel parsing is just that --- religated to the empirical sidelines. A second string something and excuse the pun. @*2 Remove |unique_id__| from |CAbs_lr1_sym|.\fbreak It's original purpose was a birthing number to give a count to the number of symbols produced and as a partial order. Never used so out damn thoughts! Dieting and speed is in. @*2 Okay guys Yacco2 is starting to smoke.\fbreak Here's another improvement. Firstly I was looking in the wrong places: String copy was thought to be a major cause but it turns out that its a minor overhead. Globalization of the character storage is good at the cost of saftey but not a really really big stopper. So here's the scoop: First set evaluation goes thru INDIVIDUALLY each potential thread contained in the state's configuration list.\fbreak If there are many potential threads to-be-run assessed on a per character basis --- ouch. All one has to do is gather the threads into a consolidation thread to have only attempted pass on the first set of the consolidation thread. Yacco2's linker consolidates this first set of referenced threads. If the threads are orthogonal to one another (there is no common prefix), then the single first test lowers the cholestoral levels. With this insight, now to modify the grammars like: pass3, lint, syntax directed code gatherer etc. Jan. 1/2005. Well this had limited improvement. Not what was expected so see |Global Parallel table entry| where it explains how Yacco2's linker became involved. Jan. 6/2005. Speed improvement --- ???. @*2 Slim down the |CAbs_lr1_sym| space.\fbreak This is the base component to all other symbols. Originally I had associated the parser across all symbols: Terminals and Rules. This fattened the space by 4 bytes. With a shrinking of some variables to short integer and unionizing the rule's variables, I brought down the space bloat from 36 bytes to...24 bytes. So what? Well, this allows more raw characters to be stored in a prefixed array rather than a template container.\fbreak 3 Jan. 2005. @*2 Grammar as a logic sequencer: Allow no token containers.\fbreak What type of improvement is this? By passing in pointers to the parser, does this not open up more programming mistakes? Could but hear my reasons please. This lets the grammar writer program grammars as logic sequencers using epsilon rules and related syntax directed code. If the writer is very creative, behavioural terminals could be defined and put into a token container for parsing: each to their own. See |enumerate_T_alphabet.lex| as an example of this use.\fbreak 15 Aug. 2005 @*2 Logic bug: same accept token added to accept queue more than once.\fbreak Help the needy, the grammar has launched multiple threads and these threads have returned the same token. This condition is caught by the number of accept tokens in queue is not equal to the number of threads reporting success. The needy? well i was caught with this logic bug. See |Arbitrator code generator| where logic check resides.\fbreak 13 Dec. 2005 @*2 Porting of |cweb| code.\fbreak Make sure the the @@i include construct uses quoted file names. Without the quotes, the mac version of |cweave| has a slight stammer. The Microsoft flavour works.\fbreak See |Generated finite state automaton macros| for more stumblings from within. The c macro definition workaround works but the references to the macros are not placed into the Index.\fbreak 16 Dec. 2005 @*2 |cweave| \CPLUSPLUS/ code.\fbreak Removed ending semi-colon from |RSVP| macro to have |cweave| print out these type of token macros onto its own line. So make sure u add a ``;'' following their use.\fbreak 8 Jan. 2006 @*2 |failed| directive added in the |fsm| construct.\fbreak I felt the grammar writer should be given a last-chance to deal with failed parses. Why? For example, my |yacco2_lcl_option| needs to deal with options having multiple letters. Now how do u program these options whose via prefix is faulty? For example, option -err has -e and -er as the potential option but are in error. One could explode on the combinatorial code within a grammar to deal with each evolving prefixe or force the calling thread to handle the failed thread with some form of epsilon in the grammar code. This is crude so why not field a returned error terminal? To do this i needed a directive of last-chance to be tried in the |parallel_parse_unsuccessful| procedure. For the moment, it is only supported in a thread grammar. Possibly i'll look at the monolithic grammar and what it means in particular for error correction.\fbreak 8 Mar. 2006\fbreak \fbreak Verified that |failed| directive works in a |monolithic| grammar. 2 thumbs up for consistency. Just make sure that a ``failed'' directive within a monolithic grammar places the Error T in the ``Error queue'' via the |ADD_TOKEN_TO_ERROR_QUEUE_FSM| macro and not |RSVP_FSM| macro: this places the error into the ``accept queue'' which is wrong.\fbreak 15 Jun. 2014 @*2 More token info for tracing.\fbreak Added to token trace macros the GPS of the source. This allows one to see where within the source things are occurring. \fbreak 22 Mar. 2006 @*2 Added to the |CAbs_lr1_sym| definition a ``who created'' GPS.\fbreak Comes in handy when errors are throw but from where? Errors are directed to the source file with no fingering as who the grammar was that generated it. So it's up to the grammar writer to tell it as it is. Now the |O2_err_hdlr| grammar can spread the word so to speak... if it is available. See |set_who_created|, |who_file|, and |who_line_no|.\fbreak 22 Mar. 2006 @*2 Rewrote |tok_can| due to global functor firing.\fbreak Originally i had the filter mechanism within the |tok_can| container. This lead to the functor being fired by the advance routine regardless of whether the tree node was rejected or not. Why the oversight? i did this to quickly knockoff the tree container. Now it's in the tree walker where it should be. This way the functor only gets fired if the tree node fetched is accepted by the filter or there is no filter.\fbreak 17 Apr. 2006 @*2 Adjusted array of ``[]'' declaration.\fbreak Originally i defined arrays of unknown size as type variable-name[]. Porting to Sun did not like this. So my delimma was ``how to define a base table structure for each table for threads, shifting, reducing etc?''. The emitted cpp tables were explicitly sized in their definitions for the ``bsearch'' function to act on but my generic search code was open-ended having no knowledge of each table's size. \fbreak Solution:\fbreak Create a base definition of only 1 entry:\fbreak \listing{"/usr/local/yacco2/diagrams/array_def.txt"} \fbreak 22 Dec. 2006 @*2 More porting issues dealing with threads and syncing signals.\fbreak When there was only 1 thread requested to run, i optimized out the mutex acquire / release cycle and left the Caller parser and the Called thread to complete their launch cycle by a) Caller parser goes into a wait state by |pthread_cond_wait| and b) the Called thread signaling the Caller parser by |pthread_cond_signal|. What happens when:\fbreak A calls only 1 thread B and B completes before A puts itself into a Wait stupor. IE, B will be signalling A to wake up. It depends on the Pthread implementation. Some will queue it up for the wait signal to happen and then pass it back immediately to the Caller while Sun drops the signal and so ..... hear the zzzzzs from the sleeping beast and the anxiety from the compiler writer while waiting and wait....\fbreak \fbreak Conclusion: Remove the optimization and just use proper acquire / release hygiene to deal with syncing between friends. As procedure calls are slower then thread calls due to ``oo'' variable initialization and destructor clean up , I'll just remove completely the conditional |THREAD_VS_PROC_CALL__|. My tracing works VERY WELL to diagnose this problem. Here here. \fbreak \fbreak Dregs of past thoughts:\fbreak |THREAD_VS_PROC_CALL__| thread versus procedure call performance.\fbreak {\bf It must be defined as it is a preprocessor conditional symbol}! There is a cost of calling a thread versus a procedure call. What is it is the reason for this symbol. When there is only one thread to be launched, this becomes a procedure call instead of a thread. Where I'm the doubting Thomas, is the cost of objects birthing and dying greater than having a thread startup and put on reserve for other calls? |THREAD_VS_PROC_CALL__| of 0 calls threads and 1 calls procedures. The winner is procedure-call by 9 percent. NOT ANY MORE! It's threads cuz of oo's overhead in those damn objects and their rights of passage.\fbreak \fbreak 16 Jan. 2007 @*2 Changed back to passing Parser as a pointer for tracing purposes.\fbreak When the going get debugged, it a hell-of-lot-better to see what the pointer is pointing to in the debug session rather than just an address. Maybe a weakness in the Sun Studio debugger but so what. This will allow me to see if i'm clobbering memory by the data per parser environment.\fbreak 29 June 2007 @*2 Some more optimizations.\fbreak The grammar suite takes 1:50 minutes. Now to improve. @*3 1) precalculates a compressed set key from a terminal's enumerate id.\fbreak This eliminates everytime a reduce takes place mapping the terminal's enumerate id into a compressed set key format so that the lookahead set can be searched. Its a tradeoff towards space for speed. Adjusted |CAbs_lr1_sym| to contain and manufacture the compressed set key. The performance improvement is approximately 20\% --- 35 seconds on grammar suite. @*3 2) eliminate passing shift's element enumerate value.\fbreak Split the |find_shift_entry| into 2 contexts:\fbreak \ptindent{1) current T context} \ptindent{2) Rule or returned T from parallelism context} \fbreak The 2 routines are |find_R_or_paralleled_T_shift_entry| and |find_cur_T_shift_entry|. 5 seconds improvement on grammar suite. @*3 3) eliminating the |tok_can| reader mutex --- nope.\fbreak Well here's the scoop. The |tok_can| templates are ``just in time'' (jit) in accessing their contents. What does this mean? For example, |tok_can| container is a wrapper to access raw characters of a file returning the raw character transformed into raw character token placed into its secondary container for possible reuse. If the read request has the token in its internal container --- container inside a wrapper container, then it returns it via the inside template container's operator[xxx]. Now for the ``jit'', if the [xxx] request is not inside its internal container, |tok_can| calls the ifstream object to fetch the next character. For far so good but put this into a multithreaded context where there are 2 or more cpus running at-the-same-time. Now the |tok_can| ifstream object becomes a critical region. What is the critical region part?: its subscript. Even though my |get_next_token| request is reader only against the |tok_can<>| container, this container itself is a reader/writer depending on the context --- reader if it has the request squirelled away in its token container, but a writer when it does not contain the request and must access the ifstream object. An optimization test was conducted, no ``jit'' character accessing by the |tok_can<>| (all the characters were read at time of open before any read requests were done) versus the ''jit'' with guarded mutex. Though the winner was no ``jit'' by only 3 seconds over 80 compiles, it was not worth the gain over a slighlty unsafe attitude. I would have needed to adjust all |tok_can| variants to remove the ``jit'' unsafe condition. \fbreak August 2007 @*4 Elimination of reader mutex for optimization reasons.\fbreak The Ides of nagging made me do it for speed. So mutex control has been eliminated from the ``jit'' containers that are now not ``jit''. These template containers now do a double read across their input as the cost of the read mutex is tooooo slow: 3/80\%. I'm putting into my subconsious the problem to find a better silicon / hardware solution to critical region control. I'll have a look at the overhead using Sun's ``dtrace'' facility not only for mutex overhead but also other optimizations that can be done to \O2 to approach top-down parsing speeds --- ie \O2{}batch versus \O2: \O2 is approximately 4 times slower. Don't know if this is an accumulation of c++ and templates etc against a bare bones \O2{}batch ``c'' language approach? \fbreak Sept. 2007 @*4 Parallel thread reduction should be lr(0).\fbreak Here's the scoop: if a thread's lookahead boundry is a superset of what should follow, the returned lookahead token could be in error. As \O2's reduce operation looks to find its boundry dependent of the faulty lookahead, guess what it throwns an error due to the lookahead token not found in the reduce table of the calling grammar. So create a new |find_parallel_reduce| procedure that just returns the first |Reduce_entry| to complete the reduce. It effectively is lr(0): no concern for the following token context! Now the error can be dealt with by programming the shift operation within the grammar using either \ALLshift{} or \INVshift{} to capture the faulty parse point and to report a specific error against the GPS of the returned lookahead token. \fbreak Oct. 2007 @*4 Make |accept_queue| more efficient.\fbreak Make it a fixed array of local |Caccept_parse| for 2 reasons:\fbreak \ptindent{1) eliminate the new / use / delete cycle: malloc is too slow} \ptindent{2) don't need a map but just a sequential queue} This gives a 13 percent inprovement. \fbreak Nov. 2007 @*4 Use Procedure call when only 1 thread needs to be run.\fbreak The mutex / thread paraphrenalia is tooooo slow compared to a procedure call. This thought was nagging me since my 1st \o2 compiler written by recursive descent. It became my bench mark that thread parsing was measured against. Yes i'm aware of the bottom-up optimization by Ullman but i'm not there yet in digesting the optimized requirements to lower the push / pop overhead by consolidation of subrules and their syntax directed code that need some form of sequential sequencer when the consolidation consequence must get exercised. Now why come back to this subject anyway? Those nagging optimization muses! I eliminated the mutex controlls due to threads and my critical regions; there is a 1:1 activity taking place whereby the calling of the procedure by the requesting grammar passes the right to the called procedure to enter its critical region when needed without the paranoia of duality destructive conditions. By making the |Parser| and its evil grammar fsm twin global and by mallocing them within the called procedure, the overhead should be lessened. Mastro the envelop please. And the winner is: 25\% faster. How was this measured? My Apple laptop where running times between threads only against the hybrid approach where taken using the |o2grammars.bat| script. \fbreak Dec. 28, 2007 @*4 Thread's start-up attributes for stack size and system scope?.\fbreak I played with |pthread_attr_setstacksize| and |pthread_attr_setscope| attributes to improve possibly speed and fat deposits. Well the |pthread_attr_setscope|'s setting of |PTHREAD_SCOPE_SYSTEM| made things worse as this was an aggregate of all things considered. Procedure calls of threads by threads made the run environment too sensitive to this unknown size mix. The result can produce a |SIGSEGV|. Experimenting by increasing the |stack size| delayed the problem but bloated the run size. As always the cure was easy: just remove this fiddling and default to the runtime attributes of the local |pthread| implementation. On the Sun Solaris, the stack size for all threads is 1 megabyte --- more than enough. \fbreak Apr. 2008 @*2 Error detection within a grammar: new \QUEshift{} symbol introduced.\fbreak \QUEshift{} was created to handle questionable situations like error detection points within a grammar. It can be expressed as a normal shift terminal or within the returned T of a \PARshift{} thread expression. As the lookahead symbol is questionable, using the \ALLshift{} or \INVshift{} symbol to handle error detection has one weakness: its subrule reduce operation depends on the lookahead set which the current T could be not in this LA set. Consequently the reduction could possibly will not action. Introducting the new symbol draws the reader's eye to the error point with the grammar. The reduce is a lr(0) context which means no dependency on the current symbol and so the subrule always reduces! This allows the grammar writer to coerse the parser's behaviour by the subrule reducing syntax directed code.\fbreak Warning:\fbreak The current token is {\bf not advanced} so perpetual motion on the same token spot could occur if one is using the \QUEshift{} to act like a \ALLshift. |@| has been created to detect and stop the parse process. So be warned.\fbreak June 2008 @*2 Speed wonderful speed in ``Oliver Twist'' and not William Burroughs.\fbreak Well the rule recycling works now. No more new(s)... Just recycle them grammar rules. The envelope please ... 25\% speed improvement from 32 to 24 seconds against all them grammars. As time shrinks there seems to be an asymtotic return on performance improvements. But this one is good; no really very good. I'm only 4--5 seconds away from the recursive descent bench mark. It's malloc! and its mutual exclusion that is very very expensive by the following ``dtrace'' outpout.\fbreak \INDENT{.5in}{0 57766 lmutex\_lock:entry} \INDENT{.6in}{libc.so.1`lmutex\_lock} \INDENT{.6in}{libc.so.1`malloc$+$0x25} \INDENT{.6in}{libCrun.so.1`void*operator new(unsigned long)$+$0x2e} \INDENT{.6in}{o2`void NS\_o2\_sdc::Co2\_sdc::reduce\_rhs\_of\_rule(...*)$+$0x282} The above trace also brought out my sloppiness in proper code emmissions per grammar's |reduce_rhs_of_rule| routine. I never stored the newed rule so each time the grammar was run the used rules were recreated --- uck. \fbreak \fbreak Dec. 2008 @*2 Improve dumped data when Shift T not found in parse table.\fbreak See |@.Err Can't find symbol to shift i...@>| where it is thrown. Though this is a grammar writer's lack of error catching in his grammar, at least dump out the info on T: its enumerated id and literal. Now the info dump contains the grammar in question, its current parse state, and the T details. Why isn't it using a Error class T and to use \O2's generic error queue dump facility? Cuz this is below the user's language: remember this is a generic interface without any knowledge of what's being built on top of it. And I didn't want to force yet another canned set of T definitions like lr constant and rc. \fbreak \fbreak Feb. 2009 @*2 VMS spits core dumps when its thread stack is exceeded.\fbreak Ahh recursion is sometimes devine but not when the stack limit is exceeded thinking its a runaway recursion call when A recurses on itself without any stop recursion detection. So U must increase the |VMS_PTHREAD_STACK_SIZE__| symbol in the |yacco2_compile_symbols.h| file and rebuild the \O2 library. The allocated thread stack size was 128k before the Pascal translator starting to choke due to better symbol table management that increased the |pas_variable| grammar run size when dynmically creating the statement variable's symbol table components. double ugh but this is reality. \fbreak \fbreak Feb. 2009 @*2 Caught by your short and curly --- local variables in grammar rule.\fbreak The short of it is the recycling of rules to new once reuse forever. The consequence is the rule gets recycled and if u have not reinitialized the variable aka an array or table then the past dregs of invocation will haunt u. Either crate the variable in the ``fsm'' grammar construct or reinitialize in the rule's construct directive. Better yet do it in the rule's ``op'' directive before the variable is being used. Do u really want the curly part? Of course not so where did it grab u Dave? Grammar |la_express| to calculate the lookahead expression. Rule reuse happens on ``+'', ``-'' expressions: eolr - ".". \fbreak \fbreak Feb. 2009 @*2 Add a complete trace on fetching a T when symbol functor in use.\fbreak When the |tble_lkup_type| token fetching in its various forms attempts to remap the raw T, i just traced the fetched T before the potential remapping took place. If the symbol table functor is in place and turned on then the after attempt is now also traced. This was highlighted when i wrote a Pascal translator with a syntax directed symbol table scope handling and my myopic test was the problem as i put an externally defined function within a local procedure. Boy my misfits never cease to entertain. This seems to be my problem where the original test item was faulty. I guess u could say my grammars should have caught this faux-pas but they were not written to catch all sins but to remap one correct Pascal program into another correct Pascal variant. Some error reporting is being done but the more others use it the more retrofitting of error reporting is taking place. More for the weary when problems prevail. \fbreak \fbreak Feb. 2009 @*2 Add right recursion support for rule recycling.\fbreak Well how did i treat this? I detected full rule use consumption and outputted a message to the grammar writer that all the allocated rules were in use and exited with a message. Please see grammar |rules_use_cnt.lex| as to how it counts number of rules in a left recursion scenario. Well this was not good as right recursion has its place in parsing though it hits hard on the parse stack. How so? Before the rule can be reduced it keeps pushing aka shifting until its lookahead boundary is met. So if the parse exceeds the fixed stack size it will still honk with an abrupt message and quick stage exit. Staying within the stack allocation is fine. See |MAX_LR_STK_ITEMS| as to the parse stack allocation: adjust accordingly. \fbreak \fbreak Feb. 2009 @*2 Changed input order of T Vocabulary --- exchanged T with Error T.\fbreak Why the change? This allows the grammar writer to write independent compiler/grammar combos --- Eg front end lexing of Unicode, so that the front-end creates the external token container for the other compiler/parser combo to digest. Currently all token containers are memory only template derived. With this change the parser/grammar(s) T Vocabulary now appends the Errors at the end of T Vocabulary enumeration scheme. The second parser/grammars combo must include the first T definitions in their own T Vocabulary in the exact order defined by the first parser. From there it can build its own T Vocabulary of additional Tes and Error symbols. Another way is to remap the enumerated ids of the first parser's tokens into the ordering scheme of the second parser. Use of the token read functor associated with a read token container to remap Tes at read time. It could just change the ``enumerate\_id'' value of the old token into the current parser's T Vocabulary mapping. It could also create a new token but this itself is overkill unless one is remapping the token into another different token type: for example remapping an ``identifier'' token into a keyword by use of a symbol table lookup. \fbreak Caveat.\fbreak Currently the \O2 library has globally defined symbols that get resolved at linker time. So one cannot run mutiply defined independent threads of parsers with having exclusive use of \O2. \O2's implementation contains multiple independent parsers sharing the same \O2 library and only 1 super set of Tes defined for all parse stages. For example, the command line to \O2 gets parsed by its own grammars and their outputted tokens become downstream fodder for the suite of grammars used to parse the inputted grammar file. \fbreak \fbreak There is still work to be done to consolidate \O2's external symbols into a structure containing indirect pointers to these symbols that are currently resolved by the linker (ld). 1st thought:\fbreak 1) have a local structure initialized to these pointers.\fbreak 2) register this structure of pointers with the runtime library of \O2 before any parsing begins. \fbreak 3) each independent parser can run in its own thread \fbreak 2nd Thought:\fbreak 1) use a fork process where the token containers are passed somehow as input to the subprocess that fills its booty. This thought is similar to the spawning of a grammar as a thread or its optimized procedure call. \fbreak \fbreak May. 2009 @*2 Tree container is out-of-sorts from self modifying trees.\fbreak Well its back to just-in-time (JIT) reading of the tree |tok_can| as the following example outlines why:\fbreak Given a grammar that reads a specific T type like ``call-stmt'' and u want to change its younger brother to a different T. What happens during the parse? The current T is shifted onto the parse stack and the lookahead T is fetched becoming the current token. This LA T will be a ``call-stmt'' possibly used to reduce the shifted T ``rhs'' subrule. The problem is the container has the unmodified reference of the lookahead T. Now within the grammar's syntax-directed-code u process the younger brother nodes to which u changed some of the tree's content. If u are unlucky, the LA T's id gets changed. Irrational behaviour could occur: the parser doesn't reduce properly or possiblely as the T type is different from the parse stack frame entry of ``call-stmt'', this acts like an uninitialized object having random behaviour.\fbreak \fbreak So what can one do? i corrected the |tok_can| container to JIT reading of its Tes and implemented the |remove| method that pops the last entry from the container. If u are modifying the T type of the tree: ie replacing the tree node's content with another T type, now the grammar writer must add syntax-directed-code to remove the LA T from the container, re-align the current token position to the shifted T position, and do a ``get\_next\_token'' to fetch the proper LA T thus maintaining the integrity of the parser. All this sounds like a lot of work but here is an example of such coding:\fbreak An example:\fbreak \fbreak \let\setuplistinghook = \linenumberedlisting \listing{"/usr/local/yacco2/diagrams/treemodify.txt"} \fbreak \fbreak The code above is taken from a grammar's ``rule'' syntax-directed-code. The rule has a reference to the parser environment and doesn't have to go thru the ``fsm'' route to get at the token supplier. lines 5--6 gets the tree token container from the parser and casts it to a tree container. Lines 7--8 removes the last T from the container and re-aligns the parser's current token position to the shifted T position. Note: All token containers have subscripted token access starting from 0. Line 9 fetches the new LA T for the parser to continue merrily along its way. There are other ways to re-align the LA T: Please see |@|. All this for dynamic modifying of trees: good stuff! \fbreak \fbreak May 2009\fbreak @*2 Multiple Reader/Writer improvement to supplier container.\fbreak Historics: JIT fetching of tokens from an ``ifstream'' container demanded locking when the request was not in the container. Consider 2 parallel threads A and B competing where their read requests to the container are simultaneous: A on cpu 1 and B on cpu 2 and their requests are not in the container. The critical region becomes the physical i/o to the ``ifstream'' object when the request was not within the container. So what did i do? experiment 1 was remove the JIT attitude and read all the ``ifstream'' characters into the container at file open time. Now the container becomes a read-only with no need to use locking. So ``ifstream'' issue is solved but what about a tree container with T filtering? It is a JIT container that requires locking protection as u do not want to walk the complete tree filling it up before the first read request. Also consider a self modifying tree. What? The Pascal translator required the following:\fbreak The HP ``delete'' call statement had to be removed and replaced with a raised signal variable so that its future close statement could deal with it using a ``delete disposition'' clause within a modified close. This future close tree node was morphed into a conditional subtree dealing with ``to delete or not to delete'' issue. Without the JIT attitude the tree walker has remnants of the before tree surgery. The container could contain items that are no longer valid due to this modification. \fbreak \fbreak Back to the JIT and Quick overview of mutual exclusion.\fbreak When a writer in introduced, locking protection is required if there is more than one simultaneous accessor to the container. If there are only readers JIT still demands writing to the container before the read request can be satified. No lock protection is required when only one suitor is active. Within the parsing environment, all threads are co-operative and must house clean when completing their task even though they might abort. By keeping a reader/writer count against the container and per parser, the supplier container lock usage can be optimized according to the simultaneous number of accessors. \fbreak \fbreak What about the other containers: recycle-bin, error, and producer? Do they require lock protection? Yes they do when they are being filled and yes when they are acting as a supplier container. As they are more infrequently used, i leave the locking mechanism with the ``add\_token\_to\_xxx'' procedures where xxx is one of ``error\_queue'', ``producer'', or ``recycle\_bin''. For occassional back door T adding to the supplier, the ``add\_token\_to\_supplier'' procedure is lock optimized on simulatneous accessors as the supplier container maintains its suitor count. \fbreak \fbreak June 2009\fbreak @*2 Removed |grammar_stk_state_no__| from the |CAbs_lr1_sym| definition.\fbreak The original thought was to capture the parse stack number at time of T creation for error tracing. The thought was half baked as what happens when a T is created outside of the parsing environment --- no parse stack? So out half-baked! If the grammar writer needs this information, it can be programmed explicitly by the grammar writer by adding the appropriate attributes to the error T being logged. \fbreak \fbreak June 2009\fbreak @*2 Note on what's in the token container and its size.\fbreak The ``end-of-grammar'' condition signaled by the |PTR_LR1_eog__| T is not an element of the container. Why? It acts as a conditional being only-the-lonely as only the Tes in the token stream are contained. So u are warned. If u are testing the token container for size --- for example u walked a tree container with filtering and u are testing whether the 2 Tes and the ``end-of-grammar'' condition are there, u should test the container's size for 2 elements and not 3. Why all this verbage? whispers to myself. \fbreak \fbreak June 2009\fbreak @*2 Sets: Sequential versus binary search optimization.\fbreak Well what is the break-over point when to use a sequential search on an ordered table versus a binary search? This question came up when i wanted to improve set handling: aka shift, reduce operations within the fsa state. Try to paper out the result! I finally wrote a simple program to gather stats on the break-out point. Surprizingly it was 72 elements. The test used a table of elements having a multiple of 3 as 1*3, 2*3, etc. The population went from 1 to 128 elements, and for each element in the table, a spanned search key of +,-, and = the element key was done. This was run against each search type to find out the break-over point on instruction costs. Now all state searches have a dual strategy tested against the |SEQ_SRCH_VS_BIN_SRCH_LIMIT| constant as to what search type to use. \fbreak \fbreak July 2009\fbreak @*2 Change T containers's subscripting to unsigned integer or my subtle stupidities.\fbreak Why the change from signed to unsigned integers for size, subscripting? Depending on the stl template library, there will be unresolved references to method like ``size'' that returns unsigned.\fbreak \fbreak Stupidity number 1: overloading the subscript range: subscript \LTsign{} 0 \derives{} have not accessed container for T, before first time access, etc. U get the notion. Due to this, ``first-time-accessed'', and ``end-of-container-reached'' attributes were needed. Tree walking with filtering needed special attention in the ``do i already have a T in the container?'' and ``end-of-tree-reached''. That is, a request could be asked to fetch a specific T after the ``end-of-tree'' has already been reached.\fbreak \fbreak 2nd stupidity: not commenting / documenting that a Parser expects that the T is already been fetched before it requests it. This showed up in my haha finetuning of my logic on tree containers and the discrete logic grammars getting nada input: dead end T. \fbreak \fbreak Cost to my overloading, about 8 hours of work to farret out these subtleties. I know its rather simple but this is my twilight zone of stupidity. \fbreak \fbreak Nov. 2009\fbreak @*2 Porting to Microsoft: Visual Studio 8.\fbreak Some not so happy comments on 32 bit console application:\fbreak 1) They got it wrong when it comes to C runtime (CRT) and their different calling types: \_\_cdecl, \_\_stdcall and how their libraries static or dynamic were built. The threaded library needs \_\_stdcall, while the main program needs \_\_cdecl. Each library draws from its own memory pool depending on what library type u are using. So build everthing using \_\_cdecl and fine-tune the call to ``\_beginthredex'' with \_\_stdcall.\fbreak 2) U better choose the right type of multi-threading ``/MT'' or ``/MD'' or Klack-klack-klack? Well trial-by-error discovered ``/MT'' is the right one and not their choosen default.\fbreak 3) Forums are thin on quality but lots of verbage on multi-threading: Try looking up exit code (255).\fbreak 4) U better use ``/force:multiple'' to allow all those common c++ rtns to coalesce.\fbreak 5) Last, their Release libraries don't work! its blows up before the program ``main'' is entered into. So the port has the porky version but it works!\fbreak \fbreak Alas poor fool for thinking they improved on this from Visual Studio 5 to 8. It was trial-by-the-blind using the various combinations to get it going. Better cosmetic documents but of same software quality ilk. Well my tea reading is this: cica 2003 was move to the CLR / C sharp development and leave as is the 32 bit console application code. Let the street hawkers spin their new tails of enchantment to follow them. Anyway the port is done but tooth mashing ain't fun. \fbreak \fbreak Nov. 2009\fbreak @*2 Mutexing the containers.\fbreak A review:\fbreak 1) All containers start with one owner. Therefore the 1st fetch is safe.\fbreak 2) All sequential reads from a container is safe.\fbreak 3) After a T is delivered from its container, the container checks nto see if the request was for its last T inside it. If so the container will do a future request by itself and not by the consumer. That is it is pushing the race condition ahead to maintain saftey to the consumer.\fbreak 4) This future read i call lookahead. It contains the mutex mechanism to protect from 2 or more suitors. So what happens when 2 consumers request the same last T? Well there could be 2 potential lookaheads attempted. Only 1 lookahead T added to the T pool. What happens if the lookahead request hits the end-of-T-stream? The mutex protect checks for this. \fbreak \fbreak Nov. 2009\fbreak @*2 Some refinements to source file/line tracings.\fbreak External file print sourcing improved, added source file/line to dynamic tracing. Cleaned up ``Generated finite state automaton macros'' from ``c type macros'' back to cweb macro.\fbreak See |EXTERNAL_GPSing| and |FILE_LINE| macros with appropriate comments. \fbreak \fbreak Jun. 20014\fbreak