ࡱ;   !"#$%&'()*+,-./012346789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~Root Entry  ®`VTextStarWriter 5.0@8+# SfxDocumentInfo Alan Robertson $01,DAlan Robertson sX18Alan Robertson sX1ܷ8A General Cluster Framework 5Clusters, HA, High-Availability, HPC, XML, networking Info 0 Info 1 Info 2 Info 3 $01,D{ <44Standard LIBIMBEDDED LIBIMBEDDED TASK,0,1,H.2,0,100,1,9847;104146;100;0;408;14400;9003;0;0SBX sb Z Standard StarBASICSBX ARSBX AR SBX AR2c%bqqOh+'0\ h t ,8239@c@ t@SW5HDR.0sX18!; Internet linkVisited Internet LinkFootnote SymbolBullet Symbols:TeletypeNumbering Symbols Footnote anchor Line numbering Endnote Symbol 332148031 334041031 422455031 262537171 344108191 342120301 344757301 354441011 421129021 425716021Outline0 #R   n+.starbats n+.starbats6 n+.starbatsQ n+.starbatsl n+.starbats n+.starbats n+.starbats n+.starbats n+.starbats n+.starbats Rq   n#.n#.6n#.Qn#.ln#.n#.n#.n#.n#. n#. Rq   n#.n#.6n#.Qn#.ln#.n#.n#.n#.n#. n#. R   n+.starbats n+.starbats6 n+.starbatsQ n+.starbatsl n+.starbats n+.starbats n+.starbats n+.starbats n+.starbats n+.starbats R   n+.starbats n+.starbats6 n+.starbatsQ n+.starbatsl n+.starbats n+.starbats n+.starbats n+.starbats n+.starbats n+.starbats R  n+.starbats n+.starbats6 n+.starbatsQ n+.starbatsl n+.starbats n+.starbats n+.starbats n+.starbats n+.starbats n+.starbats R  n+.starbats n+.starbats6 n+.starbatsQ n+.starbatsl n+.starbats n+.starbats n+.starbats n+.starbats n+.starbats n+.starbats R  n+.starbats n+.starbats6 n+.starbatsQ n+.starbatsl n+.starbats n+.starbats n+.starbats n+.starbats n+.starbats n+.starbats R  n+.starbats n+.starbats6 n+.starbatsQ n+.starbatsl n+.starbats n+.starbats n+.starbats n+.starbats n+.starbats n+.starbats R  n+.starbats n+.starbats6 n+.starbatsQ n+.starbatsl n+.starbats n+.starbats n+.starbats n+.starbats n+.starbats n+.starbats ZSWG, A<  #$%&'()*./0123456789:;<=>?@ABCDGHK  !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFQRSTU0!'*@XX%'@:eX's@ starbatsCourierTimes New RomanStandardCourierXX!X3XU'^@d d d xd d d XXX"X,X6X@' @X'@X'@XX1'#@dX+'$(@JXXX8'&@vXX9''@X'(A@. !. n. . XX!X-6')$@XX@'1 @ rdrddrdddddd7ddd7dd`dddd*d*dd*ddddddXX!X4XGXZXmXXXA'2z@idddxdddddxdxdddVdVdXXX&X2X>XJXV7S;"@XX8S<4@XXX"9S=(@XX('@k@XX$R'D@XJJ@XDeNm 5T<jUtP$' '(. . p. @ . . . . . P. . . !. $. `'. 0*. -. /. 2. p5. @8. ;. =. @. 6')9 2+  ) Internet link Internet link@!''Visited Internet LinkVisited Internet Link!''Footnote SymbolFootnote SymbolBullet SymbolsBullet Symbols''TeletypeTeletype':Numbering SymbolsNumbering SymbolsFootnote anchorFootnote anchor%' Line numberingLine numbering Endnote SymbolEndnote Symbol FrameFrame;7S<8S=9S>> GraphicsGraphics;7S<8S=9S>> OLEOLE;7S<8S=9S>> FormulaFormula1@'<8S>> StandardStandard@HeaderStandardHeader('JJ Header leftStandard Header left('JJFooterStandardFooter('JJHeadingStandard Text body@''2A'DR'  Text bodyStandard Text body@ ''$+'&8''9')6'1@'2A'33 Heading 1Heading Text body@''$+'33 Horizontal LineStandard Text body'2A'@('0Table Contents Text bodyTable Contents Table HeadingTable Contents Table Heading'$+'SenderStandardSender ' Heading 2Heading Text body@''$+'33DR'  Heading 3Heading Text body''$+'&8''9'('2A'33  Heading 4Heading Text body''  Heading 5Heading Text body'' Heading 6Heading Text body'' List HeadingStandard List Contents1@'0 List ContentsStandard List Contents1@'0 References Text body References('1@'Text body indent Text bodyText body indent1@'List Text bodyListCaptionStandardCaption' '2A'JJIndexStandardIndexJJ Abstract Text bodyAbstract1@'dt Text bodydd'1@'2A'33DR'dd Text bodydt1@'FootnoteStandardFootnote'1@'JJ Example Text bodyExample '#1'$+'&8')6'1@'2A'33@('DR'R(\R,r&d+hn7| X F  t@ǻAlan RobertsonAlan Robertson6Clusters, HA, High-Availability, HPC, XML, networkingA General Cluster FrameworkSW5HDR.0sX18!J Frameformat ZeichenformatTextformatvorlageStandardHeader Header leftFooter Heading Text body  Heading 10Horizontal LineTable Contents Table HeadingSender  Heading 2  Heading 3  Heading 4 Heading 5 Heading 60 List Heading0 List Contents ReferencesText body indentListCaption IndexAbstractdtdd FootnoteExample Internet linkVisited Internet LinkFootnote SymbolBullet Symbols:TeletypeNumbering Symbols Footnote anchor Line numbering Endnote Symbol Frame Graphics OLE FormulaRoot 45 Column 46 Column 47 Column 48 Column 49 Column 50 Column 51 Column 52 Column 53 Column 54 Column 55 Column 56 Column 57 Column 58 Column 59 Column 60 Column 61 62 63Standard First Page Right Page Left PageHTMLFootnoteEndnote  Illustration Table TextDrawingY F.Y G.Y H.Y I.q'y& GeneralGeneraldNC#,###.00#,###.00SystemNC #,##0.00 CCC#,##0.00 CCCNC$#,##0.--;[RED]-$#,##0.-- $#,##0.---$#,##0.--REDNC$ MM/DD/YYYYMM/DD/YYYY def/SystemNC%MM/DD/YYMM/DD/YY def/SystemNC&NNNNMMMM DD, YYYYNNNNMMMM  DD, YYYYSystemNC' MMM D, YYMMM D, YY def/SystemNC. [HH]:MM:SS.00 [HH ]:MM:SS .00NC3MM/DD/YYYY HH:MM:SS MM/DD/YYYY HH :MM:SS  NCK MMM D, YYYYMMM D, YYYY def/SystemNCL MMMM D, YYYYMMMM  D, YYYY def/SystemNCM NN, MMM D, YYNN, MMM D, YY def/SystemNCNNN, MMMM D, YYYYNN, MMMM  D, YYYY def/SystemNCONNNNMMMM D, YYYYNNNNMMMM  D, YYYY def/SystemNCP D. MMM. YYYYD. MMM. YYYYDIN 5008 (EN 28601)NCQ D. MMMM YYYYD. MMMM  YYYYDIN 5008 (EN 28601)NCRMM-DDMM-DDDIN 5008 (EN 28601)NCSYY-MM-DDYY-MM-DDDIN 5008 (EN 28601)NCT YYYY-MM-DDYYYY-MM-DDDIN 5008 (EN 28601)NCUWWWWNCStandardStandarddNC#.##0,00#.##0,00SystemNC #.##0,00 CCC#.##0,00 CCCNC#.##0,-- DM;[ROT]-#.##0,-- DM #.##0,-- DM-#.##0,-- DMROTNC TT.MM.JJJJTT.MM.JJJJ def/SystemNCTT.MM.JJTT.MM.JJ def/SystemNCNNNNT. MMMM JJJJNNNNT. MMMM  JJJJSystemNC T. MMM JJT. MMM JJ def/SystemNC [HH]:MM:SS,00 [HH ]:MM:SS ,00NCTT.MM.JJJJ HH:MM:SS TT.MM.JJJJ HH :MM:SS  NC T. MMM JJJJT. MMM JJJJ def/SystemNC T. MMMM JJJJT. MMMM  JJJJ def/SystemNC NN, T. MMM JJNN, T. MMM JJ def/SystemNCNN, T. MMMM JJJJNN, T. MMMM  JJJJ def/SystemNCNNNNT. MMMM JJJJNNNNT. MMMM  JJJJ def/SystemNC T. MMM. JJJJT. MMM. JJJJDIN 5008 (EN 28601)NC T. MMMM JJJJT. MMMM  JJJJDIN 5008 (EN 28601)NCMM-TTMM-TTDIN 5008 (EN 28601)NCJJ-MM-TTJJ-MM-TTDIN 5008 (EN 28601)NC JJJJ-MM-TTJJJJ-MM-TTDIN 5008 (EN 28601)NCWWWWNC' GeneralGeneraldNC' #,###.00#,###.00SystemNC('  #,##0.00 CCC#,##0.00 CCCNC)' $#,##0.--;[RED]-$#,##0.-- $#,##0.---$#,##0.--REDNC4'  MM/DD/YYYYMM/DD/YYYY def/SystemNC5' MM/DD/YYMM/DD/YY def/SystemNC6' NNNNMMMM DD, YYYYNNNNMMMM  DD, YYYYSystemNC7'  MMM D, YYMMM D, YY def/SystemNC>'  [HH]:MM:SS.00 [HH ]:MM:SS .00NCC' MM/DD/YYYY HH:MM:SS MM/DD/YYYY HH :MM:SS  NC['  MMM D, YYYYMMM D, YYYY def/SystemNC\'  MMMM D, YYYYMMMM  D, YYYY def/SystemNC]'  NN, MMM D, YYNN, MMM D, YY def/SystemNC^' NN, MMMM D, YYYYNN, MMMM  D, YYYY def/SystemNC_' NNNNMMMM D, YYYYNNNNMMMM  D, YYYY def/SystemNC`'  D. MMM. YYYYD. MMM. YYYYDIN 5008 (EN 28601)NCa'  D. MMMM YYYYD. MMMM  YYYYDIN 5008 (EN 28601)NCb' MM-DDMM-DDDIN 5008 (EN 28601)NCc' YY-MM-DDYY-MM-DDDIN 5008 (EN 28601)NCd'  YYYY-MM-DDYYYY-MM-DDDIN 5008 (EN 28601)NCe' WWWWNCBoeqeoePp  ??2$99 SpAP/=APdddAPddA PA  PAPd!SfAP/=APdddAPddA PAPd!p @B2$99 SUAP/=APdddAPddA PA  PSAAP/=APdddAPddp AB2$99 SUAP/=APpdddpAPndndA PA  PSAAP/=APpdddpAPndndp BA2$99 SUAP/=APdpddAPndndA PA  PSAAP/=APdpddAPndndp C?2$:: SpAP/=APnd7ddnAP7d7dA PA  PAPd+)S\AP/=APnd7ddnAP7d7dAPd+)p DD2$SAAP.AAPndnddnAPndndSAAP.AAPndnddnAPndndp EE2$SAAP/=APndnddnAPndndSAAP/=APndnddnAPndndZSW5HDR.0sX18C(569a(Build:5169)(SV569)]D!Address Book Fileaddress!J Frameformat ZeichenformatTextformatvorlageStandardHeader Header leftFooter Heading Text body  Heading 10Horizontal LineTable Contents Table HeadingSender  Heading 2  Heading 3  Heading 4 Heading 5 Heading 60 List Heading0 List Contents ReferencesText body indentListCaption IndexAbstractdtdd FootnoteExample Internet linkVisited Internet LinkFootnote SymbolBullet Symbols:TeletypeNumbering Symbols Footnote anchor Line numbering Endnote Symbol Frame Graphics OLE FormulaRoot 45 Column 46 Column 47 Column 48 Column 49 Column 50 Column 51 Column 52 Column 53 Column 54 Column 55 Column 56 Column 57 Column 58 Column 59 Column 60 Column 61 62 63Standard First Page Right Page Left PageHTMLFootnoteEndnote  Illustration Table TextDrawingd p1!%4 E'5*&jK standard.dic(sun.dic@ soffice.dicX IgnoreAllListY F.Y G.Y H.Y I.6NT;$ An Open Framework for ClusteringSA P?T$SAlan Robertson - IBM Linux Technology Center - / SA @A8Sd A8 A8 ,A8 ,ST6$ABSTRACTS#A  A A @T$One of the most commonly identified features which is felt to be necessary for LinuxTM to be considered "enterprise-ready" is High-Availability. High-Availability (HA) systems provide increased service availability through clustering techniques. Clustering is also used to create High-Performance Clusters (HPC). These two different techniques have many things in common, but little has been done to unify them.A8 A8TV:eT$In the case of HA clusters, several open source high-availability projects have been created. These projects were created independently for complex historical reasons, not because of political, philosophical or licensing differences. Because of this, they originally shared no code at all. This minimized the benefits of the open source model, which both encourages and benefits from the sharing of common components. However, many of them share the need for a component for resetting cluster members. A component was created for filling this need with the specific intent of being a common component across open HA systems. This was successful beyond all expectations, and this component has become standard across most open source HA systems. All of the projects involved have benefitted from this commonality. Although this component is currently used primarily in HA clusters, HPC systems also have need of this capability.T$In light of this success, the author began to search for more ways to extend these benefits across a broader set of cluster infrastructure components. Towards this end, we have created an architecture for an open cluster framework for both HA and HPC systems, and have begun to implement it. Although this framework has its origins in High-Availability systems, it is engineered to be applicable to high-performance clusters as well. This paper provides some background on open source clustering, defines this cluster framework, outlines its goals, describes key design elements of this framework, and details progress in implementing them.ISection1sp=SeA PAPxxA  PA/PdxhhA PNkIbSection2sp=>SeA PAPA  PA/PdhU V A PNݑT$ IntroductionT9$.High-Availability failover techniques are commonly viewed as critical components in the set of enterprise capabilities of an operating system. In this regard, Linux is no exception, and it is well understood that these capabilities are essential for its wide acceptance in the enterprise server arena.T$As a result, several different open source High-Availability projects have come into being. For the most part, these projects are separate for historical reasons, not for philosophical, political, or licensing reasons. Each of them has their own user community whom they faithfully support, and who expect to continue to receive support. They are willing to share components and software, but they find it difficult to do so, because they have no common infrastructure or assumptions which would enable this sharing. The STONITH module is an exception, since it was designed from the beginning to be a common component. It is the success of this common component which has prompted the work being described.T$It should be understood that every cluster requires a common set of capabilities. These capabilities are often be implemented in different ways, but fundamentally serve the same purpose.T$If these components were fit into a common component framework, then they could be assembled into many different cluster systems, each of which would fulfill different requirements in uniquely different ways. For example, one could imagine a small, embedded cluster system with minimal capabilities, or a large heavy-duty clustering system, and many in between which could be assembled out of components which fit into this framework.T)$In many respects, this creates a new class of configurable clustering systems, capable of meeting many different classes of needs well. Each of the users of the framework could select components from the project framework set, and assemble them into clusters which uniquely meet the needs of their particular customers. In many ways, this provides the best of the proprietary and open worlds - allowing competitors to differentiate themselves from each other, yet at the same time allowing them to share infrastructure for those components which do not have to be customized to meet the unique needs of their particular customer set. This maximizes both the benefits of the open development model, and the opportunities for competition in meeting the needs of unique market and technology nichesT$It also provides an ideal vehicle for research into cluster systems - since it allows researchers to concentrate on their area of interest and simply use components from the framework which they need for their research, but are not their area of concentration.T$ BackgroundT$As was mentioned earlier, there are several different Open Source clustering systems which were developed independently. However, they are all licensed under the GNU GPL. The author was involved with the development of two of them at one point in time, and it became clear that both needed a Linux implementation of a reset mechanism. So, a class of loadable modules was created for a reset mechanism. This proved to be quite successful, and has been contributed to by several different HA projects, not just the original two. The author actually only wrote one of the modules, and around a half-dozen more have been written and contributed by several different organizations.T$This allows all the projects to avoid re-inventing the wheel, and maximizes the benefits of the open source model, as the "eyeballs" and developers of the community are spread across fewer lines of code, giving a more full-featured, better result. Although this subsystem was small and its goals modest, it was remarkably successful, and has become the de-facto Linux standard for resetting cluster nodes.Tm$bThere were several things which contributed to making it so strikingly successful. These include:T/$Clean design. The design of the modules was clean, simple, easy-to-understand, and separate from involvement with the surrounding software. This made it an obvious choice to adopt, and easy to integrate into various cluster systems.S2APdddA @ 3321480313T$FLoadable modules (plugins). Since any given installation typically only used one reset mechanism at a time, having a module loading system made for cleaner system interactions, and has minimized system bloat. This has also led to a largely object-oriented approach to these modules, which enhanced the clarity of the design.S2APdddA @ 3321480313TG$ However, there is one notable weakness in the current STONITH system. The configuration of STONITH objects is awkward, and largely ill-suited for user-friendly configuration using a GUI. This results from a design decision to use an unstructured string to represent the data to configure the STONITH objects. This data required to configure STONITH objects (and indeed, most objects) requires richer semantics than a simple unstructured string to properly communicate the structure of this information to a GUI or other configuration system.SAPdddT$In addition to the lessons learned from the STONITH libraries, there is another attribute of the heartbeat software which was influential in this design as well. In heartbeat, all messages are sent as name-value pairs similar to the UNIX shell environment. As was explained in [ROB00], this helps significantly both with portability across machine types, and with cluster version management interactions. However, it is limited, in that it is incapable of dealing with more complex structured data such as lists. As a result, it is not powerful enough to serve as a unifying mechanism in a general cluster framework, or powerful enough that all clients could use it for their control messaging..SAPdddw T$Creating A FrameworkT{$The term "framework" was chosen instead of the word "design" for this project. The term framework is used here to mean a collection of common infrastructure components and APIs which permit one to create a cluster system out of components which fit in (or conform to) the framework.SUAPdddAA AT$rIn my view these APIs and infrastructures should be highly neutral (or agnostic) towards all the following things:SUAPdddAA AT$GProcessor architectures, Operating systems, environments, and versions.SkAPdddAA AA @ 3441081913T$Programming languagesSkAPdddAA AA @ 3441081913T$ICluster implementation strategies (shared storage, shared-nothing, etc.).SkAPdddAA AA @ 3441081913T$HA versus HPC clusteringSkAPdddAA AA @ 3441081913T$MIt is important to note that the infrastructure itself should not implement any cluster capabilities - but provide a context for creating cluster capabilities. This is what allows the framework to be agnostic with respect to cluster implementation strategies. Note that some of the infrastructure may interact closely with or require certain cluster components. For example they might use or require basic communication services. These interactions are acceptable (and often necessary), but can increase the software size/weight/complexity of the smallest possible working cluster node.SUAPdddAA AT$Although the framework itself will maintain this strict neutrality, it is not necessary particular implementations of cluster components retain this neutrality. For example, it is acceptable to implement a cluster component which uses a driver module which is only available on Linux on Power PC platforms. What is important is that the API is designed so that some form of its function can be performed in every target environment.SUAPdddAA AT$bSome cluster components may only be usable in a master/slave environment, while others may assume an equal peerage arrangement. Such diversity is sometimes necessary in the implementation of components. However, it is not acceptable for these implementation choices to show through to APIs which define its interactions with the other cluster elements.SUAPdddAA AT$GoalsTi$^The goals of this framework have been heavily influenced by the experiences described earlier.T$kEncourage sharing of components, and increase the number of shared subsystems beyond the STONITH subsystem.S2APdddA @ 3340410313T$Allow each of the various HA projects to use components from the framework without requiring them to abandon their current customer set.S2APdddA @ 3340410313T$tAllow the components to be adopted individually, with minimal expectations regarding the remainder of the framework.S2APdddA @ 3340410313T{$6Keep the design of individual component interfaces clean and "value-neutral". Make sure the APIs do not assume things about the rest of the cluster system and its implementation. In particular, avoid taking sides on implementation methods and techniques (like shared storage, versus shared-nothing clusters).S2APdddA @ 3340410313TN$ Make extensive use of loadable modules for framework components and provide a single, general module loading environment. This is essential given the desire to allow solutions to be assembled from sets of components which come from a diverse range of environments.S2APdddA @ 3340410313T $Provide a common infrastructure for configuring modules which does not require adding new code to the GUI to configure new types of plugins. I use the term self-configuring plugins for this property.S2APdddA @ 3340410313T$RUse a simple ASCII-based marshalling/demarshalling method which allows for arbitrarily complex hierarchical structures. An XML subset based on XML-RPC is the currently our preferred choice.. This provides the advantages attributed to name,value pairs, yet also allows for sending of much more complex structured data, at the cost of larger and more complex code for encoding and decoding of messages. This complexity is potentially formidable. If XML is used for all messages, then it will become necessary to lock the XML encoding/decoding libraries in memory. Most libraries which support full XML are many times larger than the entire heartbeat cluster system. Some are fully 10 times larger. Locking such a large piece of code into memory is undesirable. However, a simple subset of XML can be easily parsed in a few dozen kbytes of code.S2APdddA @ 3340410313w % T&$All core software will be written in 'C' - not C++ or an interpreted language. Heartbeat (which is written in 'C') is small and lightweight, has been extremely stable with virtually no memory leaks or other stability issues.S2APdddA @ 3340410313T$$Infrastructure ComponentsT$There are several components of the infrastructure of this framework which need to be completed before ork on the components can take take place in full force.wgTP$Marshaling / demarshalingSAd A w Tt$9This subsystem takes data structures in memory, translates them into a string fomat for transmission to another process or storing in a file, and conversely takes this string and translates it back into data structures. The initial implementation is oriented towards translating the Glib collection data structures to and from XML, but the design will allow plugging in encoders/decoders for any kind of data structure into the infrastructure, and any marshaling/demarshaling method. Formats which (like XML) preserve the version compatibility features are preferred.SAd A wN T$The XML decoding (parsing) code will work only with a restricted subset of XML. In some ways, XML is a bit like Perl: "There's more than one way to express it in XML". Our subset is grammatically simple, yet powerful enough, without so many possible ways of expressing information, yet retaining the ability to represent any kind of data which can be structured hierarchically (as XML requires).SAd A wqTw$OBecause the Glib collection data structures allow arbitrary types for their elements, it is necessary to use types which have a type wrapper around them in order to tell whether a list element is a string or an associative array (hash table) or another list. The term "wrapped structures" will be used here for this kind of structure.SAd A TL$The sample text in the paragraphs following illustrate one possible encoding of the data. A current alternative favorite is the data representation used by XML-RPC. Below is an example of a Glib Glist of strings as encoded into XML-RPC format:A8A8d A8 A8 wTW$5subsystem/methodw * T.$w T/$$firstT$T*$secondT$T($lastT-$SA PA PT$T!$wT4$w TJ$S?A PA PA PAd A T$Below is an example of a Glib GHashTable associative array with strings for keys and values. This example is encoded using the XML-RPC format:A8 (w T&$ w T7$subsystem/method2w T'$ w TM$SAd w T;$linux-haSAd TT$6linux-ha.org/SAd TJ$failsafeSAd wTs$Foss.sgi.com/projects/failsafeSAd w$TN$!kimberlite">SAd w Tu$Doss.mclinux.com/projects/kimberlite/SAd w( T'$ SAd T6$ SAd wT5$SAd wT6$ SAd wT:$ A8 d w T$This marshaling/demarshaling will be implemented as a loadable module, allowing for choice in these matters, including non-XML representations.w T6$Module LoadingSAd A T$hThis subsystem has the job of loading and unloading modules, and registering and unregistering plugins. SAd A wQ T$fThe term module and plugin are often used interchangably, but have distinct meanings in this document.SAd A w*T$ In this paper, we use the term module to mean a shared library which can be loaded at run time and invoked. From the point of view of the module loading software, all modules are basically identical, and are treated as identical.SAd A T$lThe term plugin is used here to denote a set of capabilities which a module registers with the PluginHandler for their particular plugin type. This set of capabilities is the same for every plugin of a given plugin type, but different from the capabilities of a different plugin type. It is these capabilities (or APIs) which define the components of the system.SAd A A8_lw_ Tm$There is only one built-in plugin type, the PluginHandler plugin type. It registers PluginHandlers. Each PluginHandler then registers itself as being the handler for those types of plugins which it is prepared to manage. For example, if a module implements a STONITH plugin, it registers itself with the PluginHandler which manages STONITH plugins. If this PluginHandler is not already loaded, it is then loaded automatically. If there is no such PluginHandler, registration of the plugin will fail.SAd A A8,9A8UdA8kxw#, Uk 3 i  T$Each loadable plugin exports only those functions which are defined by its API, because the module loading system implements explicit interface exporting.SAd A T:$Self-ConfigurationSAd A T%$Many plugins will require configuration for proper operation. Most of these plugins will use the self-configuration API to obtain their configuration information. This API allows a plugin to present information to a user interface program to allow it to collect the information and provide it back to the plugin so it can be properly instantiated. Combined with the basic plugin capabilities, powerful sets of self-configuring objects can be added to the system without writing new user interface software.SAd A Tm$EThe self-configuration API provides the following basic capabilities:SAd A T$Configuration Metadata querySKAPdddAd A A @ 3421203013wTz$Configuration default query.SKAPdddAd A A @ 3421203013T$#Construct object with ConfigurationSKAPdddAd A A @ 3421203013Ty$Current Configuration QuerySKAPdddAd A A @ 3421203013Ty$Modify object configurationSKAPdddAd A A @ 3421203013Tg$?Each of these different capabilities will be discussed in turn.SAd A T6$Configuration Metadata querywT$This is the most complex and interesting of the capabilities. The API return result is a set of metadata describing the information which must be provided to configure an object of the type in question.SAd A T${From the top level view, the metadata is structured as a list of fields, each of which has a variable number of attributes.SAd A Th$@For example, any given field will have some of these attributes:SAd A T$*fieldname - the internal name of the fieldSKAPdddAd A A @ 344757301A8 3w T$alabel - the user-visible label for the field. This is returned according to the requested localeSKAPdddAd A A @ 344757301A83T$-isarray - true if the field is an array fieldSKAPdddAd A A @ 344757301A83wT6$class - simple or struct. Simple is the norm, but struct indicates that the field is a repeating field. In this case, the aggregatetype fieldset is an array of field values.SKAPdddAd A A @ 344757301A83w3| T $basictype - the lowest-level (or most primitive) type of the field. Examples of basic types are string, boolean, integer, enumeration, etc.SKAPdddAd A A @ 344757301A8 3w iT$7specialtype - the most semantic-rich type for the fieldSKAPdddAd A A @ 344757301A8 3w T$?length - number of displayable characters allowed in the field.SKAPdddAd A A @ 344757301A83T$5regex - a regular expression for validating the fieldSKAPdddAd A A @ 344757301A83wT$short_text - a short text explanation of the field, suitable for popping up automatically. It is provided for the requested locale.SKAPdddAd A A @ 344757301A8 3 T$long_text- a short text explanation of the field, suitable for bringing up on demand. It is provided for the requested locale.SKAPdddAd A A @ 344757301A8 3 T$2minval - the minimum allowable value for the fieldSKAPdddAd A A @ 344757301A83 wT$2maxval - the maximum allowable value for the fieldSKAPdddAd A A @ 344757301A83 wT$Kenumset - a list of all possible values which the field is allowed to have.SKAPdddAd A A @ 344757301A8K3 wT$Qfields - an array of metadata fields making up the fields of a structured value.SKAPdddAd A A @ 344757301A8Q3T$mAs an example, if you have a field named IP which is an IPv4 address, it might have the following attributes:A8mA8md TY$(fieldname, "IP")S.APddAd A w TN$(label, "IP address")S.APddAd A T]$(basictype, "string")S.APddAd A w T]$(specialtype, "IPv4")S.APddAd A w TE$ (length, 15)S.APddAd A Tq$)(regex, "^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]")S.APddAd A wTc$*(short_text, "IP address of power switch")S.APddAd A T$I(long_text, "Enter the IP address assigned to the BayTech power switch")S.APddAd A w3T($SAd A T$The purpose for having two different field types is to allow configuration programs to support configuring modules which use a newer version of the API spec than the GUI implements. The expectation, is that the GUI will fall back to the primitive type and other fields (such as the length, and regex) if it doesn't recognize the high-level type to allow it to do the best job it can.SAd A w'T$In addition to the simpletypes described above, a data element can be an array of simpletypes, or an array of structures. If an information item is an array, the following attributes will be supplied.:SAd A w S T$Dminelem - the minimum number of elements which allowed in the array.SKAPdddAd A A @ 3544410113wT$>maxelem - the maximum number of elements allowed in the array.S2APdddA @ 354441011A8>A8>d 3wT>$If an item has a class of struct, then only the fieldname, label, isarray, class, shorttext, longtext, and fields attributes will be used. If the item is also an array (meaning an array of structures), then the array attributes will also be used.SAd A w0 BR ]T>$Remote Procedure CallsSAd A Tc$;In a cluster, there is a need for a higher-level paradigm for communication than simply sending messages. The applications have many such forms as described by the Application-Level APIs. However, the internals of the cluster itself also need to communicate with each other, and with administration utilities etc.SAd A T$[The framework will use several related forms of remote procedure calls to provide this structure. However, we still want all the version-compatibility features that XML provides. We will use the data formats described by the XML-RPC specification, and adapt them for use in a cluster. There will be three supported forms of RPC in this cluster.SAd A T$B"Normal" RPC - a single request sent to a single node in the cluster with a mandatory response with a completion result - as described in the XML-RPC specification. Rather than relying on HTTP for the transport as described in the specification, we will use the cluster basic communication services for message transport.SKAPdddAd A A @ 2625371713T$"Multicast" RPC. In this case, a single request is sent to a set of machines in the cluster (or the whole cluster) for each to interpret.SKAPdddAd A A @ 2625371713T/$"Normal" RPC through an separately authenticated external communication channel to a process which can be outside the cluster. This will allow us things like remote administration and monitoring capabilities.SKAPdddAd A A @ 2625371713T $For RPC within the cluster, some kinds of calls will not require a mandatory response with a completion result. This is an essential extension to be able to perform certain kinds of cluster operations (notably leader election).SAd A T"$These few simple extensions will allow cluster components and cluster-aware applications to use a simple RPC paradigm for their operations. This will give a simple conceptual model to use as the basis for implementing higher-level APIs and services.SAd A T?$Local Client-Server APISAd A T$pThis local client-server API provides authenticated access to such services as cluster components wish to provide to other local applications. It provides client registration services, authentication services, and integrates with the basic RPC services as described above. It enables building of APIs which allow processes to receive services from cluster components.SAd A A8pT$$Cluster System ComponentsT*$General API goalsSA T$The design of the good APIs is probably the most difficult and underestimated activity in the project. It is the authors' experience, that there are very few really good APIs.Tj$_The following goals are common to the design of all the APIs implemented by all the components:T $Implementation hiding - APIs should generally not require any particular method of implementation, or unnecessarily reveal or restrict properties of the components which implement it or other APIs.S2APdddA @ 4211290213T$Generality - the API itself should reflect the nature of the concept it embodies, not the restrictions of a particular implementation.S2APdddA @ 4211290213T($Simplicity - Each API should be as simple as it can to provide the needed function, but no simpler. This is parallel to Albert Einstein's famous quote about theories: A theory should be as simple as it can be, but no simpler.S2APdddA @ 4211290213T$nData (structure) hiding - APIs should not reveal the contents of data structures unnecessarily to their users.S2APdddA @ 4211290213T3$Object orientation - Most APIs should reflect natural and obvious system concepts and objects and the operations which are useful to perform on them. If a set of interfaces does not, it may not deserve to be elevated to a documented API.S2APdddA @ 4211290213T$IDecomposition - Sometimes an object should be decomposed into two layers, so that the bottom level layer can be easily understood and implemented. For example, in the system reset feature, two layers help the implementation - one to directly reflect the hardware capabilities, and one to reflect the needs of the cluster system.S2APdddA @ 4211290213TG$Request identifiers - some APIs functions can easily be changed from being several different requests with the same parameter to a single request with a function code (opcode). This makes the API extensible, and easily added to in the future.S2APdddA @ 4211290213wT$Binary encoding - APIs should avoid encoding information into bits in the interface. Of course, component implementations are free to do this as much as they wish.S2APdddA @ 4211290213T$Language/machine independence - APIs should be independent of OS, language and machine type, and the API itself should not assume a homogeneous cluster configuration.S2APdddA @ 4211290213 T$A few additional thoughts on writing good interfaces can be found in the libtool manual: : http://www.gnu.org/software/libtool/manual.html#Library%20tipswIb w TF$Initialization / ConfigurationSAd A T?$Basic CommunicationA8A8d T$The basic communication services will provide guaranteed packet delivery and packet content authentication to cluster modules, along with basic node status services. In the normal course of events, packets are encoded by the encoding plugin, and signed using an authentication plugin before being sent over the wire. The initial communication plugin module will likely be based on the heartbeat code.T?$Authentication ServicesSAd A T $The authentication services which the cluster requires are to be able to digitally sign packets, and also to authenticate them. Several plugins of authentication services will likely be based on heartbeat's authentication code.SAd A T]$"Marshalling/Demarshalling servicesSAd A w TY$1Packet Encoding (compression/encryption) servicesSAd A T;$Membership ServicesSAd A T6$Group ServicesSAd A T:$Cluster ManagementSAd A T;$Resource ManagementSAd A T;$Resource MonitoringSAd A T>$Application-Level APIsSAd A TF$GUI Configuration / MonitoringSAd A TA$Higher-Level cluster APIsSAd A T$Several different higher-level cluster APIs will be available for cluster-aware applications to use. This will likely include:T$zOrdered messaging - guarantee that every member of the cluster receives the messages in a message stream in the same orderS2APdddA @ 4257160213T$Barrier services - guarantees that every member of the cluster has acknowledged arriving at a barrier before any are allowed to pass it.S2APdddA @ 4257160213T$Transactions - guarantees that the entire cluster performs a transaction together or not at all. Variations include 2-phase commit transactions, and n-phase transactions.S2APdddA @ 4257160213Tz$5RPC - client-level cluster RPC will also be provided.S2APdddA @ 4257160213T$Implementation PlanT$xThe current thinking about the general plan of attack for completing the implementation of this framework is as follows:Th$#Implement Infrastructure ComponentsS2APdddA @ 4224550313Ty$4Convert STONITH modules to self-configuring plugins.S2APdddA @ 4224550313T$>Implement FailSafe (RHINO) interface to self-configuration APIS2APdddA @ 4224550313T$Tune and adjust infrastructure as a result of experience gained above. This will act as a proof-of-concept for the framework infrastructure.S2APdddA @ 4224550313T$jDefine APIs for other components, and implement them - starting with communication and cluster membership.S2APdddA @ 4224550313T $Implementation StatusT}$rAt this writing, the module-loading infrastructure and the XML encoding/decoding are well underway, and should be completed before the final copy of this paper is submitted. The design of the self-configuring object system is underway, and should be completed, and implementation begun before the final copy of this paper is submitted, and hopefully a lot more as well.T$ Future PlansT$Talk here about the plan to involve the community, and briefly discuss the project plan highlight the URL, and to do list on the web and how it's open to everyone, and we hope to get lots of people participating in the project.T$ ConclusionsT/$AcknowledgmentsA833T)$Colorado School of Mines, the XML students, and lots and lots of other folks... David Brower initially advocated RPC as a basic paradigm and helped clarify the issues surrounding basic communication and the RPC paradigm. Also thanks to Rusty of SGI.SA33wVT$hEverything from here to the end is not right yet ;-) This is mostly text from my last year's ALS paper.SA PT $T$ To Learn MoreT$sThe Linux-HA web site can be found at [Rob01]. Heartbeat can be downloaded (in source or RPM format) from the Linux-HA web site download page at: http://linux-ha.org/download/. Information on subscribing to the various Linux-HA mailing lists can be found on the contact page at: http://linux-ha.org/contact/. The Linux FailSafe project is described in detail in [Vas00].A8&-A8 /8A88 http://linux-ha.org/download/A78 4http://linux-ha.org/contact/A8krT*$ ReferencesA8 33T$[Milz99] Milz, Harald: "The Linux High Availability HOWTO". http://metalab.unc.edu/pub/linux/ALPHA/linux-ha/High-Availability-HOWTO.htmlAg8 =Lhttp://metalab.unc.edu/pub/Linux/ALPHA/linux-ha/High-Availability-HOWTO.htmlA8=wDT$Y[Phi98] In Search of Clusters, by Gregory F. Pfister, 2nd Edition 1998, Prentice Hall PTRSA A8 A879:eT$\[Ball00] Ballinger, Rusty, "The RHINO GUI Infrastructure. http://oss.sgi.com/projects/rhino/SA A8 9A<8 9\!http://oss.sgi.com/projects/rhinow T$W[Rob01] Robertson, A. L.,: "The High-Availability Linux Project". http://linux-ha.org/A/8 CWhttp://Linux-ha.org/T$g[Twe00] Tweedie, S. C.,: "Barrier Operations". http://linux-ha.org/PhaseII/WhitePapers/sct/barrier.txtAR8 0g7http://linux-ha.org/PhaseII/WhitePapers/sct/barrier.txtwDL X\ T$X[Vas00] Vasa, M.,: "The Linux Fail Safe Project". http://oss.sgi.com/projects/failsafe/A@8 3X%http://oss.sgi.com/projects/failsafe/ wOT $T$KNeed to add references for XML-RPC, for OMS plugins, Kimberlite, Glib, etc.A8 K w5 TJ$*Final dummy Alan Robertson the last referSAJGeneric PrinterSGENPRT PostScriptH`Tl`Tld,,lprdefault_queueSGENPRT7 U#V88/x?D /=U$2P??ªU$2?B$EBE$E$*B$bx$B$U*x<|B $D U*x< B$$U*x<: "!$$S>n"G?T"?b bf bxf z"UGIE?@\"E?b$  6 b$  x b.bx @>/=U$2p??³U$2?ӟ"$2S>ͨ"G22?"2?b$  ' Jb$ x^]bxfdb!eb#  s"EGI2E2?@"E2?b^ub x$  bp 6xbx|bx: bLZ x  b)x: М}/=U$2P??|U$2?h"$2S>("G22?"2? bfbfbkfb6xb"LxbnbxbLxbbxb~D x b' x$ +"EGI2E2?@"E2? b:! b:"b? #b6x$bN x%bn6x&bxP'bl xf(bJ' x)bT,x|*%/=U$2p??U$2?"$2S>"G22?"2?b*bx|+b,b-#b\.$b2  /2b+xf09"EGI2E2?@"E2?b02;bxP1Cb x|2IbUY~3KbUVY4LbUVY5MbUVY6NbUVY7ObUVY8Pb UVY9Qb"UVY:Rb7UVY;SbLUVY<Tba3UVY=UbV>VbV?ZЇ%/=U$2P??fU$2?R"$2$2S>̈́"G22?j"2?bUW@[bUVWA\bUVWB]bUVWC^bUVWD_b1UVW~EabUVWFbbUVW~Gdb UVWHeb UVW~Igb UVWJhb UVWKibUVWLjbUVWMkb+3UVWNlb^VOpb\Pqb2Qtb 6xRwbAx|S}b5%x:T b- xUͲ"UGI2E2?@”"E2?bPUbP xVbp WbF   X bYLxYbbxZbbx[bibx\bbx]b-!bx^b"Lx_b$`b&ab,6xbb/Lxcе%:/=U$2p??”U$2?Ӏ"$2S>W"G22??"2?bdb6xeb bxfb xgb  xhb LxibLxjbZLxkb xlb6xmbLxnbHLxob Lxpb"Lxqb,%6xrbb(bxsb)tb*ub+vb,wbl-xbV.yb@/zb1{"EGI2E2?@"E2? b:| b: x}bD Lx~bLxbx|bbkPb"xPb)xP%y/=U$2P??«U$2?ӗ"$2S>e"G22?M"2? bbx|!b  x&bx|,b-bpU : 6b7b!8b# x<b'6x?b%+ xDb/0LxF"EGI2E2?@"E2? bVHbx|Nb6xQb x|WbxP_bx|eb xjb# xob( xs%/=U$2p??²U$2?Ӟ"$2S>"G22?"2?btbub? $ b& b|bbibjbkblbmbn!bo#bp%bq'br)bK- x͟"EGI2E2?@ƒ"E2?bb xbLxb b bbxb>LxbLxb xb6xb,b1 U : b)b+|%/=U$2P?? U$2?Ӯ"$2S>l"G22?T"2?bb0b0|-"EGI2E2?@"E2?%6/=U$2p?? U$2?"$S>"G?©"?b?b?0bo0bU : bb"EGIE?@"E?bb xb6xb  xb4 xbTbxbLxbbxZRoot Entry ®`VCompObj<Ole persist elements" SfxDocumentInfo uStarBASIC BasicManager2 4SfxWindows>SwNumRuleshSfxStyleSheetsFStandardjSummaryInformation( SwPageStyleSheets$ 1StarWriterDocument&5