; = Dc@s dZdklZdZdZdZdZdZdkl Z l Z dk Z dk Z dkZe i d e_d efd YZeieZd fd YZdefdYZdeefdYZdeefdYZdefdYZdZdZdee fdYZdefdYZdefdYZdefdYZd efd!YZ d"efd#YZ!d$efd%YZ"d&efd'YZ#e$d(jo-dk%Z%ee%i&i'Z(e(i)GHndS()s/Beautiful Soup Elixir and Tonic "The Screen-Scraper's Friend" v2.1.1 http://www.crummy.com/software/BeautifulSoup/ Beautiful Soup parses arbitrarily invalid XML- or HTML-like substance into a tree representation. It provides methods and Pythonic idioms that make it easy to search and modify the tree. A well-formed XML/HTML document will yield a well-formed data structure. An ill-formed XML/HTML document will yield a correspondingly ill-formed data structure. If your document is only locally well-formed, you can use this library to find and process the well-formed part of it. The BeautifulSoup class has heuristics for obtaining a sensible parse tree in the face of common HTML errors. Beautiful Soup has no external dependencies. It works with Python 2.2 and up. Beautiful Soup defines classes for four different parsing strategies: * BeautifulStoneSoup, for parsing XML, SGML, or your domain-specific language that kind of looks like XML. * BeautifulSoup, for parsing run-of-the-mill HTML code, be it valid or invalid. * ICantBelieveItsBeautifulSoup, for parsing valid but bizarre HTML that trips up BeautifulSoup. * BeautifulSOAP, for making it easier to parse XML documents that use lots of subelements containing a single string, where you'd prefer they put that string into an attribute (such as SOAP messages). You can subclass BeautifulStoneSoup or BeautifulSoup to create a parsing strategy specific to an XML schema or a particular bizarre HTML document. Typically your subclass would just override SELF_CLOSING_TAGS and/or NESTABLE_TAGS. (s generatorss*Leonard Richardson (leonardr@segfault.org)s2.1.1s$Date: 2004/10/18 00:14:20 $s*Copyright (c) 2004-2005 Leonard RichardsonsPSF(s SGMLParsersSGMLParseErrorNs[a-zA-Z][-_.:a-zA-Z0-9]*sNullTypecBshtZdZdZdZdZdZdZdZdZ dZ d Z d Z RS( sSimilar to NoneType with a corresponding singleton instance 'Null' that, unlike None, accepts any message and returns itself. Examples: >>> Null("send", "a", "message")("and one more", ... "and what you get still") is Null True cCstSdS(N(sNull(scls((s./Weather/BeautifulSoup.pys__new__DscOstSdS(N(sNull(sselfsargsskwargs((s./Weather/BeautifulSoup.pys__call__EscCstSdS(N(sNull(sselfsattr((s./Weather/BeautifulSoup.pys __getattr__GscCstSdS(N(sNull(sselfsitem((s./Weather/BeautifulSoup.pys __getitem__HscCsdS(N((sselfsattrsvalue((s./Weather/BeautifulSoup.pys __setattr__IscCsdS(N((sselfsitemsvalue((s./Weather/BeautifulSoup.pys __setitem__JscCsdSdS(Ni((sself((s./Weather/BeautifulSoup.pys__len__KscCstgSdS(N(siter(sself((s./Weather/BeautifulSoup.pys__iter__NscCstSdS(N(sFalse(sselfsitem((s./Weather/BeautifulSoup.pys __contains__OscCsdSdS(NsNull((sself((s./Weather/BeautifulSoup.pys__repr__Ps( s__name__s __module__s__doc__s__new__s__call__s __getattr__s __getitem__s __setattr__s __setitem__s__len__s__iter__s __contains__s__repr__(((s./Weather/BeautifulSoup.pysNullType9s          s PageElementcBs@tZdZeedZehedZeZeheedZehedZ e Z eheedZ ehedZ eheedZ e ZehedZeZeheed Zehd ZeZehed Zd Zd ZdZdZdZdZdZdZRS(seContains the navigational information for some part of the page (either a tag or a piece of text)cCsk||_||_t|_t|_t|_|io |iio#|iid|_||i_ndS(sNSets up the initial relations between this element and other elements.iN(sparentsselfsprevioussNullsnextspreviousSiblings nextSiblingscontents(sselfsparentsprevious((s./Weather/BeautifulSoup.pyssetupWs     cCs|i|i|||SdS(sjReturns the first item that matches the given criteria and appears after this Tag in the document.N(sselfs_firsts fetchNextsnamesattrsstext(sselfsnamesattrsstext((s./Weather/BeautifulSoup.pysfindNextcscCs |i|||||iSdS(sdReturns all items that match the given criteria and appear before after Tag in the document.N(sselfs_fetchsnamesattrsstextslimits nextGenerator(sselfsnamesattrsstextslimit((s./Weather/BeautifulSoup.pys fetchNextiscCs|i|i|||SdS(s{Returns the closest sibling to this Tag that matches the given criteria and appears after this Tag in the document.N(sselfs_firstsfetchNextSiblingssnamesattrsstext(sselfsnamesattrsstext((s./Weather/BeautifulSoup.pysfindNextSiblingnscCs |i|||||iSdS(sqReturns the siblings of this Tag that match the given criteria and appear after this Tag in the document.N(sselfs_fetchsnamesattrsstextslimitsnextSiblingGenerator(sselfsnamesattrsstextslimit((s./Weather/BeautifulSoup.pysfetchNextSiblingstscCs|i|i|||SdS(skReturns the first item that matches the given criteria and appears before this Tag in the document.N(sselfs_firsts fetchPrevioussnamesattrsstext(sselfsnamesattrsstext((s./Weather/BeautifulSoup.pys findPreviousyscCs |i|||||iSdS(scReturns all items that match the given criteria and appear before this Tag in the document.N(sselfs_fetchsnamesattrsstextslimitspreviousGenerator(sselfsnamesattrsstextslimit((s./Weather/BeautifulSoup.pys fetchPrevious~scCs|i|i|||SdS(s|Returns the closest sibling to this Tag that matches the given criteria and appears before this Tag in the document.N(sselfs_firstsfetchPreviousSiblingssnamesattrsstext(sselfsnamesattrsstext((s./Weather/BeautifulSoup.pysfindPreviousSiblingscCs |i|||||iSdS(srReturns the siblings of this Tag that match the given criteria and appear before this Tag in the document.N(sselfs_fetchsnamesattrsstextslimitspreviousSiblingGenerator(sselfsnamesattrsstextslimit((s./Weather/BeautifulSoup.pysfetchPreviousSiblingsscCs8t}|i||d}|o|d}n|SdS(sOReturns the closest parent of this Tag that matches the given criteria.iiN(sNullsrsselfs fetchParentssnamesattrssl(sselfsnamesattrsslsr((s./Weather/BeautifulSoup.pys findParents cCs |i||t||iSdS(sFReturns the parents of this Tag that match the given criteria.N(sselfs_fetchsnamesattrssNoneslimitsparentGenerator(sselfsnamesattrsslimit((s./Weather/BeautifulSoup.pys fetchParentsscCs8t}||||d}|o|d}n|SdS(Nii(sNullsrsmethodsnamesattrsstextsl(sselfsmethodsnamesattrsstextslsr((s./Weather/BeautifulSoup.pys_firsts cCswt|d ohd|<}ng}|} x9to1y| i} Wntj oPnXt } t | t o| o| p|i| |oft} xH|iD]:\} }| i| }|i|| o t} PqqW| o | } qqq3n)|o!|i| |o | } q3n| o0|i| |ot||joPqjq6q6W|SdS(s8Iterates over a generator looking for things that match.sitemssclassN(shasattrsattrssresultss generatorsgsTruesnextsis StopIterationsNonesfounds isinstancesTagstextsnamesselfs_matchessmatchsitemssattrs matchAgainstsgetschecksFalsesappendslimitslen(sselfsnamesattrsstextslimits generatorsresultsschecks matchAgainstsmatchsattrsgsisfound((s./Weather/BeautifulSoup.pys_fetchs@    ccs&|}x|o|i}|Vq WdS(N(sselfsisnext(sselfsi((s./Weather/BeautifulSoup.pys nextGenerators  ccs&|}x|o|i}|Vq WdS(N(sselfsis nextSibling(sselfsi((s./Weather/BeautifulSoup.pysnextSiblingGenerators  ccs&|}x|o|i}|Vq WdS(N(sselfsisprevious(sselfsi((s./Weather/BeautifulSoup.pyspreviousGenerators  ccs&|}x|o|i}|Vq WdS(N(sselfsispreviousSibling(sselfsi((s./Weather/BeautifulSoup.pyspreviousSiblingGenerators  ccs&|}x|o|i}|Vq WdS(N(sselfsisparent(sselfsi((s./Weather/BeautifulSoup.pysparentGenerators  cCs,t|ot|t oDx9|D]1}t|to|i||ot Sq%q%Wt Snt |o||Snt|to |i }nt|t  ot|}nt|do|i|Snt|o||jSnt|do|i|Snt||jSdS(Nsmatchsitems(sisListschunks isinstancesTagstags NavigableTextsselfs_matchess howToMatchsTruesFalsescallablesnames basestringsstrshasattrssearchshas_key(sselfschunks howToMatchstag((s./Weather/BeautifulSoup.pys_matchess&#    (s__name__s __module__s__doc__sNullssetupsNonesfindNexts firstNexts fetchNextsfindNextSiblingsfirstNextSiblingsfetchNextSiblingss findPreviouss fetchPreviouss firstPrevioussfindPreviousSiblingsfirstPreviousSiblingsfetchPreviousSiblingss findParents firstParents fetchParentss_firsts_fetchs nextGeneratorsnextSiblingGeneratorspreviousGeneratorspreviousSiblingGeneratorsparentGenerators_matches(((s./Weather/BeautifulSoup.pys PageElementSs2    #     s NavigableTextcBstZdZRS(NcCs2|djo|Sntd|ii|fdS(s7For backwards compatibility, text.string gives you textsstrings!'%s' object has no attribute '%s'N(sattrsselfsAttributeErrors __class__s__name__(sselfsattr((s./Weather/BeautifulSoup.pys __getattr__ s (s__name__s __module__s __getattr__(((s./Weather/BeautifulSoup.pys NavigableTextssNavigableStringcBstZRS(N(s__name__s __module__(((s./Weather/BeautifulSoup.pysNavigableStringssNavigableUnicodeStringcBstZRS(N(s__name__s __module__(((s./Weather/BeautifulSoup.pysNavigableUnicodeStringssTagcBsLtZdZeeedZedZdZdZdZ dZ dZ dZ d Z d Zd Zd Zd ZdZdZeedZedZeedZedZeedZeheedZeZeheeedZeZdZdZdZ dZ!dZ"RS(s=Represents a found HTML tag with its attributes and contents.cCsO||_|tjo g}n||_g|_|i||t|_ dS(sBasic constructor.N( snamesselfsattrssNonescontentsssetupsparentsprevioussFalseshidden(sselfsnamesattrssparentsprevious((s./Weather/BeautifulSoup.pys__init__s     cCs|ii||SdS(sReturns the value of the 'key' attribute for the tag, or the value given for 'default' if it doesn't have that attribute.N(sselfs _getAttrMapsgetskeysdefault(sselfskeysdefault((s./Weather/BeautifulSoup.pysget$scCs|i|SdS(sqtag[key] returns the value of the 'key' attribute for the tag, and throws an exception if it's not there.N(sselfs _getAttrMapskey(sselfskey((s./Weather/BeautifulSoup.pys __getitem__*scCst|iSdS(s0Iterating over a tag iterates over its contents.N(sitersselfscontents(sself((s./Weather/BeautifulSoup.pys__iter__/scCst|iSdS(s:The length of a tag is the length of its list of contents.N(slensselfscontents(sself((s./Weather/BeautifulSoup.pys__len__3scCs||ijSdS(N(sxsselfscontents(sselfsx((s./Weather/BeautifulSoup.pys __contains__7scCstSdS(s-A tag is non-None even if it has no contents.N(sTrue(sself((s./Weather/BeautifulSoup.pys __nonzero__:scCs|i||i|s  cCshxa|iD]V}|d|jo|ii|n|i|ii|o|i|=q q WdS(s;Deleting tag[key] deletes all 'key' attributes for the tag.iN(sselfsattrssitemskeysremoves _getAttrMapsattrMapshas_key(sselfskeysitem((s./Weather/BeautifulSoup.pys __delitem__Ls  cOst|i||SdS(sCalling a tag like a function is the same as calling its fetch() method. Eg. tag('a') returns a list of all the A tags found within this tag.N(sapplysselfsfetchsargsskwargs(sselfsargsskwargs((s./Weather/BeautifulSoup.pys__call__WscCsst|djo|idt|djo|i|d Sn(|iddjo|i|SndS(NisTagis__i(slenstagsrfindsselfsfirstsfind(sselfstag((s./Weather/BeautifulSoup.pys __getattr__]s3cCst|d p^t|d pMt|d p<|i|ijp)|i|ijpt|t|jotSnxCtdt|iD])}|i||i|jotSqqWt SdS(sReturns true iff this tag has the same name, the same attributes, and the same contents (recursively) as the given tag. NOTE: right now this will return false if two tags have the same attributes in a different order. Should this be fixed?snamesattrsscontentsiN( shasattrsothersselfsnamesattrsslensFalsesrangescontentssisTrue(sselfsothersi((s./Weather/BeautifulSoup.pys__eq__csr cCs||j SdS(sZReturns true iff this tag is not identical to the other tag, as defined in __eq__.N(sselfsother(sselfsother((s./Weather/BeautifulSoup.pys__ne__pscCst|SdS(sRenders this tag as a string.N(sstrsself(sself((s./Weather/BeautifulSoup.pys__repr__uscCs|idSdS(Ni(sselfs__str__(sself((s./Weather/BeautifulSoup.pys __unicode__yscCsg} |io5x2|iD]#\} }| id| |fqWnd} d}|io d} nd|i}t } |t jo#|} |i o| d7} qn|i | d|} |odd|}n|i o | }ng}d}| oddi| }n|o|i|n|id |i|| f|i| |o |t jo|i|n|i|di|}t|tij}|o| ot|}n%|o |tjot|}n|Sd S( sReturns a string or Unicode representation of this tag and its contents. NOTE: since Python's HTML parser consumes whitespace, this method is not certain to reproduce the whitespace present in the original string.s%s="%s"ss /sis needUnicodes %ss s<%s%s%s>N(sattrssselfskeysvalsappendsclosescloseTags isSelfClosingsnamesNonesindentIncrementsshowStructureIndentshiddensrenderContentss needUnicodescontentssspacesssattributeStringsjoinstypestypess UnicodeTypes isUnicodesunicodesFalsesstr(sselfs needUnicodesshowStructureIndentsvalsspacesattributeStringscloseTagsss isUnicodesindentIncrementsattrsskeysclosescontents((s./Weather/BeautifulSoup.pys__str__|sN           cCs|i|dtSdS(NsshowStructureIndent(sselfs__str__s needUnicodesTrue(sselfs needUnicode((s./Weather/BeautifulSoup.pysprettifyscCsg}x|D]}t}t|tpt|ti jot |}nQt|t o|i |i ||n$|ot |}n t|}|oA|tjo#|ddjo|d }qn|i |q q Wdi|SdS(sIRenders the contents of this tag as a (possibly Unicode) string.is sN(sssselfscsNonestexts isinstancesNavigableUnicodeStringstypestypess UnicodeTypesunicodesTagsappends__str__s needUnicodesshowStructureIndentsstrsjoin(sselfsshowStructureIndents needUnicodescstextss((s./Weather/BeautifulSoup.pysrenderContentss$&  cCs|id|d|SdS(sConvenience method to retrieve the first piece of text matching the given criteria. 'text' can be a string, a regular expression object, a callable that takes a string and returns whether or not the string 'matches', etc.s recursivestextN(sselfsfirsts recursivestext(sselfstexts recursive((s./Weather/BeautifulSoup.pys firstTextscCs |id|d|d|SdS(sConvenience method to retrieve all pieces of text matching the given criteria. 'text' can be a string, a regular expression object, a callable that takes a string and returns whether or not the string 'matches', etc.s recursivestextslimitN(sselfsfetchs recursivestextslimit(sselfstexts recursiveslimit((s./Weather/BeautifulSoup.pys fetchTextscCs>t}|i||||d}|o|d}n|SdS(sLReturn only the first child of this Tag matching the given criteria.iiN( sNullsrsselfsfetchsnamesattrss recursivestextsl(sselfsnamesattrss recursivestextslsr((s./Weather/BeautifulSoup.pysfirsts cCs;|i}| o |i}n|i|||||SdS(sExtracts a list of Tag objects that match the given criteria. You can specify the name of the Tag and any attributes you want the Tag to have. The value of a key-value pair in the 'attrs' map can be a string, a list of strings, a regular expression object, or a callable that takes a string and returns whether or not the string matches for some custom definition of 'matches'. The same is true of the tag name.N( sselfsrecursiveChildGenerators generators recursiveschildGenerators_fetchsnamesattrsstextslimit(sselfsnamesattrss recursivestextslimits generator((s./Weather/BeautifulSoup.pysfetchs   cCs|itijSdS(sReturns true iff this is a self-closing tag as defined in the HTML standard. TODO: This is specific to BeautifulSoup and its subclasses, but it's used by __str__N(sselfsnames BeautifulSoupsSELF_CLOSING_TAGS(sself((s./Weather/BeautifulSoup.pys isSelfClosingscCs|ii|dS(s2Appends the given tag to the contents of this tag.N(sselfscontentssappendstag(sselfstag((s./Weather/BeautifulSoup.pysappendscCsPt|d o4h|_x(|iD]\}}||i|d?fei d@dAfei dBdCfgZ dDZ e e e dEZdFZdGZdHZdIZdJZdKZdLZe dMZdNZdOdPZdQZdRZdSZdTZdUZdVZdWZdXZ RS(YsdThis class contains the basic parser and fetch code. It defines a parser that knows nothing about tag behavior except for the following: You can't close a tag without closing all the tags it encloses. That is, "" actually means "". [Another possible explanation is "", but since this class defines no SELF_CLOSING_TAGS, it will never use that explanation.] This class is useful for parsing XML or made-up markup languages, or when BeautifulSoup makes an assumption counter to what you were expecting.ss€ss ss‚ssƒss„ss…ss†ss‡ss⁁ss%ssŠss<ssŒss?ssZssss‘ss’ss“ss”ss•ss–ss—ss˜ss™ssšss>ssœssszssŸs (<[^<>]*)/>cCs|}|iddS(Nis />(sxsgroup(s.0sx((s./Weather/BeautifulSoup.pysrss]*)>cCs|}d|iddS(Ns(sxsgroup(s.0sx((s./Weather/BeautifulSoup.pystss([-])cCs|}tii|idS(Ni(sxsBeautifulStoneSoupsMS_CHARSsgetsgroup(s.0sx((s./Weather/BeautifulSoup.pysvss [document]cCsti||i|o t| o |i}n||_ti|g|_d|_ |i t |do|i }n|o|i|n|o|indS(sInitialize this as the 'root tag' and feed in any text to the parser. NOTE about avoidParserProblems: sgmllib will process most bad HTML, and BeautifulSoup has tricks for dealing with some HTML that kills sgmllib, but Beautiful Soup can nonetheless choke or lose data if your data uses self-closing tags or declarations incorrectly. By default, Beautiful Soup sanitizes its input to avoid the vast majority of these problems. The problems are relatively rare, even in bad HTML, so feel free to pass in False to avoidParserProblems if they don't apply to you, and you'll get better performance. The only reason I have this turned on by default is so I don't get so many tech support questions. The two most common instances of invalid HTML that will choke sgmllib are fixed by the default parser massage techniques:
(No space between name of closing tag and tag close) (Extraneous whitespace in declaration) You can pass in a custom list of (RE object, replace method) tuples to get Beautiful Soup to scrub your input the way you want.isreadN(sTags__init__sselfs ROOT_TAG_NAMEsavoidParserProblemssisListsPARSER_MASSAGEs SGMLParsers quoteStackshiddensresetshasattrstextsreadsfeedsinitialTextIsEverythingsdone(sselfstextsavoidParserProblemssinitialTextIsEverything((s./Weather/BeautifulSoup.pys__init__{s      cCs|iddjp)|iddjp|iddjoti||Sn1|iddjoti||SntdS(sThis method routes method call requests to either the SGMLParser superclass or the Tag superclass, depending on the method name.sstart_isend_sdo_s__N(s methodNamesfinds SGMLParsers __getattr__sselfsTagsAttributeError(sselfs methodName((s./Weather/BeautifulSoup.pys __getattr__s BcCsN|io0x-|iD]\}}|i||}qWnti||dS(N(sselfsavoidParserProblemssfixsmssubstexts SGMLParsersfeed(sselfstextsmsfix((s./Weather/BeautifulSoup.pysfeeds   cCs6|ix%|ii|ijo|iq WdS(s^Called when you're done parsing, so that the unclosed tags can be correctly processed.N(sselfsendDatas currentTagsnames ROOT_TAG_NAMEspopTag(sself((s./Weather/BeautifulSoup.pysdones  cCs9ti|g|_t|_g|_|i|dS(N(s SGMLParsersresetsselfs currentDatasNones currentTagstagStackspushTag(sself((s./Weather/BeautifulSoup.pysresets     cCs|ii}t|iidjot|iidto|iid|i_ n|io|id|_n|iSdS(Niii( sselfstagStackspopstagslens currentTagscontentss isinstances NavigableTextsstring(sselfstag((s./Weather/BeautifulSoup.pyspopTags 3 cCsB|io|ii|n|ii||id|_dS(Ni(sselfs currentTagsappendstagstagStack(sselfstag((s./Weather/BeautifulSoup.pyspushTags cCsdi|i}|o|i o!d|jo d}qHd}nt}t|tijo t }n||}|i |i |i |i o||i _n||_ |i ii|ng|_dS(Nss s (sjoinsselfs currentDatasstripsNavigableStringscstypestypess UnicodeTypesNavigableUnicodeStringsossetups currentTagsprevioussnextscontentssappend(sselfscs currentDataso((s./Weather/BeautifulSoup.pysendDatas        cCs||ijodSnd}t}xVtt|idddD]5}||i|ijot|i|}PqDqDW| o|d}nx#td|D]}|i }qW|SdS(sPops the tag stack up to and including the most recent instance of the given tag. If inclusivePop is false, pops the tag stack up to but *not* including the most recent instqance of the given tag.Niii( snamesselfs ROOT_TAG_NAMEsnumPopssNones mostRecentTagsrangeslenstagStacksis inclusivePopspopTag(sselfsnames inclusivePopsnumPopssis mostRecentTag((s./Weather/BeautifulSoup.pys _popToTags   c Cs!|ii|}|tj}|ii|}t}t }xt t|idddD]}|i|}| p |i|jo| o |}Pn|tjo |i|jp'|tjo|o|ii|io|i}t}Pn|i}q\W|o|i||ndS(s We need to pop up to the previous tag of this type, unless one of this tag's nesting reset triggers comes between this tag and the previous tag of this type, OR unless this tag is a generic nesting trigger and another generic nesting trigger comes between this tag and the previous tag of this type. Examples:

FooBar

should pop to 'p', not 'b'.

FooBar

should pop to 'table', not 'p'.

Foo

Bar

should pop to 'tr', not 'p'.

FooBar

should pop to 'p', not 'b'.

    • *
    • * should pop to 'ul', not the first 'li'.
  • ** should pop to 'table', not the first 'tr' tag should implicitly close the previous tag within the same
    ** should pop to 'tr', not the first 'td' iiiN(sselfs NESTABLE_TAGSsgetsnamesnestingResetTriggerssNones isNestablesRESET_NESTING_TAGSshas_keysisResetNestingspopTosTrues inclusivesrangeslenstagStacksispsFalsesparents _popToTag( sselfsnamesps inclusivesis isNestablespopTosnestingResetTriggerssisResetNesting((s./Weather/BeautifulSoup.pys _smartPops&    G  icCs|io:ditd|}|id||fdSn|i||ij o| o|i |nt |||i |i }|i o||i _n||_ |i||p ||ijo|in||ijo|ii|d|_ndS(NscCs|\}}d||fS(Ns %s="%s"(sxsy(s.0sxsy((s./Weather/BeautifulSoup.pys7ss<%s%s>i(sselfs quoteStacksjoinsmapsattrss handle_datasnamesendDatasSELF_CLOSING_TAGSs selfClosings _smartPopsTags currentTagspreviousstagsnextspushTagspopTags QUOTE_TAGSsappendsliteral(sselfsnamesattrss selfClosingstag((s./Weather/BeautifulSoup.pysunknown_starttag2s"     cCs|io|id|jo|id|dSn|i|i||io|id|jo)|iit|idj|_ndS(Nisi( sselfs quoteStacksnames handle_datasendDatas _popToTagspopslensliteral(sselfsname((s./Weather/BeautifulSoup.pysunknown_endtagIs   cCs|ii|dS(N(sselfs currentDatasappendsdata(sselfsdata((s./Weather/BeautifulSoup.pys handle_dataUscCs|id|dS(s0Propagate processing instructions right through.sN(sselfs handle_datastext(sselfstext((s./Weather/BeautifulSoup.pys handle_piXscCs|id|dS(s!Propagate comments right through.s N(sselfs handle_datastext(sselfstext((s./Weather/BeautifulSoup.pyshandle_comment\scCs|id|dS(s"Propagate char refs right through.s&#%s;N(sselfs handle_datasref(sselfsref((s./Weather/BeautifulSoup.pyshandle_charref`scCs|id|dS(s$Propagate entity refs right through.s&%s;N(sselfs handle_datasref(sselfsref((s./Weather/BeautifulSoup.pyshandle_entityrefdscCs|id|dS(s.Propagate DOCTYPEs and the like right through.sN(sselfs handle_datasdata(sselfsdata((s./Weather/BeautifulSoup.pys handle_declhscCst}|i||d!djo^|iid|}|djot|i}n|i|i|d|!|d}nWyt i ||}Wn=t j o1|i|}|i||t|}nX|SdS(s^Treat a bogus SGML declaration as raw data. Treat a CDATA declaration as regular data.i s iiN( sNonesjsselfsrawdatasisfindskslens handle_datas SGMLParsersparse_declarationsSGMLParseErrorstoHandle(sselfsisksjstoHandle((s./Weather/BeautifulSoup.pysparse_declarationls   (!s__name__s __module__s__doc__sSELF_CLOSING_TAGSs NESTABLE_TAGSsRESET_NESTING_TAGSs QUOTE_TAGSsMS_CHARSsrescompilesPARSER_MASSAGEs ROOT_TAG_NAMEsNonesTrues__init__s __getattr__sfeedsdonesresetspopTagspushTagsendDatas _popToTags _smartPopsunknown_starttagsunknown_endtags handle_datas handle_pishandle_commentshandle_charrefshandle_entityrefs handle_declsparse_declaration(((s./Weather/BeautifulSoup.pysBeautifulStoneSoup7s8 '!E+        0       s BeautifulSoupc BsCtZdZeeddddddddd g Zhd e tag should implicitly close the previous

    tag.

    Para1

    Para2 should be transformed into:

    Para1

    Para2 Some tags can be nested arbitrarily. For instance, the occurance of a

    tag should _not_ implicitly close the previous
    tag. Alice said:
    Bob said:
    Blah should NOT be transformed into: Alice said:
    Bob said:
    Blah Some tags can be nested, but the nesting is reset by the interposition of other tags. For instance, a
    , but not close a tag in another table.
    BlahBlah should be transformed into:
    BlahBlah but, Blah
    Blah should NOT be transformed into Blah
    Blah Differing assumptions about tag nesting rules are a major source of problems with the BeautifulSoup class. If BeautifulSoup is not treating as nestable a tag your page author treats as nestable, try ICantBelieveItsBeautifulSoup before writing your own subclass.sbrshrsinputsimgsmetasspacerslinksframesbasesscriptsspansfontsqsobjectsbdossubssupscenters blockquotesdivsfieldsetsinssdelsolsulslisdlsddsdtstablestrstbodystfootstheadstdsthsaddresssformspspresnoscript(s__name__s __module__s__doc__s buildTagMapsNonesSELF_CLOSING_TAGSs QUOTE_TAGSsNESTABLE_INLINE_TAGSsNESTABLE_BLOCK_TAGSsNESTABLE_LIST_TAGSsNESTABLE_TABLE_TAGSsNON_NESTABLE_BLOCK_TAGSsRESET_NESTING_TAGSs NESTABLE_TAGS(((s./Weather/BeautifulSoup.pys BeautifulSoups .*H<   sICantBelieveItsBeautifulSoupcBshtZdZddddddddd d d d d ddddgZdgZegeieeZRS(sThe BeautifulSoup class is oriented towards skipping over common HTML errors like unclosed tags. However, sometimes it makes errors of its own. For instance, consider this fragment: FooBar This is perfectly valid (if bizarre) HTML. However, the BeautifulSoup class will implicitly close the first b tag when it encounters the second 'b'. It will think the author wrote "FooBar", and didn't close the first 'b' tag, because there's no real-world reason to bold something that's already bold. When it encounters '' it will close two more 'b' tags, for a grand total of three tags closed instead of two. This can throw off the rest of your document structure. The same is true of a number of other tags, listed below. It's much more common for someone to forget to close (eg.) a 'b' tag than to actually use nested 'b' tags, and the BeautifulSoup class handles the common case. This class handles the not-co-common case: where you can't believe someone wrote what they did, but it's valid HTML and BeautifulSoup screwed up by assuming it wouldn't be. If this doesn't do what you need, try subclassing this class or BeautifulSoup, and providing your own list of NESTABLE_TAGS.semsbigsissmallsttsabbrsacronymsstrongscitescodesdfnskbdssampsvarsbsnoscript(s__name__s __module__s__doc__s*I_CANT_BELIEVE_THEYRE_NESTABLE_INLINE_TAGSs)I_CANT_BELIEVE_THEYRE_NESTABLE_BLOCK_TAGSs buildTagMaps BeautifulSoups NESTABLE_TAGS(((s./Weather/BeautifulSoup.pysICantBelieveItsBeautifulSoups 9  s BeautifulSOAPcBstZdZdZRS(sThis class will push a tag with only a single string child into the tag's parent as an attribute. The attribute's name is the tag name, and the value is the string child. An example should give the flavor of the change: baz => baz You can then access fooTag['bar'] instead of fooTag.barTag.string. This is, of course, useful for scraping structures that tend to use subelements instead of attributes, such as SOAP messages. Note that it modifies its input, so don't print the modified version out. I'm not sure how many people really want to use this class; let me know if you do. Mainly I like the name.cCst|idjo|id}|id}|it|toAt|idjo+t|idt o|i i |i  o|id||i