Source for file XPath.class.php
Documentation is available at XPath.class.php
* +======================================================================================================+
* | A php class for searching an XML document using XPath, and making modifications using a DOM
* | style API. Does not require the DOM XML PHP library.
* +======================================================================================================+
* | - "What SQL is for a relational database, XPath is for an XML document." -- Sam Blum
* | - "The primary purpose of XPath is to address parts of an XML document. In support of this
* | primary purpose, it also provides basic facilities for manipulting it." -- W3C
* | XPath in action and a very nice intro is under:
* | http://www.zvon.org/xxl/XPathTutorial/General/examples.html
* | Specs Can be found under:
* | http://www.w3.org/TR/xpath W3C XPath Recommendation
* | http://www.w3.org/TR/xpath20 W3C XPath Recommendation
* | NOTE: Most of the XPath-spec has been realized, but not all. Usually this should not be
* | problem as the missing part is either rarely used or it's simpler to do with PHP itself.
* +------------------------------------------------------------------------------------------------------+
* | Requires PHP version 4.0.5 and up
* +------------------------------------------------------------------------------------------------------+
* | Nigel Swinson <nigelswinson@users.sourceforge.net>
* | Started around 2001-07, saved phpxml from near death and renamed to Php.XPath
* | Restructured XPath code to stay in line with XPath spec.
* | Sam Blum <bs_php@infeer.com>
* | Started around 2001-09 1st major restruct (V2.0) and testbench initiator.
* | 2nd (V3.0) major rewrite in 2002-02
* | Daniel Allen <bigredlinux@yahoo.com>
* | Started around 2001-10 working to make Php.XPath adhere to specs
* | Main Former Author: Michael P. Mehl <mpm@phpxml.org>
* | Inital creator of V 1.0. Stoped activities around 2001-03
* +------------------------------------------------------------------------------------------------------+
* | The class is split into 3 main objects. To keep usability easy all 3
* | objects are in this file (but may be split in 3 file in future).
* | | XPathBase | XPathBase holds general and debugging functions.
* | +-------------+ XPathEngine is the implementation of the W3C XPath spec. It contains the
* | | XPathEngine | XML-import (parser), -export and can handle xPathQueries. It's a fully
* | +------+------+ functional class but has no functions to modify the XML-document (see following).
* | | XPath | XPath extends the functionality with actions to modify the XML-document.
* | +-------------+ We tryed to implement a DOM - like interface.
* +------------------------------------------------------------------------------------------------------+
* | Scroll to the end of this php file and you will find a short sample code to get you started
* +------------------------------------------------------------------------------------------------------+
* | To understand how to use the functions and to pass the right parameters, read following:
* | Document: (full node tree, XML-tree)
* | After a XML-source has been imported and parsed, it's stored as a tree of nodes sometimes
* | refered to as 'document'.
* | AbsoluteXPath: (xPath, xPathSet)
* | A absolute XPath is a string. It 'points' to *one* node in the XML-document. We use the
* | term 'absolute' to emphasise that it is not an xPath-query (see xPathQuery). A valid xPath
* | has the form like '/AAA[1]/BBB[2]/CCC[1]'. Usually functions that require a node (see Node)
* | will also accept an abs. XPath.
* | Node: (node, nodeSet, node-tree)
* | Some funtions require or return a node (or a whole node-tree). Nodes are only used with the
* | XPath-interface and have an internal structure. Every node in a XML document has a unique
* | corresponding abs. xPath. That's why public functions that accept a node, will usually also
* | accept a abs. xPath (a string) 'pointing' to an existing node (see absolutXPath).
* | XPathQuery: (xquery, query)
* | A xPath-query is a string that is matched against the XML-document. The result of the match
* | is a xPathSet (vector of xPath's). It's always possible to pass a single absoluteXPath
* | instead of a xPath-query. A valid xPathQuery could look like this:
* | '//XXX/*[contains(., "foo")]/..' (See the link in 'What Is XPath' to learn more).
* +------------------------------------------------------------------------------------------------------+
* | A central role of the package is how the XML-data is stored. The whole data is in a node-tree.
* | A node can be seen as the equvalent to a tag in the XML soure with some extra info.
* | For instance the following XML
* | <AAA foo="x">***<BBB/><CCC/>**<BBB/>*</AAA>
* | Would produce folowing node-tree:
* | 'super-root' <-- $nodeRoot (Very handy)
* | 'depth' 0 AAA[1] <-- top node. The 'textParts' of this node would be
* | / | \ 'textParts' => array('***','','**','*')
* | 'depth' 1 BBB[1] CCC[1] BBB[2] (NOTE: Is always size of child nodes+1)
* | The node itself is an structure desiged mainly to be used in connection with the interface of PHP.XPath.
* | That means it's possible for functions to return a sub-node-tree that can be used as input of an other
* | The main structure of a node is:
* | 'name' => '', # The tag name. E.g. In <FOO bar="aaa"/> it would be 'FOO'
* | 'attributes' => array(), # The attributes of the tag E.g. In <FOO bar="aaa"/> it would be array('bar'=>'aaa')
* | 'textParts' => array(), # Array of text parts surrounding the children E.g. <FOO>aa<A>bb<B/>cc</A>dd</FOO> -> array('aa','bb','cc','dd')
* | 'childNodes' => array(), # Array of refences (pointers) to child nodes.
* | For optimisation reasions some additional data is stored in the node too:
* | 'parentNode' => NULL # Reference (pointer) to the parent node (or NULL if it's 'super root')
* | 'depth' => 0, # The tag depth (or tree level) starting with the root tag at 0.
* | 'pos' => 0, # Is the zero-based position this node has in the parent's 'childNodes'-list.
* | 'contextPos' => 1, # Is the one-based position this node has by counting the siblings tags (tags with same name)
* | 'xpath' => '' # Is the abs. XPath to this node.
* | 'generated_id'=> '' # The id returned for this node by generate-id() (attribute and text nodes not supported)
* | Every node in the tree has an absolute XPath. E.g '/AAA[1]/BBB[2]' the $nodeIndex is a hash array
* | to all the nodes in the node-tree. The key used is the absolute XPath (a string).
* +------------------------------------------------------------------------------------------------------+
* | The contents of this file are subject to the Mozilla Public License Version 1.1 (the "License");
* | you may not use this file except in compliance with the License. You may obtain a copy of the
* | License at http://www.mozilla.org/MPL/
* | Software distributed under the License is distributed on an "AS IS" basis, WITHOUT WARRANTY
* | OF ANY KIND, either express or implied. See the License for the specific language governing
* | rights and limitations under the License.
* | The Original Code is <phpXML/>.
* | The Initial Developer of the Original Code is Michael P. Mehl. Portions created by Michael
* | P. Mehl are Copyright (C) 2001 Michael P. Mehl. All Rights Reserved.
* | Contributor(s): N.Swinson / S.Blum / D.Allen
* | Alternatively, the contents of this file may be used under the terms of either of the GNU
* | General Public License Version 2 or later (the "GPL"), or the GNU Lesser General Public
* | License Version 2.1 or later (the "LGPL"), in which case the provisions of the GPL or the
* | LGPL License are applicable instead of those above. If you wish to allow use of your version
* | of this file only under the terms of the GPL or the LGPL License and not to allow others to
* | use your version of this file under the MPL, indicate your decision by deleting the
* | provisions above and replace them with the notice and other provisions required by the
* | GPL or the LGPL License. If you do not delete the provisions above, a recipient may use
* | your version of this file under either the MPL, the GPL or the LGPL License.
* +======================================================================================================+
* @author S.Blum / N.Swinson / D.Allen / (P.Mehl)
* @link http://sourceforge.net/projects/phpxpath/
* @CVS $Id: XPath.class.php,v 1.9 2005/11/16 17:26:05 bigmichi1 Exp $
// Include guard, protects file being included twice
if (defined($ConstantName)) return;
define($ConstantName,1, TRUE);
/************************************************************************************************
* ===============================================================================================
* X P a t h B a s e - Class
* ===============================================================================================
************************************************************************************************/
// As debugging of the xml parse is spread across several functions, we need to make this a member.
// do we want to do profiling?
// Used to help navigate through the begin/end debug calls
//'_evaluatePrimaryExpr',
# $this->bDebugXmlParse = TRUE;
$this->properties['verboseLevel'] =
1; // 0=silent, 1 and above produce verbose output (an echo to screen).
if (!isSet
($_ENV)) { // Note: $_ENV introduced in 4.1.0. In earlier versions, use $HTTP_ENV_VARS.
$_ENV =
$GLOBALS['HTTP_ENV_VARS'];
// Windows 95/98 do not support file locking. Detecting OS (Operation System) and setting the
// properties['OS_supports_flock'] to FALSE if win 95/98 is detected.
// This will surpress the file locking error reported from win 98 users when exportToFile() is called.
// May have to add more OS's to the list in future (Macs?).
// ### Note that it's only the FAT and NFS file systems that are really a problem. NTFS and
// the latest php libs do support flock()
$_ENV['OS'] = isSet
($_ENV['OS']) ?
$_ENV['OS'] :
'Unknown OS';
// should catch Mac OS X compatible environment
if (!empty($_SERVER['SERVER_SOFTWARE'])
&&
preg_match('/Darwin/',$_SERVER['SERVER_SOFTWARE'])) {
$this->properties['OS_supports_flock'] =
FALSE;
$this->properties['OS_supports_flock'] =
TRUE;
* Resets the object so it's able to take a new xml sting/file
* Constructing objects is slow. If you can, reuse ones that you have used already
* by using this reset() function.
//-----------------------------------------------------------------------------------------
// XPathBase ------ Helpers ------
//-----------------------------------------------------------------------------------------
* This method checks the right amount and match of brackets
* @param $term (string) String in which is checked.
* @return (bool) TRUE: OK / FALSE: KO
$bracketMisscount =
$bracketMissmatsh =
FALSE;
for ($i=
0; $i<
$leng; $i++
) {
$stack[$brackets] =
$term[$i];
$bracketMisscount =
TRUE;
if ($stack[$brackets] !=
'(') {
$bracketMissmatsh =
TRUE;
$bracketMisscount =
TRUE;
if ($stack[$brackets] !=
'[') {
$bracketMissmatsh =
TRUE;
// Check whether we had a valid number of brackets.
if ($brackets !=
0) $bracketMisscount =
TRUE;
if ($bracketMisscount ||
$bracketMissmatsh) {
* Looks for a string within another string -- BUT the search-string must be located *outside* of any brackets.
* This method looks for a string within another string. Brackets in the
* string the method is looking through will be respected, which means that
* only if the string the method is looking for is located outside of
* brackets, the search will be successful.
* @param $term (string) String in which the search shall take place.
* @param $expression (string) String that should be searched.
* @return (int) This method returns -1 if no string was found,
* otherwise the offset at which the string was found.
$bracketCounter =
0; // Record where we are in the brackets.
$exprLeng =
strlen($expression);
for ($i=
0; $i<
$leng; $i++
) {
if ($char==
'(' ||
$char==
'[') {
elseif ($char==
')' ||
$char==
']') {
if ($bracketCounter ==
0) {
// Check whether we can find the expression at this index.
if (substr($term, $i, $exprLeng) ==
$expression) return $i;
* Split a string by a searator-string -- BUT the separator-string must be located *outside* of any brackets.
* Returns an array of strings, each of which is a substring of string formed
* by splitting it on boundaries formed by the string separator.
* @param $separator (string) String that should be searched.
* @param $term (string) String in which the search shall take place.
* @return (array) see above
// Note that it doesn't make sense for $separator to itself contain (,),[ or ],
// but as this is a private function we should be ok.
$bracketCounter =
0; // Record where we are in the brackets.
// Check if any separator is in the term
$sepLeng =
strlen($separator);
if (strpos($term, $separator)===
FALSE) { // no separator found so end now
// Make a substitute separator out of 'unused chars'.
// Now determine the first bracket '(' or '['.
} elseif ($tmp2===
FALSE) {
$startAt =
min($tmp1, $tmp2);
// Get prefix string part before the first bracket.
$preStr =
substr($term, 0, $startAt);
// Substitute separator in prefix string.
$preStr =
str_replace($separator, $substituteSep, $preStr);
// Now get the rest-string (postfix string)
$postStr =
substr($term, $startAt);
// Go all the way through the rest-string.
for ($i=
0; $i <
$strLeng; $i++
) {
// Spot (,),[,] and modify our bracket counter. Note there is an
// assumption here that you don't have a string(with[mis)matched]brackets.
// This should be ok as the dodgy string will be detected elsewhere.
if ($char==
'(' ||
$char==
'[') {
elseif ($char==
')' ||
$char==
']') {
// If no brackets surround us check for separator
if ($bracketCounter ==
0) {
// Check whether we can find the expression starting at this index.
if ((substr($postStr, $i, $sepLeng) ==
$separator)) {
// Substitute the found separator
for ($j=
0; $j<
$sepLeng; $j++
) {
$postStr[$i+
$j] =
$substituteSep[$j];
// Now explod using the substitute separator as key.
$resultArr =
explode($substituteSep, $preStr .
$postStr);
} while (FALSE); // End try block
// Return the results that we found. May be a array with 1 entry.
* Split a string at it's groups, ie bracketed expressions
* Returns an array of strings, when concatenated together would produce the original
* string. ie a(b)cde(f)(g) would map to:
* array ('a', '(b)', cde', '(f)', '(g)')
* @param $string (string) The string to process
* @param $open (string) The substring for the open of a group
* @param $close (string) The substring for the close of a group
* @return (array) The parsed string, see above
// Note that it doesn't make sense for $separator to itself contain (,),[ or ],
// but as this is a private function we should be ok.
// Check if we have both an open and a close tag
if (empty($open) and empty($close)) { // no separator found so end now
while (!empty($string)) {
// Now determine the first bracket '(' or '['.
$openPos =
strpos($string, $open);
$closePos =
strpos($string, $close);
if ($openPos===
FALSE ||
$closePos===
FALSE) {
// Oh, no more groups to be found then. Quit
if ($openPos >
$closePos) {
// Malformed string, dump the rest and quit.
// Get prefix string part before the first bracket.
$preStr =
substr($string, 0, $openPos);
// This is the first string that will go in our output
// Skip over what we've proceed, including the open char
// Find the next open char and adjust our close char
//echo "close: $closePos\nopen: $openPos\n\n";
$closePos -=
$openPos +
1;
$openPos =
strpos($string, $open);
//echo "close: $closePos\nopen: $openPos\n\n";
// While we have found nesting...
while ($openPos &&
$closePos &&
($closePos >
$openPos)) {
// Find another close pos after the one we are looking at
$closePos =
strpos($string, $close, $closePos +
1);
$openPos =
strpos($string, $open, $openPos +
1);
//echo "close: $closePos\nopen: $openPos\n\n";
// If we now have a close pos, then it's the end of the group.
if ($closePos ===
FALSE) {
// We didn't... so bail dumping what was left
$resultArr[] =
$open.
$string;
// We did, so we can extract the group
$resultArr[] =
$open.
substr($string, 0, $closePos +
1);
// Skip what we have processed
$string =
substr($string, $closePos +
1);
} while (FALSE); // End try block
// Return the results that we found. May be a array with 1 entry.
* Retrieves a substring before a delimiter.
* This method retrieves everything from a string before a given delimiter,
* not including the delimiter.
* @param $string (string) String, from which the substring should be extracted.
* @param $delimiter (string) String containing the delimiter to use.
* @return (string) Substring from the original string before the delimiter.
function _prestr(&$string, $delimiter, $offset=
0) {
$offset =
($offset<
0) ?
0 :
$offset;
$pos =
strpos($string, $delimiter, $offset);
if ($pos===
FALSE) return $string; else return substr($string, 0, $pos);
* Retrieves a substring after a delimiter.
* This method retrieves everything from a string after a given delimiter,
* not including the delimiter.
* @param $string (string) String, from which the substring should be extracted.
* @param $delimiter (string) String containing the delimiter to use.
* @return (string) Substring from the original string after the delimiter.
function _afterstr($string, $delimiter, $offset=
0) {
$offset =
($offset<
0) ?
0 :
$offset;
//-----------------------------------------------------------------------------------------
// XPathBase ------ Debug Stuff ------
//-----------------------------------------------------------------------------------------
* Alter the verbose (error) level reporting.
* Pass an int. >0 to turn on, 0 to turn off. The higher the number, the
* higher the level of verbosity. By default, the class has a verbose level
* @param $levelOfVerbosity (int) default is 1 = on
if ($levelOfVerbosity ===
TRUE) {
} elseif ($levelOfVerbosity ===
FALSE) {
$level =
$levelOfVerbosity;
if ($level >=
0) $this->properties['verboseLevel'] =
$levelOfVerbosity;
* Returns the last occured error message.
* @return string (may be empty if there was no error at all)
* @see _setLastError(), _lastError