PHP
extends Tokenizer
in package
Table of Contents
- $endScopeTokens : array<string|int, mixed>
- A list of tokens that end the scope.
- $ignoredLines : array<string|int, mixed>
- A list of lines being ignored due to error suppression comments.
- $knownLengths : array<int, int>
- Known lengths of tokens.
- $scopeOpeners : array<string|int, mixed>
- A list of tokens that are allowed to open a scope.
- $config : Config
- The config data for the run.
- $eolChar : string
- The EOL char used in the content.
- $numTokens : int
- The number of tokens in the tokens array.
- $tokens : array<string|int, mixed>
- A token-based representation of the content.
- $tstringContexts : array<string|int, mixed>
- Contexts in which keywords should always be tokenized as T_STRING.
- $resolveTokenCache : array<string|int, mixed>
- A cache of different token types, resolved into arrays.
- __construct() : void
- Initialise and run the tokenizer.
- getTokens() : array<string|int, mixed>
- Gets the array of tokens.
- replaceTabsInToken() : void
- Replaces tabs in original token content with spaces.
- resolveSimpleToken() : array<string|int, mixed>
- Converts simple tokens into a format that conforms to complex tokens produced by token_get_all().
- standardiseToken() : array<string|int, mixed>
- Takes a token produced from <code>token_get_all()</code> and produces a more uniform token.
- isMinifiedContent() : bool
- Checks the content to see if it looks minified.
- processAdditional() : void
- Performs additional processing after main tokenizing.
- tokenize() : array<string|int, mixed>
- Creates an array of tokens when given some PHP code.
- createAttributesNestingMap() : void
- Creates a map for the attributes tokens that surround other tokens.
- createLevelMap() : void
- Constructs the level map.
- createParenthesisNestingMap() : void
- Creates a map for the parenthesis tokens that surround other tokens.
- createPositionMap() : void
- Sets token position information.
- createScopeMap() : void
- Creates a scope map of tokens that open scopes.
- createTokenMap() : void
- Creates a map of brackets positions.
- findCloser() : int|null
- Finds a "closer" token (closing parenthesis or square bracket for example) Handle parenthesis balancing while searching for closing token
- parsePhpAttribute() : array<string|int, mixed>|null
- PHP 8 attributes parser for PHP < 8 Handles single-line and multiline attributes.
- recurseScopeMap() : int
- Recurses though the scope openers to build a scope map.
Properties
$endScopeTokens
A list of tokens that end the scope.
public
array<string|int, mixed>
$endScopeTokens
= [T_CLOSE_CURLY_BRACKET => T_CLOSE_CURLY_BRACKET, T_ENDIF => T_ENDIF, T_ENDFOR => T_ENDFOR, T_ENDFOREACH => T_ENDFOREACH, T_ENDWHILE => T_ENDWHILE, T_ENDSWITCH => T_ENDSWITCH, T_ENDDECLARE => T_ENDDECLARE, T_BREAK => T_BREAK, T_END_HEREDOC => T_END_HEREDOC, T_END_NOWDOC => T_END_NOWDOC]
This array is just a unique collection of the end tokens from the scopeOpeners array. The data is duplicated here to save time during parsing of the file.
$ignoredLines
A list of lines being ignored due to error suppression comments.
public
array<string|int, mixed>
$ignoredLines
= []
$knownLengths
Known lengths of tokens.
public
array<int, int>
$knownLengths
= [T_ABSTRACT => 8, T_AND_EQUAL => 2, T_ARRAY => 5, T_AS => 2, T_BOOLEAN_AND => 2, T_BOOLEAN_OR => 2, T_BREAK => 5, T_CALLABLE => 8, T_CASE => 4, T_CATCH => 5, T_CLASS => 5, T_CLASS_C => 9, T_CLONE => 5, T_CONCAT_EQUAL => 2, T_CONST => 5, T_CONTINUE => 8, T_CURLY_OPEN => 2, T_DEC => 2, T_DECLARE => 7, T_DEFAULT => 7, T_DIR => 7, T_DIV_EQUAL => 2, T_DO => 2, T_DOLLAR_OPEN_CURLY_BRACES => 2, T_DOUBLE_ARROW => 2, T_DOUBLE_COLON => 2, T_ECHO => 4, T_ELLIPSIS => 3, T_ELSE => 4, T_ELSEIF => 6, T_EMPTY => 5, T_ENDDECLARE => 10, T_ENDFOR => 6, T_ENDFOREACH => 10, T_ENDIF => 5, T_ENDSWITCH => 9, T_ENDWHILE => 8, T_EVAL => 4, T_EXTENDS => 7, T_FILE => 8, T_FINAL => 5, T_FINALLY => 7, T_FN => 2, T_FOR => 3, T_FOREACH => 7, T_FUNCTION => 8, T_FUNC_C => 12, T_GLOBAL => 6, T_GOTO => 4, T_HALT_COMPILER => 15, T_IF => 2, T_IMPLEMENTS => 10, T_INC => 2, T_INCLUDE => 7, T_INCLUDE_ONCE => 12, T_INSTANCEOF => 10, T_INSTEADOF => 9, T_INTERFACE => 9, T_ISSET => 5, T_IS_EQUAL => 2, T_IS_GREATER_OR_EQUAL => 2, T_IS_IDENTICAL => 3, T_IS_NOT_EQUAL => 2, T_IS_NOT_IDENTICAL => 3, T_IS_SMALLER_OR_EQUAL => 2, T_LINE => 8, T_LIST => 4, T_LOGICAL_AND => 3, T_LOGICAL_OR => 2, T_LOGICAL_XOR => 3, T_MATCH => 5, T_MATCH_ARROW => 2, T_MATCH_DEFAULT => 7, T_METHOD_C => 10, T_MINUS_EQUAL => 2, T_POW_EQUAL => 3, T_MOD_EQUAL => 2, T_MUL_EQUAL => 2, T_NAMESPACE => 9, T_NS_C => 13, T_NS_SEPARATOR => 1, T_NEW => 3, T_NULLSAFE_OBJECT_OPERATOR => 3, T_OBJECT_OPERATOR => 2, T_OPEN_TAG_WITH_ECHO => 3, T_OR_EQUAL => 2, T_PLUS_EQUAL => 2, T_PRINT => 5, T_PRIVATE => 7, T_PUBLIC => 6, T_PROTECTED => 9, T_REQUIRE => 7, T_REQUIRE_ONCE => 12, T_RETURN => 6, T_STATIC => 6, T_SWITCH => 6, T_THROW => 5, T_TRAIT => 5, T_TRAIT_C => 9, T_TRY => 3, T_UNSET => 5, T_USE => 3, T_VAR => 3, T_WHILE => 5, T_XOR_EQUAL => 2, T_YIELD => 5, T_OPEN_CURLY_BRACKET => 1, T_CLOSE_CURLY_BRACKET => 1, T_OPEN_SQUARE_BRACKET => 1, T_CLOSE_SQUARE_BRACKET => 1, T_OPEN_PARENTHESIS => 1, T_CLOSE_PARENTHESIS => 1, T_COLON => 1, T_STRING_CONCAT => 1, T_INLINE_THEN => 1, T_INLINE_ELSE => 1, T_NULLABLE => 1, T_NULL => 4, T_FALSE => 5, T_TRUE => 4, T_SEMICOLON => 1, T_EQUAL => 1, T_MULTIPLY => 1, T_DIVIDE => 1, T_PLUS => 1, T_MINUS => 1, T_MODULUS => 1, T_POW => 2, T_SPACESHIP => 3, T_COALESCE => 2, T_COALESCE_EQUAL => 3, T_BITWISE_AND => 1, T_BITWISE_OR => 1, T_BITWISE_XOR => 1, T_SL => 2, T_SR => 2, T_SL_EQUAL => 3, T_SR_EQUAL => 3, T_GREATER_THAN => 1, T_LESS_THAN => 1, T_BOOLEAN_NOT => 1, T_SELF => 4, T_PARENT => 6, T_COMMA => 1, T_THIS => 4, T_CLOSURE => 8, T_BACKTICK => 1, T_OPEN_SHORT_ARRAY => 1, T_CLOSE_SHORT_ARRAY => 1, T_TYPE_UNION => 1]
$scopeOpeners
A list of tokens that are allowed to open a scope.
public
array<string|int, mixed>
$scopeOpeners
= [T_IF => ['start' => [T_OPEN_CURLY_BRACKET => T_OPEN_CURLY_BRACKET, T_COLON => T_COLON], 'end' => [T_CLOSE_CURLY_BRACKET => T_CLOSE_CURLY_BRACKET, T_ENDIF => T_ENDIF, T_ELSE => T_ELSE, T_ELSEIF => T_ELSEIF], 'strict' => false, 'shared' => false, 'with' => [T_ELSE => T_ELSE, T_ELSEIF => T_ELSEIF]], T_TRY => ['start' => [T_OPEN_CURLY_BRACKET => T_OPEN_CURLY_BRACKET], 'end' => [T_CLOSE_CURLY_BRACKET => T_CLOSE_CURLY_BRACKET], 'strict' => true, 'shared' => false, 'with' => []], T_CATCH => ['start' => [T_OPEN_CURLY_BRACKET => T_OPEN_CURLY_BRACKET], 'end' => [T_CLOSE_CURLY_BRACKET => T_CLOSE_CURLY_BRACKET], 'strict' => true, 'shared' => false, 'with' => []], T_FINALLY => ['start' => [T_OPEN_CURLY_BRACKET => T_OPEN_CURLY_BRACKET], 'end' => [T_CLOSE_CURLY_BRACKET => T_CLOSE_CURLY_BRACKET], 'strict' => true, 'shared' => false, 'with' => []], T_ELSE => ['start' => [T_OPEN_CURLY_BRACKET => T_OPEN_CURLY_BRACKET, T_COLON => T_COLON], 'end' => [T_CLOSE_CURLY_BRACKET => T_CLOSE_CURLY_BRACKET, T_ENDIF => T_ENDIF], 'strict' => false, 'shared' => false, 'with' => [T_IF => T_IF, T_ELSEIF => T_ELSEIF]], T_ELSEIF => ['start' => [T_OPEN_CURLY_BRACKET => T_OPEN_CURLY_BRACKET, T_COLON => T_COLON], 'end' => [T_CLOSE_CURLY_BRACKET => T_CLOSE_CURLY_BRACKET, T_ENDIF => T_ENDIF, T_ELSE => T_ELSE, T_ELSEIF => T_ELSEIF], 'strict' => false, 'shared' => false, 'with' => [T_IF => T_IF, T_ELSE => T_ELSE]], T_FOR => ['start' => [T_OPEN_CURLY_BRACKET => T_OPEN_CURLY_BRACKET, T_COLON => T_COLON], 'end' => [T_CLOSE_CURLY_BRACKET => T_CLOSE_CURLY_BRACKET, T_ENDFOR => T_ENDFOR], 'strict' => false, 'shared' => false, 'with' => []], T_FOREACH => ['start' => [T_OPEN_CURLY_BRACKET => T_OPEN_CURLY_BRACKET, T_COLON => T_COLON], 'end' => [T_CLOSE_CURLY_BRACKET => T_CLOSE_CURLY_BRACKET, T_ENDFOREACH => T_ENDFOREACH], 'strict' => false, 'shared' => false, 'with' => []], T_INTERFACE => ['start' => [T_OPEN_CURLY_BRACKET => T_OPEN_CURLY_BRACKET], 'end' => [T_CLOSE_CURLY_BRACKET => T_CLOSE_CURLY_BRACKET], 'strict' => true, 'shared' => false, 'with' => []], T_FUNCTION => ['start' => [T_OPEN_CURLY_BRACKET => T_OPEN_CURLY_BRACKET], 'end' => [T_CLOSE_CURLY_BRACKET => T_CLOSE_CURLY_BRACKET], 'strict' => true, 'shared' => false, 'with' => []], T_CLASS => ['start' => [T_OPEN_CURLY_BRACKET => T_OPEN_CURLY_BRACKET], 'end' => [T_CLOSE_CURLY_BRACKET => T_CLOSE_CURLY_BRACKET], 'strict' => true, 'shared' => false, 'with' => []], T_TRAIT => ['start' => [T_OPEN_CURLY_BRACKET => T_OPEN_CURLY_BRACKET], 'end' => [T_CLOSE_CURLY_BRACKET => T_CLOSE_CURLY_BRACKET], 'strict' => true, 'shared' => false, 'with' => []], T_USE => ['start' => [T_OPEN_CURLY_BRACKET => T_OPEN_CURLY_BRACKET], 'end' => [T_CLOSE_CURLY_BRACKET => T_CLOSE_CURLY_BRACKET], 'strict' => false, 'shared' => false, 'with' => []], T_DECLARE => ['start' => [T_OPEN_CURLY_BRACKET => T_OPEN_CURLY_BRACKET, T_COLON => T_COLON], 'end' => [T_CLOSE_CURLY_BRACKET => T_CLOSE_CURLY_BRACKET, T_ENDDECLARE => T_ENDDECLARE], 'strict' => false, 'shared' => false, 'with' => []], T_NAMESPACE => ['start' => [T_OPEN_CURLY_BRACKET => T_OPEN_CURLY_BRACKET], 'end' => [T_CLOSE_CURLY_BRACKET => T_CLOSE_CURLY_BRACKET], 'strict' => false, 'shared' => false, 'with' => []], T_WHILE => ['start' => [T_OPEN_CURLY_BRACKET => T_OPEN_CURLY_BRACKET, T_COLON => T_COLON], 'end' => [T_CLOSE_CURLY_BRACKET => T_CLOSE_CURLY_BRACKET, T_ENDWHILE => T_ENDWHILE], 'strict' => false, 'shared' => false, 'with' => []], T_DO => ['start' => [T_OPEN_CURLY_BRACKET => T_OPEN_CURLY_BRACKET], 'end' => [T_CLOSE_CURLY_BRACKET => T_CLOSE_CURLY_BRACKET], 'strict' => true, 'shared' => false, 'with' => []], T_SWITCH => ['start' => [T_OPEN_CURLY_BRACKET => T_OPEN_CURLY_BRACKET, T_COLON => T_COLON], 'end' => [T_CLOSE_CURLY_BRACKET => T_CLOSE_CURLY_BRACKET, T_ENDSWITCH => T_ENDSWITCH], 'strict' => true, 'shared' => false, 'with' => []], T_CASE => ['start' => [T_COLON => T_COLON, T_SEMICOLON => T_SEMICOLON], 'end' => [T_BREAK => T_BREAK, T_RETURN => T_RETURN, T_CONTINUE => T_CONTINUE, T_THROW => T_THROW, T_EXIT => T_EXIT], 'strict' => true, 'shared' => true, 'with' => [T_DEFAULT => T_DEFAULT, T_CASE => T_CASE, T_SWITCH => T_SWITCH]], T_DEFAULT => ['start' => [T_COLON => T_COLON, T_SEMICOLON => T_SEMICOLON], 'end' => [T_BREAK => T_BREAK, T_RETURN => T_RETURN, T_CONTINUE => T_CONTINUE, T_THROW => T_THROW, T_EXIT => T_EXIT], 'strict' => true, 'shared' => true, 'with' => [T_CASE => T_CASE, T_SWITCH => T_SWITCH]], T_MATCH => ['start' => [T_OPEN_CURLY_BRACKET => T_OPEN_CURLY_BRACKET], 'end' => [T_CLOSE_CURLY_BRACKET => T_CLOSE_CURLY_BRACKET], 'strict' => true, 'shared' => false, 'with' => []], T_START_HEREDOC => ['start' => [T_START_HEREDOC => T_START_HEREDOC], 'end' => [T_END_HEREDOC => T_END_HEREDOC], 'strict' => true, 'shared' => false, 'with' => []], T_START_NOWDOC => ['start' => [T_START_NOWDOC => T_START_NOWDOC], 'end' => [T_END_NOWDOC => T_END_NOWDOC], 'strict' => true, 'shared' => false, 'with' => []]]
This array also contains information about what kind of token the scope opener uses to open and close the scope, if the token strictly requires an opener, if the token can share a scope closer, and who it can be shared with. An example of a token that shares a scope closer is a CASE scope.
$config
The config data for the run.
protected
Config
$config
= null
$eolChar
The EOL char used in the content.
protected
string
$eolChar
= []
$numTokens
The number of tokens in the tokens array.
protected
int
$numTokens
= 0
$tokens
A token-based representation of the content.
protected
array<string|int, mixed>
$tokens
= []
$tstringContexts
Contexts in which keywords should always be tokenized as T_STRING.
protected
array<string|int, mixed>
$tstringContexts
= [T_OBJECT_OPERATOR => true, T_NULLSAFE_OBJECT_OPERATOR => true, T_FUNCTION => true, T_CLASS => true, T_INTERFACE => true, T_TRAIT => true, T_EXTENDS => true, T_IMPLEMENTS => true, T_ATTRIBUTE => true, T_NEW => true, T_CONST => true, T_NS_SEPARATOR => true, T_USE => true, T_NAMESPACE => true, T_PAAMAYIM_NEKUDOTAYIM => true]
$resolveTokenCache
A cache of different token types, resolved into arrays.
private
static array<string|int, mixed>
$resolveTokenCache
= []
Tags
Methods
__construct()
Initialise and run the tokenizer.
public
__construct(string $content, mixed $config[, string $eolChar = '\n' ]) : void
Parameters
- $content : string
-
The content to tokenize,
- $config : mixed
- $eolChar : string = '\n'
-
The EOL char used in the content.
Tags
Return values
void —getTokens()
Gets the array of tokens.
public
getTokens() : array<string|int, mixed>
Return values
array<string|int, mixed> —replaceTabsInToken()
Replaces tabs in original token content with spaces.
public
replaceTabsInToken(array<string|int, mixed> &$token[, string $prefix = ' ' ][, string $padding = ' ' ][, int $tabWidth = null ]) : void
Each tab can represent between 1 and $config->tabWidth spaces, so this cannot be a straight string replace. The original content is placed into an orig_content index and the new token length is also set in the length index.
Parameters
- $token : array<string|int, mixed>
-
The token to replace tabs inside.
- $prefix : string = ' '
-
The character to use to represent the start of a tab.
- $padding : string = ' '
-
The character to use to represent the end of a tab.
- $tabWidth : int = null
-
The number of spaces each tab represents.
Return values
void —resolveSimpleToken()
Converts simple tokens into a format that conforms to complex tokens produced by token_get_all().
public
static resolveSimpleToken(string $token) : array<string|int, mixed>
Simple tokens are tokens that are not in array form when produced from token_get_all().
Parameters
- $token : string
-
The simple token to convert.
Return values
array<string|int, mixed> —The new token in array format.
standardiseToken()
Takes a token produced from <code>token_get_all()</code> and produces a more uniform token.
public
static standardiseToken(string|array<string|int, mixed> $token) : array<string|int, mixed>
Parameters
- $token : string|array<string|int, mixed>
-
The token to convert.
Return values
array<string|int, mixed> —The new token.
isMinifiedContent()
Checks the content to see if it looks minified.
protected
isMinifiedContent(string $content[, string $eolChar = '\n' ]) : bool
Parameters
- $content : string
-
The content to tokenize.
- $eolChar : string = '\n'
-
The EOL char used in the content.
Return values
bool —processAdditional()
Performs additional processing after main tokenizing.
protected
processAdditional() : void
This additional processing checks for CASE statements that are using curly braces for scope openers and closers. It also turns some T_FUNCTION tokens into T_CLOSURE when they are not standard function definitions. It also detects short array syntax and converts those square brackets into new tokens. It also corrects some usage of the static and class keywords. It also assigns tokens to function return types.
Return values
void —tokenize()
Creates an array of tokens when given some PHP code.
protected
tokenize(string $string) : array<string|int, mixed>
Starts by using token_get_all() but does a lot of extra processing to insert information about the context of the token.
Parameters
- $string : string
-
The string to tokenize.
Return values
array<string|int, mixed> —createAttributesNestingMap()
Creates a map for the attributes tokens that surround other tokens.
private
createAttributesNestingMap() : void
Return values
void —createLevelMap()
Constructs the level map.
private
createLevelMap() : void
The level map adds a 'level' index to each token which indicates the depth that a token within a set of scope blocks. It also adds a 'conditions' index which is an array of the scope conditions that opened each of the scopes - position 0 being the first scope opener.
Return values
void —createParenthesisNestingMap()
Creates a map for the parenthesis tokens that surround other tokens.
private
createParenthesisNestingMap() : void
Return values
void —createPositionMap()
Sets token position information.
private
createPositionMap() : void
Can also convert tabs into spaces. Each tab can represent between 1 and $width spaces, so this cannot be a straight string replace.
Return values
void —createScopeMap()
Creates a scope map of tokens that open scopes.
private
createScopeMap() : void
Tags
Return values
void —createTokenMap()
Creates a map of brackets positions.
private
createTokenMap() : void
Return values
void —findCloser()
Finds a "closer" token (closing parenthesis or square bracket for example) Handle parenthesis balancing while searching for closing token
private
findCloser(array<string|int, mixed> &$tokens, int $start, string|array<string|int, string> $openerTokens, string $closerChar) : int|null
Parameters
- $tokens : array<string|int, mixed>
-
The list of tokens to iterate searching the closing token (as returned by token_get_all)
- $start : int
-
The starting position
- $openerTokens : string|array<string|int, string>
-
The opening character
- $closerChar : string
-
The closing character
Return values
int|null —The position of the closing token, if found. NULL otherwise.
parsePhpAttribute()
PHP 8 attributes parser for PHP < 8 Handles single-line and multiline attributes.
private
parsePhpAttribute(array<string|int, mixed> &$tokens, int $stackPtr) : array<string|int, mixed>|null
Parameters
- $tokens : array<string|int, mixed>
-
The original array of tokens (as returned by token_get_all)
- $stackPtr : int
-
The current position in token array
Return values
array<string|int, mixed>|null —The array of parsed attribute tokens
recurseScopeMap()
Recurses though the scope openers to build a scope map.
private
recurseScopeMap(int $stackPtr[, int $depth = 1 ], int &$ignore) : int
Parameters
- $stackPtr : int
-
The position in the stack of the token that opened the scope (eg. an IF token or FOR token).
- $depth : int = 1
-
How many scope levels down we are.
- $ignore : int
-
How many curly braces we are ignoring.
Tags
Return values
int —The position in the stack that closed the scope.