Scalar Literals and Variables
Values and Variable Definitions in Perl
PREREQUISITES — You should already…
- have read Getting Started with Perl;
- know how to write and run simple Perl scripts.
Concepts
Programs use values of different types and characteristics. Perl has very few types. It is not strongly-typed, in that it does minimal compile-time type checking, and performs implicit conversions at runtime.
Type Concepts
In Perl’s terminology, it has only three types: scalars, lists and hashes. That is not very helpful when considering the differences between numbers and strings, for example. In Perl’s view then, these are different kinds of scalar values, which can be stored in variables of type: scalar, list, or hash.
Type Sigils
Variables in Perl do not have an absolute, or pre-defined preference for the kind of data you store in them. Their names incorporate their type by virtual of a sigil in the form of a prefix character. This is ‘$
’ for scalars, ‘@
’ for arrays, and ‘%
’ for hashes (dictionaries). It cannot be modified. One consequence of this arrangement, is that these are three different variables:
Name variables distinctly; do not depend on sigils as the only differentiation.
Arrays and Lists
Although Perl has an array type, the term is only applied to variables. An array can be created with a list of values, often called a literal list. Wherever documentation refers to ‘LIST’, you can use either a literal list, or an array variable.
Type Context
Many of Perl’s functions and operators expect a certain type as argument or operand. The position where a scalar, or list (literal array) is expected, is called a context, i.e., list context, or scalar context.
Certain operations and functions will behave differently when used in a list context, as opposed to a scalar context. This is always documented, and we will point this out where appropriate. At this point, we just want to make you aware of the concept and the terminology around it.
Although not officially a “context”, some operators need logical (boolean) values as one or more operands. There are now true
or false
constants in Perl, which means that values are treated logically, depending on certain rules.
References in General
Perl references are not often introduced to beginners, but they will eventually become useful. We take this opportunity merely to introduce the topic, and establish some terminology.
References are scalar values, and abstractly are addresses, or “pointers” (if you are familiar with that term). The ref(
‹expr›)
function can be used to determine the type of an ‹expr›ession, as long as it is a reference expression. This suggests that a reference can store the location of (or “a reference to”) any kind of Perl object:
SCALAR CODE LVALUE VSTRING
ARRAY REF FORMAT Regexp
HASH GLOB IO
Since the above list is from Perl’s documentation, it would suggest that Perl has more than three types. And yes, it does, but formalism in open-source documentation cannot always be rigorously effected. Scalars, arrays and hashes are by far the most common types, and all that many programmers need to be comfortable with.
Type Conversion
Type conversions are implicit. When an operator expects a number, it will convert the expression used as operand to a number automatically. There are no cast, or type conversion, operators in Perl. The closest is the int
function, which truncates any decimal parts from a number (no rounding).
One consequence of this design choice, is that the comparison operators for numbers and strings are different. For example: ==
tests for equality on numbers, but eq
tests for equality between strings. Operators therefore drive implicit type conversion — they will automatically convert operands as required.
Undefined Values
It is possible for a value to be undefined. This may arise from the result of an operation or function call. This value is represented by the undef
function, and its result can be assigned to a variable. To test for an undef
value, use the defined
function; this is the only way to distinguish an undef
value from other “false” values.
Truthiness
Since Perl does not have a boolean type, it must treat expressions as either ‘true’ or ‘false’, in any context where it needs to treat values as logical. The treatment is based on certain criteria. We list the conditions under which an expression will be treated as ‘false’.
Definition — Truthiness
Only the following values are ever treated as false: if the expression…
- results in the number ‘
0
’, or ‘0.0
’.- is an empty string: ‘
''
’, or ‘""
’.- is the string: ‘
'0'
’, or ‘"0"
’.- is an empty list: ‘
()
’.- results in ‘
undef
’.All other values will be treated as true.
Operators like the comparison operators, which must report whether some condition is ‘true’ or ‘false’, will return ‘0
’ for ‘false’ and ‘1
’ for ‘true’. The logical operators, on the other hand, return the value of the last operand evaluated.
#!/usr/bin/env perl
use v5.16; use utf8;
my ($i, @tests) = (0, 0, 0.0, "", "0", "0.0", undef, (), 1.2, 3);
for (@tests) {
++$i;
say "#$i: |$_| is 'true'" if ($_);
say "#$i: |$_| is 'false'" if (!$_);
}
Do not worry too much about the ‘for
’ loop, or the ‘if
’ statement modifier, for now. The code above will iterate through all the elements in the @tests
list, and successively store the current element in the default variable: ‘$_
’. This variable is tested for “truthiness”, during each iteration.
IMPORTANT — Default Variable $_
Many Perl constructs, functions and statements require a value clause or argument. If none is provided, they will simply use the $_
“default variable”. If you are not aware of this, some code may not make sense. For example: print;
actually results in: print $_;
.
Whenever Perl documentation refers to a “Boolean” value as returned by operators and functions, it depends entirely on the relevant operator or function exactly what form the value will take. What you can be sure of, is that the value returned, will be treatable as true or false.
Constants
Very few introductory Perl material ever mentions the constant
module. The concept of symbolic constants in programming seems to be past its prime. However, if you want to be “old school”, and value the concept of symbolic constants (values with a name), you can use the following as a pattern, without deep understanding. The pattern is very simple.
Syntax: — Symbolic Constants
use constant
‹NAME›=>
‹expr›;
The ‹expr›ession can be a string, a number, or a calculation — but the result must be a scalar. It cannot be used to create array constants.
Consider a situation where you want to have your own constant for π (pi). As a convention, you may decide on some name consisting of only capital letters, e.g.: PI
. Then you can create your constant as follows:
From this point on, you can use PI
just like any other literal. And without a sigil, which means it will not interpolated, but string concatenation can be used: "ABC".PI."DEF"
.
Literals
Literals are values without names; sometimes called “magic numbers” (assuming they are numbers, otherwise they are strings or other magic values). Literals have a syntax notation, which determines what value will eventually end up in memory.
Numeric Literals
There is no distinction in memory between integer types and floating point types — in Perl, there are just “numbers”. Notation-wise, you may distinguish, and certain operators will disregard (truncate) decimals, but their type will remain the same.
Numeric Notation
Numbers are by default considered to be in base 10 (decimal) format. They may optionally be prefixed with a +
(plus) or -
(minus). If the number contains no decimal point, we may call it an “integer literal”. Underscores may be placed at strategic positions in any numeric literal. Here are some examples of legal, base 10, integer literals:
A base other than decimal can be used. For octal, you merely start the integer literal with a 0
digit. The following digits are then treated as base 8, meaning that the largest legal digit will be 7
, so: 0678
is illegal (since 8
is not an octal digit). The following example contains only octal integer literals and will print very different numbers to the decimal literals above:
Hexadecimal integer literals are indicated with a leading: 0x
, where the x
may also be a capital letter. As before, you can use underscores at any point.
And finally, you can use the 0b
prefix for binary (base 2) numbers. The b
may be a capital letter B
. In the following example, we use all the available notations to represent the decimal value 1234
:
Floating point literals contain a decimal point, which may or may not be followed by a digit. This is fixed-point notation: 123.456
. Alternatively, regardless of the presence of a decimal point, exponential notation can be used (1.23456×102): 1.23456e2
. The e
can also be a capital E
. The exponent can be explicitly positive: 1.23456e+2
, or negative: 123.456e-2
.
Basic Numeric Operations
Regardless of numeric literal base notation, all numerical operations produce Perl numbers as represented internally. Some operators will disregard the fractional part (decimals).
Arithmetic. As one may expect, Perl provides the traditional arithmetic operators: +
(addition), -
(subtraction), *
(multiplication) and /
(division). There is no concept of integer division — the result is always a Perl number, which may have decimals, so: say 123/50;
will print 2.46
. You can use the int
function to truncate the decimals: say int(123/50);
will print 2
.
Remainder. To get the remainder (modulus) of a division, you can use the modulus operator: %
. It truncates any decimals before the operation. The code: say 123%50;
will thus print 23
.
To the Power. Numbers can be raised to a power with the **
operator, which has high precedence. So, the equation πr2 can be expressed in Perl as: PI * $radius ** 2
, given the appropriate variable and constant.
Bitwise. If you have need for bitwise operations, these mirror those of C: &
(bitwise AND), |
(bitwise OR), ^
(bitwise XOR), ~
(bitwise complement or negation). They will all disregard any decimals present in operands. You also have the <<
(bitwise shift left), and >>
(bitwise shift right) operators at your disposal.
Numeric Output
Simply passing a numeric value to either the say
function, or the print
function, will output it in decimal to the highest precision. If the value has no decimal digits, it will not print any.
The only way to control numerical output appearance and base, is with the printf
function (the sprintf
function works the same, except it returns a formatted string). We will only show a couple of common patterns here:
use v5.16; $, = ", ";
printf "%d, %f\n", 12.999, 12.999; #<-print as decimal integer, and
# same value as floating point.
printf "%x, %04X\n", 123, 123; #<-hex with lowercase digits, and
# same value with leading `0`s.
printf "%d, %o, %x, %08b\n", #<-print same value in decimal,
123, 123, 123, 123; # octal, hex, and binary.
printf "%.2f, %12.2f\n", #<-float with 2 decimals and same
12.345, 12.345; # value right aligned (12 wide).
printf "%12.2f, %-12.2f\n", #<-float with 2 decimals, right
12.345, 12.345; # and left aligned, (12 wide).
The %
(percentage) character is the start of a format specification. It represents a placeholder, which will be replaced by the corresponding argument following the initial format string. Any other text is output verbatim (unless string interpolation is used, which is a Perl operation, and nothing to do with printf
).
Syntax — Print Format Specifier
%
[‹width›][.
‹prec›]‹spec›- ‹width› total output width; fill with spaces on left; if negative, fill with spaces on right.
- ‹prec› precision: number of decimal digits, rounded; only valid on floating point.
- ‹spec› format specifier. see examples and documentation.
Both the ‹width› and ‹prec›ision parts can be an asterisk (*
), in which case, for each asterisk, an additional integer parameter must be passed, which will be used for the ‹width› or ‹prec›ision.
To actually output a percentage sign, use two consecutive signs: %%
. It does not matter whether you use double quotes for the formatting string literal, or single quotes. The format string can be a variable.
Rounding
A common beginner’s question involves rounding of floating point values, in particular the rounding to a set number of decimals. There is no round
function. You can use math functions from the core POSIX
module, in particular floor()
and ceil()
, to convert down or up to the closest integer respectively. However, here is a pattern that rounds a floating point number to any number of decimal digits:
Pattern — Rounding to N Decimal Digits
- ‹var1›
= 1.0 * sprintf("%.
‹N›f",
‹var2›);
- ‹var1› can be the same variable as ‹var2›.
As an example, consider this snippet, using the above pattern to round $num
to 2
decimal digits:
The output will be: 123.46
. The ‘1.0 * …
’ part exists solely to convert the result back to a number, since the sprintf
function returns a string. This part is completely optional.
String Literals
To represent a sequence of characters as fixed, constant values, many languages including Perl provide string literals, normally enclosed in either double quotes ("⋯"
), or single quotes ('⋯'
). In Perl, you can use either, but double-quoted string literals are special, in that they are parsed and processed at runtime, just like an expression containing operators.
Perl has no “character literal” syntax. To represent a single character, use a string containing only one character.
String Literal Notation
The simplest and most efficient form for a string literal, is one enclosed in single quotes. The only special sequence inside a single-quoted string literal is: ‘\'
’ to represent a single quote inside the string. This is not necessary for double-quoted strings, but they too must use: \"
to get a double quote character into a string.
use v5.16; use utf8; $, = ' ';
say 'ABC', "DEF"; #=> `ABC DEF`
say 'G\'I\'J', "K'L'M"; #=> `G'I'J K'L'M`
say "G\"I\"J", 'K"L"M'; #=> `G"I"J K"L"M`
Both single-quoted and double-quoted literal strings can enclose nothing. This is called an “empty string”, and when treated as logical, will be treated as “false”. The length of an empty string is therefore 0
. The backslash inside a literal string is called an “escape character”, and is the start of an “escape sequence”.
Although it is really not recommended for use, you should probably know that string literals may span multiple lines. This will embed newlines in the string, but will mess with indentation, and is thus not considered good programming practice. The following example will output 3 separate lines because of the embedded newlines.
Any indentation you apply before the GH…
and QR…
lines, will become spaces in the string. This also works for double-quoted string literals.
Here Strings / Documents
In Perl, long strings can be created that span several lines (apart from single-quoted strings). They are called “here documents” (heredocs) or “here strings”. You can use a here document anywhere a string is expected: you can assign it to a variable, pass it to a function, or pass it to ‘say
’ or ‘print
’.
Syntax — Here Strings / Documents
<<
‹IDENT›;
notice trailing semicolon.
⋯⋯⋯⋯⋯ string, line(s) treated as double-quoted string literal.
‹IDENT› termination. notice absence of semicolon.- ‹IDENT› typically, an all-capitals identifier.
By default ‘<<
‹IDENT›;
’ is treated as ‘<<"
‹IDENT›";
’, although you can explicitly write it like that. The quotes must not be repeated on the terminating ‹IDENT›. Using: ‘<<'
‹IDENT›';
’, will result in the here document being treated as a single-quoted string literal, so interpolation and escape sequences will not work.
The trailing ‹IDENT› must be on a line by itself, without leading spaces, and no trailing semicolon.
use v5.16; $, = ", ";
say '-' x 40;
print <<MSG;
This is some arbitrary long string
with embedded newlines, treated as
a double-quoted string, so that in-
terpolation will also work here.
MSG
say '-' x 40;
print <<"MSG";
This is some arbitrary long string
with embedded newlines, EXPLICITLY
a double-quoted string, so that in-
terpolation will also work here.
MSG
say '-' x 40;
print <<'MSG';
This is some arbitrary long string
with embedded newlines, treated as
a SINGLE-quoted string, so that in-
terpolation will not work here.
MSG
say '-' x 40;
Notice that the leading spaces are part of the resulting string. It is not much better than a string literal spanning many lines. You can also use here documents like this:
print (<< "MSG");
this is some text and
here we have more text
and the last line here.
MSG
print "Done\n";
It is a little bit weird, we must admit. But that is Perl for you. Have fun; amuse your bosses and colleagues; they will appreciate your insight and mastery.
Escape Sequences
As we have seen before, the backslash inside string literals is treated as special: in this context, it is the escape character, starting an escape sequence. Apart from the one exception, escape sequences have meaning only inside double-quoted string literals.
Following a single backslash, you have several syntax options:
\x
‹hex› ‹hex› is 2-digit hexadecimal character code in the range00
…FF
.\x{
‹hex›}
‹hex› is a hexadecimal Unicode character code point.\N{
‹name›}
‹name› is a Unicode character name.\N{U+
‹code›D}
‹code› is a decimal Unicode character code point.\
‹oct› ‹oct› is a 3-digit octal character code.\o{
‹oct›}
‹oct› is a N-digit octal character code.\c
‹C› control ‹C›haracter; ‹C› must be one of:A
…Z
,@
,[
,]
,^
,_
,?
.\e
(escape),\t
(tab),\n
(newline),\r
(carriage return),\f
(form feed),\a
(alarm).\l
lowercase next character.\u
uppercase next character.\L
lowercase all characters until next\E
.\U
uppercase all characters until next\E
.\Q
quote special characters until next\E
.\E
end conversion sequence fromL
,\U
, or\Q
.
The conversion sequences are only useful when combined with string interpolation below.
To represent a single backslash, you have to use two: \\
.
String Interpolation
Only double-quoted string literals support variable interpolation, and the special escape sequences. Variables can be interpolated (expanded in the string), by simply referencing them: ‘"⋯$var⋯"
’. If there is potential for ambiguity, enclose the variable’s name in curly braces: ‘"⋯${var}⋯"
’.
use v5.16;
my ($more) = ('Some more words.');
say <<MSG1;
This string can contain embedded newlines, but otherwise
will act like a double-quoted string, so interpolation
will work as expected. \U$more\E
MSG1
say "Message complete. $more";
☆The name MSG1
in this example is just an arbitrary name you choose to indicate the start and end of the here document. Note that there cannot be a trailing ‘;
’ after the terminating MSG1
. If you inserted indentation before the lines, those spaces will be part of the strings on each line.
This string can contain embedded newlines, but otherwise
will act like a double-quoted string, so interpolation
will work as expected. SOME MORE WORDS.
Message complete. Some more words.
Basic String Operations
Strings are concatenated (joined) with the .
(period character) operator. If any of its two operands are not strings, they will be converted to strings.
The x
(yes, the ‘x’ character), is a string repetition operator. The left-hand operand must be a string, or convertible to a string. The right-hand operand must be an integer expression (decimals are discarded), and can be a literal, variable, or an expression containing other operators, but do take precedence into account.
use v5.16; $, = ", ";
say '-.' x 10; #=> `-.-.-.-.-.-.-.-.-.-.`.
say 12 x 10; #=> `12121212121212121212`.
say 'X~' x (5 * 2); #=> `X~X~X~X~X~X~X~X~X~X~`.
my $x = ('A' . 'B') x 10; #=> `ABABABABABABABABABAB`.
say $x;
foreach my $i (1..4) { #=> `####`.
say '#' x ($i * 4); #=> `########`.
} #=> `############`.
#=> `################`.
String operations include the ‘index
’ function (find first substring within another), and for finding the last match, the: ‘rindex
’ function. The most powerful is probably the ‘substr
’ function, since it can also appear on the left of assignment, which means it can be used to insert, delete and replace parts of a string. The ‘sprintf
’ function can format strings.
You can also represent a string with one character from a code point with the chr
function, and get the code point (ordinal) of the first character in a string with the ord
function.
String Output
The printf
function also provides for the %s
(string), and %c
(character) format specifiers. The %s
specifier will format its argument as a string, while the %c
specifier requires an integer value; this value must be the code point of a character.
The ‘%c
’ was replaced with ‘A
’, because the code point for the letter ‘A
’ is 65
. The second ‘%s
’ caused the 0x7B
(123
in decimal) to be converted to a string, and then printed. You can verify that the code point for ‘A
’ is 65
with the ord
function (returning a number), or manually perform the task of ‘%c
’ with the chr
function (which returns a string):
UTF-8
While Perl is fully Unicode compatible, you may sometimes get a “Wide character…” message during output. That is because the ‘STDOUT
’ and ‘STDERR
’ file handles are not initially set to UTF-8 encoding. You can stop the warnings by adding ‘no warnings 'utf8';
’ at the top of your scripts (not recommended), or you can set the file handle encoding:
no warnings 'utf8'; # optional (not recommended).
binmode STDOUT, ":encoding(utf8)"; # set UTF-8 encoding for `STDOUT`.
binmode STDERR, ":encoding(utf8)"; # set UTF-8 encoding for `STDERR`.
NOTE — Text Encoding
You can also use ‘":encoding(
UTF-8
)"
’, in which case Perl will perform UTF-8 validity checking on the input and output strings. This is generally preferable, but you can ‘use utf8
’ if you feel that your text is properly encoded, and you do not want the overhead.
You can check out the Unicode Introduction at perldoc, and/or the Unicode FAQ.
Generally speaking, you should not use Unicode for variable and function names in Perl, just to avoid potential problems. But your source file should be UTF-8 encoded, as good convention. To tell Perl that you are using a UTF-8 source file, add: ‘use utf8;
’ at the top of your scripts.
UTF-8 is useful and ubiquitous. By all means, embrace it and use it. All modern POSIX terminals support it as default. The Windows Command Prompt Console supports it to some degree with the: “change codepage command” (chcp 65001
). XML and HTML use UTF-8. Decent editors support UTF-8. The list goes on.
Here is a code snippet from Getting Started with Perl that works portably across most, if not all operating systems that Perl supports and where UTF-8 terminals are used:
use v5.16; #<-minimum version.
use utf8; #<-*source* in UTF-8.
use open ':std', ':encoding(UTF-8)'; #<-all std. I/O in UTF-8.
#= Windows-specific code. Will be ignored on other systems.
#
if ($^O eq 'MSWin32') {
use if $^O eq 'MSWin32', 'Win32::Console';
Win32::Console::OutputCP(65001);
system(''); #<-enable vt escape sequences processing.
}
With the above code you are not reliant on the user setting the Console code page to 65001
; the script does that itself. As bonus, it ensures that VT100/ANSI terminal escape sequences are processed by the Console.
Variables
Literals by themselves are of little use; at some point we would like to store values in variables. Traditionally, you only had to assign a value to a non-existent variable, and it would be automatically defined. This can cause some difficult bugs, so this is deprecated, even if it is the default behaviour for legacy scripts.
Because of our recommendation to use strict;
or implicitly: use v5.16;
as minimum, you will have to define your variables before you can use them. You can initialise them at the same time as you define them.
Scalar variables can store single values of any kind: number, string, reference, etc. They can be modified to store any other kind of value at any time. You can start by storing a number in a variable, and later store a string in the same variable; Perl will not mind.
Variables can be interpolated in double-quoted string literals, and here documents / strings, as we mentioned before.
Variable Definitions
Although Perl has a number of ways to define variables, the most common and useful are: my
and our
. The latter is only useful when you start writing modules, so we will ignore it for now; which leaves the my
function (as Perl documentation calls it). References to local
should be avoided in the short term, and should not be used even when you understand it — fair warning, local
does not work as you would expect, or even as the name suggests; especially if you have experience with other programming languages.
To define a single scalar variable, is simple: ‘my $
‹ident›;
’. It can optionally be initialised at the same time: ‘my $
‹ident› =
‹ident›;
’. The ‹ident›ifier cannot start with a digit, only underscores or alphabetic characters are allowed. The following characters may include digits.
Syntax — Scalar Variable Definitions
my $
‹ident›;
define scalar variable ‹ident› containingundef
.my ($
‹ident›);
same as above, thus parentheses are optional.my $
‹ident›=
‹expr›;
define & initialise scalar variable ‹ident›.my ($
‹ident›)
=
(
‹expr›);
same as above; parentheses thus optional.my ($
‹ident1›,
‹ident2›,
…);
define arbitrary number of variables, all containingundef
.my ($
‹ident1›,
‹ident2›,
…)
=
(
‹expr1›,
‹expr2›,
…);
define & initialise arbitrary number of variables; too few initialisers will causeundef
to be placed in remaining variables.
Parentheses are not optional when defining several variables at the same time.
Scope
The term scope refers to an area inside a program where a user-defined identifier (like a variable name) is visible. If a variable is visible, we say it is “in scope”; if not visible, we say it is “out of scope”. Scopes can nest, and are introduced by curly-brace delimited blocks.
Every Perl script has a scope that is global to that script. Any block will create a nested scope. Inside a block you can have further blocks. For humans to keep track of the nesting, it is crucial that consistent indentation is used.
Identifiers defined on a higher level, are visible in all nested blocks. A name can be redefined in a nested block, in which case it will hide the higher-level identifier with the same name, until the end of the nested block.
use v5.16; use utf8;
my $a = 123;
say "\$a = $a"; #=> `$a = 123`.
#= A block, which is a local scope. Variables defined inside, are
# only visible inside the block. Higher-level variables are still
# visible inside the block.
{
say "\$a = $a"; #=> `$a = 123`.
$a = 234; #<-change higher-level `$a`.
my $b = 345;
say "\$b = $b"; #=> `$b = 345`.
my $a = 456; #<-hide/shadow higher `$a`.
say "\$a = $a"; #=> `$a = 456`.
} #<-`$b` and new `$a` not
# visible after this block.
say "\$a = $a"; #=> `$a = 234`.
When a nested variable is defined with the same name as a variable in a higher level, the new variable hides or shadows the original. Until this new name goes out of scope, the original value cannot be accessed.
Variable Assignment
Apart from the my
prefix, the same syntax as above can be used to assign the results of expressions to existing variables, even multiple variables at the same time. In multiple assignments, the left-hand variable list can contain undef
, in which case the corresponding right-hand expression will be ignored.
my ($a, $b, $c); #<-all `undef`.
$a = 123; #<-`$a`<-`123`.
($b, $c) = (456); #<-`$b`<-`456`; `$c`<-`undef`.
($a, $b) = ($b, $a); #<-swap `$a` and `$b`.
($a, undef, $c) = (11, 22, 33); #<-`$a`<-`11`; `$c`<-`33`.
($a, $b, $c) = (11, undef, 33); #<-as above; `$b`<-`undef`.
say $a, $b = 22, $c, my $d = 44; #<-`$b`<-`22`; new `$d`<-`44`.
Although not shown in the example code above, it is not illegal to have too many expressions in the right-hand list — the unused ones will simply be discarded. Notice two further points:
- Assignment produces a result, because it is an operator, and we can pass its result to
say
. - Variables can be defined in a surprising variety of locations, as witnessed above.
Although this is more a quality of operators, and in particular the assignment operator, multiple variables can be given the same value with a simple pattern. Assuming the variables above still exist, we can set them all to 0
as follows:
This works because the assignment operator associates with its operands from right to left, so it is executed as if you parenthesised the expression as follows:
This is common idiom, and well-defined behaviour, so you are welcome to employ it liberally (without the superfluous parentheses, of course).
String Modifications
Parts of a string variable can be extracted with the substr
function, which can also modify parts when appearing on the left of assignment.
use v5.16; use utf8;
my $str = "ABC-DEF-GHI-JKL-MNO-PQR-STU-VWX-YZ";
say $str;
my $ndx = index $str, "JKL"; #<-find position of `JKL`.
(substr $str, $ndx-1, 4) = ""; #<-delete `-JKL`.
say $str;
(substr $str, $ndx, 0) = "#######-"; #<-insert some hashes.
say $str;
substr($str, $ndx, 8) = ""; #<-delete those hashes.
say $str;
substr($str, 0, 4) = ""; #<-delete `ABC-`.
say $str;
ABC-DEF-GHI-JKL-MNO-PQR-STU-VWX-YZ
ABC-DEF-GHI-MNO-PQR-STU-VWX-YZ
ABC-DEF-GHI-#######-MNO-PQR-STU-VWX-YZ
ABC-DEF-GHI-MNO-PQR-STU-VWX-YZ
DEF-GHI-MNO-PQR-STU-VWX-YZ
Unlike some other languages, you cannot subscript a string to obtain single characters. You will have to use substr
.
Variable References
A reference is like an address, in that it indicates the location of a value. This means that you may have two or more references to the same location. Accessing the value at that location indirectly via the reference will be an alias for the value stored there.
To take the reference of a variable, the backslash (\
) is used as prefix operator.
my $var = 123; #<-location `$var` stores `123`.
my $ref = \$var; #<-location `$ref` stores location `$var`.
my $rrf = \$ref; #<-location `$rrf` stores location `$ref`.
Given the above variable definitions, we can access the values the references represent, by using an extra $
sigil (because the values are all scalars):
use v5.16; $, = ", ";
say "\$var = $var"; #=> `$var = 123`.
say "\$\$ref = $$ref"; #=> `$$ref = 123`.
say "\$\$\$rrf = $$$rrf"; #=> `$$$rrf = 123`.
say ref($ref), ref($rrf), ref($$rrf); #=> `SCALAR, REF, SCALAR`.
As we can see from the last line, a REF
(reference) is a special type in Perl, but still a “kind of scalar” value.
Without further Perl knowledge, like arrays, hashes, subroutines and object-oriented programming, references are of little use. Do take note of the fact that assigning to an indirection changes the original value. Again, assuming the above variables are still in effect:
use v5.16;
$$ref = 234;
say "\$var = $var"; #=> `$var = 234`.
$$$rrf = 345;
say "\$var = $var"; #=> `$var = 345`.
So, $$ref
represents the exact same location and value as $var
, and is in all respects, an alias for $var
. Just like $$rrf
is an alias for $ref
, and thus $$$rrf
is an alias for $var
as well.
Variable Summary
Once variables have been defined, we can initialise them during the definition, or later assign new values to them at any time. The contents of variables can be expanded in double-quoted string literals, using interpolation The location where a variable is defined, determines its scope.
You can assign undef
to a variable, or use ‘undef
$
‹ident›’ to “clear” the current value stored in a variable.
Parts of a string variable can be modified with the powerful substr
function (which can also extract parts).
The readline
function, or its alias <STDIN>
, can be used to read strings from standard input, which can then be saved in a variable. If no assignment is made, the return value is stored in the default variable: $_
. Passing no arguments to the print
or say
functions, will print the contents of $_
. Lines read from input are often trimmed with the chomp
function.
2017-12-22: Edited. [jjc]
2017-12-19: Created. [brx]