vespaclient/src/perl/PERL_BEST_PRACTISES


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361

To try and make the perl tools good and consistent, here is a list of best
practises used within the modules.

(Whether they are best can of course be debated, but what's listed is what is
currently used)

1. Always use strict and warnings first thing.

There is a lot of stuff legal in perl for backward compatability and ease of
writing one liners. However, these statements are frequent source of bugs in
real code. All modules and binaries should use strict and warnings to ensure
that these checks are enabled. (There is a unit test in the module grepping
source to ensure this). Thus, pretty much the first thing in all perl files
should be:

  use strict;
  use warnings;

2. Use perl modules.

We want to group functionality into multiple files in perl too. A perl module is
just another perl file with a .pm extension, which minimally can look something
like this:

Yahoo/Vespa/VespaModel.pm:

  package Yahoo::Vespa::VespaModel;

  use strict;
  use warnings;

  my %CACHED_MODEL; # Prevent multiple fetches by caching results

  return 1;

  sub get {
    ...
  }

Yahoo/Vespa/Bin/MyBinary.pl:

  use strict;
  use warnings;
  use Yahoo::Vespa::VespaModel;

  my $model = Yahoo::Vespa::VespaModel::get();

2a. Module install locations.

Perl utilities are installed under $VESPA_HOME/lib/perl5/site_perl

2b. Aliasing namespace.

Perl doesn't have that great namespace handling. It's not like in C++, where we
can be in the storage::api namespace and thus address something in the
storage::lib namespace as lib::foo or even refer to another instance in the
same namespace. Thus, if the user of the VespaModel module above were
Yahoo::Vespa::MyLib, it still has to address VespaModel with full path by
default.

It is possible to create aliases in Perl to help this. Using an alias the
MyBinary.pl code above could look like:

  ...
  use Yahoo::Vespa::VespaModel;

  BEGIN {
    *VespaModel:: = *Yahoo::Vespa::VespaModel:: ;
  }

  my $model = VespaModel::get();

The alias declaration doesn't look very pretty, but it can be helpful to get
code looking simple.

2b. Exporting members into users namespace.

Another option to using long prefixed names or aliasing, is to export names
into the callers namespace. This can be done in a module doing something like
this:

Yahoo/Vespa/VespaModel.pm:

  package Yahoo::Vespa::VespaModel;

  use strict;
  use warnings;

  BEGIN {
      use base 'Exporter';
      our @EXPORT = qw( getVespaModel );
      our @EXPORT_OK = qw( otherFunction );
  }

  my %CACHED_MODEL;

  return 1;

  sub getVespaModel {
    ...
  }
  sub otherFunction {
    ...
  }

Yahoo/Vespa/Bin/MyBinary.pl:

  use strict;
  use warnings;
  use Yahoo::Vespa::VespaModel;

  my $model = getVespaModel();

In this example, the getVespaModel function is imported by default, while
otherFunction is not, but can be included optionally. You can specify what to
include by adding arguments to the use statements:

use Yahoo::Vespa::VespaMode; # Import defaults
use Yahoo::Vespa::VespaModel (); # Import nothing
    # Import other function but not getVespaModel
use Yahoo::Vespa::VespaModel qw( otherFunction );

(The qw(...) function is just a function to generate an array from a whitespace separated string. Writing qw( foo bar ) is equivalent to writing ('foo', 'bar'))

You can also export/import variables, but then you need to prefix the names
with the type, as in "our @EXPORT = qw( $number, @list, %hash );".

Note that you should prefer to export as little functions as possible as they
can clash with names used in caller. Also, the tokens you do export should have
fairly descriptive names to reduce the chance of this happening. An exported
name does not have a module name tagged to it to include context. Thus, if you
don't export you can for instance use Json::encode, but if you do export you
likely need to call the function encodeJson or similar instead.

2c. Prefer private variables (my instead of our)

When declaring variables with 'my' they become private to the module, and you
know outsiders can't alter it. This makes it easier when debugging as there are
less possibilities for what can happen.

2d. Prefer calling functions or exported variables rather than referencing
global variables in a module from the outside.

Referencing non-declared variables in another module does not seem to create
compiler warnings, nor does using private (my) declared variables. Thus it's
better to refer to imported variables or call a function, such that the
compiler will tell you when this doesn't work anymore.

2e. Put all function declarations at the bottom.

When a perl module is loaded, the code within it run. If that doesn't return
true, that means the module fails to load. Thus, traditionally, perl modules
often end with 1; (equivalent to return 1;) to ensure this. However, this mean
you have to read through the entire module to look for module code run.

By doing exit(...) call in main prog before function declaration and return; in
modules before function declarations, it is easier for reader to see that you
haven't hidden other code between the function declarations. (Unless you've
hacked it into a BEGIN{} block to enforce it to run before everything else)

2f. Make it easy to reinitialize in unit tests.

By putting initialization steps in a separate init function, rather than doing
it on load, unit tests can easily call it to reinitialize the module between
tests. Also this separates declarations of what exist from the initialization so
it is easier to see what variables are there.

3. Confess instead of die.

The typical perl assert is use of the 'die' function, as in:

  defined $foo or die "We expected 'foo' to be defined here";

The Utils package contains a confess function to be used instead (Wrapping an
external dependency), which will do the same as 'die', but will add a
stacktrace too, such that when encountered, it is much easier to find the
culprit.

4. Do not call exit() in libraries.

We want to be able to unit test all types of functions in unit tests, also
functionality that makes application abort and exit. The Utils defines an
exitApplication that is mocked for unit tests. Assertion types of exits with
die/confess can also be catched in unit tests.

5. Code conventions.

  - Upper case, underscore divided, module level variables.
  - Camel case function names.
  - Four space indent.

6. Naming function arguments.

For perl, a function is just a call to a subroutine with a list containing
whatever arguments, called @_. Using this directly makes the code hard to read.
Naming variables makes this a bit easier..

  sub getVespaModel { # (ConfigServerHost, ConfigServerPort)
    return Json::parse(Http::get("http://$_[0]:$_[1]/foo"#));
  }

  sub getVespaModel { # (ConfigServerHost, ConfigServerPort) -> ObjTree
    my ($host, $port) = @_;
    return Json::parse(Http::get("http://$host:$port/foo"#));
  }

In the latter example it is easier to read the code.

The argument comment is something I usually add for function declarations to
look better with vim folding.. When I fold functions in vim, the below line will
look like

+-- 4 lines: sub getVespaModel (ConfigServerHost, ConfigServerPort) -> ObjTree

Using such a convention it is thus easier to read the code, as you may be able
to see all your other function declarations while working on the function you
have expanded.

6b. Functions with many arguments.

If you create functions with loads of parameters you can end up with a messy
function, and a hard time to adjust all the uses of it when you want to extend
it. At these times you may use hashes to name variables, such that the order
is no longer important..

  sub getVespaModel { # (ConfigServerHost, ConfigServerPort) -> ObjTree
    my $args = $_[0];
    return Json::parse(Http::get("http://$$args{'host':$$args{'port'}/foo"#));
  }

  getVespaModel({ 'host' => 'myhost', 'port' => 80 });

Using this trick, you can have defaults for various arguments that can be
ignored by users not caring, rather than having to pass undef at many positions
to ensure order of parameters is correct.

Note however, that this looks a bit more messy in the function itself, and it
makes it more important to make comments of what arguments are actually handled
and which ones are not optional.. I prefer to try and have short argument
lists instead.

7. Constants

Sometimes you want to declare constants. Valid flag values for instance. You
can of course just declare global variables, but you have no way of ensuring
that they never change, which can be confusing. To define constants you can
do the following:

  use constant MY_FLAG => 8;

This constant is referred to without the usual $ prefix too, so it is easy to
distinguish it from variables. These constants can also be exported, enabling
you to create function calls like:

  MyModule::foo("bar", OPTION_ARGH | OPTION_BAZ);

Though this of course pollutes callers namespace again, so he has to
specifically not include them if he otherwise would have a name clash.

8. Libraries not in search path

Sometimes people install perl libraries in non-default locations. If temporary
you can fix this by add directory to PERLLIB on command line, but if permanent,
the recommended way to find the libraries is to add the directory to the search
path where you include it, like the Yahoo installation for the JSON library:

  use lib '$VESPA_HOME/lib64/perl5/site_perl/5.14/';
  use JSON;

9. Perl references

In perl you can create references to variables by prefixing a backslash '\'.

  my @foo ; my $listref = \@foo;
  my $var ; my $scalarref = \$var;
  my %bar ; my $hashref = \%bar;

You can also create references to lists and hashes directly:

  my $listref = [ 1, 2, 4 ]; # [] instead of () to get ref instead of list.
  my $hashref = { 'foo' => 3, 'bar' => 'hmm' }; # {} instead of ()

To check what a variable is you can use the ref() function:

  ref($scalarref) eq 'SCALAR'
  ref($listref) eq 'ARRAY'
  ref($hashref) eq 'HASH'
  ref($var) == undef

To dereference a reference you can add a deref clause around it:
  my @foo = @{ $listref };
  my %bar = %{ $hashref };
  my $scalar = ${ $scalarref };

If the insides of the clause is easy, you also omit it.
  my $scalar = $$scalarref;
  my %bar = %$hashref;
  my $value = $$hashref{'foo'}

You can also dereference using the -> operator.
  my $value = $hashref->{'foo'};
  my $value2 = $listref->[3]; # Element 3 in the list

The -> operator is typically used when traversing object structures.

10. Perl structs

Perl object programming requires some blessing and doesn't look that awesome,
so I typically mostly program functionally. However, at the bare minimum one
needs to be able to create some structs to contain data that isn't bare
primitives.

Perl's Class::Struct module implements a way to define structs in a simple
fashion without needing to know how bless works, module inheritation and so
forth.

An example use case here is Yahoo::Vespa::ClusterState

  use Class::Struct;

  struct( ClusterState => {
    globalState => '$',
    distributor => '%',
    storage => '%'
  });

  struct( Node => {
    group => '$',
    unit => 'State',
    generated => 'State',
    user => 'State'
  });

  struct( State => {
    state => '$',
    reason => '$',
    timestamp => '$',
    source => '$'
  });

# Some file using it.

  use Yahoo::Vespa::ClusterState;

  my $clusterState = new ClusterState;
  $clusterState->globalState('UP');
  my $node = new Node;
  $node->group('Foo');
  $clusterState->distributor('0', $node);

  ...

  my $group = $clusterState->distributor->{'0'}->group;
  my $nodetype = 'storage';
  my $group = $clusterState->$nodetype->{'0'}->group;

Some notes:
  - The names of the structs are automatically imported. Thus you don't need to
    worry about prefixing or aliasing, but be aware names can collide for user.
  - $, % or @ indicates if content is scalar, hash or list. A name indicates the
    name of another struct that should have the content.