Monday 31 August 2020

Mutant algorithms

 The word "algorithm" has caused a storm in recent news in the UK. Due to COVID-19 school children were not able to sit their exams. This left 16 and 18 year olds waiting to see how they would be assessed, and had obvious implications for their academic or career futures.  As you may know, the grades were awarded based on an "algorithm", which our Prime Minister later described as mutant. According to the BBC news, he said "'Mutant algorithm' caused exam chaos." This begs the question, what does our PM think a mutant algorithm is?

The news in the UK has talked generally about the algorithm's inputs being course work and teacher's estimated grades. These are "mutated" (or adjusted) by the algorithm to take into account a school's performance over the last three years. This means schools whose pupils sometimes struggle are more likely to be down-graded. The precise details are buried in a 319 page report. Feel free to read it all and report back. TL;DR; Private and public schools tend to get higher grades than government run schools, so poorer pupils tended to get down-graded and richer pupils did not. Some form of mutation, or even perversion, perhaps of justice, but not in the algorithm. 

Now, some algorithms do use mutation. In fact genetic algorithms rely on mutation to seek out new solutions to problems. This is guided by a fitness function, to check the "mutant algorithm" is doing what's required. You can test such algorithms to see what they do, and keep an eye on them as they run to check they are heading the right way. You frequently spend a long time tuning parameters to get better results. This, on the face of it, has nothing whatsoever to do with the "mutant algorithm" our PM was talking about. 

There has also be a hint of slur on the programmers who wrote the algorithm, suggesting the idea was good and proper but the naughty programmers took it upon themselves to do something completely different that got out of hand, like a Marvel movie. Think Magneto (naughty programmers) versus Charles Francis Xavier (sensible people like, our PM? Go figure). I am sick of programmer bashing and the general misunderstanding of algorithms.

Where a genetic algorithm uses mutation, or a Monte Carlo simulation uses random numbers as input, it is still possible to test the algorithm is doing what you require. Programmers should never abdicate responsibility for what they have built. However, it is highly irresponsible of the news to allow propaganda and misrepresentations to flourish like this. 

A while ago, the Imperial College model for COVID-19 was open-sourced. At the time many people raised bug reports against it. One rumour suggested that running it twice with the same seed for the pseudo-random numbers would produce different results. Now, that might be described as a "mutant algorithm", but we'd usually describe this as buggy code. I don't believe our PM has the technical know-how to spot buggy code, but I'm willing to help him out if he wants. I'm also willing to be interviewed by the BBC to explain some of these technical issues in more detail, if they are interested. Or I could find other technical people who could equally well help out.

DM me.

https://twitter.com/fbuontempo


 



Sunday 29 March 2020

Can a decision tree tell us about wine categories?

I previously wrote an overview showing how decision trees work: http://buontempoconsulting.blogspot.com/2019/07/decision-trees-for-feature-selection.html

This time, let's build a decision tree with some data. There are many freely available data sets used to explore machine learning, such as the Iris dataset, in the UCI repository.

So let's try another one. The so-called wine dataset. This has three types of wine, with 13 attributes. Though many blogs list the attributes, I have been unable to find out what these three mystery types of wine are. They are three different Italian cultivars, but I have no idea what.

Rather than concentrating on building a decision tree to accurately categorise the wine, giving us a way to predict the type of another wine based on some or all of the 13 attributes, let's build a tree and see what it says.

These data sets are so common, they can be loaded directly from many machine learning packages, such as the python module sklearn. This also has a DecisionTreeClassifier.

So,

from sklearn.datasets import load_wine
X = data.data
y = data.target
estimator = DecisionTreeClassifier(max_depth=2)
estimator.fit(X, y)

We asked for a maximum depth of 2, otherwise it makes a tree as deep (or high) as required to end up with leaves that are "pure" (or as pure as possible). In this case each is the same category of wine. Limiting the depth means it won't get as deep, or wide. But the first few layers will still show us which attributes are used to split up the data.

I say, "show", but we need to see the tree it's made. There are various ways to do this, but I'll use this:

from sklearn import tree
from IPython.display import SVG
from graphviz import Source
from IPython.display import display

graph = Source(tree.export_graphviz(estimator, out_file=None
   , feature_names=labels, class_names=['0', '1', '2']
   , filled = True))
display(SVG(graph.pipe(format='svg')))

Unfortunately, I've had to stick with class names, i.e. wine categories, of 0, 1 and 2, because I have no idea what they really are.

This generates the following picture:


The first line tells you the attribute and the cut off point chosen. For example, any wine with proline less than or equal to 755 goes down the left branch. The gini index is the measure used to decide which attribute or feature to split on. If you look up the decision tree classifier, you'll find other measures to try. The samples tell you how many at that node. We start with 178 wines, with 71 in class 1, with fewer in the other classes, so it reports class 1 at the first node.

For proline less than 755, we have 111 samples, still mostly in class 1. For proline greater than 755 we have 67 samples, mostly in class 0. These 67 samples can then be split on flavanoids. Anything less than 2.165 is class 2, according to this tree. Anything greater is class 0. We do have some class 2 wine on the left-most branch as well, however, I had a brief wander round the internet to read about flavanoids in wine.
Wikipedia says

In white wines the number of flavonoids is reduced due to the lesser contact with the skins that they receive during winemaking.

Is class 2 white wine? Who knows. It could be. The decision tree made this stand out far more clearly than looking directly at the input data.


I've put the code in a gist if you want to play around with it:
https://gist.github.com/doctorlove/bf6e42658d5806a61669a844b885983b

I think I've included everything here though.



I was planning on giving this as a lightning talk at the ACCU conference, but since it was cancelled this year, because of COVID-19, I wrote this short blog instead. If you can figure out what the types of wine are, get in touch.




Thursday 19 September 2019

Swarm algorithms

I wrote a book about about genetic algorithms and machine learning. You can buy it here.




Apart from genetic algorithms and other aspects of machine learning, it includes some swarm algorithms. Where a genetic algorithm mixes up potential solutions, by merging some together, and periodically mutates some values, swarm algorithms can be regarded as individual agents collaborating, each representing a solution to a problem, They can work together in various ways, giving rise to a variety of swarm algorithms.  

The so-called particle swarm algorithm can be used to find optimal solutions to problems. It's commonly referred to as a particle swarm optimisation, or PSO for short. PSO is often claimed to be based on the flocking behaviour of birds. Indeed, if you get the parameters right, you might see something similar to a flock of birds. PSO are similar to colony algorithms, which are also nature inspired, and also having agents collaborating to solve a problem.

Suppose you have some particles in a paper bag, say somewhere near the bottom. If they move about at random, some might get out of the bag in the end. If they follow each other, they might escape, but more likely than not, they'll hang round together in a gang. By providing a fitness function to encourage them, they can learn, for some definition of learn. Each particle can assess where it is, and remember the better places. The whole swarm will have a global best too. To escape a paper bag, we want the particles to go up. By inspecting the current (x, y) position, the fitness score can be the y-value. The bigger, the better. For real world problems, there can be many more than two dimensions, and the fitness function will require some thought. 

The algorithms is as follows:

    Choose n
    Initialize n particles randomly
    For a while:
        Update best global position
        Move particles
        Update each particle's best position and velocity

The particles' personal bests and the overall global best give the whole swarm a memory, of sorts. Initially, this is the starting position for each particle. In addition to the current position, each particle has a velocity, initialised with random numbers. Since we're doing this in two dimensions, the velocity has an x component, and a y component. To move a particle, update each of these by adding the velocity, v, in that direction to the current position:

    xt+1 = xt + vx,t
    yt+1 = yt + yx,t

Since the velocity starts at random, the particles move in various different directions to begin with.  The trick comes in when we update the velocity. There are several ways to do this. The standard way adds a fraction of the distance between the personal best and global best position for each particle and a proportion of the current velocity, kinda remembering where it was heading. This gives each a momentum along a trajectory, making it veer towards somewhere between its best spot and the global best spot. You'll need to pick the fractions. Using w, for weight, since we're doing a weighted sum, and c1 and c2 for the other proportions, we have:

    vx,t+1 = wvt + c1(pt-xt)+c2(gt-xt)

If you draw the particles moving around you will see them swarm, in this case out of the paper bag. 

This is one of many ways to code your way out of a paper bag covered in my book. When a particle solves a problem, here being out of the bag, inspecting the x and y values gives a solution to the problem. PSO can be used for a variety of numerical problems. It's usually described as a stochastic optimisation algorithm. That means it does something random (stochastic) to find the best (optimal) solution to a problem. You can read more here.  






Tuesday 16 July 2019

Decision trees for feature selection

I asked twitter who is using decision trees and what for. Most were using them, unsurprisingly, to make decisions. It wasn't always clear how the trees themselves were built.

If you are armed with data, so that each row has some features and a category - either yes/no, or one of many classes - you can build a classifier from the data. There are various ways to decided how to split up the data. Nonetheless, each algorithm follows the same overall process. Start with a tree root node, with all the data, and add nodes with part of the data.

Then

  1. Pick a feature
  2. Split the data set, some to the left branch, some to the other branch (or branches) depending on the value of the feature
  3. If all the data at a node is in the same category (or almost all in the same category) form a leaf node
  4. Continue until each node is a leaf node.



This is a bit like a sorting algorithm: in quick sort, you choose a pivot value and split the data down one branch or the other, until you have single points at nodes. Here we don't choose a pivot value but features. The way to pick a feature can be based on statistics, information theory or even at random. At each step, you want to know if all the items in one category tend to have the same value or range of values of a feature. Once you are done you have a tree (or flow chart) you can apply to new data. Each way to split has various pros and cons. You can even build several trees. A random forest will build lots of trees and they vote on the class of new, unseen data. You could build your own voting system, using a variety of tree induction techniques. This might avoid some specific problems, like over-fitting from some techniques.

You can use decision tree induction in a variety of places, even if you don't want a full decision tree or ruleset. A rule set is a tree written in as a sequence of if statements. For example,

If currency is USD then data goes missing.

If you are moving a data source from one provider to another, and some data goes missing, can you spot what the missing items have in common? You could do a bit of manual investigation, say using pivot tables and data filters in Excel. However, a decision tree might find common features far more quickly than you can. This is a form of feature selection - using an algorithm to find pertinent features. Find a decision tree implementation in a language you know, or write one yourself, and have an experiment.

My book, Genetic Algorithms and Machine Learning for Programmers, has a chapter explaining how to build one type of decision tree. Take a look. Lots of machine learning frameworks also have tools to help you build decision trees. Next time you want to know what certain things have in common, try a decision tree and see you you learn. Machine learning is often about humans learning, rather than the machines.








Friday 3 May 2019

Log driven development

Everyone knows attempting to figure out what's happening by resorting to print statements is desperation. Everyone knows you should use TDD, BDD or some xDD to write code well, don't they? I am desperate. I suggest PDD, print driven development is slow, error prone but sometimes helpful.

I recently shoe-horned a new feature into a large existing code base. It's hard to make it run a small set of work and I wanted to check a config change create my new class via the dependency injection framework. Now, I could put in a break point and wait a while for it to get hit, or not. Fortuenately, the code writes logs. By adding a log line to the constructor, I could tell much more quickly that I'd got the config correct. Write the code, leave it running, have a look later. I am tempted to name this Log Driven Development.

On one hand, adding the moral equivalents of prints is a bad thing. Teedy Deigh mused on logging in Overload 126, saying

It is generally arbitrary and unsustainable and few people are ever clear what the exact requirements are, so it spreads like a cat meme.

And yet, if I add informative logging as things happen, I can see how the whole system hangs together. Jonathan Boccara mentioned finding logs when trying to understand an unfamiliar code base, in his talk at the ACCU Conference. Existing logs can help you understand some code. Looking at this from a different direction begs the question are my logs useful? As a first step seeing a log entry when I wired in my class felt like a win. It told me what I wanted to know. I must make sure the logs I write out as my class starts doing something useful are themselves useful.

This is a step in the direction of a walking skeleton style approach - get things running end to end, outside-in, then fill in the details. I'm much more comfortable with a TDD approach, but keeping an eye on the whole system and how it hangs together is important. Building up the new feature by writing useful logs means the logs will be useful when it's released. LDD - log driven development. Why not?















Wednesday 17 April 2019

ACCUConf 2019

The ACCU conference happened in Bristol again this year. For my first time ever, I was at a workshop. In fact, I ran a work shop with Chris Simons. We talked about Evolutionary Algorithms in practice. We gave a 90 minute talk later in the week, using the same Java framework (JCLEC), with slides here. High level summary: can your computer solve problems using a few random guesses and iteratively improve, using crossover (merging together previous attempts) and mutation (nudge things up or down, or flip bits). Answer yes: we managed to turn on all the bits in an array, code our way out of a paper bag (by firing virtual cannon balls based on an old Overload article), and finally in the 90 minute workshop, we generated code for Fizz Buzz.   

The full schedule is here. Having five tracks meant I missed lots, but talks will appear on the YouTube channel over time. I'll just give brief notes on a handful of talks I attended. The opening keynote was "Delivering software that is secure and usable - who’s job is it?" by  M Angela Sasse. Angela called out StackOverflow being functionally great but the security advice being bad, in contrast to using an official manual, wherein the security advice is great but it's functionally worse. This was based on measuring several developers attempting to use a software product. How can you actually measure security or usability? How are you currently measuring it? Mention was made of hard to follow security rules, which people tend to work around. Angela called for a way to reprogram the security experts. How good are they at conflict resolution? Do they have social marketing skills? Twitter devolved into quips about social engineering at that point.My final note says,
Programmers are tribal and seek approval. Try to trust and collaborate instead.

Next I'll mention "10 Techniques to Understand Code You Don’t Know" by Jonathan Boccara. He's written a book, which I've seen several people recommend. The 10 techniques fell into three groups: explore, speed read, and detail.
Exploring covered

  • using and finding the I/O frameworks, 
  • performing local analysis - getting the hang of one or two important functions, 
  • analysing call stacks to join the dots between modules. 

Speed reading covered

  • reading the end first - don't be put off by a long function, find the output or returns and worry about the rest later,
  • find frequent words, both count and span (total and lines with words)
  • filter on flow control - giving something like a table of contents for a book
  • scan for the main action - feel free to ignore catch blocks or elses, focus on the happy path

Finally, you sometimes need to start going into detail

  • try scratch refactoring
  • practice writing functions in the code
  • team up - strive for pair understanding

There was a discussion about flame graphs at the end, and he mentioned "How to read a book: the classic guide to intelligent reading" by Alder and van Doren. This points out you don't need to read a non-fiction book in order. Jump around, follow back links, jump straight in to what you want to learn. Very non-linear.

Next, I'll talk about "The anatomy of an exploit" by Patricia Aas. She started by mentioning the weird machine. You can see most programs as a finite state machine. An exploit jumps out of the finite states into other, unintended states. She looked at CWE-242; a list of potentially dangerous functions. The CWE is the common weakest enumeration, available online, listing things to avoid. Her talked pulled on things that might go wrong with gets or std::cin. Surprisingly, you get more warnings from C than C++. By disabling warnings, one at a time, we ended up with code to get a prompt. Once you have a shell on another machine, you can then do a variety of nefarious things.  She covered loads of things including ASLR; address randomisation, heap grooming and use after free. Security was a definite theme at this conference, and many developers understand far too little about it.

Herb Sutter gave Thursday's keynote on "De-fragmenting C++: Making exceptions more affordable and usable". He called out a divide between teams who can and who cannot use exceptions. Many libraries have a mix of exceptions and return code. He said "Pick a lane". C++ is supposed to be zero overhead

  • a feature only costs if it's used
  • it's better than coding it yourself

This is not true for exceptions. He considered the difference between program recoverable and non-recoverable errors. What can you do about stack exhaustion, for example? Who do you report problems to? Humans or code? Exceptions are automatic (a good thing) and invisible (a bad thing). He sketched out ways we could make exceptions have zero overhead. What this space.

Anthony Williams then talked about aysnc, executors and callbacks: "Here’s my number; call me, maybe. Callbacks in a multithreaded world". He called out a few things to be aware of. Does the order of your callbacks matter? Can you deregister them? He encouraged us to capture by value, rather than reference, unless we have a really good excuse.

At lunchtime, there was a book signing. I sold several copies of my book; "Genetic Algorithms and machine learning for programmers." Three others,  Anthony Williams, Ivan Cukic, and Jonathan Boccarra were also selling books, but I didn't get a chance to go talk to them. Thanks to ACCU for the chance to do this. I put mine in paper bags, and even wrote a receipt on one. The chapters in mine show how to code your way out of a paper bag, so it seemed sensible.



I gave a session with Chris Simons about how to teach your computer to code Fizz Buzz. We plan to write this up for ACCU's Overload magazine shortly.

On Friday, Paul Grenyer gave the keynote. He reminisced about people he'd met when he was an ACCU member, and all the things he'd done, some that worked and some that didn't, in Norwich, to grow the tech network. There's now a background discussion on accu-general and accu-members email lists about how to revive some things we used to do, and find new things to do, that will be valuable to the group. I'd love to see the mentored developers reboot.

Next, I went to "Interactive C++ : Meet Jupyter / Cling - The data scientist’s geeky younger sibling" by Neil Horlock. He talked about Code Club, and teaching people. This led nicely into using Cling/Jupyter to have notebooks for C++. Cling is an interactive (JITted) version of clang. I can't do his talk justice here. It was amazing. It managed to cope with templates, and a variety of things that blew my mind. He demonstrated using RISE to make RevealJS slides from a notebook, so I think I was watching a talk in a talk in a talk.

My notes have run out at this point. At the speakers' dinner we met EchoBorg. An actor (an echoborg) voiced the words of a chatbot. People volunteered to be interviewed to become an echoborg themselves. This set of cyberpunk style SciFi in my head. Again, I won't do it justice, but watching the conversation develop was incredibly interesting. Have a look at their websites:



I went to two talks on Saturday: "Windows Native API" by Roger Orr and
"Best practices when accessing Big Data or any other data!" by  Rosemary Francis. I was too tired to make notes by that point, and we left early, since we had a three hour drive home. Rog considered several ways to return 42 from a program and showed several steps that happen before main; something people don't always consider. He touched on security too. I didn't stay to the end of Rosemary's, but she was talking about tooling her company has developed to watch programs using big data and tracing bottlenecks. In my opinion, many data scientists make mistakes some programmers might avoid. Her first example was opening and closing a file in a loop. I wish machine learners and programmers could talk to each other more and help each other out.

I had a great conference, and the includecpp crew were there. I dipped in and out of the discord chat. It's lovely to see people supporting each other. Simple things, like chats about where to go for dinner.

Echoborg has left dystopian Sci-Fi short stories brewing in my head, and the Jupyter/cling talk left me with lots to explore. Thanks ACCUConf. Hope to be there next year.



Tuesday 12 March 2019

The ACCU's Overload magazine

ACCU is an organisation for programmers. Its original focus was C and C++, but now members use a variety of languages, talk about testing and process and how to keep learning. ACCU holds an annual conference in the UK, attended by people from around the world. There's even a YouTube channel of recorded talks from this.

As a member you get a discount for the conference, can volunteer to do book reviews, can participate in study groups, though these have been quiet lately, and get two magazines; the CVu members' magazine and Overload, which is open to anyone. There are also several local groups if you want to come along and meet us.

I've been a member for several years now. It's been a great networking opportunity and I have learn so much from other members. I love the magazines, and by starting to write for them myself, I stepped up my game. I began by writing book reviews, then tried some of the Student Code Critiques in CVu. Eventually, I wrote an Overload article, pulling together a discussion on the accu-general mailing list about floating point numbers.

I took on the role of Overload editor in 2012. We welcome articles from non-members as well as members. They are peer reviewed, meaning the author gets feedback, questions and suggestions. A surprisingly high number of writers have gone on to write books, myself included. (I mentioned I wrote a book about genetic algorithms and machine learning, yes?)

If you have an article you'd like to get published, let me know. We do accept existing blog posts, but the review team might well ask for slight improvements. There are some submission instructions here.

We welcome established writers as well as new writers. If you've never written an article, give it a go. You can learn a lot by trying to write something up. For example, as you try to explain something you may find gaps in your knowledge and understanding. Questions and suggestions from the review team will make your article better.

I love the ACCU and am looking forward to this year's conference, in just under a month. If you can't make the conference, find a local group, or consider joining the organisation. Or submit an article for Overload. Get involved.


ACCU Home page