Sunday, 12 March 2017

Week 23: (6th March to 12th March 2017) or 'STATS by STARS'

If one spends a minute to consider the most illustrious and notable roads that are stitched into the fabric of this great country, you may conclude that what makes them so eminent can seldom be traced back to the liquefied materials that gyrate in the drums of cement mixers. When the odious viscous mixture is swirling around under the watchful eye of an engineer, there is very little hint that what is about to be pasted on the ground will reach the pages of city guidebooks, and construction degrees. Only the ideas which revolve in the mind of the architect can predict that. And even then, it's only ever a prediction.

Some rather brave ideas have been cemented into notable roads, in recent times. Oxford Street is a spine to the country's most celebrated department stores whilst Swindon's Magic Roundabout (of roundabouts) whisks a hellish dose of confusion into millions of daily commutes. Of course, within this inventory, sit the jumbled strands of knitting yarn, often called Spaghetti Junction, which are draped in knots across the suburbs of Birmingham. Then there are roads which seem to feast on the merits of anonymity. They are the obscure, forgotten passages spun like spiders across the landscape; hems to the fields, tunnels through the forests, meandering laces of mystery. They are the oldest of passage-ways, beaten by hooves, pressed by carts, flattened by tractors, nosed by dogs, and showered by centuries of weather. I prefer these lanes, and on such a lane I journeyed this week.

Aside from the occasional village signpost or community hall noticeboard, there was little in the way of evidence to suggest I was motoring through the Buckinghamshire countryside. Washed by the rain the night before, flat open fields were draped over the land as if they had just been ironed. A cluster of thatched roofs sat on the shoulders of farmyard cottages, and scaling the bricks were the usual floral mountaineers of Ivy and Virginia Creeper. Shelters gathered on grassy patches alongside the edge of roads waiting for buses. Crows sat on the telegraph lines, eavesdropping on village gossip. Travel along this same stretch of concrete and you will, as I did, skirt the border of Bedfordshire. You will, by following the road, attempt many times to enter the county, only to find yourself rejected and catapulted back into Buckinghamshire. And so the road concedes defeat and resigns to following the border around, and you have no choice but to join it on this peripheral journey. But with patience and optimism, you will eventually happen upon another road; an overall more perseverant route that some time ago punctured successfully through the Bedfordshire border and now leaves the gateway open for vehicular access. It is this road that I travelled upon, and it's this road that led me to Cranfield.

The story of Cranfield is a narrative that stretches across multiple volumes. Few universities hold diaries as interesting as those written by the chroniclers of this forgotten institution. I can only suggest that those interested in the intricate details consult the leaves of Bedfordshire's history books. In short, Cranfield University sits on an old RAF air base and the skeletons of those war-time years still reside to a certain extent. The Vincent Building sits under a disused airport hanger, whilst further across the campus, another hanger and runway are still in operation. If you take up the opportunity to go on a tour, you will inevitably happen upon a room collecting engines.


At the conclusion of the Second World War, the base was closed and the College of Aeronautics was set up as one of the UK's centres for the development of many aspects of aircraft research. In the convoluted way events tend to turn, Cranfield eventually adopted the responsibility of agricultural engineering too, which paved part of the way for it becoming an Institute of Technology. Shortly after that, the School of Management hopped aboard this wagon and what had been a technological institute now morphed into a University. And I should make this point here: a University exclusively for the minds of Postgraduate Students.

"There are many still around that miss the days we were an Institute of Technology", my tour guide pondered, gazing just over my shoulder at a British Airways plane parked on the runway. "We've only got Masters students here; it doesn't have the usual feel of the hundreds and thousands of 'undergrads' floating about...it has more of an institute feel about it."

Perhaps that is why I like it.


I have to make a confession. When I first learnt of Cranfield's existence, and journeyed there a few years ago, I left learning very little about its role in the war. That perhaps was the fault of my tour guide at the time, but I do remember the space devoted to agricultural science. I was reminded of the ornate and grandiose greenhouses which hold, I'm told, most of the state-of-the-art sensors in plant observation. The technology held behind the glass can simulate rainfall, scan roots underground, and conduct large-scale investigations into the ways plants and soil interact.

As enticing it may be to explore the innovative wonders germinating from these greenhouses, and indeed the remainder of the UK's largest university campus, my four-day visit this week had a very different agenda. I was actually here to attend another training course, administered by the STARS programme, and convened by two statisticians. The course, in very simplistic terms (although simplicity didn't have much of a role to play) was aimed at outlining the role of statistics in soil science, (re)introducing some basic statistical concepts, and to introduce the R platform. I will come to R in a moment.

The correct employment of statistics within the realms of Soil Science is not something that many can boast, according to Richard Webster; another statistician within the discipline. "Much soil research needs statistics to support and confirm impressions and interpretations of investigations in field and laboratory," he writes in a paper published in 2001. "Many soil scientists have not been trained in statistical method and as a result apply quite elementary techniques out of context and without understanding." Reading that (and the rest of his scratchings on the topic) and one cannot fail to realize why the STARS programme felt that a statistics course was required.


I have thought very long and hard about how much to write about the course. I could type away in gay abandon and deliver an exhaustive (and exhausting) transcript, detailing the byzantine procedures we exercised in complex data analyses. I could rattle on for days about the mathematical formulae we cautiously applied to build our statistical models. I could guide you through the inscrutable code we used to produce various graphs and charts. But I could not put my readership through such unfathomable bewilderment and thus I will simply describe three of the most interesting sessions.

***

Beyond the panes of glass that separate a Cranfield computer room from the rest of the world, sits a great expanse of grass like a doormat for a giant. It's effectively a lawn, but on Tuesday it was a field. The task was simple. We wanted to know the average water content of this field. With a bursting anxiety to retreat into fresh air, and to reacquaint the hands with the soil, a soil scientist might leap out of his or her chair, grab a soil moisture probe and start stabbing the field to collect results. After many hours at a computer screen, such practice is tempting, but not statistically valid. One must sample randomly. One must sample methodically. When I was young, and considering similar experiments at school, we often used random number tables to select randomized co-ordinates upon which we would use to sample. Recent computational developments have made such tables redundant; random co-ordinates can now be produced at a touch of a button in a computer programme called R. (A touch of a button actually turns into quite a few buttons and a string of complex code). Those who persevere with R will eventually obtain random co-ordinates, which can then be plotted onto a map. In my case, we used R to churn out a set of 15 random points.


A revolving door whirls at the front of the Vincent Building, and it is in this orbiting mechanism that a statistician turns back into a pedologist. Emerging out onto the field of grass, my colleague and I proceeded to measure the moisture content of the surface inch of soil. In my hand was a GPS device, which sent us trooping to each of our 15 randomly selected points. In my colleague's hand was a soil Theta probe which measures soil moisture.


Equipped with fifteen measurements, we processed ourselves back through the revolving doors, turning back into statisticians and ready to analyse our findings. We were to learn much more about our 'field'. We were to learn that the average water content was 47.6%; that the minimum content was 39.5% and the maximum was 54.4% and that the water content this year was on average 12% higher than that measured last year by the first cohort of STARS.

The R package is useful, not only in outputting a set of random co-ordinates, but in suggesting the quickest route between sampling points. To some extent, the doctrine that 'time is money' is true even in soil science, and it is thus sensible to develop an efficient sampling plan. This includes aspects such as the journey one executes to yield the required data. In some experiments, this 'travelling salesman' tool is not needed, especially if you're sampling in a straight line. But in the case of a randomly selected range of co-ordinates, which might be scattered over a map, it can be useful to have a pre-defined route. The map of points in the bottom right of the figure below to some extent demonstrates what I mean.


As the week grew older, and as we skated through various statistical methods, I was beginning to feel rather giddy, as if I had become intoxicated by a cocktail of mathematics. Even the convenors, I think, realized that the four days had been very intensive. On the final day, I noted one apologizing for the "deluge of geostatistical jargon". Later on, one particular statistical test was described as a "palaver". I am not ashamed (nor do I believe I am the only PhD student) to have departed feeling like I had left the straight furrows of Soil Science to emerge into a baffling mathematical wilderness. But journey into this world we must, for the very validity of our investigations is hinged upon the use of rigorous statistical methods. And as taxing as it is to titivate over the maths, we must persevere.

There is, however, a point that is worth mentioning, and it stems from a paragraph about 'regression' written by Richard Webster again, in the same article that I cited a little earlier. And it set me thinking. It follows:

"I described regression, guided authors on its correct application, and warned of its improper use in a previous article. Yet misunderstanding and abuse continue, and I do not know what more I can do to educate authors on the subject. Papers in which regression has been applied thoughtlessly continue to pour into my office, and it is no exaggeration to say that in most the regression is inadequately explained, inappropriate, unnecessary, or just plain wrong." (Richard Webster, 2001).

So, if Webster and his colleagues have reached the point at which they know of no solution to educate the authors, why do authors continue to misunderstand? Could it be that the mathematical "jargon" (as it was described this week by a statistician) clouds their comprehension? Could it be that computer software like R has reduced scientists to pure 'mouse-clickers'? Like Webster himself, I am unsure.

But perhaps being unsure is the desirable instinct when doing statistics. After all, as George Canning once said, "I can prove anything by statistics except the truth".



***

No comments:

Post a Comment