Archive for Data Innovation Research Institute

Science for the Citizen

Posted in Education, Open Access, The Universe and Stuff with tags , , , , , , on March 20, 2017 by telescoper

I spent all day on Friday on business connected with my role in the Data Innovation Research Institute, attending an event to launch the new Data Justice Lab at Cardiff University. It was a fascinating day of discussions about all kinds of ethical, legal and political issues surrounding the “datafication” of society:

Our financial transactions, communications, movements, relationships, and interactions with government and corporations all increasingly generate data that are used to profile and sort groups and individuals. These processes can affect both individuals as well as entire communities that may be denied services and access to opportunities, or wrongfully targeted and exploited. In short, they impact on our ability to participate in society. The emergence of this data paradigm therefore introduces a particular set of power dynamics requiring investigation and critique.

As a scientist whose research is in an area (cosmology) which is extremely data-intensive, I have a fairly clear interpretation of the phrase “Big Data” and recognize the need for innovative methods to handle the scale and complexity of the data we use. This clarity comes largely from the fact that we are asking very well-defined questions which can be framed in quantitative terms within the framework of well-specified theoretical models. In this case, sophisticated algorithms can be constructed that extract meaningful information even when individual measurements are dominated by noise.

The use of “Big Data” in civic society is much more problematic because the questions being asked are often ill-posed and there is rarely any compelling underlying theory. A naive belief exists in some quarters that harvesting more and more data necessarily leads to an increase in relevant information. Instead there is a danger that algorithms simply encode false assumptions and produce unintended consequences, often with disastrous results for individuals. We heard plenty of examples of this on Friday.

Although it is clearly the case that personal data can be – and indeed is – deliberately used for nefarious purposes, I think there’s a parallel danger that we increasingly tend to believe that just because something is based on numerical calculations it somehow must be “scientific”. In reality, any attempt to extract information from quantitative data relies on assumptions. if those assumptions are wrong, then you get garbage out no matter what you put in. Some applications of “data science” – those that don’t recognize these limitations – are in fact extremely unscientific.

I mentioned in discussions on Friday that there is a considerable push in astrophysics and cosmology for open science, by which I mean that not only are the published results openly accessible, but all the data and analysis algorithms are published too. Not all branches of science work this way, and we’re very far indeed from a society that applies such standards to the use of personal data.

Anyway, after the day’s discussion we adjourned to the School of Journalism, Media and Cultural Studies for a set of more formal presentations. The Head of School, Professor Stuart Allan introduced this session with some quotes from a book called Science for the Citizen, written by Lancelot Hogben in 1938. I haven’t read the book, but it looks fascinating and prescient. I have just ordered it and look forward to reading it. You can get the full-text free online here.

Here is the first paragraph of Chapter 1:

A MUCH abused writer of the nineteenth century said: up to the present philosophers have only interpreted the world, it is also necessary to change it. No statement more fittingly distinguishes the standpoint of humanistic philosophy from the scientific outlook. Science is organized workmanship. Its history is co-extensive with that of civilized living. It emerges so soon as the secret lore of the craftsman overflows the dam of oral tradition, demanding a permanent record of its own. It expands as the record becomes accessible to a widening personnel, gathering into itself and coordinating the fruits of new crafts. It languishes when the social incentive to new productive accomplishment is lacking, and when its custodians lose the will to share it with others. Its history, which is the history of the constructive achievements of mankind, is also the history of the democratization of positive knowledge. This book is written to tell the story of its growth as a record of human achievement, a story of the satisfaction of the common needs of mankind, disclosing as it unfolds new horizons of human wellbeing which lie before us, if we plan our new resources intelligently.

The phrase that struck me with particular force is “the democratization of positive knowledge”. That is what I believe science should do, but the closed culture of many fields of modern science makes it difficult to argue that’s what it actually does. Instead, there is an increasing tendency for scientific knowledge in many domains to be concentrated in a small number of people with access to the literature and the expertise needed to make sense of it.

In an increasingly technologically-driven society, the gap between the few in and the many out of the know poses a grave threat to our existence as an open and inclusive democracy. The public needs to be better informed about science (as well as a great many other things). Two areas need attention.

In fields such as my own there’s a widespread culture of working very hard at outreach. This overarching term includes trying to get people interested in science and encouraging more kids to take it seriously at school and college, but also engaging directly with members of the public and institutions that represent them. Not all scientists take the same attitude, though, and we must try harder. Moves are being made to give more recognition to public engagement, but a drastic improvement is necessary if our aim is to make our society genuinely democratic.

But the biggest issue we have to confront is education. The quality of science education must improve, especially in state schools where pupils sometimes don’t have appropriately qualified teachers and so are unable to learn, e.g. physics, properly. The less wealthy are becoming systematically disenfranchised through their lack of access to the education they need to understand the complex issues relating to life in an advanced technological society.

If we improve school education, we may well get more graduates in STEM areas too although this government’s cuts to Higher Education make that unlikely. More science graduates would be good for many reasons, but I don’t think the greatest problem facing the UK is the lack of qualified scientists – it’s that too few ordinary citizens have even a vague understanding of what science is and how it works. They are therefore unable to participate in an informed way in discussions of some of the most important issues facing us in the 21st century.

We can’t expect everyone to be a science expert, but we do need higher levels of basic scientific literacy throughout our society. Unless this happens we will be increasingly vulnerable to manipulation by the dark forces of global capitalism via the media they control. You can see it happening already.

Signs of the Data Innovation Institute

Posted in Biographical with tags on February 13, 2017 by telescoper

I’ve only been in my new office in the Data Innovation Research Institute for 5 months so it came as a big surprise to see that they’ve already started putting up the signs telling people where we are. In fact a couple of chaps came this  morning to do the necessary, and now we look very professional. It’s hard to tell that this used to be a chip shop.

dii_out

Please don’t tell the Health & Safety people about the power cable trailing through the window!

And here’s me answering the door to strangers…

dii_2

Thanks to Dan Read for taking that second one.

Back to Work…

Posted in Biographical, The Universe and Stuff with tags , , on January 3, 2017 by telescoper

Well, the Christmas break is over at Cardiff University and I’m back in the office of the Data Innovation Research Institute. To be honest, it’s rather quiet around here. Most staff seem to be still on holiday. There are a few students around, mainly international ones. This is actually a revision week at Cardiff University in advance of the mid-year examinations which start next week and go on for a fortnight. After that we’ll be back into teaching. I’ll be doing a Masters-level module on The Physics of the Early Universe in the forthcoming term, and I’m very much looking forward to it.

The outcomes of the annual round of consolidated grants administered by the Astronomy Grants Panel of Science and Technology Facilities Council were announced just before Christmas, with success for some and disappointment for others. I only have anecdotal evidence from personal contacts but it seems to have been a tough round, which wouldn’t surprise me because the funding for basic scientific research in the UK has been flat in cash terms for many years now, and is gradually being eroded by inflation. It’s a tough climate but when, in a couple of years, we lose access to all forms of EU funding things will get even tougher…

Anyway, as new grants are announced and old ones terminated, this is a busy time of year for postdocs (who are largely funded by research grants) seeking new positions. I’ve spent most of the day so far writing references for applicants and will return to that task for a couple of hours after lunch. It’s particularly tough on those whose positions lapse at the end of March who only got notice just before Christmas that their existing funding is not going to be renewed. There’s little time in such a position to get a new job sorted, but on the other hand, new grants are starting from 1st April so there are opportunities out there. It’s not easy to respond if you have a family or other commitments, though.

Another thing that happened just before Christmas was that the Data Innovation Research Institute here at Cardiff University announced its first tranche of “seedcorn” grants to foster interdisciplinary research. These grants are quite small in cash terms but it is hoped that at least some of them will help develop substantial projects by bringing together parts of the University that don’t previously collaborate enough. Congratulations to those whose proposals were selected, and commiserations to those who were unsuccessful.

I was pleased that my proposal – together with Professor Nikolai Leonenko of the School of Mathematics – was one of the successful bids. That means that, probably in the spring, we will be organizing a short workshop relating to the analysis and modelling of astrophysical data defined on the sphere, a topic which has interesting mathematical aspects as well as very practical implications for astronomy and cosmology. We’ll be starting to organize that soon, which adds another item to my to-do list, but it should be a fun conference when it happens.

Before you ask: yes, I do work for the Data Innovation Research Institute but because I was an applicant I recused myself from judging the applications in case there was any perception of a conflict of interest. So there.

Most of my work between now and the start of teaching term is going to be devoted to a couple of MSc courses we’re planning to launch this year, but I’ll write more about them – and plug them shamelessly – when they’re all formally announced and ready to go!

And with that I’d better get back to work again.

Magnets, Data Science and the Intelligent Pig

Posted in Biographical, The Universe and Stuff with tags , , , , , on November 18, 2016 by telescoper

The other day I was talking to some colleagues in the pub (as one does). At one point the subject of conversation turned to the pressure we academics are under these days to collaborate more with the world of industry and commerce. That’s one of the things that the Cardiff University Data Innovation Research Institute – which currently pays half my wages  – is supposed to do, but there was general consternation when I mentioned that I have in the past spent quite a long time working in industry. I am, after all, Professor of Theoretical Astrophysics. Of what possible interest could that be to industry?

My time in industry was spent at one of the research stations of British Gas, called the On-Line Inspection Centre (“OLIC”) which was situated in Cramlington, Northumberland. I started work there in 1981, just after I’d finished my A-levels and the Cambridge Entrance Examination and I worked there for about 9 months, before leaving to start my undergraduate course in 1982. At that time British Gas was still state-owned, and one of the consequences of that was that I had to sign the Official Secrets Act when I joined the staff. Among other things that forbade me from making “unauthorized disclosures” of what I was working on for thirty years. I feel comfortable discussing that work now, partly because the thirty years passed some time ago and partly because OLIC no longer exists. I’m not sure exactly what happened to it, but I presume it got flogged off on the cheap when British Gas was privatized during the Thatcher regime.

The main activity of the On-Line Inspection Centre was developing and exploiting techniques for inspecting gas pipelines for various forms of faults. The UK’s gas transmission network comprises thousands of kilometres of pipelines, made from steel in sections joined together by seam welds. I always thought of it as like a road network: the motorways which were made of 36″ diameter pipes; the A-roads were of smaller, 24″, diameter; and the minor roads were generally made of 12″ pipes. It’s interesting that despite the many failings of my memory now that I’ve reached middle age, I can still remember the names of some of the routes: “Huddersfield to Hopton Top” and “Seabank to Frampton Cotterell” spring immediately to mind.

Anyway, as part of the Mathematics Group at OLIC my job was to work on algorithms to analyse data from various magnetic inspection vehicles. These vehicles – known as “pigs” – were of different sizes to fit snugly  in the various pipes. The term “pig” had originally been applied to simple devices used to clean the gunk from inside of a pipe. They were just put in one end of the line and  gas pressure would push them all the way to the other end, often tens of kilometres away. The pipeline could thus be cleaned without taking it out of service.

This basic idea was modified to produce the much more sophisticated “intelligent pig” which produced the data I worked on. You can read much more about this here. This looked very similar to the cleaning pig, but had a complicated assembly of magnets and sensors, shown schematically here:

pig

The two sets  of magnets are connected to the pipe wall by steel brushes to maintain good contact. The magnetic field applied by the front set of magnets is contained within the pipe wall forms a kind of circuit with the rear set as shown, unless there is a variation in the thickness of the material. In that case magnetic flux leaks out and is detected by the sensors. The magnets and sensors are deployed in rings to cover the whole circumference of the pipe. A 24″ diameter pig would have 240 sensors, each recorded as a separate channel on the vehicle.

The actual system is fairly complicated so some of the work was experimental. Sections of pipe were made with defects of various sizes machined into them. The pig would then be pulled through these sections and the signals studied to build up an understanding of how the magnetic field would respond in different situations.

The actual pig (which could be several metres long and weighing a couple of tonnes) looks like this:

pig2

I always thought they looked a bit like spacecraft.

The pig usually travels at something like walking pace along the pipeline, and the sampling rate of the sensors was such that a reading would be taken every few millimetres. That sampling rate was necessary because corrosion pits as small as 1cm across could be dangerous.  The larger vehicles had “on-board thresholding” so that recordings of quiescent sections were discarded. Even so pipe surfaces (especially those of smaller bore) could be uneven for various reasons to do with their production rather than the effects of corrosion. Moreover, every few metres there would be a circumferential seam weld where two sections of pipe were joined together; these features would produce a large signal on all channels which the thresholding algorithm did not suppress.  The net result was that a lot of data had to be stored on the vehicle. When I say “a lot”, I mean for that time. A full run might produce about 5 × 107 readings. That seems like nothing now, but it was “Big Data” in those days!

So how was all this data processed back at the station? You probably won’t believe this, but it was printed out on Versatec printers in the form of a chart recording for each channel. Operators then identified funny-looking signals by eye and we then pulled down the data from tape and had a further look, usually comparing the patterns visually with those obtained from “pull-through” experiments.

Among the things I worked on was an algorithm to recognize seam weld signals automatically. That was quite easy actually – because it just requires looking for simultaneous activity on all channels – although it had to be made robust enough to deal with the odd dead channel and other instrumental glitches. This algorithm proved to be useful because sometimes the on-board telemetry would go wrong and we had to locate the pig by counting the number of welds it had passed since the start of the run.

A far more difficult challenge was dealing with data from 12″ diameter pipe. These are manufactured in a way that’s completely different from that used to make pipes of larger diameter, which are made of rolled steel. The 12″ pipes were made from a solid plug of molten steel, the centre of which is bored out by a device that rotates as it goes along. The effect of this is that it imposes a peculiar form of variation on the pipe wall, in the form of a spirally modulated “noise”. Annoyingly, the pitch and amplitude of the spiral varied from one section of pipe to another. After many failed attempts, the group finally came up with an algorithm that used the weld detector as a starting point to establish the vehicle had entered a new section of pipe. It then used data from the start of each section to estimate the parameters of the spiral pattern for that section, and then applied a filter to remove it from the rest of the section. It wasn’t particularly elegant, but it certainly cleaned up the data massively and made it much easier to spot significant features.

You might ask why I’ve written at such length about this when it’s got nothing to do with my current research (or indeed, anything else I’ve done since I graduated from Cambridge in 1985). One reason is that, although I didn’t know it at the time, my time at OLIC was going to prepare me very well for when I started my PhD. That was the case because all the programming I did used VAX computers, which turned out to be the computers used by STARLINK.  When I started my life as a research student I was already fluent in the command language (DCL) as well as the database software DATATRIEVE, which was a great advantage. Another reason is that working in this environment I had to learn to make my code (which, incidentally, was all in Fortran-77) conform to various very strict standards. I didn’t like some of the things we were forced to do, but I was shouted at sufficiently often that I gave up and did what I was told. I have never been particularly good at doing that in general, but in the context of software it is a lesson I’m glad I learned. Above all, though, I think working outside academia gave me a different perspective on research.  As academics were are very lucky to be able- at least some of the time – to choose our own research problems, but I believe that in the long run it can be very for your intellectual development to do something completely different every now and then.

We’re currently discussing a scheme whereby Physics and Astrophysics research students can interrupt their PhD for up to 6 months to undertake a (paid) work placement outside academia. I suspect many graduate students will not be keen on this, as they’ll see it as a distraction from their PhD topic, but I think it has many potential advantages as I hope I’ve explained.

 

 

That Was The Data Innovation Day That Was

Posted in Uncategorized with tags , , on November 7, 2016 by telescoper

Time, methinks, for a quick work-related post. You may know that my current appointment is in association with Cardiff University’s Data Innovation Research Insitute, and it’s that part of my job that is taking up most of my time at the moment. Last Friday (4th November) we had our first Data Innovation Day, the aim of which was to encourage collaboration between Schools and Research Institutes in the area of Data Science.

To this end, on Friday morning we had a dozen short(ish) talks on data science aspects of all kinds of subjects, from neuroimaging to gravitational wave research to healthcare to biosocial computing to statistical modelling and so on and so forth. It was a fascinating mixture of presentations and about 75 people attended, which was a pretty good audience. After lunch we broke into groups to develop specific research projects and establish what the Data Innovation Institute can do to help foster collaborations across disciplinary and administrative boundaries. That’s much harder than it might sound, and is certainly harder than it should be in modern universities. We had no shortage of ideas, and let’s hope we can turn them into concrete projects.

Anyway, one of my contributions to the day was to set up a Twitter account for the Data Innovation Research Institute together with a logo:

dii_ligo

We currently have a princely 37 followers. Feel free to follow if you’re on Twitter and interested in Data Science!

Back to Cosmology, Data Analysis and Cardiff

Posted in Biographical, The Universe and Stuff with tags , on September 1, 2016 by telescoper

Today is my first day back in the School of Physics and Astronomy at Cardiff University. Although my job title, Professor of Theoretical Astrophysics, is the same as it was when I was here in a previous incarnation it will be quite a different job and I’m going to be located in a different building (though not far from my old office). In fact my office is in a newly refurbished space connected with the Data Innovation Research Institute just on the other side of a car park from my old office. It looks like being an exciting time over the next few months and years as new staff across a range of disciplines join the Institute, expanding its research portfolio from astrophysics (especially gravitational wave research) into biomedical sciences and beyond.

Here’s a little video about the Data Innovation Research Institute, which is about conducting fundamental research into the aspects of managing, analysing and interpreting massive volumes of textual and numerical information:

But for the moment it’s been a day for administrative matters: taking my P45 to the Human Resources Department, getting my new Staff ID card, trying to get myself set up on the University computer network, and so on. Oh, and I’ve agreed to do some teaching in the Spring Semester, a Level 4 module on The Physics of the Early Universe. It will be nice to be teaching some cosmology again!