Archive for Data Innovation Research Institute

Moving On..

Posted in Biographical, Cardiff with tags , , , , on August 22, 2019 by telescoper

After attending my second Repeat Examination Board of the week (this one in the Department of Engineering) it’s now time to begin the task of moving the contents of my office into the new one I’ll be in as Head of Department. Roughly simultaneously, the current Head of Department, Jonivar Skullerud, will be moving his clobber from the Head of Department’s office into my current office. Some coordination may be necessary to avoid collisions and/or other confusion, but I’m confident of a successful outcome…

While I’m on the subject of moving to a new job, though in my case remaining at the same institution, this very afternoon my wonderful former colleague from the School of Mathematical and Physical Sciences at the University of Sussex, Dorothy Lamb, is having a leaving do. She will soon be moving to a position at the University of Birmingham (in the Midlands). I’m very sad that I couldn’t be there for her farewell party, but the least I can do is wish Dorothy (aka Miss Lemon) all the best in her new job, and hope that her move from Brighton to Birmingham, after (I think) 25 years, goes as smoothly and as free from stress as possible.

UPDATE: You can read Dorothy’s farewell edition of the MPS Newslettter here.

Dorothy isn’t the only former colleague to be moving on to pastures new. I heard this morning that Ian Harvey and Unai Lopez from the Data Innovation Research Institute at Cardiff University are leaving soon. Unai is taking up a Lectureship at the University of the Basque Country in Bilbao so Bon Voyage Unai!


Cardiff Blues: Sustainability and UK Universities

Posted in Cardiff, Education with tags , , , on February 20, 2019 by telescoper

Just before I left on my travels last week I saw a rather depressing news item about Cardiff University. It seems that, after posting a deficit of £22.8 M last year, the University is planning to cut about 380 staff positions. According to the news item:

“The university plans to reduce current staff levels by 7%, or 380 full-time equivalent over five years,” said vice chancellor Colin Riordan in an email to staff.

Since I left Cardiff University in the summer I didn’t get the email from which this is quoted and I don’t know the wider picture. (If anyone would like to forward the V-C’s email to me I’d be very interested.)

The news item also says

Its aim is to get back into surplus by 2019-20 and it wants to cut staffing costs from 59.6% of total income to no more than 56% of income by 2022-23.

Between you and me I was quite surprised that a University can be spending less than 60% of its income on staff, since staff are by far its most valuable resource. Bear in mind also that academic staff will be responsible for only a fraction of this expenditure. In some universities this fraction is only about half. Cutting this still further seems a very retrograde step to me, as it means that student-staff ratios will inevitably rise, making the institution less attractive to prospective students, as well as increasing the workload on existing staff to intolerable levels.

I sincerely hope none of my former colleagues in the School of Physics & Astronomy is affected by the deterioration of the University’s finances. At least the news item I referred to does mention new investments in Data Science, so that is presumably a positive development for the Data Innovation Research Institute with which I was formerly associated.

Incidentally, best wishes to anyone at Cardiff who is reading this, and good luck against England in the Six Nations on Saturday!

I’ve mentioned Cardiff here just because I noticed a specific news item (and I used to work there) but it seems a number of other universities are suffering financial problems. There are cold winds blowing through the sector. Many institutions (including Cardiff) have committed to ambitious building programs funded by a combination of borrowing and on optimistic assumptions about growth in student numbers and consequent increases in fee income. Although I no longer work in the UK Higher Education system, I do worry greatly about its sustainability. Even from across the Irish Sea the situation looks extremely precarious: the recent boom could easily end in some institutions going bust. I don’t think that will include Cardiff, by the way. I don’t think the Welsh Government would ever allow that to happen. But I think the English Government wouldn’t act if an English university went bankrupt.

Big News for Big Data in Cardiff

Posted in Biographical with tags , , , , on June 20, 2018 by telescoper

I know I’m currently in Maynooth, but I am still employed part-time by Cardiff University, and specifically by the Data Innovation Research Institute there. When I started there a couple of years ago, I moved into a big empty office that looked like this:

Over the last two years the DIRI office has gradually filled up. It is now home to an Administrative Officer (Clare), two Research Software Engineers (Ian & Unai),  Ben and Owain from Supercomputing Wales,and the newest arrival, a Manager for the Centre for Doctoral Training in Data-Intensive Physics (Rosemary). That doesn’t include, myself, the Director of DIRI (Steve Fairhurst), DIRI Board member Bernard Schutz and a number of occasional users of various `hot desks’. And there’s another Research Software Engineer on the way.

Now the latest news is of a huge injection of cash (£3.5M) for a new Data Innovation Accelerator, funded by the Welsh Government and the European Regional Development Fund. The Welsh Government has joined forces with Cardiff University to develop the project, which has the aim of transferring data science and analytics knowledge from Cardiff University to Small to Medium Sized Enterprises (SMEs) in Wales so they can develop and grow their businesses. The funding will enable researchers to work on collaborative projects with companies specialising in things like cyber security, advanced materials, energy and eco-innovation. For more information, see here.

Among other things this project will involve the recruitment of no less than eight data scientists to kick-start the project, which will probably launch in November 2018. With another eight people to be based in the Data Innovation Research Institute by the end of the year, the office promises to be a really crowded place. My departure next month will release one desk space, but it will still be a crush! That’s what you call being a victim of your own success.

Anyway, it’s exciting times for Data Science at Cardiff University and it has been nice to have played a small part in building up the DIRI activity over the last two years. I’m sure it will go on developing and expanding for a very long time indeed.

In Praise of Research Software Engineers

Posted in Cardiff with tags , , , on May 1, 2018 by telescoper

Yesterday in the Data Innovation Research Institute we held a special event, our first ever Conference for Research Software Engineers. Sadly I was too busy yesterday to attend in person, but I did turn up at the end for the drinks reception at the end.

In case you weren’t aware, the term Research Software Engineer (RSE) is applied to the growing number of people in universities and other research organisations who combine expertise in programming with an intricate understanding of research. Although this combination of skills is extremely valuable, these people lack a formal place in the academic system. Without a name, it is difficult for people to rally around a cause, hence the creation of the term Research Software Engineer and the Research Software Engineer Association.

We have quite a few RSEs associated with the Data Innovation Research Institute in Cardiff – as you can see here. These are quite different from system administrators or other computing support staff as they are involved directly in research, working in teams alongside academics and other specialists.

One of the biggest problems facing RSEs in the UK university system is there isn’t a well-established promotions route for them. For researchers in an academic environment, performance is usually judged through publications, PhD students supervised, grants awarded and so so. Although RSEs play a vital role, especially (but not exclusively) in large collaborations, they do not usually end up as lead authors on papers and generally do not apply for grants in their own name. That means that if they are judged by these criteria they struggle to get promotion and often leave academia to work for higher pay and better terms and conditions elsewhere.

In my opinion, one of the important things that must be done to improve the lot of Research Software Engineers is to construct a career structure in parallel with the academic route  and other grades (such as laboratory technician) but judged by more appropriate criteria tailored to the reality of the job. Writing the necessary grade profiles and getting them agreed by the relevant university committees will take some time, but I think it will pay dividends in terms of better retention and job satisfaction for these highly talented people.

I hope Cardiff can take some sort of a lead in defining the role of an RSE, but this is really a national need. There are pretty uniform grade descriptions for academic and research staff across the United Kingdom so I don’t see any reason why this can’t be the case for Research Software Engineers. They are vital to many research fields already, and their importance can only grow in the future.


A Year Back

Posted in Biographical, The Universe and Stuff with tags , on September 1, 2017 by telescoper

So, with the summer drawing to a close, and the contents of my weekly veggie box changing to autumnal varieties, I realise that today is the first anniversary  of my first day back in the School of Physics and Astronomy at Cardiff University. In other words, I’ve now been in office in the Data Innovation Research Institute for a full year.  Very soon we get to the official launch of a couple of things that have started during this time – including a new Centre for Doctoral Training in Data-Intensive Science and two new MSc course which have recruited their first students for entry this year.


I seem to remember this day last year mainly involving running around dealing with administrative matters: taking my P45 to the Human Resources Department, getting my new Staff ID card, trying to get myself set up on the University computer network, and so on. I moved into a large empty office, but it’s now gradually filling up with staff: a couple of Research Software Engineers have been appointed, together with an administrators, and two members of Supercomputing Wales are joining us soon too.

Anyway, I’m shortly off to London for the weekend to catch up with an old friend I haven’t seen for ages. I’m currently pissed off with Great Western Railways for failing to pay a compensation claim I lodged back in June and for slow running on the mainline to Paddington today due to planned engineering to works, so I’ll be travelling to the Big Smoke and back by National Express Coach.




Science for the Citizen

Posted in Education, Open Access, The Universe and Stuff with tags , , , , , , on March 20, 2017 by telescoper

I spent all day on Friday on business connected with my role in the Data Innovation Research Institute, attending an event to launch the new Data Justice Lab at Cardiff University. It was a fascinating day of discussions about all kinds of ethical, legal and political issues surrounding the “datafication” of society:

Our financial transactions, communications, movements, relationships, and interactions with government and corporations all increasingly generate data that are used to profile and sort groups and individuals. These processes can affect both individuals as well as entire communities that may be denied services and access to opportunities, or wrongfully targeted and exploited. In short, they impact on our ability to participate in society. The emergence of this data paradigm therefore introduces a particular set of power dynamics requiring investigation and critique.

As a scientist whose research is in an area (cosmology) which is extremely data-intensive, I have a fairly clear interpretation of the phrase “Big Data” and recognize the need for innovative methods to handle the scale and complexity of the data we use. This clarity comes largely from the fact that we are asking very well-defined questions which can be framed in quantitative terms within the framework of well-specified theoretical models. In this case, sophisticated algorithms can be constructed that extract meaningful information even when individual measurements are dominated by noise.

The use of “Big Data” in civic society is much more problematic because the questions being asked are often ill-posed and there is rarely any compelling underlying theory. A naive belief exists in some quarters that harvesting more and more data necessarily leads to an increase in relevant information. Instead there is a danger that algorithms simply encode false assumptions and produce unintended consequences, often with disastrous results for individuals. We heard plenty of examples of this on Friday.

Although it is clearly the case that personal data can be – and indeed is – deliberately used for nefarious purposes, I think there’s a parallel danger that we increasingly tend to believe that just because something is based on numerical calculations it somehow must be “scientific”. In reality, any attempt to extract information from quantitative data relies on assumptions. if those assumptions are wrong, then you get garbage out no matter what you put in. Some applications of “data science” – those that don’t recognize these limitations – are in fact extremely unscientific.

I mentioned in discussions on Friday that there is a considerable push in astrophysics and cosmology for open science, by which I mean that not only are the published results openly accessible, but all the data and analysis algorithms are published too. Not all branches of science work this way, and we’re very far indeed from a society that applies such standards to the use of personal data.

Anyway, after the day’s discussion we adjourned to the School of Journalism, Media and Cultural Studies for a set of more formal presentations. The Head of School, Professor Stuart Allan introduced this session with some quotes from a book called Science for the Citizen, written by Lancelot Hogben in 1938. I haven’t read the book, but it looks fascinating and prescient. I have just ordered it and look forward to reading it. You can get the full-text free online here.

Here is the first paragraph of Chapter 1:

A MUCH abused writer of the nineteenth century said: up to the present philosophers have only interpreted the world, it is also necessary to change it. No statement more fittingly distinguishes the standpoint of humanistic philosophy from the scientific outlook. Science is organized workmanship. Its history is co-extensive with that of civilized living. It emerges so soon as the secret lore of the craftsman overflows the dam of oral tradition, demanding a permanent record of its own. It expands as the record becomes accessible to a widening personnel, gathering into itself and coordinating the fruits of new crafts. It languishes when the social incentive to new productive accomplishment is lacking, and when its custodians lose the will to share it with others. Its history, which is the history of the constructive achievements of mankind, is also the history of the democratization of positive knowledge. This book is written to tell the story of its growth as a record of human achievement, a story of the satisfaction of the common needs of mankind, disclosing as it unfolds new horizons of human wellbeing which lie before us, if we plan our new resources intelligently.

The phrase that struck me with particular force is “the democratization of positive knowledge”. That is what I believe science should do, but the closed culture of many fields of modern science makes it difficult to argue that’s what it actually does. Instead, there is an increasing tendency for scientific knowledge in many domains to be concentrated in a small number of people with access to the literature and the expertise needed to make sense of it.

In an increasingly technologically-driven society, the gap between the few in and the many out of the know poses a grave threat to our existence as an open and inclusive democracy. The public needs to be better informed about science (as well as a great many other things). Two areas need attention.

In fields such as my own there’s a widespread culture of working very hard at outreach. This overarching term includes trying to get people interested in science and encouraging more kids to take it seriously at school and college, but also engaging directly with members of the public and institutions that represent them. Not all scientists take the same attitude, though, and we must try harder. Moves are being made to give more recognition to public engagement, but a drastic improvement is necessary if our aim is to make our society genuinely democratic.

But the biggest issue we have to confront is education. The quality of science education must improve, especially in state schools where pupils sometimes don’t have appropriately qualified teachers and so are unable to learn, e.g. physics, properly. The less wealthy are becoming systematically disenfranchised through their lack of access to the education they need to understand the complex issues relating to life in an advanced technological society.

If we improve school education, we may well get more graduates in STEM areas too although this government’s cuts to Higher Education make that unlikely. More science graduates would be good for many reasons, but I don’t think the greatest problem facing the UK is the lack of qualified scientists – it’s that too few ordinary citizens have even a vague understanding of what science is and how it works. They are therefore unable to participate in an informed way in discussions of some of the most important issues facing us in the 21st century.

We can’t expect everyone to be a science expert, but we do need higher levels of basic scientific literacy throughout our society. Unless this happens we will be increasingly vulnerable to manipulation by the dark forces of global capitalism via the media they control. You can see it happening already.

Signs of the Data Innovation Institute

Posted in Biographical with tags on February 13, 2017 by telescoper

I’ve only been in my new office in the Data Innovation Research Institute for 5 months so it came as a big surprise to see that they’ve already started putting up the signs telling people where we are. In fact a couple of chaps came this  morning to do the necessary, and now we look very professional. It’s hard to tell that this used to be a chip shop.


Please don’t tell the Health & Safety people about the power cable trailing through the window!

And here’s me answering the door to strangers…


Thanks to Dan Read for taking that second one.

Back to Work…

Posted in Biographical, The Universe and Stuff with tags , , on January 3, 2017 by telescoper

Well, the Christmas break is over at Cardiff University and I’m back in the office of the Data Innovation Research Institute. To be honest, it’s rather quiet around here. Most staff seem to be still on holiday. There are a few students around, mainly international ones. This is actually a revision week at Cardiff University in advance of the mid-year examinations which start next week and go on for a fortnight. After that we’ll be back into teaching. I’ll be doing a Masters-level module on The Physics of the Early Universe in the forthcoming term, and I’m very much looking forward to it.

The outcomes of the annual round of consolidated grants administered by the Astronomy Grants Panel of Science and Technology Facilities Council were announced just before Christmas, with success for some and disappointment for others. I only have anecdotal evidence from personal contacts but it seems to have been a tough round, which wouldn’t surprise me because the funding for basic scientific research in the UK has been flat in cash terms for many years now, and is gradually being eroded by inflation. It’s a tough climate but when, in a couple of years, we lose access to all forms of EU funding things will get even tougher…

Anyway, as new grants are announced and old ones terminated, this is a busy time of year for postdocs (who are largely funded by research grants) seeking new positions. I’ve spent most of the day so far writing references for applicants and will return to that task for a couple of hours after lunch. It’s particularly tough on those whose positions lapse at the end of March who only got notice just before Christmas that their existing funding is not going to be renewed. There’s little time in such a position to get a new job sorted, but on the other hand, new grants are starting from 1st April so there are opportunities out there. It’s not easy to respond if you have a family or other commitments, though.

Another thing that happened just before Christmas was that the Data Innovation Research Institute here at Cardiff University announced its first tranche of “seedcorn” grants to foster interdisciplinary research. These grants are quite small in cash terms but it is hoped that at least some of them will help develop substantial projects by bringing together parts of the University that don’t previously collaborate enough. Congratulations to those whose proposals were selected, and commiserations to those who were unsuccessful.

I was pleased that my proposal – together with Professor Nikolai Leonenko of the School of Mathematics – was one of the successful bids. That means that, probably in the spring, we will be organizing a short workshop relating to the analysis and modelling of astrophysical data defined on the sphere, a topic which has interesting mathematical aspects as well as very practical implications for astronomy and cosmology. We’ll be starting to organize that soon, which adds another item to my to-do list, but it should be a fun conference when it happens.

Before you ask: yes, I do work for the Data Innovation Research Institute but because I was an applicant I recused myself from judging the applications in case there was any perception of a conflict of interest. So there.

Most of my work between now and the start of teaching term is going to be devoted to a couple of MSc courses we’re planning to launch this year, but I’ll write more about them – and plug them shamelessly – when they’re all formally announced and ready to go!

And with that I’d better get back to work again.

Magnets, Data Science and the Intelligent Pig

Posted in Biographical, The Universe and Stuff with tags , , , , , on November 18, 2016 by telescoper

The other day I was talking to some colleagues in the pub (as one does). At one point the subject of conversation turned to the pressure we academics are under these days to collaborate more with the world of industry and commerce. That’s one of the things that the Cardiff University Data Innovation Research Institute – which currently pays half my wages  – is supposed to do, but there was general consternation when I mentioned that I have in the past spent quite a long time working in industry. I am, after all, Professor of Theoretical Astrophysics. Of what possible interest could that be to industry?

My time in industry was spent at one of the research stations of British Gas, called the On-Line Inspection Centre (“OLIC”) which was situated in Cramlington, Northumberland. I started work there in 1981, just after I’d finished my A-levels and the Cambridge Entrance Examination and I worked there for about 9 months, before leaving to start my undergraduate course in 1982. At that time British Gas was still state-owned, and one of the consequences of that was that I had to sign the Official Secrets Act when I joined the staff. Among other things that forbade me from making “unauthorized disclosures” of what I was working on for thirty years. I feel comfortable discussing that work now, partly because the thirty years passed some time ago and partly because OLIC no longer exists. I’m not sure exactly what happened to it, but I presume it got flogged off on the cheap when British Gas was privatized during the Thatcher regime.

The main activity of the On-Line Inspection Centre was developing and exploiting techniques for inspecting gas pipelines for various forms of faults. The UK’s gas transmission network comprises thousands of kilometres of pipelines, made from steel in sections joined together by seam welds. I always thought of it as like a road network: the motorways which were made of 36″ diameter pipes; the A-roads were of smaller, 24″, diameter; and the minor roads were generally made of 12″ pipes. It’s interesting that despite the many failings of my memory now that I’ve reached middle age, I can still remember the names of some of the routes: “Huddersfield to Hopton Top” and “Seabank to Frampton Cotterell” spring immediately to mind.

Anyway, as part of the Mathematics Group at OLIC my job was to work on algorithms to analyse data from various magnetic inspection vehicles. These vehicles – known as “pigs” – were of different sizes to fit snugly  in the various pipes. The term “pig” had originally been applied to simple devices used to clean the gunk from inside of a pipe. They were just put in one end of the line and  gas pressure would push them all the way to the other end, often tens of kilometres away. The pipeline could thus be cleaned without taking it out of service.

This basic idea was modified to produce the much more sophisticated “intelligent pig” which produced the data I worked on. You can read much more about this here. This looked very similar to the cleaning pig, but had a complicated assembly of magnets and sensors, shown schematically here:


The two sets  of magnets are connected to the pipe wall by steel brushes to maintain good contact. The magnetic field applied by the front set of magnets is contained within the pipe wall forms a kind of circuit with the rear set as shown, unless there is a variation in the thickness of the material. In that case magnetic flux leaks out and is detected by the sensors. The magnets and sensors are deployed in rings to cover the whole circumference of the pipe. A 24″ diameter pig would have 240 sensors, each recorded as a separate channel on the vehicle.

The actual system is fairly complicated so some of the work was experimental. Sections of pipe were made with defects of various sizes machined into them. The pig would then be pulled through these sections and the signals studied to build up an understanding of how the magnetic field would respond in different situations.

The actual pig (which could be several metres long and weighing a couple of tonnes) looks like this:


I always thought they looked a bit like spacecraft.

The pig usually travels at something like walking pace along the pipeline, and the sampling rate of the sensors was such that a reading would be taken every few millimetres. That sampling rate was necessary because corrosion pits as small as 1cm across could be dangerous.  The larger vehicles had “on-board thresholding” so that recordings of quiescent sections were discarded. Even so pipe surfaces (especially those of smaller bore) could be uneven for various reasons to do with their production rather than the effects of corrosion. Moreover, every few metres there would be a circumferential seam weld where two sections of pipe were joined together; these features would produce a large signal on all channels which the thresholding algorithm did not suppress.  The net result was that a lot of data had to be stored on the vehicle. When I say “a lot”, I mean for that time. A full run might produce about 5 × 107 readings. That seems like nothing now, but it was “Big Data” in those days!

So how was all this data processed back at the station? You probably won’t believe this, but it was printed out on Versatec printers in the form of a chart recording for each channel. Operators then identified funny-looking signals by eye and we then pulled down the data from tape and had a further look, usually comparing the patterns visually with those obtained from “pull-through” experiments.

Among the things I worked on was an algorithm to recognize seam weld signals automatically. That was quite easy actually – because it just requires looking for simultaneous activity on all channels – although it had to be made robust enough to deal with the odd dead channel and other instrumental glitches. This algorithm proved to be useful because sometimes the on-board telemetry would go wrong and we had to locate the pig by counting the number of welds it had passed since the start of the run.

A far more difficult challenge was dealing with data from 12″ diameter pipe. These are manufactured in a way that’s completely different from that used to make pipes of larger diameter, which are made of rolled steel. The 12″ pipes were made from a solid plug of molten steel, the centre of which is bored out by a device that rotates as it goes along. The effect of this is that it imposes a peculiar form of variation on the pipe wall, in the form of a spirally modulated “noise”. Annoyingly, the pitch and amplitude of the spiral varied from one section of pipe to another. After many failed attempts, the group finally came up with an algorithm that used the weld detector as a starting point to establish the vehicle had entered a new section of pipe. It then used data from the start of each section to estimate the parameters of the spiral pattern for that section, and then applied a filter to remove it from the rest of the section. It wasn’t particularly elegant, but it certainly cleaned up the data massively and made it much easier to spot significant features.

You might ask why I’ve written at such length about this when it’s got nothing to do with my current research (or indeed, anything else I’ve done since I graduated from Cambridge in 1985). One reason is that, although I didn’t know it at the time, my time at OLIC was going to prepare me very well for when I started my PhD. That was the case because all the programming I did used VAX computers, which turned out to be the computers used by STARLINK.  When I started my life as a research student I was already fluent in the command language (DCL) as well as the database software DATATRIEVE, which was a great advantage. Another reason is that working in this environment I had to learn to make my code (which, incidentally, was all in Fortran-77) conform to various very strict standards. I didn’t like some of the things we were forced to do, but I was shouted at sufficiently often that I gave up and did what I was told. I have never been particularly good at doing that in general, but in the context of software it is a lesson I’m glad I learned. Above all, though, I think working outside academia gave me a different perspective on research.  As academics were are very lucky to be able- at least some of the time – to choose our own research problems, but I believe that in the long run it can be very for your intellectual development to do something completely different every now and then.

We’re currently discussing a scheme whereby Physics and Astrophysics research students can interrupt their PhD for up to 6 months to undertake a (paid) work placement outside academia. I suspect many graduate students will not be keen on this, as they’ll see it as a distraction from their PhD topic, but I think it has many potential advantages as I hope I’ve explained.



That Was The Data Innovation Day That Was

Posted in Uncategorized with tags , , on November 7, 2016 by telescoper

Time, methinks, for a quick work-related post. You may know that my current appointment is in association with Cardiff University’s Data Innovation Research Insitute, and it’s that part of my job that is taking up most of my time at the moment. Last Friday (4th November) we had our first Data Innovation Day, the aim of which was to encourage collaboration between Schools and Research Institutes in the area of Data Science.

To this end, on Friday morning we had a dozen short(ish) talks on data science aspects of all kinds of subjects, from neuroimaging to gravitational wave research to healthcare to biosocial computing to statistical modelling and so on and so forth. It was a fascinating mixture of presentations and about 75 people attended, which was a pretty good audience. After lunch we broke into groups to develop specific research projects and establish what the Data Innovation Institute can do to help foster collaborations across disciplinary and administrative boundaries. That’s much harder than it might sound, and is certainly harder than it should be in modern universities. We had no shortage of ideas, and let’s hope we can turn them into concrete projects.

Anyway, one of my contributions to the day was to set up a Twitter account for the Data Innovation Research Institute together with a logo:


We currently have a princely 37 followers. Feel free to follow if you’re on Twitter and interested in Data Science!