How to Do R&D Right - ChinaTalk

I think it was Michael Nielsen who proposed the thought experiment of could we get better research if we just cut the money? There’s this paper called the decline of unfettered research that makes the argument that the more money we put into research, the more people there are, and the more it becomes a commodity that we expect consistent results, but by expecting consistent results we kill the thing that is at the core of research, namely this like high variance discovery process.

If you know that you have to deliver something in three months, you’re not going to work on the weird project. You’re going to work on the thing that, you can deliver in three months.

Get rid of reporting requirements!

Whenever you get a federal grant, you have to tell them exactly what you’re spending all the money for and every semester or every reporting period. You have to say exactly how much money you spent, exactly what you spent it for. If you don’t or you screw that up or you spend it on the wrong thing, they take your money away.

Very often it’s required that you spend most of the money on grad students, which probably gets a whole other problem. Why not decouple research funding from this idea of training America’s research workforce, which is rhetoric you see all the time. What ends up happening is that all the money gets earmarked for grad students. That incentivizes people to do grad student-heavy research which makes it so that nobody will invest in building research tools.

Also it creates too many grad students. If you look at the number of grad students versus professor positions, the number of of grad students is increasing exponentially while the number of professor positions is basically holding constant.

All of these things I’m saying to cut, they exist for a reason, right? Like the reporting requirements exists because at the end of the day, it’s taxpayer money. I don’t want the government to be irresponsible with my money, but at the same time, in order to get truly amazing things, you actually do need to be irresponsible.

Seven Years of Factorio Friday Facts

Fun post, I recommend starting with the ones labelled must read. Factorio is just amazing. Previously:

we actually have 0 known issues and 0 active bug reports

Cores that don’t count

We are accustomed to thinking of computers as fail-stop, especially the cores that execute instructions, and most system software implicitly relies on that assumption. During most of the VLSI era, processors that passed manufacturing tests and were operated within specifications have insulated us from this fiction. As fabrication pushes towards smaller feature sizes and more elaborate computational structures, and as increasingly specialized instruction-silicon pairings are introduced to improve performance, we have observed ephemeral computational errors that were not detected during manufacturing tests. These defects cannot always be mitigated by techniques such as microcode updates, and may be correlated to specific components within the processor, allowing small code changes to effect large shifts in reliability. Worse, these failures are often “silent” – the only symptom is an erroneous computation.

We refer to a core that develops such behavior as “mercurial.” Mercurial cores are extremely rare, but in a large fleet of servers we can observe the disruption they cause, often enough to see them as a distinct problem – one that will require collaboration between hardware designers, processor vendors, and systems software architects.

Making is Show Business now – Welcome to Dancoland

How did we go from Brooks’ /Mythical Man-Month/, where adding more contributors were a net cost, to the 90s source ideal where adding new contributors magically worked, and then back to this new present situation, where they’re a cost again?

When you create great products, and people get engaged, you become a show. Shows create their own success: fans create more fans, just like product use creates more of itself. Conversely, if no one’s at your show, no one will go next time. If no one uses your product, it won’t improve. There’s no business on earth with a greater divergence of outcomes between the winners and everyone else.

Everything You Might Want to Know about Whaling – Matt Lakeman

David Spivak Stuff

Category theory is a universal modeling language.

Ologs: a category-theoretic foundation for knowledge representation

Ontology is the study of what something is, the intrinsic nature of a subject. Ologs, or ontology logs, are a framework for recording the results of such a study by breaking it down into types (objects), aspects (arrows), and facts (commutative diagrams). Functors between ologs can create translating dictionaries between various perspectives on a subject, and set-valued functors amount to instance data or “real world examples” for the system of concepts an olog.

Poly: An abundant categorical setting for mode-dependent dynamics

Dynamical systems---by which we mean machines that take time-varying input, change their state, and produce output---can be wired together to form more complex systems. Previous work has shown how to allow collections of machines to reconfigure their wiring diagram dynamically, based on their collective state. This notion was called “mode dependence”, and while the framework was compositional (forming an operad of re-wiring diagrams and algebra of mode-dependent dynamical systems on it), the formulation itself was more “creative” than it was natural.

In this paper we show that the theory of mode-dependent dynamical systems can be more naturally recast within the category Poly of polynomial functors. This category is almost superlatively abundant in its structure: for example, it has four interacting monoidal structures (+,×,⊗,∘), two of which (×,⊗) are monoidal closed, and the comonoids for ∘ are precisely categories in the usual sense. We discuss how the various structures in Poly show up in the theory of dynamical systems. We also show that the usual coalgebraic formalism for dynamical systems takes place within Poly. Indeed one can see coalgebras as special dynamical systems---ones that do not record their history---formally analogous to contractible groupoids as special categories.

Thinking Like Transformers

What is the computational model behind a Transformer? Where recurrent neural networks have direct parallels in finite state machines, allowing clear discussion and thought around architecture variants or trained models, Transformers have no such familiar parallel. In this paper we aim to change that, proposing a computational model for the transformer-encoder in the form of a programming language. We map the basic components of a transformer-encoder — attention and feed-forward computation — into simple primitives, around which we form a programming language: the Restricted Access Sequence Processing Language (RASP). We show how RASP can be used to program solutions to tasks that could conceivably be learned by a Transformer, and how a Transformer can be trained to mimic a RASP solution. In particular, we provide RASP programs for histograms, sorting, and Dyck-languages. We further use our model to relate their difficulty in terms of the number of required layers and attention heads: analyzing a RASP program implies a maximum number of heads and layers necessary to encode a task in a transformer. Finally, we see how insights gained from our abstraction might be used to explain phenomena seen in recent works.

What We Learned Doing Fast Grants

Recommended. My thoughts:

UTC, TAI, and UNIX time

For many years, the UNIX localtime() time-display routine didn’t support leap seconds. In effect it treated TAI as UTC. Its displays slipped 1 second away from the correct local time as each leap second passed. Nobody cared; clocks weren’t set that accurately anyway. Unfortunately, xntpd, a program that synchronizes clocks using the Network Time Protocol, pandered to those broken localtime() libraries, at the expense of reliability. By resetting the clock at each leap second, xntpd extracts a correct UTC display (except, of course, during leap seconds) from the broken localtime() libraries. Meanwhile, it produces incorrect results for applications that add and subtract real times.

It’s easy enough to fix xntpd. It’s also easy to fix localtime() to handle leap seconds. In fact, some vendors have already adopted Olson’s time library. The main obstacle is POSIX. POSIX is a “standard” designed by a vendor consortium several years ago to eliminate progress and protect the installed base. The behavior of the broken localtime() libraries was documented and turned into a POSIX requirement. Fortunately, the POSIX rules are so outrageously dumb—for example, they require that 2100 be a leap year, contradicting the Gregorian calendar—that no self-respecting engineer would obey them.


Interesting, but I don’t know enough about Taiwan to comment on the veracity.

Grain of salt, contrarian take.

I think geoengineering like this is at the edge of the Overton window, but predict will start to become mainstream.