At the beginning of my Ph.D. studies, I had the intuition that Grothendieck toposes could effectively serve as ‘

bridges’ for connecting different mathematical theories with each other and allowing an effective transfer of knowledge between them

We present a method to construct “native” type systems for a broad class of languages, in which types are built from term constructors by predicate logic and dependent types. Any language with products and functions can be modelled as a higher-order algebraic theory with rewrites, and the internal language of its presheaf topos provides total specification of the structure and behavior of terms. This construction is functorial, so translations of languages give translations of type systems. The construction therefore provides a shared framework of higher-order reasoning for most existing programming languages.

Gwern’s favorite books

Fields are not algebraic. An algebraic theory, for example, has free objects: there are free rings, free groups, a free monoids. The free functor is left adjoint to the forgetful functor to sets (okay, I’m talking about models in Set). There are, though, no free fields. Alternatively, one can talk about

meadows. Meadows are algebraic theories which are modified versions of fields.

Hagomoro chalk photo essay / history

Let’s say you want to know how likely it is that an innovative new product will succeed, or that China will invade Taiwan in the next decade, or that a global pandemic will sweep the world — basically any question for which you can’t just use “predictive analytics,” because you don’t have a giant dataset you can plug into some statistical models like (say) Amazon can when predicting when your package will arrive. Is it possible to produce reliable, accurate forecasts for such questions? Somewhat amazingly, the answer appears to be “yes, if you do it right.”

Really good post, covers Berkson’s Paradox and The Good Enough Effect.

In magazines and newspapers, you’ve probably read articles about the benefits enjoyed by tall people: higher salaries , increased confidence , greater intelligence , and more .

The selection effect is a very important concept.

We should keep selection effects at the top of our minds when reviewing any observational data that does not have a randomized control trial to measure the true impact.Whenever you are comparing two groups — whether it’s employees, customers, candidates, or professional basketball players — and you see a difference, your first thought should NEVER be: “something we are doing must be driving that difference.” Instead, your initial analysis should try to understand how members of the two groups might be different. And the most obvious difference usually relates to the way the groups are selected.

This paper does not describe a working system. Instead, it presents a single idea about representation which allows advances made by several different groups to be combined into an imaginary system called GLOM. The advances include transformers, neural fields, contrastive representation learning, distillation and capsules. GLOM answers the question: How can a neural network with a fixed architecture parse an image into a part-whole hierarchy which has a different structure for each image? The idea is simply to use islands of identical vectors to represent the nodes in the parse tree. If GLOM can be made to work, it should significantly improve the interpretability of the representations produced by transformer-like systems when applied to vision or language

See thread for results on chess, coding, and learning an instrument.

Select answers:

I like the work on learning invariant representations. Examples: Invariant risk minimization (https://arxiv.org/abs/1907.02893), Domain-adversarial ML (https://arxiv.org/abs/1505.07818). These have flaws, but point in interesting directions

I see promise in some incomplete and complementary directions, yet to show significant results: -Friston actually implementing FEP solutions -Gained relevance of graph networks -A bit of capsule networks from Hinton -Open ended systems -Bengio’s attempt to unify S1 and S2

1: AlphaZero and MuZero (learning by self-play alone, MuZero doesn’t even need a simulator) 2: Self-supervised learning (no need for labels to learn good representations) 3: Attention (NN layer that can solve more symbolic tasks) 4: Encode-Process-Decode GNN architecture 1/2 (which can solve more “reasoning” tasks, see Neural Execution Engines 5: World Models 6: PAIRED and POET (one agent creating tasks for others)