Since neural networks are universal, we can often solve problems we don’t know exactly how to solve with neural networks. NeRF is a great example. But once solved, we should try to reverse engineer the solution and optimize. https://arxiv.org/abs/2112.05131 did such work and found you can get NeRF quality without any neural network at all and as a result 100x faster.
This post collects formulas that let you bootstrap the functions listed above into all the trig and hyperbolic functions and their inverses.
Up to this point this post has implicitly assumed we’re only working with real numbers. When working over complex numbers, inverse functions get more complicated. You have to be explicit about which branch you’re taking when you invert a function that isn’t one-to-one. Common Lisp worked though all this very thoroughly, defining arc tangent first, then defining everything else in a carefully chosen sequence. See Branch cuts and Common Lisp .
Email is our only reliable communication method between different organizations.
As j. b. crawford notes, the prospect for another federated, Internet wide communication system seem very remote at this point in time, so email is it.
Famous short story published in The New Yorker in 1948. Conveys a wonderful sense of place, being in a small, conservative town.
Why do deep neural networks (DNNs) benefit from very high dimensional parameter spaces? Their huge parameter complexities vs stunning performances in practice is all the more intriguing and not explainable using the standard theory of regular models. In this work, we propose a geometrically flavored information-theoretic approach to study this phenomenon. Namely, we introduce the locally varying dimensionality of the parameter space of neural network models by considering the number of significant dimensions of the Fisher information matrix, and model the parameter space as a manifold using the framework of singular semi-Riemannian geometry. We derive model complexity measures which yield short description lengths for deep neural network models based on their singularity analysis thus explaining the good performance of DNNs despite their large number of parameters.
the SAT is gone, now fairness and egalitarianism will reign
getting rid of the SATs is just another way for them to consolidate total and unfettered privilege to choose whoever is going to make their pockets even heavier, and that they are and will always be in the business of nominating an aristocracy that will deepen inequality and intensify exploitation no matter what kind of faces they happen to have, Black or white, Jew or gentile, all of them the elect, Elois over Morlocks, and this is the system to which you have lent your faith, the vehicle you expect to deliver us equality
I’m an independent researcher working on AI alignment and the theory of agency. I’m 29 years old, will make about $90k this year, and set my own research agenda. I deal with basically zero academic bullshit - my grant applications each take about one day’s attention to write (and decisions typically come back in ~1 month), and I publish the bulk of my work right here on LessWrong / AF . Best of all, I work on some really cool technical problems which I expect are central to the future of humanity.
Transformer-based models have pushed state of the art in many areas of NLP, but our understanding of what is behind their success is still limited. This paper is the first survey of over 150 studies of the popular BERT model. We review the current state of knowledge about how BERT works, what kind of information it learns and how it is represented, common modifications to its training objectives and architecture, the overparameterization issue and approaches to compression. We then outline directions for future research.
Thinks about nutrition in a much different way from how I’m accustomed to
Amazingly, it turns out this site is hosted on the platform of which I’m the lead engineer (Cloudflare Workers / Pages) and the problem is in fact our own fault, not Ben’s.
The reader may conclude that I do not expect anything startling in the way of scientific discovery. That is not the case; I am convinced that in 2022 the advancement of science will be amazing, but it will be nothing like so amazing as is the present day in relation to a hundred years ago. A sight of the world today would surprise President Jefferson much more, I suspect, than the world of 2022 would surprise the little girl who sells candies at Grand Central Station. For Jefferson knew nothing of railroads, telegraphs, telephones, automobiles, aeroplanes, gramophones, movies, radium, &c.; he did not even know hot and cold bathrooms. The little girl at Grand Central is a blasé child; to her these things are commonplace; the year 2022 would have to produce something very startling to interest her ghost. The sad thing about discovery is that it works toward its own extinction, and that the more we discover the less there is left.
Similar reforms apply to cooking, a great deal of which will survive among old fashioned people, but a great deal more of which will probably be avoided by the use of synthetic foods. It is conceivable, though not certain, that in 2022 a complete meal may be taken in the shape of four pills. This is not entirely visionary; I am convinced that corned beef hash and pumpkin pie will still exist, but the pill lunch will roll by their side.
It is practically certain that in 2022 nearly all women will have discarded the idea that they are primarily “makers of men”. Most fit women will then be following an individual career… But it is unlikely that that women will have achieved equality with men.
I can run my own mail server, but it doesn’t functionally matter for privacy, censorship resistance, or control – because GMail is going to be on the other end of every email that I send or receive anyway. Once a distributed ecosystem centralizes around a platform for convenience, it becomes the worst of both worlds: centralized control, but still distributed enough to become mired in time.
The second edition of Deep Learning Interviews is home to hundreds of fully-solved problems, from a wide range of key topics in AI. It is designed to both rehearse interview or exam specific topics and provide machine learning MSc / PhD. students, and those awaiting an interview a well-organized overview of the field. The problems it poses are tough enough to cut your teeth on and to dramatically improve your skills-but they’re framed within thought-provoking questions and engaging stories. That is what makes the volume so specifically valuable to students and job seekers: it provides them with the ability to speak confidently and quickly on any relevant topic, to answer technical questions clearly and correctly, and to fully understand the purpose and meaning of interview questions and answers. Those are powerful, indispensable advantages to have when walking into the interview room. The book’s contents is a large inventory of numerous topics relevant to DL job interviews and graduate level exams. That places this work at the forefront of the growing trend in science to teach a core set of practical mathematical and computational skills. It is widely accepted that the training of every computer scientist must include the fundamental theorems of ML, and AI appears in the curriculum of nearly every university. This volume is designed as an excellent reference for graduates of such programs.
The L2 latency from CPU core 0 was pretty reliably four cycles lower than the L2 latency from CPU core 1 or 2. So that’s interesting.
The crucial thing is that the path to the top-right CPU is quite short, whereas the path to the two CPUs on the left is noticeably longer – the signals have to travel horizontally across the interconnect.
/So the 5.5 mm actually only takes 0.625 nanoseconds each way, the signal actually goes 11 mm in 1.25 nanoseconds, so the original title of this blog post is a lie./
Light can travel roughly 30 cm in one nanosecond so you may be wondering why this signal is traveling /so slow/.
/aka nerds nerding out about / obsessing about / fetishizing Truth/
We build tools to make the command line glamorous.
I’ve used NixOS as the only OS on my laptop for around three years at this point. Installing it has felt sort of like a curse: on the one hand, it’s so clearly the only operating system that actually gets how package management should be done. After using it, I can’t go back to anything else. One the other hand, it’s extremely complicated constantly changing software that requires configuration with the second-worst homegrown config programming language I’ve ever used.
The worst is, of course, GCL/borgcfg, the (turing complete) configuration language for Google’s internal job scheduling software. I used to sit next to the team that was reimplementing the interpreter for it (since the original interpreter leaked memory all over the place), after performing what they referred to as a “forensic analysis” of the semantics of the language, and they were elated when they finally managed to get 90% of the config files in the monorepo to have the same behaviour as the original interpreter. Truly one of the worst languages I have ever seen.
The best known algorithm for this problem is in PSPACE. That is, it takes exponential time but only polynomial space. That means that as far as we know, this problem is crazily hard—among other things, PSPACE is harder than NP.
Most mathematical questions one could have about Wordle are settled by now, and a few remain open. I summarize here what is known, as far as I can tell.
“The iron law of capitalism”
The fundamental reason why the United States is so bad is because health care breaks the iron law of capitalism, which is that the person who uses a good or service should be the person who chooses it, and who pays the bill. Doctors & patients together choose a medical treatment, but insurance pays the bill. Individuals use insurance, but don’t actually choose or pay for it; that’s done by their employers, and the plans are so complicated that individual consumers can’t make heads or tails of it anyway.