this post was submitted on 17 Sep 2024
305 points (98.4% liked)
xkcd
9059 readers
258 users here now
A community for a webcomic of romance, sarcasm, math, and language.
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I think a lot of the reason is that fields (the real numbers in this case) have some pretty lousy categorical properties, and you can't define a very nice additive and multiplicative structure on ℝ^n^ for n >3. So you end up having to deal with vector spaces instead of fields. i.e., you can't (in general) multiply or divide points in ℝ^n^ by other points in ℝ^n^, so you have way fewer tricks at your disposal. The other thing is that you don't have a way to order points in ℝ^n^, so nice things like the mean value property sort of disappear. There are a few other complications as well, but I think those are the big ones. It's a whole other beast than singlevariate analysis.
I feel like this is just an unfortunate part of learning math. I'm not really sure that feeling ever goes away, but it usually means you're making progress. My experience has been that the more math I learn, the more comfortable I get with the things I already know, and the more I realize how much is left to learn. So it feels like I only really know the "basic stuff" and continue to struggle with the "hard stuff". My advice would be to try to not be discouraged by it, although it's easier said than done.
Onto the more technical questions. I'll try to make things handwavey to hopefully make the "big picture" shine through a bit. I think analysis textbooks are a bit guilty of getting too wrapped up in the details and missing the forest for the trees (or however the saying goes).
The total derivative is basically just a way to turn calculus problems into linear algebra problems. I think it's best understood by first looking at the one dimensional case, and then trying to generalize it a bit to higher dimensions. The key idea is this:
Notice how in the 1-dimensional case this is just a "clever" way to rephrase that f'(x~0~) is the "instantaneous slope" of f at x~0~.
In higher dimensions, it no longer makes sense to approximate f with a straight line, because lines are 1-dimensional objects, whereas the domain/codomain of the function might not necessarily be 1-dimensional. However, it does still make sense to talk about the best linear approximation of f. A bit of linear algebra knowledge helps to make this idea clearer, but I'll try to do my best to explain it with as little linear algebra as I can. (But let me know if you want a more linear algebra heavy explanation.)
A higher dimensional linear function is (basically) just a matrix, and a matrix is basically just a way to (linearly) turn one vector into another vector. At a high level, you can think of a matrix as turning one copy of ℝ^m^ into another copy of ℝ^n^, possibly rotating/translating/scaling things in the process. (Compare this to the 1-dimensional case, where a 1 x 1 matrix is just a number, and multiplying by a a number "turns one copy of ℝ to another copy of ℝ", provided that number isn't 0.)
So, the total derivative is basically just a matrix that gives the best way to approximate a multivariable function f at a vector x~0~. And as you vary the input vectors, you end up tracing out a copy of ℝ^n^ for some n. i.e., you get an n-dimensional plane that corresponds to the "best" approximation for f. And "best approximation" is just a slightly less fancy way of saying "tangential".
I always found the gradient to be a bit confusing. But I think it helps to understand it best in terms of what it does, and not in terms of how it's defined. The "purpose" of the gradient is to let you compute the directional derivative. i.e., what is the derivative in the direction of a given vector v. So, lets use the notation
(∇f)(v) to denote the directional derivative of f, in the direction of v.
Let's consider the 3-dimensional case and write v = a~1~e~1~ + a~2~e~2~ + a~3~e~3~ for basis vectors e~i~ and real numbers a~i~.
Since "taking the derivative" is linear, we would expect to have
(∇f)(v) = (∇f)(a~1~e~1~ + a~2~e~2~ + a~3~e~3~) = a~1~(∇f)(e~1~) + a~2~(∇f)(e~2~) + a~3~(∇f)(e~3~).
In other words, we only need to compute the directional derivative of the basis vectors in order to figure out the gradient. That's pretty nice! Also, the derivative of ~f~ in the direction of e~i~ is exactly the partial derivative of f taken with respect to e~i~. Let's write f~i~ for the partial derivative with respect to e~i~ (just because I don't know how well Lemmy handles double subscripts). Then we can rewrite the above equation as
(∇f)(v) = = a~1~f~1~ + a~2~f~2~ + a~3~f~3~.
Now compare that with the dot product of the vectors (f~1~, f~2~, f~3~) and _v = (a~1~, a~2~, a~3~). It's exactly the same. So, the gradient can be defined in terms of taking the dot product of a vector with the partial derivatives. But I think that kind of loses a lot of the intuitive meaning of the gradient in the process.
I hope you found some of this helpful, and feel free to ask if you have any more questions/found something I said confusing.
Thanks for answering my frustrated questions, was a long day yesterday. I'll try to understand the deeper truths later, but I can already tell the matrix stuff goes over my head.
anytime. i’ve also had my fair share of long days studying analysis. and i feel like most of my time spent trying to learn analysis was spent fighting with the textbooks. i think the (ε,δ) stuff is to blame for that, but that’s a whole other topic.
anyways, i was thinking a bit more about the matrix stuff and i think i have a better explanation if you’re interested, since my previous one was probably a bit too abstract. i think it should honestly be criminal to teach multivariate analysis before linear algebra, since a lot of the purpose of multivariate analysis is to turn complicated problems into linear problems. but anyways, here’s the big picture:
you don’t really need to understand the ins and outs of matrices and be super familiar with them to get a sense of what the total derivative is, and how it should behave. for that purpose, here are some of the highlights of matrices and the total derivative:
Let A be an m x n matrix. Then:
So those are two ways to look at the total derivative: you can try to get a geometric understanding of what it does (approximate the function with the best fitting plane), or try to look at why it’s useful (turning harder problems into easier problems). But just to be clear, dealing with matrices is still hard, it’s just comparably a lot easier than dealing with random functions.