State in imperative programs
Or, the hidden and unchangeable mystery
Here's the way I like to think about state in imperative programs — it's hard because it's not something you can get far away from enough to see, usually.
In imperative programs, the value of a variable 'a' at one point is not always the value of the variable 'a' at another point later in the code. In some sense, each statement that gets executed is passed the entire state of the machine (the world) implicitly, and then when the statement ends, it passes the state of the world on to the next statement. If you want to access the value of the variable 'a', then 'a' gets looked up in the environment/state.
In C/C++/Java/C#/Python/Perl, etc. this is done for you automatically and efficiently. It's just the way the machine works. But you don't have the choice to change this or, as someone put it, "overload the semicolon."
In Haskell none of this variable-mutating, state-passing can occur, so it gets created from scratch, and voila, we have the State Monad. It makes it sound like a lot more work than it should be just to do something that comes for free in most other languages, but in these languages, you can't overload the semicolon! And if you could, who knows what could go wrong at runtime (imagine Perl with semicolon overloading... I bet some day they will do this just because they can...). In Haskell, everything is watched over by the type system, so the parts of your program that explicitly need to munge state are isolated with the some type tag, e.g. ParseContext, while the rest of your program is "normal" and pure and functional.
The Zen of Monads: there is no Zen of Monads
In Haskell you don't overload the semicolon, instead you overload >>= ("bind") and create a new instance of the Monad typeclass with this function (as well as a function called "return", where these two functions obey certain laws that keep things kosher, i.e. "return" is the identity, >>= is associative, and they interact as you would expect). It just so happens that the semicolons (or newlines) in do blocks get converted into >>=s as well as lambdas to scope your variables over the rest of your actions.
The reason monads are cool and magical is that now that we've made this sequencing explicit in the type, we can swap out the machinery (what type of implicit state we're passing around or the bookkeeping >>= function itself) and change a minimal amount of code, i.e. nothing... except adding code to use the new functionality (if all goes well). We can now get continuation passing, or true mutable variables (e.g. IORef), or a simple sort of non-determinism (e.g. List as a monad), or failure (Maybe) or exception handling, or any number of behaviors (or combinations) by changing the monad.
The problem with monads is not that they are "advanced" but that they are so painfully and subtly abstract (It just so happens that you can do amazing, convenient, efficient, magic and otherwise advanced things with them, especially with the libraries. I was going to say "subtly simple" but maybe they aren't for most working non-Haskell programmers... and there are a lot of things about them that I don't understand yet.) Another problem is that everyone has different ways of explaining them or trying to define what they are (a way of sequencing computation? or a type constructor? or a type class? or a burrito?). Of course, they are all those things (except the burrito), which makes it even more confusing. At a certain point, though, I think they just subconsciously click and boom, now you get it.
To respond to the accusation that Haskell knowledge accumulates like a snowball rolling downhill, I just have to say I experienced that too. I like how Haskell is about the concepts (even things you didn't know you were already doing implicitly) and the language is just a very transparent vehicle for manipulating them. Haskell has really taught me a lot and helped me think a lot clearer — but now I think in Haskell! The danger is that since I've taken the red "Haskell" pill I have a hard time going back and looking at the imperative world the same way. I have to resist trying to cram all these elegant approaches into a system that can't support them and instead program like a normal person. (The good news is that many languages are slowly soaking up features, e.g. C# delegates are higher-order functions and anonymous delegates are lambdas, C# 3.0 will have some kind of type inference, etc. but these features are many decades old. They have a lot of catching up to do.)