Rob, please tell me you're not serious about HTML-encoding all input by default. I appreciate that you're trying to protect people from their own ignorance or forgetfulness, but as shinakuma indicated, this causes more problems than it solves.
[1] It's an inappropriate mix of technologies. HTML encoding is only a concern of displaying HTML - it has no place in a SQL database. It's ugly when querying, interferes with full-text indexing, and clashes with other applications that might be reading from
the same DB. What about when I'm not outputting to HTML (e.g. when rendering to a plain-text email, WinForms app, or 3rd party system) - do I have to unescape it first? Yuck!
[2] It leads to more serious security vulnerabilities. You say it's safer to "protect inbound", but here the opposite is true. Remember that by saying "we'll protect inbound", you're also saying "you must not protect outbound" (otherwise you'll double-escape
and & will render as &). So now we are forced to trust that all systems writing to the DB remember to encode all strings exactly once, and any single vulnerability spreads to all parts of the system because nothing protects outbound. "Perimiter security"
is not safer when it actually prevents you from adding real security.
[3] It's inconsistent with how decent ASP.NET controls work. Well-written controls take responsibility for HTML-encoding their own output, so pre-escaped strings would cause display problems.
[4] It's inconsistent with how other aspects of the MVC framework work. You're not pre-escaping all values passed to IHttpRequest.Form, so myObject.Value = form['key'] would behave differently to myObject.UpdateFrom(form).
If someone on my team encoded strings before putting them into the DB, then depended on that when rendering them later, I'd have to tell them off and ask them to change it. Please don't encourage this behaviour!
ScottGu and I talked a bit about this and HTMLEncoding a security concern so we have to take this into consideration when deciding how to deal with these things.
SteveSanderson1
It's an inappropriate mix of technologies. HTML encoding is only a concern of displaying HTML - it has no place in a SQL database. It's ugly when querying, interferes with full-text indexing, and clashes with other applications that might be reading from the
same DB.
In terms of it's "place" - I highly disagree. Anything that is being reported back to a web front end (usually a browser) should, at some level, be checked so that you don't inadvertantly sabotage your user. Database, flatfile, cache - the storage medium
has nothing to do with this :). FullText indexing ignores anything in <>.
One thing you can do here is to Encode on the way out - but again that's a perf thing you'd need to think about. I hear you though about the other readability issues, and we can have an override to turn it off.
SteveSanderson1
"Perimiter security" is not safer when it actually prevents you from adding real security.
Yep. Security comes at a cost at some level and I agree with you in principle here. However you're talking about a 1% issue at best (shared DB input) and many architects would completely cringe at sharing a DB with a web site for this very reason. However
your point is valid and again - we'll provide an override so the decision to turn this off if you need.
SteveSanderson1
It's inconsistent with how other aspects of the MVC framework work.
Hopefully not - but if we're not encoding properly we need to take a look at that.
Ultimately it boils down to line of responsibility - should Microsoft, by default, disable security or should you, as the developer, disable it for yourself? I'm all for trying to find a better answer here - would another route be acceptable - something
like "UpdateFromEncode(Request.Form)"? The thing is - and I'm going to stick to this - the decision to turn encoding off needs to be the developer's.
--Iniviting ScottGu and DamienG to this thread - it's a good one and it's important.
WebForms by default validates the request for injection attacks, it does NOT however by default html encode everything to be passed along "securely" into your DB. It's unreasonable to assume that your data will be in a silo consumed only by your web frontend
when dealing with enterprise applications. I don't want to have to worry about html decoding when I import these data into my warehouse or sync over to a legacy system, or like someone said output to a non-html format. That is just backwards. The default behavior
should be you encode for display, not for storage.
Any helper functions that output HTML should encode values using HttpUtility.HtmlEncode for literals and HttpUtility.HtmlAttributeEncode for attributes such as value on input tags.
If users want to output unencoded HTML then let them do that themselves without a helper as it's not easy to deal with.
Where you are bringing data back for processing through a form you don't need to do anything - the request variable will contain it unencoded.
Definitely do not encode it going into the database - to do so would pollute your data with HTML encoding. If you thought having presentation and model logic separate was important then imagine what stuffing HTML into the database would mean. If you ever
wanted to show it via another application such as a WinForm, Console or XML variant you'll have to do cross-conversion. It would also mean that you can no longer HTML encode the output or you would double encode it and actually start displaying < in text
boxes.
Rob (and the others), thanks for giving this issue such careful consideration. I appreciate that you're trying to do the right thing by "enabling security by default".
Many arguments could be made, but for me the key thing is "what kind of a future are you creating". As you're in the priviledged position of designing a Microsoft-endorsed framework, the mindset you take is going to influence countless web developers for
many years to come (otherwise, this would just be a minor technicality and I wouldn't really care either way). If you proceed with the idea "we'll encode the data inbound", you are not just perpetuating ignorance about encoding it outbound, you're actually
*enforcing inaction*. It means that developers learn like monkeys they must *not* encode the data they send, because some black magic somewhere in the system does that and they have to trust it to avoid double-escaping. Do you think they are going to behave
differently when working with different data sources? It just creates unpredictability, instead of the simple rule: "escape any strings that you don't want to render as HTML".
In other words, it only enables an illusion of security, and ultimately causes a greater loss of security (as well as millions of wasted person-hours trying to remember which pieces of data are pre-encoded and which ones aren't). So yes, it does boil down
to responsibility - responsibility for our future, and our children's future... ([+o(])
What I would ultimately like, though I appreciate this would turn a few things upside-down, would be to change the meaning of <%= ... %> on MVC ViewPages, so it escapes the strings you pass it. You would force developers to use more keystrokes when they
want to avoid escaping (instead of fewer), maybe using <%!= ... %> or even <%= RawHtml(...) %>.
robconery
However you're talking about a 1% issue at best (shared DB input) and many architects would completely cringe at sharing a DB
Considering the other comments posted, the people who hang out on this forum are somewhat architecturally minded - sharing databases, legacy data storage, and interacting with external systems aren't 1% issues for us - it's what we do every day. Perhaps
we're not representative of your wider customer base, but we are the opinion shapers.
robconery
One thing you can do here is to Encode on the way out - but again that's a perf thing you'd need to think about
Have you done any benchmarking of this? As a quick test, my humble desktop can HTML-encode 1.5 million 30-character strings per second (in which about every 5th character needs encoding), and about half of that processing time goes into allocating a new
string on the heap. Compared to the cost of a database query, a few hundred encodings is completely insignificant, and if you're pushing your webserver that hard, I'd hope you're using output caching anyway.
robconery
UpdateFromEncode(Request.Form)
Yeah, great! Or just, UpdateFrom(Request.Form, Encoding.HtmlEncodeAllFields), as long as the default is not to encode.
HTML encoding is an output-only issue - there should NOT be any facility to encode data going into the database. 100%, no exceptions, not even as an option the developer can specify.
If you HTML encode data into the database you force web developers into writing unsafe code because if they do HTML encode now it will be double-encoded and will display as a mess of " etc. all over the screen/output/XML/RSS.
Of course there are times when HTML is desired in the database - rich text might be one option - but that's a totally different game and in fact requires neither encoding or decoding which would actually break that. In those scenarios you require careful
scrubbing to ensure only the tags you wanted in there get kept.
You effectively turn what was a simple rule - encode data into the output stream correctly - into a guessing game by doing this.
SteveSanders...
Member
6 Points
3 Posts
Re: UpdateFrom and Encoding
Dec 16, 2007 05:59 PM|LINK
Rob, please tell me you're not serious about HTML-encoding all input by default. I appreciate that you're trying to protect people from their own ignorance or forgetfulness, but as shinakuma indicated, this causes more problems than it solves.
[1] It's an inappropriate mix of technologies. HTML encoding is only a concern of displaying HTML - it has no place in a SQL database. It's ugly when querying, interferes with full-text indexing, and clashes with other applications that might be reading from the same DB. What about when I'm not outputting to HTML (e.g. when rendering to a plain-text email, WinForms app, or 3rd party system) - do I have to unescape it first? Yuck!
[2] It leads to more serious security vulnerabilities. You say it's safer to "protect inbound", but here the opposite is true. Remember that by saying "we'll protect inbound", you're also saying "you must not protect outbound" (otherwise you'll double-escape and & will render as &). So now we are forced to trust that all systems writing to the DB remember to encode all strings exactly once, and any single vulnerability spreads to all parts of the system because nothing protects outbound. "Perimiter security" is not safer when it actually prevents you from adding real security.
[3] It's inconsistent with how decent ASP.NET controls work. Well-written controls take responsibility for HTML-encoding their own output, so pre-escaped strings would cause display problems.
[4] It's inconsistent with how other aspects of the MVC framework work. You're not pre-escaping all values passed to IHttpRequest.Form, so myObject.Value = form['key'] would behave differently to myObject.UpdateFrom(form).
If someone on my team encoded strings before putting them into the DB, then depended on that when rendering them later, I'd have to tell them off and ask them to change it. Please don't encourage this behaviour!
abombss
Member
575 Points
164 Posts
Re: UpdateFrom and Encoding
Dec 16, 2007 08:16 PM|LINK
+1Steve
SteveSanders...
Member
6 Points
3 Posts
Re: UpdateFrom and Encoding
Dec 16, 2007 08:54 PM|LINK
Um... maybe that was an excessive rant - I do really like the MVC framework so far! Good work [:)]
robconery
Participant
852 Points
195 Posts
Re: UpdateFrom and Encoding
Dec 17, 2007 11:02 PM|LINK
ScottGu and I talked a bit about this and HTMLEncoding a security concern so we have to take this into consideration when deciding how to deal with these things.
In terms of it's "place" - I highly disagree. Anything that is being reported back to a web front end (usually a browser) should, at some level, be checked so that you don't inadvertantly sabotage your user. Database, flatfile, cache - the storage medium has nothing to do with this :). FullText indexing ignores anything in <>.
One thing you can do here is to Encode on the way out - but again that's a perf thing you'd need to think about. I hear you though about the other readability issues, and we can have an override to turn it off.
Yep. Security comes at a cost at some level and I agree with you in principle here. However you're talking about a 1% issue at best (shared DB input) and many architects would completely cringe at sharing a DB with a web site for this very reason. However your point is valid and again - we'll provide an override so the decision to turn this off if you need.
Hopefully not - but if we're not encoding properly we need to take a look at that.
Ultimately it boils down to line of responsibility - should Microsoft, by default, disable security or should you, as the developer, disable it for yourself? I'm all for trying to find a better answer here - would another route be acceptable - something like "UpdateFromEncode(Request.Form)"? The thing is - and I'm going to stick to this - the decision to turn encoding off needs to be the developer's.
--Iniviting ScottGu and DamienG to this thread - it's a good one and it's important.
abombss
Member
575 Points
164 Posts
Re: UpdateFrom and Encoding
Dec 17, 2007 11:15 PM|LINK
We are talking about Encoding not Decoding.... The name alone dictates the direction.
What if I am not even outputting html but some other format. Now I need to ensure that all my input is not encoded?
On by default is just confusing. I would change the method names all together so it is explicit what is going on.
shinakuma
Member
378 Points
92 Posts
Re: UpdateFrom and Encoding
Dec 17, 2007 11:42 PM|LINK
WebForms by default validates the request for injection attacks, it does NOT however by default html encode everything to be passed along "securely" into your DB. It's unreasonable to assume that your data will be in a silo consumed only by your web frontend when dealing with enterprise applications. I don't want to have to worry about html decoding when I import these data into my warehouse or sync over to a legacy system, or like someone said output to a non-html format. That is just backwards. The default behavior should be you encode for display, not for storage.
damieng
Member
38 Points
16 Posts
Re: UpdateFrom and Encoding
Dec 17, 2007 11:48 PM|LINK
Any helper functions that output HTML should encode values using HttpUtility.HtmlEncode for literals and HttpUtility.HtmlAttributeEncode for attributes such as value on input tags.
If users want to output unencoded HTML then let them do that themselves without a helper as it's not easy to deal with.
Where you are bringing data back for processing through a form you don't need to do anything - the request variable will contain it unencoded.
Definitely do not encode it going into the database - to do so would pollute your data with HTML encoding. If you thought having presentation and model logic separate was important then imagine what stuffing HTML into the database would mean. If you ever wanted to show it via another application such as a WinForm, Console or XML variant you'll have to do cross-conversion. It would also mean that you can no longer HTML encode the output or you would double encode it and actually start displaying < in text boxes.
damieng
Member
38 Points
16 Posts
Re: UpdateFrom and Encoding
Dec 18, 2007 12:41 AM|LINK
I have a new post up on my blog entitled 5 signs your ASP.NET application may be vulnerable to HTML injection if anyone is interested in the subject (it affects WebForms and HtmlControls too).
SteveSanders...
Member
432 Points
119 Posts
Microsoft
Re: UpdateFrom and Encoding
Dec 18, 2007 08:46 AM|LINK
Rob (and the others), thanks for giving this issue such careful consideration. I appreciate that you're trying to do the right thing by "enabling security by default".
Many arguments could be made, but for me the key thing is "what kind of a future are you creating". As you're in the priviledged position of designing a Microsoft-endorsed framework, the mindset you take is going to influence countless web developers for many years to come (otherwise, this would just be a minor technicality and I wouldn't really care either way). If you proceed with the idea "we'll encode the data inbound", you are not just perpetuating ignorance about encoding it outbound, you're actually *enforcing inaction*. It means that developers learn like monkeys they must *not* encode the data they send, because some black magic somewhere in the system does that and they have to trust it to avoid double-escaping. Do you think they are going to behave differently when working with different data sources? It just creates unpredictability, instead of the simple rule: "escape any strings that you don't want to render as HTML".
In other words, it only enables an illusion of security, and ultimately causes a greater loss of security (as well as millions of wasted person-hours trying to remember which pieces of data are pre-encoded and which ones aren't). So yes, it does boil down to responsibility - responsibility for our future, and our children's future... ([+o(])
What I would ultimately like, though I appreciate this would turn a few things upside-down, would be to change the meaning of <%= ... %> on MVC ViewPages, so it escapes the strings you pass it. You would force developers to use more keystrokes when they want to avoid escaping (instead of fewer), maybe using <%!= ... %> or even <%= RawHtml(...) %>.
Considering the other comments posted, the people who hang out on this forum are somewhat architecturally minded - sharing databases, legacy data storage, and interacting with external systems aren't 1% issues for us - it's what we do every day. Perhaps we're not representative of your wider customer base, but we are the opinion shapers.
Have you done any benchmarking of this? As a quick test, my humble desktop can HTML-encode 1.5 million 30-character strings per second (in which about every 5th character needs encoding), and about half of that processing time goes into allocating a new string on the heap. Compared to the cost of a database query, a few hundred encodings is completely insignificant, and if you're pushing your webserver that hard, I'd hope you're using output caching anyway.
Yeah, great! Or just, UpdateFrom(Request.Form, Encoding.HtmlEncodeAllFields), as long as the default is not to encode.
http://blog.codeville.net/
damieng
Member
38 Points
16 Posts
Re: UpdateFrom and Encoding
Dec 18, 2007 09:21 AM|LINK
HTML encoding is an output-only issue - there should NOT be any facility to encode data going into the database. 100%, no exceptions, not even as an option the developer can specify.
If you HTML encode data into the database you force web developers into writing unsafe code because if they do HTML encode now it will be double-encoded and will display as a mess of " etc. all over the screen/output/XML/RSS.
Of course there are times when HTML is desired in the database - rich text might be one option - but that's a totally different game and in fact requires neither encoding or decoding which would actually break that. In those scenarios you require careful scrubbing to ensure only the tags you wanted in there get kept.
You effectively turn what was a simple rule - encode data into the output stream correctly - into a guessing game by doing this.