Hello. This is a very short video to discuss some of the defensive programming techniques in the cloud. Let's get started.

The first technique that may be well known to you already is to make sure that we include retry logic, and of course we have included retry logic when building on-premises app- applications as well, but retry logic takes an important new role in the cloud because they can be transient failures that can happen in the cloud more often than on premises. 
	

So in the cloud when we write retry logic, when we are using services, a number of these services come with client libraries that have retry logic baked into them and when we don't have that retry logic capability available to us as part of the client's [inaudible 00:00:56] then we can use third party libraries like Poly to make sure that we implement retry logic as part of our code to access these cloud services. 
	
So here we are inside some visual studio code. In this case we are going to be taking advantage of a client library that comes as part of higher storage. And as I said earlier, this client library comes with a retry option available to us and you can see as part of this code that I'm for now going with a retry policy of no retry. Okay, now keep in mind, this fortune machine is running in Azure, this code that is. So I just cannot turn off network connectivity. Instead, to demonstrate a transient failure, what I'm going to do is I have Fiddler running on this machine and then, within Fiddler, as you know, I can set up custom rules, so in this case, I've set up a custom rule which can, uh, return a certain failure right here, let me go back to this again, which can return a certain failure, in this case 408 or time out, uh, I can simulate this failure from my failure condition here. 
	
So let's just go ahead and uncomment this failure rule and keep in mind that our code is going to make a call to this storage location, C-O-P-R-A-G-R-S blob dot col dot windows dot net, let's just go ahead and save that. And let's just go back to visual studio and lets run this code. 
	
And we will get an exception, of course, because of our custom rule that a time out error has happened, okay? Just stop this program, and let's just go back and apply a retry policy. So I'm going to comment out this no retry logic right here and then uncomment retry policy which I'll talk about in just one second. So, here we are setting up retry policy, which gives us an explanation retry, which is really important to understand, so we not only do we want to retry a transient failure, but we also want it that our operation does indeed get completed, as you can see, our operation is now complete. 
	
Okay, so we have seen the retry logic but it is also important to make sure that we have tested our retry logic because oftentimes you're not able to simulate the transient exception conditions that we have in a production setting. I'm going to look at an example of trying to simulate a transient failure, and like I said, it is hard to sometimes create transient failures. I want to show you an example of a REDIS cash server. In this instance, the source code for REDIS cash server is available on Windows, a C plus, plus source code. What we did was we rejected some transient failure, some random transient failures as part of the source code so that, uh, when the connection are being made, some connections can be throttled or exceptions being thrown. 
	
So I'm going to start the server right here, so take the C++ source code and then injected these transient random failures in that, let's just go ahead and run this piece of code. So we're starting the REDIS cash server here, as you can see. Just zoon into this in a moment, so we started a REDIS cash server and you can do that, too, and then injected some faults and I'll point you to a link where we show you the code that we added in order to inject this failure. 
	
So while we have this server running, let me go back, uh, to visual studio and let me show you a piece of client code, in this case, REDIS client code which is going to make a number of calls to this server that we just started. This is the server that we just started and then going to make hundreds of calls and then because the random failures are going to be thrown, we want to see how well our retry code does in the face of these failures. So let's just go ahead and run this application here and this will take several seconds to complete and it is now making a call against that server that I showed you previously. 
	
Once this operation completes, we can go and look at how many failures did our client or the exception logic did handle gracefully and then what needs to be done in order to make sure that our code is indeed robust. 
	
Looks like the test that we ran a moment ago where we were continuously trying to make calls against the REDIS cash, and remember that we had the REDIS cash server that injected transient failures, and now we get to come back and see how many operations did indeed fail, and why did they fail, did we have the right retry logic baked into that or not? 
	
So, if you want to try this code yourself, we have put out another blog here, let me go ahead and show that to you. So here's the blog that we have put out earlier that talks about all of the code that are needed to inject the transient failure. Let me zoom into it, you can see. So if you go to appliedis.com and this blog right here, you can have access to all of the code that is needed to try this experiment yourself. But the key point of course, here to take away is that you need to think of creative ways so that you are indeed able to test your retry logic adequately. 
	
The third and final difference in programing technique that we want to talk about as part of this first video is the circuit breaker pattern that you may be already very familiar with. And once again, this pattern is extremely important in the cloud. Let me just show you a quick blog post. Whenever a cloud outage happens, often times there's a root cause analysis conducted and I just randomly searched for one such outage and you can see that the root cause, this is in 2017, October 2017, doesn't really matter, uh, what originally caused this problem. Happens to be a slow running query, but on October 21st as the author says, this was the first time a threshold where, uh, the Q table was long enough and as the result, the Q, uh, reached a point where more and more items would end up and then causing the system to come to it's knees and eventually cause an outage. 
	
So I just randomly picked a cloud outage and a root cause scenario and you can see why this pattern, or implementing this pattern is so critical. 
	
So let's just see this in action and you'll remember that we have talked about lab recall policy, which I talked about in the context of exception handling. Turns out that this is also good for implementing a circuit breaker pattern. 
	
So let me go back to visual studio, so what you see on the screen here is the poly library and in fact, I'm implementing the circuit breaker policy, which means that if we make a certain number of attempts and we get failures that we are going to cause the circuit to open, which means no more calls will be possible. So in this case, the circuit breaker project is comprised of two projects here really, the server app and the client. And what I did is I went ahead and ran the server, you can see a very simple website here and then let's just go back to our client here.
	
And I'm going to start this again so notice that our server is indeed running and I'm going to bring up the client. Let's just go ahead and run the circuit breaker client and you should see that, that our client is indeed able to successfully call the service. Now, let's just go back to visual studio and lets just stop our server temporarily and then let's just go back to our client and hopefully we'll start seeing errors. So you can see, our client failed to connect to the server once, twice and thrice and now that the circuit is open, no more calls are being made to the server for a certain period of time. In this case, we are waiting for two seconds or three seconds and then, once the server comes back up again and we come back up from the sleep, we are going to start making the calls again. 
	
So let me go ahead and start the server one more time here. So go back to the visual studio project. So once again I'm going to go ahead and run the server. And once the server comes back up again ... we should see right now our client once again the circuit is closed once again and now we are once again able to make our calls to the server. So simple pattern like this can, can go a long way in, in causing an exception in the cloud to take place. 
	
Hope you found these tips useful. We'll come back with the part two of this video to talk about some more defensive programming techniques that are really important when working with cloud based projects. Thank you.