Are class variables persistent between requests?
By default, on development no, on production yes.
Why?
because classes are only loaded once in production, but are reloaded on every request in development. Though, instances of those classes are loaded on every request.
Again why?
It takes time to reload the classes every time. But in development this allows you to make changes on the fly and test them without restarting your servers.
Can I change this configuration in development to emulate production?
Yes. In the development config file config/environments/development.rb do:
config.cache_classes = true
So, should I use class variables as I wish to store data between requests?
No, there are a few things you need to consider first. Reading and writing to the same variables from multiple requests is not safe if you enable multithreaded dispatching. If you are only reading the variables and not modifying them then you are good to go.
What if I want to modify them too?
Then make it thread safe. If the value changes in one thread, another running thread will not necessarily pick up that change. For that reason, this is a good solution to some but not all cases. A global cache is one problem you can solve with class variables. If a thread doesn't find the value it will just fetch it again. Not a big deal.
Okay, so how do I make it thread safe?
You can use the Mutex Class. Here is an example.
class ExampleController < ApplicationController
@@lock = Mutex.new
@@cache = {}
def operation(expensive_operation_A)
unless ExampleController.read_cache(expensive_operation_A)
do_expensive_operation(expensive_operation_A)
ExampleController.write_cache(expensive_operation_A)
end
end
def self.read_cache(expensive_operation_A)
@@lock.synchronize do
@@cache["#{expensive_operation_A}"] && (@@cache["#{expensive_operation_A}"] == true)
end
end
def self.write_cache(expensive_operation_A)
@@lock.synchronize do
@@cache["#{expensive_operation_A}"] = true
end
end
end
In this example you have expensive operations that only needs to be done once. It's okay if it was done again, but that would reduce performance. So you store whether the operation was done or not in a class variable @@cache. @@cache is a hash that keeps track of the expensive operations and whether they were completed.
so if @@cache = {"expensive_operation_A" => true}, then expensive_operation_A was already done, no need to do it again. Otherwise, do this operation and update the cache.
The cache is persistent between requests and therefore needs to be thread safe. self.read_cache and self.write_cache have to be class methods.
UpdateUse with care
In this example it is assumed that you have a limited number of expensive operations that must be performed at least once, and it's not destructive if it happened more than once. If its not a closed set of keys you will create a memory leak.
If you are looking for a full blown cache check other existing work. This is only good as a global variable, and is definitely not a one solution solves all.
But this is not good if you are running production with multiple server instances running (say a few mongrels, passenger etc). Honestly, I don't know where is right approach to use it: not caching, not storing data between requests...
ReplyDeleteUse Memcache for caching ;-)
ReplyDeleteHi
ReplyDeleteThank you for taking your time and sharing your thoughts with us.
A few thoughts to consider:
@@class_variables might not be the best choice due to their behaviour. They are shared over class hierarchy. The "private" in your class does nothing, and if it did actually make the class methods private, it'd render your example broken as you call read-/write_cache with a receiver.
The example code is broken as your *_cache methods don't take arguments.
Also caching without a pruning strategy is only viable if you have a closed set of keys, otherwise you create a memory leak by adding new key/value pairs to the cache without ever deleting them. Additionally as Hubert Łępicki said, it is also only viable if you're either on a single server or a stale cache invalidates itself over multiple servers on its own or doesn't need to be invalidated at all.
Taking all that into account, one might be well advised to use existing work that takes distributed servers, cache expiration/invalidation and other things into account and is well tested. Rails comes with built-in low level caching capabilities. See Rails.cache.
Last but not least a small correction: Mutex is not a rails thing ("You can use Rails' Mutex Class"), it's part of rubys stdlib "thread" (require 'thread' and you get Mutex and some other nice classes).
While I said a lot of things why I consider this post a bad advice, I want to emphasize that I appreciate that you took the time and wrote it. Besides of what I said I find it well written and well explained. Thank you.
Best regards
Stefan Rusterholz, aka apeiros
An alternative implementation: https://gist.github.com/831425 - just as an example for how one could do the part that you did too. It doesn't take expiration/pruning of the cache into consideration either, nor does it provide any cross-process or cross-server sharing mechanics.
ReplyDeleteThanks for all your comments. I updated the post to reflect some of the things you said.
ReplyDeleteThis is a great tutorial …one of the best I’ve seen from you yet. I really appreciate you sharing your inside tips and tricks…
ReplyDeleteOr you could just do...
ReplyDeleteunless Rails.cache.exist?("mydata")
Rails.cache.write("mydata", "value")
end
Rails.cache.read("mydata") => "value"
I wrote some code a few years ago that uses a simple class hash variable to store the results of some expensive queries. It was before we had memcached setup and it worked quite well (I thought). Just yesterday we started having some severe issues in production. The culprit was the class level hash. Turns out that after a certain number of requests, that cache would be GC'ed - which I did not think would happen in production mode.
ReplyDeleteI highly advise that you use memcached to to do this. Rails provides a nice Rails.cache(key) {...} with a block. It will either find the key in the cache, or execute the block, take it's return value and stuff it back into the cache for the next execution. It is very easy in a few lines of code to accomplish this and you even get the benefit of every server process having access to the same cache.
Remember to take advantage of defining a #cache_key instance method on any objects that may need them. ActiveRecord objects have a #cache_key method built-in. So a User object with a primary key of 420 would return this string for #cache_key => "/users/420". Put some thought to what objects may need that method and what is the best string to return. Rails makes use of asking objects for a #cache_key and even has a ActiveSupport::Cache.expand_cache_key method that takes multiple args, some could be strings and some could be objects that respond to #cache_key.
Have fun :)
I should also point out that ActiveRecord #cache_key method is much smarter. If the object has an updated_at column, that is included in the string as an integer which helps bust the cache when it changes. FFT.
ReplyDeleteI wish more people explained things in this manner. Great style - thank you.
ReplyDelete