where vs select impact in Rails on memory & performance
I was writing a seed file where in it was needed to update details pertaining to service_id in a table. My current task needed to look for 10000+ services and update the records.
In a traditional setup, I would had stored all the services in a variable and worked on each record (I know it is bad but wanted to see how bad it can get).
To understand the impact of this I wanted to see two kinds of impact on system & resources as:
- Memory allocated to the variable
- Time take by SQL to execute the query
There is an excellent module in Ruby to check the memory used by a variable called as Object Space. To use it inside Rails Console, do this :
This module includes a lot of awesome things to actually study the internals of an object but lets focus on our current considered issue. The memsize_of method helps to get the size of an object in bytes in following fashion.
So I decided to improvise and just select the id of the services using select method in ActiveRecord
So curiously I had anticipated that this should be a lot smaller than the earlier approach. To my disappointment it was not :(
Suprisingly the size of this object was exactly the same. So ActiveRecord treats both of the queries as same instantitates all of the objects & attributes immediately.
Now, I thought that to optimise this issue lets try to use the where method in ActiveRecord. I was aware of the number of records I needed to work with so tweaked the query accordingly.
And now went on to the explorative path to get astonished.
As per the behaviour of ActiveRecord, where returns an ActiveRecord Relation having characterstic lazy loading. In simple terms the actual query is not performed till user asks for the object and it is in an uninstantiated way in the beginning saving us the precious memory.
Now to study the SQL impact of these same ways to get the data, I went ahead to use the Benchmark module that ruby provides. Benchmark module has the measure method which does all the time measuring stuff for us
These numbers displayed as an output are actually time taken by code to execute in seconds in four different categories viz. user / system / total / real. The actual meaning of these times are as follows :
- user : Total user CPU time
- system : System CPU time
- total : Sum of User & System CPU times
- real : Elapsed real time required to process this benchmarking activity
Now on to the analysis which should be inline with our earlier observation realted to memory usage.
Yes it is, We can see the query load lowering dramatically and eventually during the where query ActiveRecord did not made any query which clears our perspective about lazy loading in a practical way.
So in the end objspace
& Benchmark
modules are really a bliss to study the impact on memory and system loads.