Create a search…then speed it up by a factor of 50!

One great way to implement a basic search is to put all your data fields into one big array and then perform the search on this one field. Let’s say, you’ve got a Customer model. Open up customer.rb (in app/models/) and write this few lines of code:

class Blueprint < ActiveRecord::Base
 scope :search, ->(search){ where('keywords LIKE ?', "%#{search.downcase}%") }

 before_save :set_keywords

 protected
 def set_keywords
    self.keywords = [name, email, birthdate, etc.].map(&:downcase).join(' ')
  end
end

To pick only the searched items we’re creating a scope to which we’re passing the search argument. You can add a function, which will turn all separate fields into one big keywords array.

Now, open up your Controller (CustomersController.rb)

class CustomersController < ApplicationController
  @customers = Customer.search(params[:search])
end

If desired, you can add some validation for cases if no search parameter is provided.

This approach won’t yield you amazon like results, but can bring one pretty far.

Step up the game

If you want to make one little step further and do not want to bother with “heavy industry” tools like ElasticSearch, then you can utilize PostgreSQL’s awesome built in indexing features. This approach requires a bit more setup, than the first one, but once you follow those steps, it will appear straight forward.

The main goal is to write our custom SQL query. In the end what we want to achieve is something like this:

SELECT
  *
FROM
  customers
WHERE
  lower(first_name) LIKE 'bob%' OR
  lower(last_name)  LIKE 'bob%' OR
  lower(email)      LIKE 'bob@example.com'
ORDER BY
  email = 'bob@example.com' DESC,
  last_name ASC

Using rails syntax this would mean we need a query like this:

Customer.where("lower(first_name) LIKE :first_name OR " +
  "lower(last_name) LIKE :last_name OR " +
  "lower(email) = :email", {
    first_name: "bob%",
    last_name:  "bob%",
    email: "bob@example.com"
}).order("email = 'bob@example.com' desc, last_name asc")

To create this query and keep things clean I’m using an extra class (models/customer_search_term.rb):

class CustomerSearchTerm
  attr_reader :where_clause, :where_args, :order
  def initialize(search_term)
    search_term - search_term.downcase
    @where_clause = ""
    @where_args = {}
    if search_term =~ /@/
      build_for_email_search(search_term)
    else
      build_for_name_search(search_term)
    end
  end

#Following code goes there
end

Now we need this helper function, “build_for_name_search” (I’ll omit the first one for now).

def build_for_name_search(search_term)
	@where_clause << case_insensitive_search(:first_name)
	@where_args[:first_name] = starts_with(search_term)

	@where_clause << " OR #{case_insensitive_search(:last_name)}"
	@where_args[:last_name] = starts_with(search_term)

	@order = "last_name asc"
end

def starts_with(search_term)
	search_term + "%"
end

def case_insensitive_search(field_name)
	"lower(#{field_name}) like :#{field_name}"
end

This function does nothing else rather than building the string for our SQL query. You can verify it by examining the variables – @where_clause is a string, while @where_args is a hash.

Finally, let’s build our controller:

class CustomersController < ApplicationController
	def index
		if params[:keywords].present?
			@keywords = params[:keywords]
			customer_search_term = CustomerSearchTerm.new(@keywords)
			@customers = Customer.where(
				customer_search_term.where_clause,
				customer_search_term.where_args).
			order(customer_search_term.order)
		else
			@customers = []
		end
	end
end

We’re almost there. This search works, but it’s still a bit slow (see below for speed results).

Now we need to create custom indexes on those tables. We will stick to old up and down migration methods, since rails won’t understand our custom SQL migration if we’d stick to “change”.

class AddLowerIndexesToCustomers < ActiveRecord::Migration
  def up
  	execute %{
  		CREATE INDEX
  			customers_lower_last_name
  		ON
  			customers (lower(last_name) varchar_pattern_ops)
  	}

  	execute %{
  		CREATE INDEX
  			customers_lower_first_name
  		ON
  			customers (lower(first_name) varchar_pattern_ops)
  	}

  	execute %{
  		CREATE INDEX
  			customers_lower_email
  		ON
  			customers (lower(email))
  	}
  end

  def down
  	remove_index :customers, name: 'customers_lower_last_name'
  	remove_index :customers, name: 'customers_lower_first_name'
  	remove_index :customers, name: 'customers_lower_email'
  end
end

It’s important to use “varchar_pattern_ops”, since we are not using an exact match, but instead using a like operator. Now, if we fire up rails dbconsole and perform EXPLAIN ANALYZE on both methods, we can see the difference.Ama

This was the original result:

Started GET "/customers?utf8=%E2%9C%93&keywords=andres148127%40dicki.biz&commit=Find+Customers" for ::1 at 2017-01-04 22:19:21 +0300
Processing by CustomersController#index as HTML
  Parameters: {"utf8"=>"✓", "keywords"=>"andres148127@dicki.biz", "commit"=>"Find Customers"}
  User Load (0.3ms)  SELECT  "users".* FROM "users" WHERE "users"."id" = $1  ORDER BY "users"."id" ASC LIMIT 1  [["id", 1]]
  Customer Load (561.3ms)  SELECT "customers".* FROM "customers" WHERE (lower(first_name) like 'andres%' OR lower(last_name) like 'andres%' OR lower(email) like 'andres148127@dicki.biz')  ORDER BY lower(email) = 'andres148127@dicki.biz' desc, last_name asc
  Rendered customers/index.html.haml within layouts/application (572.4ms)
Completed 200 OK in 591ms (Views: 27.8ms | ActiveRecord: 561.6ms)

After spicing things up:

Started GET "/customers?utf8=%E2%9C%93&keywords=andres148127%40dicki.biz&commit=Find+Customers" for ::1 at 2017-01-04 22:40:59 +0300
Processing by CustomersController#index as HTML
  Parameters: {"utf8"=>"✓", "keywords"=>"andres148127@dicki.biz", "commit"=>"Find Customers"}
  User Load (0.2ms)  SELECT  "users".* FROM "users" WHERE "users"."id" = $1  ORDER BY "users"."id" ASC LIMIT 1  [["id", 1]]
  Customer Load (2.1ms)  SELECT "customers".* FROM "customers" WHERE (lower(first_name) like 'andres%' OR lower(last_name) like 'andres%' OR lower(email) like 'andres148127@dicki.biz')  ORDER BY lower(email) = 'andres148127@dicki.biz' desc, last_name asc
  Rendered customers/index.html.haml within layouts/application (12.7ms)
Completed 200 OK in 32ms (Views: 28.3ms | ActiveRecord: 2.3ms)

This is an increase by a factor of 243!

Connect to remote postgreSQL database using Ruby

Ever wrote a script, meaning you needed to connect to a remote PG database using PORO (plain old ruby object)?

This can be quite tricky as I found out. Here is a how to.

Let’s say you are parsing some page on the interwebs.

require 'nokogiri'
require 'open-uri'
require 'pg'

@doc = Nokogiri::HTML(open("http://www.some-url.com/products"))

attributes = @doc.css("td[class=some class] a @href")

attrArray = attributes.map { |attr| attr.value}


@newDocArray = Array.new
@nameArray = Array.new

attrArray.each_with_index do |link, index|
 newDoc = Nokogiri::HTML(open("http://www.some-url.com/products/#{link}"))
 
 nameDoc = newDoc.css("div h1")
 newDoc = newDoc.xpath("//div[@class='cardpage']")
 
 newDoc.search("form").remove
 newDoc.search("a").remove

 @nameArray << nameDoc.map { |attr| attr.inner_text}[0]
 @newDocArray << newDoc.map { |attr| attr.inner_text}[0]

end

Now you need to prepare you postgres config on the remote machine to allow connections from your current local machine (per default postgres allows only localhost connections, meaning from remote machine to itself).

There is an amazing tutorial on this page – http://www.cyberciti.biz/faq/postgresql-remote-access-or-connection/

I will sum it up:

Edit $ nano /var/lib/pgsql/data/pg_hba.conf It might be in some other different folder. If that is the case, just search for “pg_hba.conf” – find / -name pg_hba.conf

Now search for your own ip and paste it in pg_nba.conf

host all all 192.168.1.0 255.255.255.0 trust

(Replace 192.168.1.0 with your obtained ip. I left DNS mask unchanged.

Now find and edit $ nano /var/lib/pgsql/data/postgresql.conf

Changing this value – tcpip_socket = true . Now comes the important part – if you don’t find any tcpip_socket value, then DON’T add it! Instead search for

# – Connection Settings –

listen_addresses = ‘*’
port = 5432

And change listen_addresses = ‘localhost’  to listen_addresses = ‘*’ .

That’s it! Restart # /etc/init.d/postgresql restart postgreSQL and try connecting $ psql -h 82.102.42.56 -U username -d database where:

-h 82.102.42.56 is the remote server ip where PG db is running
-U username is the username of the PG user
-d database is the database name

Now we’re all ready to rock and roll!

@conn = PGconn.connect("82.102.42.56", 5432, '', '', "my_database", "my_user", "my_topsecret_password")

def prepareInsertProduct
    	@conn.prepare("insert_product", "insert into products (name, description,) values ($1, $2)")
	end

def addProduct(name, description)
    @conn.exec_prepared("insert_product", [name, description])
end

def connect(arrayContent, arrayNames)

	prepareInsertProduct

	length = arrayContent.length

	arrayContent.each_with_index do |contentValue, index|		
		name = arrayNames[index]
		description = contentValue

		addProduct(name, description)

		print "Processing entry - #{index}"
		print "\r"
	end

end

connect(@newDocArray, @nameArray)

 

 

Properly create and restore your development postgreSQL on production

So maybe you are like me, who is parsing a lot of static data into a development database, make an app around this data and then you don’t want to have an empty database on production, but instead you want to have a full copy of your development database.

After spending a few hours trying to make it work, there is one solution, that yielded exactly 0 errors (yay!)

On your local machine (in the shell, wherever you currently are) :

pg_dump --no-owner your_db_develoment > backup.bak

Then put it in some safe place on your remote machine:

scp backup.bak deploy@192.168.111.222:/home/restore/backup.bak

Make a backup of your production database for safety sake (on your production server, of course)

pg_dump --no-owner your_db_production > /home/backups/04-03-2016/backup.bak

If there is no production database yet on the server, proceed with creating it, elsewise you might need (if the app has users) to cut the current sessions, so check these out:

http://stackoverflow.com/questions/12924466/capistrano-with-postgresql-error-database-is-being-accessed-by-other-users

or

http://stackoverflow.com/questions/1237725/copying-postgresql-database-to-another-server?rq=1

Ok, now we have no database on production machine at all. Let’s create new production database:

postgres=# CREATE DATABASE database_production;
CREATE DATABASE

postgres=# CREATE USER production_db_user WITH password 'qwerty';
CREATE ROLE

postgres=# GRANT ALL privileges ON DATABASE database_production TO production_db_user;
GRANT

Make sure to populate the password with the same password, as set up in database.yml for production user.

That’s it! No need to change the owner. Now let’s restore the backup file:

sudo psql -d mydb_production -U production_user < /home/deploy/restore/backup.bak;

Password for user production_user: 
SET
SET
SET
SET
SET
SET
CREATE EXTENSION
ERROR:  must be owner of extension plpgsql
SET
SET
SET
CREATE TABLE
CREATE SEQUENCE
ALTER SEQUENCE
CREATE TABLE
CREATE SEQUENCE
ALTER SEQUENCE
CREATE TABLE
CREATE SEQUENCE
ALTER SEQUENCE
CREATE TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
COPY 14964
 setval 
--------
 114041
(1 row)

COPY 3151
 setval 
--------
   3236
(1 row)

COPY 28011
 setval 
--------
   7081
(1 row)

COPY 8
ALTER TABLE
ALTER TABLE
ALTER TABLE
CREATE INDEX
CREATE INDEX
CREATE INDEX
ALTER TABLE
ALTER TABLE

As of now I have no issues regarding the one error above.