We are going to deploy the latest version of the project on Heroku first and then we’ll see what we need to change if we want to deploy this version on local machine or a server.

Heroku deployment:

This link from Heroku dev center “Getting Started with Rails 3.x on Heroku” is covering the basics needed to connect to Heroku and deploy, which is a very simple process and it relies on git, and the three main commands is:

Creating the application on the Heroku to deploy on, you run this command from the project directory

 heroku create pageonex

And then push the project to the Heroku server via git

git push heroku master

Last command is to run the migrations, and that’s it

heroku run rake db:migrate

There is an important note, which is Heroku is using PostgreSQL for the production, so you will have to install PostgreSQL on you’r machine first, and the “pg” and run the bundle command before pushing the code

Installing PostgreSQL isn’t easy, and configuring it is much complex than MySQL for example, so my suggestion if you are not interested to use PostgreSQL (which you can do) so at least install it, so you’ll be able to install the pg gem and bundle the gems

The limitations of Heroku deployment, and how we are dealing with it:

  • Disk storage limitation which was causing this images low resolution problem, and this is happening because Heroku remove any images after few hours of storing them, and if the thread contains a huge number of images, so it will fail.The solution for this problem was that we’ve decided not to store the images on the disk, and we will use the direct links from Kiosko to display the images, so because we are trying to fetch many images from the same domain, so Kiosko server send the 300px images instead of the 740px to reduce the bandwidth, but we’ve added the original link of the image beside each image, so if you want to see the full image, you can copy and past the link in a new tab, and we didn’t use a direct link, because it’ll cause the same problem because the request comes from the same domain
  • The processing power is very limited (because it’s a free version for sure) so we are not be able to use any image processing libraries in this version like “RMagick”The solution is to comment this library, and use other ways to get the images coordinates (this was the use of RMagick in the kiosko scraper) and not to use the elpais scraper, because it use RMagick to convert the pdf scraped file into images
We are going no to see how to remove the limitation changes in the Heroku deployed version to back to the original code, because we have commented the parts which can’t fit for Heroku, instead of creating new branch for this version, so we’ll list all the files which have to be change, and which parts will have to change exactly in this files
The files that we’ll need to change, and the lines inside each one related to this, to add more highlighted areas:
  1.  app/assets/javascript/coding.js
    line 116-117:  change this with a loop over all highlighted areas
    line 161: change this method to loop over all highlighted areas
    line 229: change this method to  loop over all highlighted areas instead of checking with if statement for each one
    line 338: change this as the line 229
    line 297: change this to loop over all highlighted areas
  2. app/assets/javascript/display.js
    line 43: change this method to loop over all the highlighted areas
  3. app/views/coding/display.html.erb
    link 150-152: replace this two line with any number of highlighted areas
  4. app/views/coding/process_images.html.erb
    link 96-98: replace this two line with any number of highlighted areas
  5. app/controllers/threads_controller.rb
    line 133-136, 278-285: change this with any number of highlighted areas
The files that we’ll need to change and the lines inside each one related to this to, to switch to the older version, where we download the images and store them:
  1. lib/scraper.rb
    line 4: un-comment the RMagick library
    line 29-35: un-comment this part were we open the images links and path their content to saving method
    line 37-39: delete this part
    line 118-126: un-comment this part which save the downloaded image to the disk
    line 130-136: delete this part
  2. app/controller/threads_controller.rb
    line 95-96: un-comment this part, to get the images size
    line 98-100: delete this part
    line 105-106: un-comment this part
    line 234-235, 243-244: un-comment this part
  3. app/views/threads/index.html.erb
    line 21: un-comment this to get the images from the local storage
    line 23-25: delete this part
  4. app/views/threads/new.html.erb
    line 7-10: delete this part 
  5. app/views/coding/display.html.erb
    line 135: un-comment this line
    line 137-148: delete this part
  6. app/views/coding/process_images.html.erb
    line 60, 74: un-comment this line 
    line 62-68, 76-83: delete this part
  7. app/assets/javascript/coding.js
    line 216-219: delete this part
  8. app/assets/javascript/display.js
    line 141-144: delete this part

 

Notes on the pending feature, and how they can be implemented

  • Exporting the display result as an image
    The gem we’ll need is IMGKit, and in the coding controller we’ll use this gem to convert the rendered view into image
  • Create user profile pages
    We’ll override the user controller of Devise, and add a view for the user profile pages
  • Implementing tags
    To implement the full featured tags we’ll need to use ActsAsTaggableOn

I’ll give in this post a technical overview “Coding View”, and how it works and why I’m using specific library, plugin, or even techniques.

Let’s start with the basic structure of the coding view it self;

Coding view divided mainly into two parts:
1 – The left side part which contains the list of codes and their codes, in a colored box (users decide the color of each box in initiate step), and then Newspaper info (name, publication date, image source)

2 – The right side (or the middle part, because the right side is part of the layout) which contains the images slider “Carousel”, and we have faced two options for displaying images in Coding view:

1 – The first one is to display image by image and submit each image highlighted areas values by itself, and the problem with this is the following; first even if the user coding 10 images it take time and the user will even take time to skip images and back to them later, so the main problem was the navigating between images, but this option was much simpler to handel on the front side and even on the server side, becuse we will be dealing with only one image at a time, but for scalability purpose it will be bad, and we will need to refactor a big part of the code for larg set of images

2 – The second one, which we actually using now is using a bootstrap jQuery slider plugin http://twitter.github.com/bootstrap/javascript.html#carousel, to display a large set of images with a very easy navigation way, so users can slide to any image to code first and the back again to the uncoded images, the problem with this option is it impose more complexity on how to store the highlighted area in the browser and how to submit this values to the server

Before explaining how we store highlighted areas, we should know how we generate them, which is done using imgAreaSelect jQuery plugin http://odyniec.net/projects/imgareaselect/ which is simple and easy to use.

I’ll explain now how we store highlighted areas in the browser: we are using hidden fields to store the values of highlighted areas, line 3 show hidden field with an id for instance “image3_ha1” with default value “0”, and this field is used to tell us if highlighted area number “1” is used with image or not, then line 4 which store the code id which this highlighted area is represent, by setting the number we can decide the color of the highlighted area (I’ll explain this part after this), then line 5 which stores the x1 value and so on for the following fields (for now we are using x1,y1, width, and height to draw the highlighted area only) and the same for the fields starting from line 11, the differenc is that it represent the other highlighted area

We are using just two highlighted areas to code, but we are going to make it unlimited in the next version

  1. <% @image_counter.downto(1) do |ic| %>
  2.  <div id="image<%= ic %>">
  3.   <%= hidden_field_tag "image#{ic}_ha1","0" %>
  4.   <%= hidden_field_tag "image#{ic}_ha1_code_id","0" %>
  5.   <%= hidden_field_tag "image#{ic}_ha1_x1" %>
  6.   <%= hidden_field_tag "image#{ic}_ha1_y1" %>
  7.   <%= hidden_field_tag "image#{ic}_ha1_x2" %>
  8.   <%= hidden_field_tag "image#{ic}_ha1_y2" %>
  9.   <%= hidden_field_tag "image#{ic}_ha1_width" %>
  10.   <%= hidden_field_tag "image#{ic}_ha1_height" %>
  11.   <%= hidden_field_tag "image#{ic}_ha2","0" %>
  12.   <%= hidden_field_tag "image#{ic}_ha2_code_id","0" %>
  13.   <%= hidden_field_tag "image#{ic}_ha2_x1" %>
  14.   <%= hidden_field_tag "image#{ic}_ha2_y1" %>
  15.   <%= hidden_field_tag "image#{ic}_ha2_x2" %>
  16.   <%= hidden_field_tag "image#{ic}_ha2_y2" %>
  17.   <%= hidden_field_tag "image#{ic}_ha2_width" %>
  18.   <%= hidden_field_tag "image#{ic}_ha2_height" %>
  19.  </div>
  20. <%end%>

We are using bootstrp jQuery modals plugin http://twitter.github.com/bootstrap/javascript.html#modals to allow users to select the code of a highlighted area the following snippet shows how codes colors attached to the options, and it’s important to point this part because; loading codes colors fetched from this elements, in line 3, we have added an attribute to the radio button element called “color” and sets it value with the code color, and also code_id element to store the code id, this two attributes is very important, becuase we are using them to set the highlighted areas colors

  1. <% @thread.codes.each do |code| %>
  2.   <%= radio_button_tag "codes", code.code_text, false, color: code.color, code_id: code.id %> <%= code.code_text %><br>
  3. <% end %>

Last part which is submiting buttons at center part, “Display Now” button which will direct the user to display view, “Clear Highlighted Areas” which will reset all hidden fields values and highlighted areas, “Nothing to Code” which will add a box of the images saying “Nothing to code here” (will implemented soon), last button “Cancel” which will delete the thread (will implemented soon)

We are now is so close to the first Version 0.1, which will basically give the user the following features; to be able to create an account and creating a Thread, with basic info; name, start date, end date (in the same month, just for alpha version), description, choosing and number of newspaper, and Topics  to code with it and for each topic the user can add a color and description.

Then user start to code scraped images (opened issues) with selected topics, the color of  highlighted area will be based on the topic color, maximum number highlighted areas for alpha version is two, and if the user want to add any other highlighted areas, the system will prompt the user with the option to clear the current highlighted areas or skip adding other highlighted area, after that the display, it’s not finished yet.

Here’s a UI – first drafts and Some of opened issues

Rporres was visiting Cambridge last week, and after listening to one of the online meetings, he decided to jump in the project. By night he had finished the script for grabbing all the newspapers that are available in Kiosko.net. We will need this list soon

You can check the code at https://gist.github.com/2970558 or the output (a csv file with the name, friendly url name, coauntry and country code of all the 377 newspapers) at http://brownbag.me:9001/p/pageonex-kiosko-newspaper-names. It’s written in Perl.

Ahmd had started another similar one in Ruby, but we put it on hold for the short term.

Thanks Rafa for your help!

Continue reading

I’ve made some changes to the script to scrape from different sources (http://kiosko.nethttp:/nytimes.comhttp://elpais.com) and other sources can be added easily, for each source there are two methods, build_source_issues and save_source_issues, the first method is to construct the URI of the issue image based on some pattern which different from source to another, and the other method is to scrape the images and save them on the disk in their specific folders. I’ve wrote some comments to clear some parts of the code.

Note, to scrape from specific source you should comment the others as you can see in the code, for example to scrape from New york Times you should un-comment line 15 and line 32 and comment line 14 and line 31, and also if you want to run the script on https://scraperwiki.com/ you should comment line 3 and don’t try to scrape from elpais because scraperwiki don’t have “RMagick” gem installed.

Information about sources used in the script

1. http://kiosko.net 

Date limits: there are no specific starting date, scraping starting from 2008, but most of the newspapers exist starting from 2011, the script is able to scrape from [2008-2012 ]

Image resolution: [750×1072]

2. http:/nytimes.com

Date limits: first issue available date is 2002/01/24

Image resolution: [348×640] the resolution is not enough for coding!

3. http://elpais.com

Date limits: first issue available date is 2012/03/01

Image resolution: [765×1133] the resolution of produced images can be changed!

Script on Github

https://gist.github.com/2925910/

Ahmd has been working on a scrapper in Ruby for the front Pages at Kiosko.net

I’ve finished the scraping script, and it’s public on https://gist.github.com/2925910 to run the script just pass the file to ruby [ruby scraper.rb] and it will generate the folders (the directories is set for Linux, if you are on Windows you should modify them first), download the images(you can change the variable values in the get_issues method to get different newspapers), and write the log to stdout.

Check the script below.

I’ve also contacted Newseum to see if their “only today” front page data base is avaible for PageOneX.

Script to scrape front pages images of newspapers form kiosko.net

Continue reading