Programming with Google App Engine

Google app engine uses Python. I am still learning Python at the same time programming with GAE! In this post I will be noting down steps to try my first program with Google App Engine.

Prerequisite

You must download the Google App Engine development kit located at at
http://code.google.com/appengine/downloads.html.
The SDK is available for Windows, Mac OS X, and Linux. It also has python 2.5

What is included in the kit?

The kit is pretty comprehensive and includes,

  • A web server application to simulate App engine environment
  • Local copy of Datastore
  • Local copy of Google accounts and ability to get URLs, send email from your computer using APIs

CLIs from SDK

Look for following 2 CLIs once you download the SDK,

  • dev_appserver.py The development web server
  • appcfg.py Used to upload your app to App Engine

The app I am going to create is known “Hello World” but I promise (to myself) that I will create better app next time to try out some advanced APIs. (may be app for Bulb and Tube ;)

So here it goes,

Create hello.py and put following code,

print ‘Content-Type: text/plain’ print ”
print ‘Hello World!’
Next, you need to have a configuration file called app.yaml.

Create a file in the directory called app.yaml and write it to read as follows:

application: san_hello_world version: 1.0 runtime: python api_version: 1 handlers: – url: /.* script: hello.py

Because the handler script and configuration file are mapping every URL to the handler, the application is done. Trust me! That’s it.

Now you can test the app with the web server included with the App Engine SDK.

Start the web server with the following command:

google_appengine/dev_appserver.py hello/ (where hello is a folder having above 2 files)

Go to browser and run http:/localhost:8080

You can continue to modify files and web server gets notification and updates what is displayed in the browser (ofcouse once your refresh the browser)

Once you develop and test your app, its time to register it with Google. After all main
intention of this app is to run on cloud and not local on my computer. :)

If this is first time for you, you need to authenticate yourself by providing cell phone number where Google can text you auth code.

The next step is to register the application ID for your application using,

https://appengine.google.com/start/createapp

Once the registration is completed, you access the application by going to

http://application-id.appspot.com.

You are almost done!

To upload your finished application to Google App Engine, run the following command:

appcfg.py update hello

Enter your Google user name and password at the prompts. Now you can see your application on App Engine and all you need to do is open up a web browser and enter

http://application-id.appspot.com.

Note: You can create 10 applications per google account

Love to hear your comments. I will explore some more APIs from GAE and also going to try my hands on Amazon Web Services.

Here is something important about Google App Engine. When you start learning new technology, it is good idea to go through its architecture and understand how it works.

Google App Engine

Zest Infotech - Architecture of Google App Engine

App engine is meant for applications that reacts quickly to requests. It is expected to respond within hundreds of milliseconds to respond to a web request. A request can be as simple as get chunk of data from data store or contact a remote server. Let’s walk through inside of Google App engine and see what exactly happens when an request comes,

  • As soon as you/user interacts with an app hosted on Google app engine, the request is sent from browser to app engine. The first stop is “front end”. It is a load balancer for distributing incoming requests efficiently across multiple nodes. It has mapping so figures out for which app the request is and then consults corresponding configuration file.
  • The configuration tells front end how to treat a request based on URL. (This is similar or close to what we call as DOCROOT)
  • If request does not belong to any entry in configuration 404 is returned
  • If it matches to a static file, control is transferred to staic file server. This server is dedicated for serving static files. It is optimized for fast delivery
  • If it matches a pattern mapped to one of the request handlers, the frontend sends request to app server. The server invokes the app by calling the request handler that corresponds with the URL path of the request, according to the app configurations
  • Finally It waits for the response

You can configure front end optionally to authenticate using Google account

Every app in Google app engine runs in a capsule called as “run time environment”. It is basically abstraction above the actual nodes. (Operating system and hardware).
This run time environment is also called as “sandbox”. Sandbox allows fenced environment to every app which can be executed as if it is exclusive access to underneath hardware and resources. Due to this sandbox, apps developed on google app engine do not have access to file system or network ports of hardware it is running.

Something every app developer should remmeber all the time! :)

so what an app can do if it can not access file system or network ports? There are some services provided by app engine to perform various tasks. For example to fetch URLs, app can make http requests to remote machines. This facility is provided by service which is used by Google’s great apps like gmail, picassa etc.

There is no streaming implemented so request handler prepares the response then returns it and then terminates. And then app engine sends that response to client. Once all data is sent, request terminates.

The controlling authority to control the front ends, app servers and static file server is called as “app master”. One of the task of app master is for deploying new versions of application software and configuration.

libcloud

libcloud is a standard client library for many popular cloud providers, written in python

libcloud is a pure python client library for interacting with many of the popular cloud server providers. It was created to make it easy for developers to build products that work between any of the services that it supports. libcloud was originally created by the folks over at Cloudkick, but has since grown into an independent free software project licensed under the Apache License (2.0).

The table shows list of cloud vendors supported by libcloud at various stages.

 

libcloud table

libcloud - Zest Infotech

Here is a sample program using libcloud to create a node

from libcloud.types import Provider
from libcloud.providers import get_driver

RACKSPACE_USER = 'your username'
RACKSPACE_KEY = 'your key'

Driver = get_driver(Provider.RACKSPACE)
conn = Driver(RACKSPACE_USER, RACKSPACE_KEY)

# retrieve available images and sizes
images = conn.list_images()
# [, ...]
sizes = conn.list_sizes()
# [, ...]

# create node with first image and first size
node = conn.create_node(name='test', image=images[0], size=size[0])
#

How does google app engine handles requests?

App engine is meant for applications that reacts quickly to requests. It is expected to respond within hundrends of milliseconds to respond to a web request. A request can be as simple as get chunk of data from datastore or contact a remote server. Let’s walkthrough inside Google App engine and see what exactly happens when an request comes,

- As soon as you/user interacts with an app hosted on Google app engine, the request is sent from browser to app engine. The first stop is “frontend”. It is a load balancer for distributing incoming requests efficiently across multiple nodes. It has mapping so figures out for which app the request is and then consults corresponding configuration file.

- The confiuration tells frontend how to treat a request based on URL. (This is similar or close to what we call as DOCROOT)

- If request does not belong to any entry in configuration 404 is returned

- If it matches to a static file, control is transfered to staic file server. This server is dedicated for serving staic files. It is optimized for fast delivery

- If it matches a pattern mapped to one of the request handlers, the frontend sends request to app server. The server invokes the app by
calling the request handler that corresponds with the URL path of the request, according to the app configuration
s
- Finally It waits for the resoibse,

You can configure frontend optionally to authenticate using google account

Every app in google app engine runs in a capsule called as “runtime environment”. It is basically abstraction above the actual nodes. (Operating system and hardware).
This runtime environment is also called as “sandbox”. Sandbox allows fenced environment to every app which can be executed as if it is exclusive access to underneath hardware and resources. Due to this sandbox, apps developed on google app engine do not have access to filesystem or network ports of hardware it is running.

Something every app developer should remmeber alll the time! :)

so what an app can do if it can not access filesystem or network ports? There are some services provided by app engine to perform various tasks. For example to fetch URls, app can make http requests to remote machines. This facility is provided by service which is used by Google’s great apps like gmail, picassa etc.

There is no streaming implementd so request handler prepares the response then returns it and then terminates. And then app engine sends that response to client. Once all data is sent, request terminates.

The controlling authority to control the fronends, app servers and static file server is called as “app master”. One of the task of app master is for deploying new versions of application software and configuration.

Ok enough of theory now. If you are really reading till this point, I promise I will put some real meat in the next post. After first Hello World program I haven’t tried any other program. I will try some advanced program and let you know how it goes.

Bye!

Request Handlers in Google App Engine

The app configuration tells the frontend what to do with each request, routing it to either the application servers or the static file servers. The destination is determined by the URL path of the request. For instance, an app might send all requests whose URL paths start with /images/ to the static file server, and all requests for the site’s home page (the path /) to the app servers. The configuration specifies a list of patterns that match URL paths, with instructions for each pattern.

For requests intended for the app servers, the configuration also specifies the request handler responsible for specific URL paths. A request handler is an entry point into the application code. In Python, a request handler is a script of Python code. In Java, a request handler is a servlet class. Each runtime environment has its own interface for invoking the application.

Request handlers in Python

All URL paths for Python apps are described in the app.yaml file using the handlers element. The value of this element is a sequence of mappings, where each item includes a pattern that matches a set of URL paths and instructions on how to handle requests for those paths. Here is an example with four URL patterns:

handlers:
- url: /profile/.*
script: userprofile.py
- url: /css
static_dir: css
- url: /info/(.*\.xml)
static_files: /datafiles/\1
- url: /.*
script: main.py

The url element in a handler description is a regular expression that matches URL paths. Every path begins with a forward slash (/), so a pattern can match the beginning of a path by also starting with this character. This URL pattern matches all paths:

url: /.*

If you are new to regular expressions, here is the briefest of tutorials: the . character matches any single character, and the * character says the previous symbol, in this case
any character, can occur zero or more times. There are several other characters with special status in regular expressions. All other characters, like /, match literally. So this
pattern matches any URL that begins with a / followed by zero or more of any character. If a special character is preceded by a backslash (\), it is treated as a literal character in
the pattern. Here is a pattern that matches the exact path /home.html: url: /home\.html

See the Python documentation for the re module for an excellent introduction to regular expressions. The actual regular expression engine used for URL patterns is not Python’s, but it’s similar.
App Engine attempts to match the URL path of a request to each handler pattern in the order the handlers appear in the configuration file. The first pattern that matches determines the handler to use. If you use the catchall pattern /.*, make sure it’s the last one in the list, since a later pattern will never match.

To map a URL path pattern to application code, you provide a script element. The value is the path to a Python source file, relative to the application root directory. If the frontend gets a request whose path matches a script handler, it routes the request to an application server to invoke the script and produce the response. In the previous example, the following handler definition routes all URL paths that begin with /profile/ to a script named userprofile.py:

- url: /profile/.*
script: userprofile.py

 

Things App Engine Doesn’t Do…Yet

An app can accept web requests on a custom domain using Google Apps. Google Apps maps a subdomain of your custom domain to an app, and this subdomain can be www if you choose. This does not yet support requests for “naked” domains, such as http://example.com/. It also does not support arbitrary tertiary domains on custom domains
(http://foo.www.example.com). App Engine does support arbitrary subdomains on appspot.com URLs, such as foo.app-id.appspot.com.

App Engine does not support streaming or long-term connections. If the client supports it, the app can use XMPP and an XMPP service (such as Google Talk) to deliver state updates to the client. You could also do this using a polling technique, where the client asks the application for updates on a regular basis, but polling is difficult to scale (5,000 simultaneous users polling every 5 seconds = 1,000 queries per second), and is not appropriate for all applications. Also note that request handlers cannot communicate with the client while performing other calculations. The server sends a response to the client’s request only after the handler has returned control to the server.

App Engine only supports web requests via HTTP or HTTPS, and email and XMPP messages via the services. It does not support other kinds of network connections. For instance, a client cannot connect to an App Engine application via FTP.
The App Engine datastore does not support full-text search queries, such as for implementing a search engine for a content management system. Long text values are not indexed, and short text values are only indexed for equality and inequality queries. It is possible to implement text search by building search indexes within the application,
but this is difficult to do in a scalable way for large amounts of dynamic data.

Things Google App Engine doesn’t do,

Before going and learning more about GAE, it is better to set expectations and know in advance what we will not be able to accomplish with GAE due to its limitations or design.

  1. App Engine can not support secure connection to private domains. It uses the URL to fetch service to make an https request but cannot verify the certificat
  2.  GAE can receive email and xmPP chat messages but not on custom domains
  3. GAE does not support naked domains “http://mydomain.com”, it does support however “www.mydomain.com”
  4. GAE does not support long term connections or streaming
  5. Also subdomains are not supported yet. foo.mydomain.com is not supported
  6. GAE can not accept long running background processes
  7. GAE only support http/https, email and xmpp

App Engine supports secure connections (HTTPS) to .appspot.com subdomains, but does not yet support secure connections to custom domains. Google Accounts signins always use secure connections.

An application can use the URL Fetch service to make an HTTPS request to another site, but App Engine does not verify the certificate used on the remote server.

An app can receive incoming email and XMPP chat messages at several addresses. As of this writing, none of these addresses can use a custom domain name.

 

Management Console of Google App Engine

Before learning details of components to be used during developing application for Google App Engine, It is important to learn dashboard to manage apps that we develop.
We will manage apps that we will develop (test as well as production) using Google App Engine Administration Console. It can be accessed from,

https://appengine.google.com/

Very first thing once you login using your Open ID is dashboard. (see in the figure)
It summarizes the current status of various applications including traffic, load resource usage and various errors.

We can view charts for the request rate, the amount of time spent on each request, error rates, bandwidth and CPU usage, and whether your application is hitting its resource limits.
As you start testing apps, you should see spike in the requests per second chart. (A simple test to get an idea of how administration console works)

From this console, we can examine how the app is using resources, browse the application’s request and message logs, and query the datastore and check the status of its indexes. So its just like a cpael or Plex for web hosting services.

We can even manage multiple versions of apps. Over time we enhance apps and bump up versions, we can test new version before making it “live”. The console is multi user just like Google Analytiics where one can invite other people with specific role as developer or admin etc allowing them to access the console, upload files etc.

Finally when our app becomes popular (isn’t that main idea? ;) and its ready to take on large amounts of traffic, we can establish a billing account, set a daily budget, and monitor expenses. Something I wish will happen to my apps sometime in future!

Security in Google App Engine

Secure connections in Python

To configure a URL handler in a Python application to accept secure connections, add a secure element to the handler’s properties in the app.yaml file:
handler:
- url: /profile/.*
script: userprofile.py
secure: always

The value of the secure element can be either always, never, or optional. If you don’t specify a secure element for a URL path, the default is optional. always says that requests to this URL path should always use a secure connection. If a
user attempts to request the URL path over a nonsecure connection, the App Engine frontend issues an HTTP redirect code telling it to try again using a secure HTTP connection. Browsers follow this redirect automatically.
never says that requests to this URL path should never use a secure connection, and requests for an HTTPS URL should be redirected to the HTTP equivalent. Note that browsers often display a warning when a user follows a link from a secure page to a nonsecure page.

optional allows either connection method for the URL path, without redirects. The app can use the HTTPS environment variable to determine which method was used for the request, and produce a custom response.

Authorization with Google Accounts
Back in Chapter 2, we discussed how an App Engine application can integrate with Google Accounts to identify and authenticate users. We saw how an app can use library calls to check whether the user making a request is signed in, access the user’s email address, and calculate the sign-in and sign-out URLs of the Google Accounts system.
With this API, application code can perform fine-grained access control and customize displays.
Another way to do access control is to leave it to the frontend. With just a little configuration, you can instruct the frontend to protect access to specific URL handlers such that only signed-in users can request them. If a user who is not signed in requests such a URL, the frontend redirects the user to the Google Accounts sign-in and registration

Authorization in Python
For a Python app, you establish frontend access control for a URL handler with the login element in app.yaml, like so:
handlers:
- url: /myaccount/.*
script: account.py
login: required
The login element has two possible values: required and admin.
If login is required, then the user must be signed in to access URLs for this handler. If
the user is not signed in, the frontend returns an HTTP redirect code to send the user
to the Google Accounts sign-in and registration form.
If login is admin, then the user must be signed in and must be a registered developer for
the application. If no login is provided, the default policy is to allow anyone to access the resource,
whether or not the client represents a signed-in user, and whether or not the app is setto use a members-only access policy. You can use the login element with both script handlers and static file handlers.

 

Three components of Google App Engine

  • The Runtime environment
  • The Datastore
  •  The services

The Runtime Environment

The Engine part of GAE is nothing but when any browser requests (through http) any webpage on the server (in this case google’s web server), whatever happens to serve the requested page back to the browser (client) is done by the engine. From an application’s point of view the runtime environment exists when request handler begins and disappears when it ends. Unlike your own webserver hosted on your standalone server (or any hosted service like godaddy), GAE has capability to distribute traffic among multiple servers. It does it to give justice to every request coming to it. The runtime environement completly abstracts computation on server, storage on filesystem from the application. Google provides 2 such rumtimes one in Python and another in JAVA.

The Datastore

Every app needs storage while doing any computation.Read/Write to a storage is unavoidable. In traditional websites (and shared or dedicated hosting) it would be relational database. GAE provides the datastore for same reason.The only difference is it is object database (repository) instead of relational database.App Engine datastore is an abstraction that allows App Engine to handle the details of distributing and scaling the application so that you (user) can focus on building contents and goole takes care of infrastructure.

The Services

The link between datastore and runtime is what constitutes “Services”. The application uses an API to access a separate system that manages all of its own scaling needs separately from the runtime environment. Google App Engine includes several other self-scaling services useful for web applications.
The datastore’s relationship with the runtime environment is that of a service:
The memory cache (or memcache) service is a short-term key-value storage service. Its
main advantage over the datastore is that it is fast, much faster than the datastore for
simple storage and retrieval. The memcache stores values in memory instead of on disk
for faster access. It is distributed like the datastore, so every request sees the same set
of keys and values. However, it is not persistent like the datastore: if a server goes down,
such as during a power failure, memory is erased. It also has a more limited sense of
atomicity and transactionality than the datastore. As the name implies, the memcache
service is best used as a cache for the results of frequently performed queries or calculations.
The application checks for a cached value, and if the value isn’t there, it performs
the query or calculation and stores the value in the cache for future use.

Google Accounts

Applications built on GAE gets access to some of thep populaer services from google such as gmail, Google docs, Google Calendar etc.
One can use google account as authentication mechanism for their website.