Setting up a new Subversion repository and Trac project

|

Here are the steps I use each time I need to create a new Subversion repository:

Create Repository

> sudo svnadmin create /var/svn/MY_PROJECT_NAME


Create TRAC project


> sudo trac-admin /var/www/trac/MY_PROJECT_NAME initenv


During install, you will be prompted for the repository path created above.


Set up Apache to serve TRAC project


> cd /etc/apache2/sites-enabled
> sudo nano 000-default
# add each repository to apache
# also, we need to let the apache user read the directory:
# (the basic authentication isn't passed on to the user)
> cd /var
> sudo chown -R www-data.svn svn
# (alternative: add www-data to svn group)
# set perms correctly on new trac directory
> cd /var/www
> sudo chown -R www-data.svn trac
# update Apache settings to enable trac authentication; as described here:
# http://trac-server-hostname/trac/MY_PROJECT_NAME/wiki/TracModPython
> cd /etc/apache2/sites-enabled
> sudo nano 000-default
> sudo /etc/init.d/apache2 restart


See the original article that inspired this post.

Using SQL Server to analyze IIS logs

|

Rather than using LogParser to analyze IIS logs, you can import your log files in an instance of SQL Server Express.

First, create a table that has the same columns as your log file. Depending on what data you were capturing in your log files, you will need a different structure.

For example, on my Windows XP IIS 5.1 machine, I had the following fields:

CREATE TABLE [dbo].[iis_logtable_1] (
[time] [datetime] NULL ,
[c-ip] [varchar] (50) NULL ,
[cs-method] [varchar] (50) NULL ,
[cs-uri-stem] [varchar] (255) NULL ,
[sc-status] [int] NULL
)





For an IIS7 server instance, I had these fields captured:




CREATE TABLE [dbo].[iis_logtable_2] (        
[date] [datetime] NULL,
[time] [datetime] NULL ,
[s-ip] [varchar] (50) NULL ,
[cs-method] [varchar] (50) NULL ,
[cs-uri-stem] [varchar] (255) NULL ,
[cs-uri-query] [varchar] (2048) NULL ,
[s-port] [varchar] (50) NULL ,
[cs-username] [varchar] (50) NULL ,
[c-ip] [varchar] (50) NULL ,
[cs(User-Agent)] [varchar] (2048) NULL ,
[sc-status] [int] NULL ,
[sc-substatus] [int] NULL ,
[sc-win32-status] [varchar] (255) NULL ,
[time-taken] [int] NULL
)







You may need to modify depending on which fields are being logged on your web server.



You could do a BULK INSERT now to pull all the data into SQL Server. However, your log file probably has comments in it, which IIS typically adds each time the server restarts… these lines start with hash (#) and cause BULK INSERT to choke. If you have enough of these in your log file, the insert will fail.



You can remove comments from your log file by using the PrepWebLog utility. Command will be something like this:




preplog.exe c:\temp\iislogs\mylogfile.log > c:\temp\iislogs\mylogfile_new.log





Finally, the newly un-commented file can be read by SQL Server.




BULK INSERT [dbo].[iis_logtable_2]
FROM 'C:\temp\iislogs\mylogfile_new.log'
WITH (FIELDTERMINATOR = ' ', ROWTERMINATOR = '\n')







For more info, see How To Use SQL Server to Analyze Web Logs.

Installing an Ubuntu 9.10 VMware image on ESXi host

|

Here’s my process for creating Ubuntu VMs on and ESXi 4.0 host.

I downloaded a pre-built VMware 9.10 image from thoughtpolice. I chose the amd64 version and downloaded it via bittorrent.

This image won’t run out of the box on ESXi 4.0 – it must first be converted. For this, I needed to download the VMware vCenter Converter Standalone.

Convert the .vmx file and deploy it to the ESXi host using VMware vCenter Converter Standalone:

  • Choose “Convert Machine” button.
  • Select source type: “VMware workstation or other VMware virtual machine”.
  • Browse to the .vmx file.
  • For the destination, choose “VMware Infrastructure virtual machine” and enter appropriate login credentials.
  • Type the VM name.

That copies the VMX to the ESXi instance. On my setup, it took about 15 minutes over my LAN.

Once that was all settled, I performed the following standard updates in the VMware console:

# start VM in vSphere – go to console
passwd
sudo aptitude update
sudo aptitude upgrade
sudo nano /etc/hostname # change hostname as needed
sudo /sbin/shutdown -h -P now
# set up router so VM’s MAC address is linked to a single IP
# start VM again in vSphere – go to console
sudo aptitude install openssh-server


 


Also, update:


sudo nano /etc/hosts


sudo dpkg-reconfigure tzdata


Furthermore, I created a user as follows:



# create user
sudo useradd -d /home/myusername -m myusername
sudo passwd myusername
# show user's shell
getent passwd myusername
# change user's shell
sudo chsh -s /bin/bash myusername



To ease file transfers, I installed samba and set up a samba share.

Creating a VMware 4.0 host server with ESXi

|

I procured a Power Edge T105 server with the following specs:

  • Dual Core 4450B Processor 2x512K Cache, 2.3GHz Athlon for PowerEdge T105
  • 8GB, DDR2, 800MHz, 4x2GB,Dual Ranked DIMMs
  • 160GB 7.2K RPM SATA 3Gbps 3.5-in Cabled Hard Drive-Entry
  • On board Network Adapter
  • 16X DVD-ROM,SATA, INTERNAL

 

Rather than installing VMware on the (smallish) hard drive, I grabbed a 4GB thumb drive lying around to host the ESXi hypervisor.

This video was a decent intro on install options for ESXi.

Steps to install:

  1. Burn ESXi 4.0 installer ISO.
  2. Followed these instructions to install to Flash drive. In my installer, the flash drive came up as “Disk0 JetFlash Transcend 4GB”.
  3. Restart, enter BIOS, select to boot from Flash drive.
  4. Hit F2 to set root password.
  5. Connected ethernet cable to server. Configured router to grant static IP address for hypervisor’s MAC address.
  6. Installed VMware vSphere client on my laptop.
  7. Followed these instructions to set up my ESXi license key.

 

All set. Next, I’ll create some virtual machines. Some project ideas:

  • Source control server
  • FreeNAS file server for photos and music
  • Web farm
  • Test for alternative web servers: NGINX, Squid, etc.

Scalable Internet Architectures: Static Content

|

In some of my previous posts, I discussed ways to instrument static content references so their domain is configurable. This small aspect of application-level design makes it possible to offload static content delivery to a separate infrastructure from the application pages.

Chapter 6 in Scalable Internet Architectures focuses on Static content, that aspect of web applications that is sometimes an afterthought to web developers.

Before reading this chapter, I though in two steps for web application design: A single web farms delivering all static + dynamic content. To scale beyond that, the next choice is using a [potentially expensive] CDN with dynamic capabilities such as Akamai Dynamic Site Accelerator. It turns out there are many interesting solutions in between these two.

The first lightbulb to go off in my head was this: Even though a web application may rely on ASP.NET and therefore Windows, you can scale its ability to serve static content using commodity hardware and free operating systems.

Secondly, an Apache-based solution is not necessarily required either. There are many other free – simpler – web servers that can be leveraged for serving static content, including thttpd and Squid.

Rewriting references to static CSS, JS, and image content, part 2

|

Following up on the previous post, here are a few other options to consider:

  1. Use a custom Expression Builder to create a include script and CSS references. This syntax <%$ %> would work in No-Compile pages.
  2. Use combres. I tried the demo and really liked it. It automatically renames scripts as they are updated, plus includes minification and compression. Unfortunately, I don’t think it will will work with No-Compile pages… need to investigate further.

Rewriting references to static CSS, JS, and image content

|

For static content such as CSS, JavaScript, and image files, it is best practice to serve these items from a separate domain from the main content. Even if this secondary domain points to the same web farm initially, it opens up the possibility of serving the content from other servers at a later date.

Other benefits:

  1. Static content domain can be cookie-free.
  2. Cache settings can be adjusted specifically for the static content, based on host headers.
  3. Dedicated hardware could be set up to serve the static content.

In dev and QA environments, of course, it may not be possible to point to an external production domain. The CSS/JS/etc files may need to point to local content during dev/QA. Consequently, any solution should take into account configuration differences across environments.

A couple solutions to consider:

  1. Instrument IMG and SCRIPT links with a method call that substitutes the domain.
    Drawback: <%= %> block in ASPX page means that it cannot be declared CompileMode="never", which is needed for CMS-generated pages.
    Will a <%$ %> block work?
  2. Create an HttpModule to comb through the page output and substitute the domain in the proper locations.
  3. Implement an .ascx control to output SCRIPT and CSS references.
    Drawback: Will not address the issue of inline IMG tags.
  4. Use a dynamic #include file with some logic embedded.
    Drawback: Same as #3. Also, CMS user could possibly mangle code directives in #include. Finally, it is questionable if this will even work reliably with ASPX.
More to come...

Blogger Backup

|

May be useful for backing up blogs:

http://www.codeplex.com/bloggerbackup

May also consider an import to a local Wordpress instance.

Review of some IResourceProvider implementations

|

Problem: an ASP.NET website is connected to a CMS which permits users to modify .resx files.

Modifying .resx resource files during runtime has the unfortunate consequence of causing the App Domain to reload. This can really kill your scalability since the HttpRuntime.Cache (among other things) is blown away when App Domain is reloaded.

I reviewed some options for custom resource providers. First, I reviewed this article:

http://www.west-wind.com/presentations/wwDbResourceProvider/

The implementation here uses a database back-end for storage of resources. With this solution, obviously there is no problem of ASP.NET monitoring App_GlobalResources and App_LocalResources files and invoking an App Domain reload.

This implementation uses on a private IDictionary member variable in each Resource Provider instance. This IDictionary’s keys correspond with all cultures supported in the resource data. The values in the IDictionary are themselves IDictionary object which provide all key-value pairs within each culture. These key-value pairs are the resource strings.

This example is easily adapted to read from .resx files instead of a database. (Note that the resource files would need to be placed in a directory that isn’t monitored by ASP.NET for changes.)

The issues I see with this implementation are as follows:

  1. Since the resource values are cached in member variables, they don’t change when the data source changes. The example code provides a ClearResourceCache() method to clear out the member variable. However, this method is not called automatically when specific resource are made.
    Depending on application requirements, this may not be problematic; it could be OK to wait for the next App Domain restart for the resources to update.
  2. There may be multi-threading issues with this code. Note this line of code:
    this._resourceCache[cultureName] = Resources;
    Is it worth it to place a lock around the code that populates this member variable?
    UPDATE -- locking has been added to the latest code download (not reflected in the article itself).

Another implementation that I reviewed:

http://www.onpreinit.com/2009/06/updatable-aspnet-resx-resource-provider.html

This implementation is more geared to my specific problem – making modifications to .resx files without causing App Domain restarts.

In this implementation, the resources are not stored in member variables (contrary to the above implementation). Instead, they are stored in HttpRuntime.Cache. This has the benefit that we can make use of CacheDependency to monitor the .resx files for changes. This makes the files updateable, with the changes immediately reflected in the resource provider.

Still, there are drawbacks:

  1. In this implementation, the cached object is not an IDictionary. Instead, it is an IResourceReader object, which comes from a ResxResourceReader which is leveraged to read the .resx files. Consequently, every call to GetObject() actually loops through the IResourceReader object. So in the worse-case, the code iterates through every resource key looking for the item. This might be a performance issue for large resource files.
  2. To get around this limitation, some additional code would need to be written to loop through the resource reader and store it in an IDictionary object.

Some considerations:

  • How often will resources be updated by users?
  • How quickly do resource updates need to be reflected in the web UI?

Scalable Internet Architectures, part 2

|

Chapter 3 covers Mission-Critical (aka Business Critical) environments.

The five “key aspects” of a Business Critical environment are:

  • High Availability
  • Monitoring
  • Handling release cycles
  • Controlling Complexity
  • Performance optimization

Monitoring

In the section on monitoring, the author mentions that SNMP (simple network management protocol) is the industry standard for monitoring of commercially available products.

If a product isn’t SNMP-capable, an organization may choose to export its monitoring info via another means, or author a new Management Information Base (MIB) to expose the info over SNMP.

While SNMP provides statistics for bottom-up monitoring, the other monitors to consider are Business metrics. For example, how many widgets are sold per second. This is ultimately what will matter to the customer, so it is important to track these as well.

Handling Release Cycles

A key point here:

The best solutions technically is not always the right solution for the business.

While it makes technical sense to have full Dev, QA, UAT, and Production environments, it may not be feasible for the business. For example, it may be worth the risk of downtime to skimp on a staging environment that doesn’t match production specs.

Controlling Complexity

For me, the key takeaway from this discussion were these points:

Independent architectural component added to a system complicate it linearly.
Dependent architecture components added complicate it exponentially.

Is a particular component adding dependencies? This is a key question to ask oneself when an architecture component is being considered, because, ultimately, the organization will need to support the resulting architecture.

Scalable Internet Architectures, part 1

|

A few interesting points in chapter 1:


  • The only "true" scalability is horizontal, meaning that system capacity is increased by adding more of the same hardware or software. (Scale out, not up)
  • Scaling "vertically" is just adding horsepower (storage, CPU, etc) to an existing machine.
  • One example where horizontal scaling is difficult is large ACID-compliant databases. The current practice to scale such a database is to place the service on a very powerful machine.
  • Services such as DNS, FTP, and static web content are horizontally scalable by nature.
  • Scaling down may be necessary at times: Consider a startup whose infrastructure needs to be scaled back to reduce costs.

Key attributes of an architect:


  • Seeing beyond business requirements and predicting future demands.
  • Experience
Elements to balance in architecture:
  • Cost
  • Complexity
  • Maintainability
  • Implementation latency


Chapter 2, Principles for Avoiding Failure...



The author suggests planning that a system will run on "yesterday's" commodity hardware. If your application needs the latest and greatest hardware, you are veering towards vertical scaling.



In Code Complete, McConnell describes the architect's key role in software construction as managing complexity. In this book, the author identifies "uncontrolled change" as the biggest enemy of software systems. He describes two opposing forces in software projects:


  • The business, who wants technical innovation "on-demand" (interestingly, he notes that this is indeed available in the hardware world, but not in the realm of application development, architecture, and adaptation of business rules).
  • The technology side, who want complete requirements for everything upfront.
Three "low-hanging fruits" for controlling change in projects:
  • Source control
  • Having a plan for each of the following (assuming configurations A and B):
    • Push A to B
    • Revert B to A
    • Bare-metal restore of A
    • Testing A to B push and B to A revert
  • Unit testing, preferably across the entire system


As in the conclusion of chapter 1, the ultimate asset is "Good Design" which is attained by an experienced architect who can predict the future :)

Partial tech reading list for 2010

|

Here are a few books I that I need to get through. Some of these I've partially read -- I plan to do a more focused reading in the upcoming months.

Microsoft .NET: Architecting Applications for the Enterprise
I've already taken a couple ideas from this book, specifically regarding UML.

Code Complete, 2nd edition
I've read a bit of this one already - great book that deserves its acclaim.

Pro ASP.NET MVC Framework
Work has not given me an opportunity to use MVC.NET yet -- looking forward to getting into this. Has quite a bit of TDD coverage as well.


Pro ASP.NET 3.5 in C# 2008
Hope to fill out my ASP.NET knowledge. The section on Asynchronous pages has helped a lot.

Head First Design Patterns
This has been out for a while - now I need to read it.

Head First Software Development
More of a project management focus

Building a Web 2.0 Portal with ASP.NET 3.5
I was intrigued by this book after reading the author's posts on codeproject.com.
The appendix on perfmon counters has been a good resource for me.

Scalable Internet Architectures
More of a PHP focus.

RESTful .NET
I may try a general WCF book first before launching into this one.

Information Architecture

ASP.NET AJAX in Action
I got this one before discovering [the amazing] jQuery - now I'm thinking I should have got a jQuery-focused book instead.

Mind Performance Hacks

Secrets of the JavaScript Ninja
I got a pre-release of this after reading about it on John Resig's blog.

Ultra Fast ASP.NET

Beginning XML with C# 2008 - From Novice to Professional

Some solutions for posting code in blogs

|

Geshi is discussed here:
http://stackoverflow.com/questions/113440/displaying-code-in-blog-posts

A few takes on the syntaxhighlighter project, which uses JavaScript to format the code:
http://alexgorbatchev.com/wiki/SyntaxHighlighter
http://ditrans.blogspot.com/2009/03/using-syntaxhighlighter-20-on-blogger.html
http://www.hanselman.com/blog/BestCodeSyntaxHighlighterForSnippetsInYourBlog.aspx

Prettify:
http://code.google.com/p/google-code-prettify/

Some ideas for hosting JavaScript:
http://soswitcher.blogspot.com/2009/05/blogger-host-javascript-file-for-free.html
http://www.corpseofattic.com/2008/10/how-to-host-javascript-for-blogger.html
http://adsense-day.blogspot.com/2009/06/alternative-javascript-hosting-for.html

New years 2010

|

Happy New Year!