XNSIO
  About   Slides   Home  

 
Managed Chaos
Naresh Jain's Random Thoughts on Software Development and Adventure Sports
     
`
 
RSS Feed
Recent Thoughts
Tags
Recent Comments

Pharma Hack: Spammy Links visible to only Search Engine Bots in WordPress, CMS Made Simple and TikiWiki

Sunday, April 10th, 2011

Over the last 6 months, I’ve been blessed with various pharma hacks on almost all my site.

(http://agilefaqs.com, http://agileindia.org, http://sdtconf.com, http://freesetglobal.com, http://agilecoachcamp.org, to name a few.)

This is one of the most clever hacks I’ve seen. As a normal user, if you visit the site, you won’t see any difference. Except when search engine bots visit the page, the page shows up with a whole bunch of spammy links, either at the top of the page or in the footer. Sample below:

Clearly the hacker is after search engine ranking via backlinks. But in the process suddenly you’ve become a major pharma pimp.

There are many interesting things about this hack:

  • 1. It affects all php sites. WordPress tops the list. Others like CMS Made Simple and TikiWiki are also attacked by this hack.
  • 2. If you search for pharma keywords on your server (both files and database) you won’t find anything. The spammy content is first encoded with MIME base64 and then deflated using gzdeflate. And at run time the content is eval’ed in PHP.

This is how the hacked PHP code looks like:

If you inflate and decode this code it looks like:

  • 3. Well documented and mostly self descriptive code.
  • 4. Different PHP frameworks have been hacked using slightly different approach:
    • In WordPress, the hackers created a new file called wp-login.php inside the wp-includes folder containing some spammy code. They then modified the wp-config.php file to include(‘wp-includes/wp-login.php’). Inside the wp-login.php code they further include actually spammy links from a folder inside wp-content/themes/mytheme/images/out/’.$dir’
    • In TikiWiki, the hackers modified the /lib/structures/structlib.php to directly include the spammy code
    • In CMS Made Simple, the hackers created a new file called modules/mod-last_visitor.php to directly include the spammy code.
      Again the interesting part here is, when you do ls -al you see: 

      -rwxr-xr-x 1 username groupname 1551 2008-07-10 06:46 mod-last_tracker_items.php

      -rwxr-xr-x 1 username groupname 44357 1969-12-31 16:00 mod-last_visitor.php

      -rwxr-xr-x 1 username groupname 668 2008-03-30 13:06 mod-last_visitors.php

      In case of WordPress the newly created file had the same time stamp as the rest of the files in that folder

How do you find out if your site is hacked?

  • 1. After searching for your site in Google, check if the Cached version of your site contains anything unexpected.

  • 2. Using User Agent Switcher, a Firefox extension, you can view your site as it appears to Search Engine bot. Again look for anything suspicious.

  • 3. Set up Google Alerts on your site to get notification when something you don’t expect to show up on your site, shows up.

  • 4. Set up a cron job on your server to run the following commands at the top-level web directory every night and email you the results:
    • mysqldump your_db into a file and run
    • find . | xargs grep “eval(gzinflate(base64_decode(“

If the grep command finds a match, take the encoded content and check what it means using the following site: http://www.tareeinternet.com/scripts/decrypt.php

If it looks suspicious, clean up the file and all its references.

Also there are many other blogs explaining similar, but different attacks:

Hope you don’t have to deal with this mess.

Basics of making Webpages Search Engine Friendly

Tuesday, January 25th, 2011

I’m just learning the basics of how to make webpages easily searchable. Search Engine Optimization (SEO) is a vast topics, in this blog, I won’t even touch the surface.

Following are some simple things I learned today that are considered to be some basic, website hygiene stuff:

  • Titles: The title of a web page appears as a clickable link in search results and bookmarks. A descriptive, compelling page title with relevant keywords can increase the number of people visiting your site. Search engines view the text of the title tag as a strong indication of what the page is about. Accurate keywords in the title tag can help the page rank better in search results. A title tag should have fewer than 70 characters, including spaces. Major search engines won’t display more than that.
  • Description Meta-tags: The description meta-tag should tell searchers what a web page is about. It is often displayed below the title in search results, and helps people decide if they want to visit that website. Search engines will read 200 to 250 characters, but usually display only 150, including spaces. The first 150 characters of the meta description should contain the most important keywords for that web page.
  • H1 Heading: The H1 heading is an important sentence or phrase on a web page that quickly and clearly tells people and search engines what they can expect to find there. The H1 heading for a page should be different from its title. Each can target different important keywords for better SEO.
  • Outbound Links: Outbound links tell search engines which websites you find valuable and relevant. Including links to relevant sites is good for your website’s standing with search engines. Outbound links also help search engines classify your site in relationship to others.
  • Inbound Links: More number of website linking to your site is always better. Most search engines look at the reputation of the sites linking to your site. They also consider the anchor text (keywords) used to link to your site.
    • Self Links: Link back to your archives frequently when creating new content. Make sure your webpages are all well connected with proper anchor text (keywords) used to link back.
  • Create a sitemap: A site map (or sitemap) is a list of pages of you web site accessible to crawlers or users. The fewer clicks necessary to get to a page on your website, the better.
  • Pretty URLs: Easy to understand URLs, esp. the ones that contain the correct keywords are more search engine friendly compared to cryptic URLs with many request parameters. Favor mysite.com/ablum/track/page over mysite.com/process?albumname=album&trackname=track&page=name
  • Avoid non-Linkable Content: Some things might look pretty, but it might not good from SEO point of view. For example some flash based content or some javascript based content to which you can’t link.
  • Image descriptions: AKA alt text – is the best way to describe images to search engines and to visitors using screen readers. Describing images on a web page with alt text can help the page rank higher in search results if you include important and relevant keywords.
  • Keywords Meta-tag: Search engines don’t use the keyword meta-tag to determine what the page is about. Search engines detect keywords by looking at how often each word or phrase occurs on the page, and where it occurs. The words that appear most often and prominently are judged to be keywords. If the meta keywords and detected keywords match, that means the desired keywords appear frequently enough, and in the right places.
  • First 250 words: The first 250 words of on a web page are the most important. They tell people and search engines what the page is about. The two to three most important keywords for any web page should appear about five times each in the first 250 words of web page copy. They should appear two to three times each for every additional 250 words on the page.
  • Robots.txt file: A website’s robots.txt file is used to let search engines know which pages or sections of the site shouldn’t be indexed.
  • Canonical URL: A canonical URL is the standard URL for a web page. Because there are many ways a URL can be written, it’s possible for the same web page content to live at several different addresses, or URLs. This becomes a problem when you’re trying to enhance the visibility of a web page in search results. One factor that makes a web page rank higher in search results is the number and quality of other websites that link to it. If a web page is useful enough that lots of people create links to it, you don’t want to dilute the value of those links by having them spread across two or more URLs. Use a 301 redirect on any other version of that web page to get people – and search engines – to the standard version. Some common mistake people do:
    • Leave both www.mysite.com and mysite.com in place.
    • Leave default documents directly accessible. (mysite.com/ and mysite.com/index.html) More details: Twin Home Pages: Classic SEO Mistake
  • Web Presence: Having as much information and links about your website on the web as possible is key. Let it me other people’s website, news sharing and community sites, various social media sites or any other site which many people refer to. Alexa and Compete are two companies which give you a pretty good analysis of your web presence.
  • Fresh Content:  The best sites for users, and consequently for search engines, are full of often-updated, useful information about a given service, product, topic or discipline. Social media distribution via Blogs, Microblog (Twitter), Discussion forums, User Comments, etc. are great in this regard.

Big thanks to AboutUs.org for helping me understand these basic concepts.

    Licensed under
Creative Commons License