Archive for the ‘development’ Category

NoSQL and Web applications

Wednesday, March 24th, 2010

If I’m asked to draw an “easy to understand” diagram about the next-generation architecture for Web-Applications, one of my sketches would look like this:

SQL And NoSQL

SQL And NoSQL

An obvious question about this picture is:

Why are two different Database-Systems necessary?

My artless answer is as follows:

In some crucial cases a NoSQL Database is considerably less expensive than a SQL Database, but a NoSQL Database cannot completely replace a SQL Database.

For example:

  • Try to store millions of Media Files or other large Documents on a distributed RDBMS. Each document has to be quickly accessible and updatable.
  • Try to create a very large dynamic key/value table as a replacement for an overgrown in-memory Hashmap.

Of course you can achieve this also with a relational database, but you will be faced with following problems:

  • RDBMS BLOBS are slow. You can compensate this problem with database clustering but the typical workaround is to store a link or file-path and use the file system as data storage. In this case you always have to check for inconsistencies.
  • You can create a key/value store in your SQL Database. But each access creates a heavy processing overhead. (Transaction handling, logging and versioning.)

NoSQL-Databases are optimized and designed to:

  • Store and access large “binary” objects very fast.
  • Accomplish fast data replication and use table separation (sharding) between different storages.
  • They allow very fast key-value stores.

(Easy schema modification is often stated as NoSQL advantage, but I think this is also a trivial task with DDL.

– Don’t forget to set the initial value of the new column and enable the trigger after the table alteration. )

In contrast there are tasks you should use a SQL – Database:

  • Data Presentation: In most cases visual entities (for example table lines and columns), aren’t 1:1 presentations from Database objects. They are sorted, mapped, reduced (filtered) and often consist of joins between different tables. Map-Reduce and sorting can be done efficiently with NoSQL. But joins aren’t supported. Of course you can rewrite the JOIN operation in your application. For example a simplified HashMap JOIN could look like this:
    var tableBMap : HashMap[idxType, Type of tableB] = tableB.foreach( line => tableBMap += (line.joinAttributB -> line))
    tableA.map( line => (line, tableBMap.get(line.joinAttributA)))

    The Disadvantage is, that you have to materialize the complete object / table-line.

  • NoSQL = No Transaction. You cannot use NoSQL for reliable database transactions (reliable like in ACID). This is a serious restriction for massive parallel database updates. (Hence, you should avoid programming the popular Bank Account example with a NoSQL Database.)

Why I prefer MongoDB as NoSQL Database System:

  • Very Fast: Sequential write and random read operations are done very fast on an “average” Server. (You need a 64 Bit OS if your database is larger than 2GB)
  • Scala(Java) Support: Several drivers are available scamongo, mongo-scala-driver or akka-persistence
  • Easy to use: There is no setup, parameter or table type “magic”. In less than an hour you can setup a secure and robust Database Server. The client uses JS/JSON.
  • A nice community: The Forum is very active. Questions should be answered in a couple of hours.

Addendum to my recent Post: FSM stands for “Flying Spaghetti Monster”

In my recent post I used the abbreviation FSM for “Finite State Machine”.

I was informed that FSM is the common abbreviation for the Flying Spaghetti Monster.

I searched for this term in the Kungle News document-storage and found some evidence for this claim:

Small Screendump

Small Screendump

Here is the complete list of references:

Query:

Start Main

New Hist Defined with:

Primary: List(Flying Spaghetti Monster, Pastafarianism, Pastafarian)

Secondary in: List(FSM)

Secondary out: List()

Calculated Interval: { “publishingDate” : { “$gt” : “2010-01-10T23:00:00.000Z” , “$lte” : “2010-03-22T23:00:00.000Z”} , “originalLanguage” : “ENGLISH”}

TitlePublished
Recorded
PublisherCitation
Mississippi dips its toe into antirealityTue Jan 19 15:00:00 CET 2010Discover MagazineFinally they will spread the message of how we were all created by the Flying Spaghetti Monster
"Senator Webb (D) shows fear: ""Suspend All Votes On Health Care Bill Until Senator-Elect Brown Is Seated"""Thu Jan 21 00:00:28 CET 2010Crooks and Liarsit'll be approximately Dec 2012, slightly before she takes office, so I figure what better time for Jebus, Buddah, Allah, Ra, Flying Spaghetti Monster, the Mayans, The 'V' lizard people, the Vogons, Daleks, V'Ger, etc to arrive?
FSM protect us!Tue Jan 26 19:40:17 CET 2010Discover MagazineSome people say the Church of the Flying Spaghetti Monster was a joke made ...
Video – Who Knew He Could Be A Swedish Hero?Mon Feb 01 16:00:11 CET 2010Dvorak UncensoredI wonder if it will work with the Flying Spaghetti Monster?
Minority Contractors Receive Just 2 Percent of Highway Stimulus CashThu Feb 04 20:31:08 CET 2010InfrastructuristWhat if only 2% of all infrastructure construction companies are women or minority owned? Setting a goal of having 10% of all contract recipients be minority or women owned is about as useful as setting a goal of having 10% of all contract recipients go to Wild Hildabeest and Flying Spaghetti Monster owned businesses.
When Did Jesus Become a Republican (or, for that matter, a Democrat?Mon Feb 15 00:44:01 CET 2010Care2 NewsI'm a Pastafarian. To me it's the only pure religion.
Do we really need a religious bill of rights?Mon Feb 15 15:27:14 CET 2010Discover MagazineIf the act passes, we need a Pastafarian as an agent provocateur.
Church of the Flying Spaghetti Monster FSM StoreFri Feb 19 02:33:10 CET 2010Suite101Church of the Flying Spaghetti Monster FSM Store
Iraq still embracing the magicWed Feb 24 02:02:34 CET 2010Discover MagazineNo, just kidding, it’s “For Flying Spaghetti Monster’s Sake”
Miss Beverly Hills tries to one-up Carrie Prejean, says it’s divine law that gays be put to death.Wed Feb 24 15:55:45 CET 2010Think ProgressThe Flying Spaghetti Monster offers more in life than any pagan based worship.
Video: Republican legislator says disabled children are 'God's punishment' for abortionWed Feb 24 19:00:09 CET 2010Crooks and LiarsI think my Pastafarianism makes me less able to understand why so many people think a superior being gives a flying noodle what they do or say?
South Dakota legislators tell schools to teach ‘astrological’ explanation for global warming.Thu Feb 25 19:49:40 CET 2010Think ProgressAll hail the Flying Spaghetti Monster!
"End of an Era: ""Lasts"" for Shuttle Program"Fri Feb 26 18:32:36 CET 2010Universe TodaySome spectacular pictures from the final SRB test. FSM-17, (that's flight support motor, not Flying Spaghetti Monster) burned for approximately 123 seconds — the same time each reusable solid rocket motor burns during an actual space shuttle launch.
Atheist Groups Visit The White House Causing A Right Wing TizzyMon Mar 01 15:45:41 CET 2010Dvorak Uncensored(Before the religious start jumping up and down “See, atheists ARE a religion”, the whole thing is a joke, like the Flying Spaghetti Monster)
Creationists And Climate Deniers Take On Teaching Climate Science In SchoolsThu Mar 04 17:20:14 CET 2010HuffingtonpostI think we can all look forward to the time when these three theories are given equal time in our science classrooms across the country, and eventually the world; One third time for Intelligent Design, one third time for Flying Spaghetti Monsterism (Pastafarianism), and one third time for logical conjecture based on overwhelming observable evidence.
Massa Will Resign MondayFri Mar 05 20:43:18 CET 2010Talking Points MemoThere is no doubt that there is a Flying Spaghetti Monster. The question is just how it flies, and what kind of sauce it's covered in.
ARD TV drama sparks Scientology's ireMon Mar 08 11:34:00 CET 2010The Local GermanyWhat Would the Flying Spaghetti Monster Do?
Christian leaders urge Congress to ignore misinformation on abortion provisions and pass health reform.Sat Mar 13 18:17:21 CET 2010Think ProgressSince a Pastafarian, I will say RAmen. ;)
Kreutz Comet VIDEO: WATCH Newly-Discovered Comet's Collision Course With The SunSun Mar 14 15:36:13 CET 2010HuffingtonpostIt's the great noodly appendage of the Flying Spaghetti Monster.
"To The 9th Circuit Court Of Appeals, God Is ""Patriotic"" And No Longer ""Religious"""Sun Mar 14 16:00:51 CET 2010Crooks and LiarsI'd tell him to substitute Flying Spaghetti Monster where appropriate.
Boehner Claims Student Loan Reform Will ‘Eliminate Every Bank In The Country’Fri Mar 19 23:42:19 CET 2010Think ProgressThe universe could have been created by a Flying Spaghetti Monster, or it could have always existed.

2; 0; 19

Topic Connections:

(fsm,1)(monster,1)(store,1)(flying,1)(church,1)(spaghetti,1)

Emotions:

(love,14)(hope,13) (+)/(-) (hate,7)(fear,7)

Public Tendency:

10; 5; 6

Country:

gb; us; no; cn; jp; in; se; au; ru; de; fr; ie; gr; nz; ca;

0; 20; 0; 0; 0; 0; 1; 0; 0; 0; 0; 0; 0; 0; 0;

Publisher Tendency:

(Discover Magazine,3)(Huffingtonpost,2)(The Local Germany,1)(Think Progress,0)(Talking Points Memo,-1)(Crooks and Liars,-3)

Calculation Done

Trend – Visualization

Monday, March 22nd, 2010

Since January the New “Corpus Engine” is in development and recorded about 302.000 articles. All in all 1.130.000 “news headlines” and summaries were stored since Kungle.de went online.

Now new algorithms were developed to:

  • Identify the public opinion about political and economic topics.
  • Follow the image status of brands, corporations or companies.
  • Track public feelings and emotions about actual events.

The Challenge

The actual trend calculation, based on static dictionaries, isn’t able to identify new events like an ‘earthquake’ or a
‘political reform’. The “topic-tagging” is static and limited to 9 topics “Science, Economy, Politic, Technology, Entertainment, Sport, Boulevard, Adult and Religion”.

It would be an exhausting task to code every new subtopic or event in a FSM (Finite State Machine).

Therefore the new engine identifies topics by itself. So not only the trend is calculated dynamically also the topic classification is “calculated”.

How is this done?

A simplified breakdown:

NLP (Natural Language Processing) is based on two strategies for text analysis: Tagging via Dictionaries and word / N-gram frequency analysis.

For Example:

This is an animation of a small section from the Kungle English – Dictionary (about 300.000 words) since January. The daily word count (one hour = one frame) is represented in the column height. The Column color changes from green to red if the word occurred in more than 10 percent of all articles. The overall word count frequency decreases from left to right.

Bigrams:

This is an even smaller section from the weighted bigram Matrix (about 100.000 x 100.000 words) in the same timeframe. Also this animation is compressed you can identify some horizontal and vertical lines. These lines occur if a topic is heavily discussed.

This Week on Kungle.de: Nobel Prizes, Riots in Pakistan and the “Balloon Boy”

Friday, October 16th, 2009

Three issues with hundreds of similar news publications blocked the front page of Kungle.de. Each publication is interesting and informative by itself  but together they are hiding other noteworthy information.

I concluded that it was about time to build a new subsystem to reduce the amount of identical information. You can still find all articles via the new “related link”.

The new Subsystem “IssueMerger“ now merges  news with similar content. The older news entries are the more likely  they are consolidated to one issue.

For this, I defined a function to calculate the proximity of two entries. (The Result is 1 if two news entries  are identical and 0 if they completely different.)

It is necessary to  build a complete “News Topology” (A Matrix with up to 1.5 million elements) which defines the proximities of all entry combinations.

The calculation for all topics requires up to 40 hours. The Algorithm itself was coded in 80 lines of scala.

You can find a calculated result here:
http://www.kungle.de/Trend/entry/220033

Update 1: In comparison this merge was hand made:

http://www.kungle.de/Trend/entry/225189

Running a Website With Scala and Lift

Wednesday, July 1st, 2009

Scala is a modern, statically typed language. Its bytecode is Java-VM compatible and the design is influenced by languages like Haskell or Erlang.

Lift is a Web Framework written in Scala. Lift applications follow the View First pattern. The View First pattern defines a complete separation of presentation and logic.

Although with long experience in business, writing my first web application in Scala/Lift wasn’t an easy task. The combination of functional and object-oriented programming paradigms enables a wide range of new possibilities to compose your source code. It may take a while to understand wiki examples or library documentation, but writing your application in Scala has crucial advantages.

  • Less code:
    The elegant Type System and the modern Control Flow Structures reduce your code size dramatically.
  • A manageable toolchain:
    Every new tool in your development environment means a new dependency in your application. Every update can cause trouble. To configure these tools you often implant new plugins in your ide which in turn complicates the build process. Typically you stop updating your development environment at a certain project stage.
    In contrary, you can build your Lift applications with vi. In less than 100 lines of code you can create a complete web application!

The result is a fast and flexible development cycle.

You can visit my site at kungle.de. It is an application to identify relevant news. The news rating is statistically calculated by an adaptive network and updated every 30 minutes.

References:
Scala homepage: http://www.scala-lang.org/
Lift homepage: http://liftweb.net/
Introduction “View First Pattern”: http://wiki.liftweb.net/index.php/Lift_View_First

Memory-Game on Ubuntu

Sunday, April 19th, 2009

You can now play my 3D memory-game on Ubuntu/Linux.

memorygameonubuntu

Make sure you have installed java6 from sun.

You can find it here: http://www.yousry.de/Memory-Game.html

Update: Memory Game

Wednesday, April 15th, 2009

You can find a small bugfix release of the memory-game. Reason was a newly discovered VM coredump under Vista (ogl library conflict).

Update: Verschlüsselung (eng.: Encryption)

Saturday, April 11th, 2009

First results show comparable processing speeds to commercial or proprietary products. A backport to Java5 / Mac was successful. The application may be used on all current desktop operation systems.

Algorithm Windows Mac
SHA1AndDESede
MD5AndTripleDES
SHA1AndRC2_40
MD5AndDES

In subsequent versions, major functional enhancements are planned. The complex user interface will be simplified.

Tasks until alpha release:

  • The user interface must be completed.
  • Software testing.

Tasks until beta release:

  • Add new encryption algorithms.
  • Add simplified user interface.
  • Add server functionality.
  • Batch processing.
  • Create API documentation.

New application in development: Aufgabenliste

Wednesday, March 11th, 2009

My first simple JavaFX application.

New application in development: Verschlüsselung

Monday, March 9th, 2009

“Verschlüsselung”  – an application for fast encryption and compression.  A usefull tool if you want to transport sensitive data with your flashdrive.

verschlusselung

Memory-Game Source Code Published

Tuesday, October 14th, 2008

The Memory-Game source code is now available at: http://code.google.com/p/memory-game/. It is published under the GPLv3 license.