Programming Tips - Java: Split a String into English words?

Date: 2011may30 Update: 2025oct13 Language: Java Q. Java: Split a String into English words? A. Use BreakIterator like this:
import java.text.BreakIterator; import java.util.ArrayList; class Demo { static ArrayList<String> splitIntoWords(final String s) { ArrayList<String> out = new ArrayList<>(); BreakIterator wordBreaker = BreakIterator.getWordInstance(); wordBreaker.setText(s); int end = 0; for (int start = wordBreaker.first(); (end = wordBreaker.next()) != BreakIterator.DONE; start = end) { final String word = s.substring(start, end); // The so-called word includes spaces final String trimmedWord = word.trim(); out .add(trimmedWord); } return out; } public static final void main(String[] args) { var words = splitIntoWords("hello, world how are you?"); for (String word : words) { System.out.println("word=" + word); } } }
Output:
word=hello word=, word= word=world word= word=how word= word=are word= word=you word=?
As you can see, it keeps the spaces and makes punctuation its own word. This is actually useful. You can easily discard this if you don't want them. But they're there if you do want them.